Bruh moment! Not about Machine Learning

I am not like him.-Definitely not

K-Nearest Neighbors

KNNs are one of the most basic approaches of ML. Basically, they memorize all the data, and when a new data point comes they measure the distance of it to all memorized data after that the new data point classified into the same class with which most neighbors belong. Actually, Grammarly says I have 3 critical mistakes but I have a free plan so I don't know where are they damn I became nervous! Whatever, KNNs have a critical point which gives name to the approach K-neighbors. All results can change depending on this parameter. There is a figure in the following that explains KNNs in one figure.

Blue (I am a boy so I can define only some colors) Triangles Class1, Rectangles Class2, Purple Triangle New Datapoint. Circles are the distance. More in the following paragraph
  • I am not sure if it is an advantage but I am totally sure it is not cool. There is no training time. As has been discussed above, KNNs just memorize the training data. So fast to train, slow to test.
  • Easy to implement.
  • All new data can be added to the model to enhance model performance.
  • Not a good choice for big sets.
  • Very sensitive to noise and missing data.
  • All data must be normalized in the same way.
  • Do not have a good performance on high dimension data.

Support Vector Machines (SVM)

Ok. Now things are getting more complicated. SVM is a very powerful method. Basically, SVM uses only the most challenging few data to optimize its parameters. These are called support vectors. Baaaaam! I added 2 figures to illustrate what is going on. In figure1 there is R² environment in figure 2 there is R³ environment. Ok! I know I lost you on the last sentence. So let's optimize it like that: If space is linearly separable then look at figure 1. If it is not then you should look at figure 2. In figure2 you see an example of kernel operations (I will talk about it in later posts). This is a very useful trick.

Figure1: 2 classes. The hyperplane is illustrated with blue
Figure1:Support Vector Machines with the mlr package 10 November 2019 , hefinioanrhys, [https://www.r-bloggers.com/2019/10/support-vector-machines-with-the-mlr-package/] , Last access:06.12.2020
  • Good performance on high-dimensional data
  • Better performance on less data than other methods.
  • Memory efficient
  • It can be a rival to neural networks.
  • It cannot be used as regressors.
  • Bad naming. SVM hmmm… Can be better.. (My opinion)

Decision Trees

Decision trees are the algorithm we all use in our daily lives. Unconsciously we all have this decision flow. Accept or not everybody has that spirit! Basically, decision trees are binary decisions optimized to input data.

Figure3: Decision Trees
  • Easy to interpret.
  • White box man! White box.
  • Can be merged with other decision trees
  • Very sensitive to changes in data
  • Generally, accuracy is not adequate. So instead of decision trees, we use forests.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store