Fundamental Algorithms
Linear Regression
Linear regression is a popular regression learning algorithm that learns a model which is a linear combination of features of the input example.
Problem Statement
We have a collection of labeled examples
where w is a D-dimensional vector of parameters and b is a real number. The notation
Solution
The optimization procedure which we use to find the optimal values for
In mathematics, the expression we minimize or maximize is called an objective function or simply and objective. The expression Adrien-Marie Legendre, who first published the sum of squares method for gauging the quality of the model stated that squaring the error before summing is convenient. Why did he say that? The absolute value is not convenient, because it doesn’t have a continuous derivative, which makes the function not smooth. Functions that are not smooth create unnecessary difficulties when employing linear algebra to find closed form solutions to optimization problems. Closed form solutions to finding an optimum of a function are simple algebraic expression and often preferable to using numerical optimization methods such as Gradient Descent.
Logistic Regression
The first thing to say is that logistic regression is not a regression, but a classification learning algorithm. The name comes from statistics and is due to the fact that the mathematical formulation of logistic regression is similar to that of linear regression.
Problem statement
In logistic regression, we still want to model
where 
Solution
In logistic regression instead of using squared loss and trying to minimize the empirical risk, we maximise the
Decision Tree Learning
A decision tree is an acyclic graph that can be used to make decisions. In each branching node of the graph, a specific feature j of the feature vector is examined. If the value of the feature is below a specific threshold, then the left branch is followed; otherwise, the right branch is followed. As the leaf node is reached, the decision is made about the class to which the example belongs.
Problem Statement
make a simple decision tree classifier.
Solution
There are various formulation of the decision tree learning algorithm. One of them is ID3. The optimization criterion, in this case, is the average log-likelihood:
Support Vector Machine
K-Nearest Neighbors
K-Nearest Neighbors (KNN) is a non-parametric learning algorithm. Contrary to other learning algorithms that allow discarding the training data after the model is built, KNN keeps all training examples in memory. Once a new, previously unseen example x comes in, the KNN algorithm finds k training examples closest to x and returns the majority label (in case of classification) or the average label (in case of regression). The closeness of two points is given by a distance function. For example, Euclidean distance seen above is frequently used in practice. Another popular choice of the distance function is the negative cosine similarity. Cosine similarity defined as,
is a measure of similarity of the directions of two vectors. If the angle between two vectors is 0 degrees, then two vectors points to the same direction and cosine similarity is equal to 1. If the vectors are orthogonal, the cosine similarity is 0. For vectors pointing in opposite direction, the cosine similarity is -1. If we want to use cosine similarity as distance metric we need to multiply it by -1.
Other popular distance metrics include Chebychev distance, Mahanobis distance, Hamming Distance.
The choice of distance metric as well the value for k are the choices the analyst makes before running the algorithm. So these are hyperparameters.