Anatomy of a Learning Algorithm

Building Blocks of a Learning Algorithm

Every learning algorithm consists of three parts

  1. A loss function
  2. An optimization criterion
  3. An optimization routine

Gradient Descent

Gradient descent is an iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one starts at some random point and takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. Gradient descent can be used to find optimal parameters for linear and logistic regression, SVM and also neural networks which we consider later. For many models, such as logistic regression or SVM, the optimization criterion is convex. Convex functions have only one minimum, which is global. Optimization criteria for neural networks are not convex, but in practice even finding a local minimum suffices.

Working of Gradient Descent

Linear regression model looks like : . where is called weights and is called bias. in order to get the optimal model we have to find the optimal values for both and . we look for such values and that minimize the mean square error:

Gradient Descent starts with calculating the partial derivate for every parameter.

To find the partial derivate of the term with respect to we applied the . Here we have the chain where and . To find a partial derivate of with respect to we have to first find the partial derivate of with respect to which is equal to . and then we have to multiply it by the partial derivate of with respect to which is equal to . So overall

We initialize and and then iterate through out training examples. each examples having the form of . For each examples we update and using our partial derivates, The learning rate controls the size of an update :

Where and denote the values of and after using the example for the update. One pass through all training examples is called an epoch.

How machine Learning engineers work

Machine learning engineers use libraries instead of implementing learning algorithm themselves. The most frequently used open-source library is scikit-learn:

def train(x, y):
	from sklearn.linear_model import LinearRegression
	modl = LinearRegression().fit(x, y)
	return model
 
model = train(x, y)
 
x_new = 23.0
y_new = model.predict(x_new)
print(y_new)