The Perceptron Learning Algorithm (PLA) is one of the simplest Machine Learning (ML) algorithms. This post goes through an elementary example to illustrate how PLA works.

# Notes on Perceptron Learning Algorithm

We start with some definitions and classification problem outline, next we move to the plain vanilla perceptron learning algorithm.

## Linear Classifier

Consider set of points with first coordinate fixed to 1 and mapping where , which maps elements of to two different classes denoted by elements of . The two classes are assumed to be linearly separable, i.e. there exists vector such that where * is dot product on . For the sake of the following discussion we will define . Such mapping is **a linear classifier** (on the set ). The set of the pairs

:

is called the **training data set**.

## Perceptron Learning Algorithm (PLA)

Given a training data set , the goal of PLA is to find some that provides correct classification on , i.e. for all pairs in the training set holds . Obviously, for a linearly separable training data sets such classifier is not unique.

The algorithm is iterative. On each iteration we will obtain new candidate value for , which we will denote as and the corresponding linear mapping will be denoted as .

Initialization. Assign .

Step .

- Compute values of pairs on the training set until we find a misclassified pair, say, for , .
- Update the candidate using the rule: .

Termination. Stop iterations when all points are correctly classified.

The intuition behind the update method (item 2) is that it adjusts to improve its performance on the selected misclassified point:

and since always , due to the first non-zero coordinate, the value of is closer to having the same sign as than . Since each step improves only on a single training pair (and may actually make classification performance worse on some or all other pairs), it is not obvious why the PLA converges to a solution of the linear classification problem. The proof will be discussed in one of the future posts. Here we will use an extremely simple examples to illustrate the PLA.

## Example: One-dimensional Training Data Set.

The training data set consists of two training points:

;

also shown below:

While this is probably the simplest example that can be devised to illustrate the PLA, it shows the idea behind it in a clear unobstructed way. By executing the PLA step-by-step we will first initialize .

Iteration 1. We obtain:

;

.

Note that classifies the entire range of real values of as +1. Out of the two training data points only the first one is misclassified: the desired value is -1 while returns +1. This results in an update:

Iteration 2. With the new candidate we obtain the following values on the data points:

;

.

As a side note, the decision boundary between the two classes is now located at (while for it was at -∞):

The value of decision boundary is obtained by solving equation:

or, equivalently, .

The misclassified point leads to an update:

That moves decision boundary to the root of the equation i.e. to and results in all training data points classified correctly:

## References

The PLA is covered by many books and articles. The AML Book listed below is easily accessible, well written and has a corresponding MOOC.

[1] Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. “Learning From Data” AMLBook, 213 pages, 2012.