Here we look at the Pocket algorithm that addresses an important practical issue of PLA stability and the absence of convergence for non-separable training dataset.
Generating Non-Separable Training Datasets
A minor modification for the code from the previous post on generation of artificial linearly separable datasets allows to generate “almost” separable data, i.e. data where number of data points that violate linear separability can be controlled and the max violation distance from the “true” decision boundary is a parameter. The code below uses function value_on instead of class_of (see Classifier python code in one of the previous posts). That allows to pick data point class randomly within the boundary of specified “width” around classifier’s decision boundary. (The width here is in quotes because the classifier linear function vect_w is not normalized.)
def nearly_separable_2d(seed, n_points, classifier, boundary_width):
'''Generates n_points which are not linearly separable with
a controlled degree of inseparability (in 2d). The classifier has
to be able to return floating point value of the underlying linear
function.
:params seed: sets random seed for the random generator,
:params n_points: number of points to generate,
:params classifier: a linear function returning floating point
for each point in 2d;
:params boundary_width: instead of using comparison of
classifier.value_on(data point) with 0 we will compare
it with a random number in the range [-boundary_width, boundary_width]
to assign the class +1 or -1 to a data point.
'''
np.random.seed(seed)
dim_x = 2
data_dim = dim_x + 1 + 1 # leading 1 and class value
data = np.ones(data_dim * n_points).reshape(n_points, data_dim)
# fill in random values
data[:, 1] = -1 + 2*np.random.rand(n_points)
data[:, 2] = -1 + 2*np.random.rand(n_points)
# in the vicinity of boundary generate random number
# to decide on the class of the data point:
rand_boundary = -1 + 2*np.random.rand(n_points)
rand_boundary *= boundary_width
# TODO: use numpy way of applying a function to rows
for idx in range(n_points):
curr_val = classifier.value_on(data[idx])
data[idx,-1] = 1 if rand_boundary[idx] < curr_val else -1
return data
Below is a plot of 300 training points generated using the code above. Note that the separability violators are concentrated around the decision line of the classifier used to generate data.

Pocket PLA
Modifications done to the PLA to support the Pocket algorithm are obvious and there is no need to present them. Now the PLA should also check for both – convergence criteria and the max iteration limit to terminate.
Animations
The following two animations show behavior of the Pocket PLA on artificial datasets with non-separable data generated by function nearly_separable_2d above. The max iteration number was set to 100. Both animations show that updates on later steps are mostly happening on the datapoints that violate linear separability. Click on the thumbnail to see the gif animation.
References
[1] Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. “Learning From Data” AMLBook, 213 pages, 2012.
[2] Wikipedia (http://en.wikipedia.org/wiki/Perceptron)


