def train(images, labels):
# Machine Learning!
return model;
def predict(images, labels):
# Use model to predict labels
return test_labels;
The function is usually used to memorize all data and labels
def train(images, labels):
# Machine Learning!
return model;
The function is usually used to predict the label of the most similar training image
def predict(images, labels):
# Use model to predict labels
return test_labels;
L
1
d
i
s
t
a
n
c
e
:
d
1
(
I
1
,
I
2
)
=
∑
P
∣
I
1
P
−
I
2
P
∣
L1\ distance :\quad d1(I1,I2) = \sum_{P} \left | I_{1}^{P} - I_{2}^{P} \right |
L1 distance:d1(I1,I2)=P∑∣
∣I1P−I2P∣
∣
import numpy as np
class NearestNeighbor:
def __init__(self):
pass
def train(self, X, Y):
""" X is N * D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y
def predict(self, X):
""" X is N * D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# lets make sure that the output type matches the input type
Y_pred = np.zeros(num_test, dtype = self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
# get the index with smallest distance
min_index = np.argmin(distances)
# predict the label of the nearest example
Ypred[i] = self.ytr[min_index]
return Ypred
Q & A :
Q: How fast are training and prediction?
A: Train O(1) Predict O(N)
This is bad. Because we want to classifier that are fast at prediction; and slow for training is ok