KNN

#Computers
A way to classify a data point based on its "similarity distance" to the $\displaystyle k$ nearest neighbors.

Topics

Initialize
1. Place the data point in a feature space
Calculate Distance
1. Find the Euclidean distance from the data point with all other labeled data
2. May have to normalize data point values along different coordinates to be Z-scores for each coordinate
Sort Distance
1. Sort distances of the data point with other labeled data points in increasing order
2. For classification, the most common class of $\displaystyle k$-nearest neighbors determines the data point class
3. For regression, use the average of the $\displaystyle k$-nearest neighbors' labels

Gives the index in $\displaystyle [N]$ of the $\displaystyle k$th nearest neighbor of $\displaystyle x$, or the data point we're interested in classifying
$\displaystyle [N]$ is an array of indices for each data sample point
We use the square of the L2 norm here to calculate distance but other measures could work

Gives the set of indices of the $\displaystyle K$ nearest neighbors to $\displaystyle x$

Gives the number of "votes" for a particular class/label $\displaystyle c$
We iterate over each of the $\displaystyle K$-nearest neighbors and increment by one every time the neighbor equals the class of interest

The prediction is the class $\displaystyle c$ that maximizes the number of votes

I don't know why this isn't $\displaystyle O(n)$, but spatial complexity definitely is $\displaystyle O(n)$

To classify, we need to calculate the distances from our point to all of our $\displaystyle n$ data points, where calculating the distances scales linearly with the number of features $\displaystyle d$, which overall takes $\displaystyle O(nd)$ time. We must also sort these distances before finding the $\displaystyle K$-nearest neighbors, which takes $\displaystyle O(n\log n)$ time