## k-Nearest Neighbors Regression

k-Nearest Neighbors (k-NN) regression is a type of non-parametric, instance-based learning algorithm that is used for regression tasks. It works by predicting the output value of a data point based on the output values of the k-nearest data points in the training set.

The k-NN algorithm has the following steps:

- Choose the number of neighbors
*k*and collect a training set. A set of training samples is (*xs*) where_{i}, ys_{i}*xs*is the input (features) vector and_{i}*ys*is the corresponding target (label) value for each set_{i}*i*. - Calculate the distance between the data point and all the points in the training set. The distance
*d*between the test sample_{i}*xt*and a stored training sample*xs*is usually calculated using the Euclidean distance._{i}

*n*is the number of features in

*xs*or

*xt*.

- Select the
*k*-nearest points in the training set based on the distance. - Predict the output value of the data point as the average of the output values of the
*k*-nearest points. k-NN regression finds the prediction for a test sample*xt*by computing the average of the*k*nearest training samples to*xt*. The prediction can be written as:

*ys*are the target values of the

_{i}*k*-nearest neighbors to

*xt*.

Neighbors based regression is a type of lazy learning as it does not attempt to construct a general internal model, but simply stores instances of the training data. Regression result is computed from the *k*-nearest neighbors of each point as an average or local linear approximation.

**Advantages:** This algorithm is simple to implement, robust to noisy training data, and effective if training data is large.

**Disadvantages:** Need to determine the value of *k* and the computation cost is high as it needs to computer the distance of each instance to all the training samples. A feedback loop can be added to determine the number of neighbors.

**k-Nearest Neighbors Regression in Python**

Below is an example of how to implement k-NN regression in Python using the scikit-learn library:

import numpy as np

# Assume that we have a training set of data points with input features X and output values y

X = np.array([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]])

y = np.array([1, 2, 3, 4, 5])

# Create a k-NN regressor with k=3

knn = KNeighborsRegressor(n_neighbors=3)

# Fit the regressor to the training data

knn.fit(X, y)

# Predict the output value of a new data point

x_new = np.array([[1, 1]])

y_pred = knn.predict(x_new)

print(y_pred) # Output: [1.66666667]

In this example, we have a training set of 5 data points with input features *X* and output values *y*. We create a k-NN regressor with *k=3* and fit it to the training data. Then, we use the regressor to predict the output value of a new data point. The output value is predicted as the average of the output values of the 3-nearest data points in the training set. The number of nearest neighbors in the primary hyper-parameter to adjust the performance of the regressor, such as changing *n_neighbors=5*.

knn = KNeighborsRegressor(n_neighbors=5)

knn.fit(X,y)

yP = knn.predict(x_new)

**MATLAB Live Script**

#### ✅ Knowledge Check

**1.** Which of the following statements best describes how k-Nearest Neighbors (k-NN) regression works?

**A.**It predicts the output value of a data point based solely on its own features without considering any other data points.

- Incorrect. k-NN regression considers the output values of the k-nearest data points in the training set to predict the output value of a new data point.

**B.**It works by calculating the mode of the output values of the k-nearest data points in the training set.

- Incorrect. While mode-based decision might be a technique in k-NN for classification tasks, k-NN regression predicts the output value based on the average of the output values of the k-nearest points.

**C.**It predicts the output value of a data point based on the average of the output values of the k-nearest data points in the training set.

- Correct. Exactly! k-NN regression predicts the output value by averaging the values of the k-nearest data points.

**D.**The algorithm tries to find the best fit line for the data points.

- Incorrect. k-NN regression doesn't try to find a best fit line, unlike linear regression. It's based on instance-based learning.

**2.** Which of the following is NOT a disadvantage of the k-NN regression algorithm?

**A.**It's complex to implement.

- Incorrect. One of the advantages of k-NN is its simplicity in implementation.

**B.**Determining the value of k can be challenging.

- Correct. While determining the optimal value of k can indeed be challenging, this statement correctly identifies it as a disadvantage of the k-NN algorithm.

**C.**It has a high computational cost since it computes the distance to all training samples.

- Incorrect. This is indeed a disadvantage of k-NN. It requires calculating distances between the test sample and every sample in the training dataset.

**D.**The algorithm tries to construct a general internal model for predictions.

- Incorrect. k-NN is a type of lazy learning, meaning it does not construct a general internal model but simply stores instances of the training data.