Questions tagged [k-nn]

K-Nearest Neighbor (K-NN) is a classification algorithm that determines the label of some data point based on the most common label of the closest k other points.

186 questions
11
votes
1 answer

Learning with Positive labels only

I have ~7 million rows of customer data (~500 sparse attributes) A million out of them have opted in to a new service. How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the…
7
votes
2 answers

Coordinate System's influence on $L$ distances (Manhattan and Euclidean)

I don't understand this picture, which says if we change the coordinate system, we would have the same result for $L_2$ distance, whereas, our result would differ for $L_1$ distance. What does it mean by coordinate system? $(0,0)$ if yes, the…
Fatemeh Asgarinejad
  • 1,164
  • 1
  • 8
  • 17
7
votes
1 answer

How to decide how many n_neighbors to consider while implementing LocalOutlierFactor?

I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data…
7
votes
1 answer

How to decide the shape of input features, when each data file is of different length?

To help me understand the benefits and shortcomings of decision trees, KNN, Neural Networks, I wanted to build a simple classifier that classifies into 2 classes (Bird Sound and Non-Bird Sound) using all of the above 3 methods. So I downloaded a…
6
votes
1 answer

How to add data points to a trained KNN

I have a trained KNN, I created with https://github.com/kevinzakka/blog-code/blob/master/knn/knn.py I want to add more data points to the KNN but I am on a raspberry pi so limited by RAM and therefore the number of data points I can add at a time to…
6
votes
2 answers

What does big O mean in KNN optimal weights?

Wiki gives this definition of KNN In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in…
JJJohn
  • 614
  • 10
  • 22
6
votes
1 answer

Mapping of categorical features into binary indicator features

I am currently reading an introductory machine learning book by Daumé (ch. 03, p. 30) and when discussing the mapping of categorical features with "n" possible values into "n" binary indicator features, the following question is proposed: Is it a…
6
votes
1 answer

Sklearn: unsupervised knn vs k-means

Sklearn has an unsupervised version of knn and also it provides an implementation of k-means. If I am right, kmeans is done exactly by identifying "neighbors" (at least to a centroid which may be or may not be an actual data) for each cluster. But…
Universalis0
  • 105
  • 1
  • 2
  • 8
5
votes
2 answers

What is difference between Nearest Neighbor and KNN?

I was taking the tutorial of making Recommendation system , there I read that Nearest Neighbor is different from KNN classifier . Could anyone explain that what is Nearest Neighbor and how it is different between KNN ?
Hamza
  • 229
  • 3
  • 11
5
votes
4 answers

How to save a knn model?

I need to save the results of a fit of the SKlearn NearestNeighbors model: knn = NearestNeighbors(10) knn.fit(my_data) How do you save to disk the traied knn using Python?
Vincenzo Lavorini
  • 1,754
  • 1
  • 9
  • 22
5
votes
2 answers

Regression for discrete values?

I am a novice in machine learning/statistical algorithms, but I have worked with some simple classifiers and regression. I would like some opinions on whether I am going the right direction or not, given my limited knowledge. My problem is the…
urSus
  • 151
  • 1
  • 2
5
votes
2 answers

Is K-NN applicable for binary variables?

I need help because I'm just new to machine learning and I do not know if k-nearest neighbors algorithm can be used to identify the appropriate program(s) for Student 11 in the table below. The school subjects (Math, English, etc.) are the…
5
votes
1 answer

k-Nearest Neighbours with time series data - how to obtain whole-time-period estimators

I have a large dataset for the activities performed by multiple staff in a factory over a long period of time - 01/01/2017 - present. The activities performed by the different staff are recorded at each point in time (since they interact with…
Statsanalyst
  • 151
  • 1
4
votes
2 answers

KNN accuracy going worse with chosen k

This is my first ever KNN implementation. I was supposed to use (without scaling the data initially) linear regression and KNN models for predicting the loan status(Y/N) given a bunch of parameters like income, education status, etc. I managed to…
satan 29
  • 103
  • 6
4
votes
1 answer

KNN Regression: Distance function and/or vector representation for datetime features

Context: Trying to forecast some sort of consumption value (e.g. water) using datetime features and exogenous variables (like temperature). Take some datetime features like week days (mon=1, tue=2, ..., sun=7) and months (jan=1, ..., dec=12). A…
1
2 3
12 13