11

I would like to use the knn distance plot to be able to figure out which eps value should I choose for the DBSCAN algorithm. Based on this page:

The idea is to calculate, the average of the distances of every point to its k nearest neighbors. The value of k will be specified by the user and corresponds to MinPts. Next, these k-distances are plotted in an ascending order. The aim is to determine the “knee”, which corresponds to the optimal eps parameter.

Using python with numpy/sklearn, I have the following points, with the following distance for 6-knn:

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=len(X)).fit(X)
distances, indices = nbrs.kneighbors(X)

# Indices

[[0 1 2 3 4 5]
 [1 0 2 3 4 5]
 [2 1 0 3 4 5]
 [3 4 5 0 1 2]
 [4 3 5 0 1 2]
 [5 4 3 0 1 2]]

# Distances
[[ 0.          1.          2.23606798  2.82842712  3.60555128  5.        ]
[ 0.          1.          1.41421356  3.60555128  4.47213595  5.83095189]
[ 0.          1.41421356  2.23606798  5.          5.83095189  7.21110255]
[ 0.          1.          2.23606798  2.82842712  3.60555128  5.        ]
[ 0.          1.          1.41421356  3.60555128  4.47213595  5.83095189]
[ 0.          1.41421356  2.23606798  5.          5.83095189  7.21110255]]

then I computed the average distance:

distances.mean()
2.9269575028354495

The problem is I don't understand how exactly could I represent the same plot as them with distances in y-axis and number of points according to the distances on the x-axis using python.

Thank for your help.

Kasra Manshaei
  • 6,502
  • 1
  • 20
  • 45
Marc Lamberti
  • 327
  • 1
  • 3
  • 8
  • [![enter image description here](https://i.stack.imgur.com/KFDbs.png)](https://i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!! – Dung Le Oct 10 '17 at 00:37
  • See also [https://stackoverflow.com/a/48594371/3342058](https://stackoverflow.com/a/48594371/3342058) – Baschdl Apr 22 '20 at 22:23

1 Answers1

9

You

  1. take the last column of that matrix
  2. sort descending
  3. plot index, distance
  4. hope to see a knee (if the distance does not work well. there might be none)
Has QUIT--Anony-Mousse
  • 7,969
  • 1
  • 14
  • 30