I am trying to choose parameters for DBSCAN clustering algorithm, in particular minPts.
The Wikipedia article suggests a rule of thumb to derive minPts from the number of dimensions D in the data set. minPts >= D + 1. For larger datasets, with much noise, it suggests minPts = 2 * dim. I guess D and dim stands for the same, do they?
I wonder what is the value of D in my case?
I use DBSCAN for clustering 2-dimensional black-and-white scans of business documents based on their layout. First, each scan gets segmented into black-and-white boxes and then turned into 0-1 1-dimensional arrays (strings). Here's an example of a segmented document scan.
I use Levenstein distance in clustering to measure similarity between scans.
So I guess my D = 1 and I should start with minPts = 2?
