2

I'm working on a project that seeks to identify clusters in urban development based on location (in lat/lon) and a categorical variable (what the particular site is zoned for). Ideally, the analysis would identify clusters of sites that are 1) near each other and 2) zoned the same. Below is a sample of what my data looks like:

 lat       lon      zone
 33.22320 -112.6741 R-43      
 33.45324 -113.0888 R-43      
 33.71800 -112.3885 R-43      
 33.45626 -111.9408 AG        
 33.45746 -111.9313 R-6       
 33.45747 -111.9309 R-6 

I've seen methods that define distance on just lat/lons using great circle distances, but I haven't seen any mixed clustering of the sort I'm trying to implement. I'm fairly new to cluster analysis so any guidance on implementing something like this in R or Python would be greatly appreciated!

Tasos
  • 3,860
  • 4
  • 22
  • 54
user27974
  • 21
  • 1
  • 1
    This is just a special case of mixed continuous/categorical clustering. If you search for that you should find something, e.g., [this](http://datascience.stackexchange.com/questions/22/k-means-clustering-for-mixed-numeric-and-categorical-data) question. – Emre Jan 16 '17 at 03:41
  • I'd **split the data** per category, cluster each, then compare. – Has QUIT--Anony-Mousse Jan 16 '17 at 07:00

0 Answers0