I am new in topic modeling and text clustering domain and I am trying to learn more. I would like to use the DBSCAN to cluster the text data. There are many posts and sources on how to implement the DBSCAN on python such as 1, 2, 3 but either they are too difficult for me to understand or not in python.
I have a CSV data that has userID and message that they wrote as follows:
user.csv (number of csv rows:400 (#message))
userID messages
112 The car was broken and Kevin fixed it
.
.
.
I know some steps to apply DBSCAN such as:
- Remove stop words
- Find similarity distance ( I have a code that does the cosine similarity)
I am also aware that sci-kit learn has the demo at 4 but I prefer the manual implementation that I can see what's going on in the code.
It would be great if you can provide your help with code that I can run in my side to learn it.