Questions tagged [nltk]

NLTK is a free, open-source natural language processing toolkit for python. It is used primarily for text processing applications and includes libraries specifically made for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

NLTK is a free, open-source natural language processing toolkit for python. It is used primarily for text processing applications and includes libraries specifically made for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

138 questions
33
votes
5 answers

How can I get a measure of the semantic similarity of words?

What is the best way to figure out the semantic similarity of words? Word2Vec is okay, but not ideal: # Using the 840B word Common Crawl GloVe vectors with gensim: # 'hot' is closer to 'cold' than 'warm' In [7]: model.similarity('hot',…
Thomas Johnson
  • 665
  • 1
  • 7
  • 11
23
votes
6 answers

Similarity between two words

I'm looking for a Python library that helps me identify the similarity between two words or sentences. I will be doing Audio to Text conversion which will result in an English dictionary or non dictionary word(s) ( This could be a Person or Company…
gogasca
  • 749
  • 2
  • 8
  • 17
9
votes
6 answers

NLP: What are some popular packages for multi-word tokenization?

I intend to tokenize a number of job description texts. I have tried the standard tokenization using whitespace as the delimiter. However I noticed that there are some multi-word expressions that are splitted by whitespace, which may well cause…
CyberPlayerOne
  • 392
  • 1
  • 4
  • 14
8
votes
1 answer

Complex Chunking with NLTK

I am trying to figure out how to use NLTK's cascading chunker as per Chapter 7 of the NLTK book. Unfortunately, I'm running into a few issues when performing non-trivial chunking measures. Let's start with this phrase: "adventure movies between 2000…
grill
  • 234
  • 3
  • 7
8
votes
2 answers

Is there an alternative to nltk in golang?

Golang is one of my favourite languages and I want to use it for a personal NLP/ML project. Is golang's ecosystem good and rich enough for this? Is there an alternative package for nltk in golang?
Dariush
  • 183
  • 1
  • 5
7
votes
2 answers

Combining Machine Learning classifier with NLTK Vader for Sentiment Analysis

As a part of my university project, I am researching/developing a sentiment analysis model wherein I am trying to combine NLTK Vader (SentimentIntensityAnalyzer) results with a Machine Learning trained classifier for prediction of Sentiments on…
6
votes
1 answer

Is there an NLP corpus that contains common medical terms?

I am trying to use the NLTK library to extract keywords denoting medical symptoms from medical reports of patients. For example, I have a medical report as follows: s:a 33 year old female crystallographer presents with mild spells of vertigo, mild…
user112647
6
votes
3 answers

Training NLP with multiple text input features

Question: How can I train a NLP model with discrete labels that is based on multiple text input features? Background: I'm trying to predict the difficulty of a 4-option multiple choice exam question (probability of a test-taker selecting the correct…
Carl Molnar
  • 111
  • 2
  • 6
6
votes
1 answer

How to extract Question/s from document with NLTK?

How to extract Only Question/s from document with NLTK ? Can we categorise this Question into Y/N and details type answerable ? Note: I am one week old in NLTK ;-)
5
votes
3 answers

Chunking Sentences with Spacy

I have a lot of sentences (500k) which looks like this: "Penalty missed! Bad penalty by Felipe Brisola - Riga FC - shot with right foot is very close to the goal. Felipe Brisola should be disappointed." "Penalty saved! Damir Kojasevic - Sutjeska…
senty
  • 153
  • 3
5
votes
3 answers

Machine learning or NLP approach to convert string about month ,year into dates

I'm currently in the process of developing a program with the capability of converting human style of representing year into actual dates. Example : last year last month into December 2018 string may be complete sentence like : what were you doing 5…
Bipul
  • 201
  • 1
  • 9
5
votes
1 answer

Accuracy of word and sent tokenize versus custom tokenizers in nltk

The Natural Language Processing with Python book is a really good resource to understand basics of NLP. One of the chapters introduces training 'sentence segmentation' using Naive Bayes Classifer and provides a method to perform sentence…
MrKickass
  • 111
  • 8
5
votes
1 answer

Inferring Relational Hierarchies of Words

I am new to natural language processing and I have not heard of a problem similar to mine yet. I was wondering if anyone could refer me to a method for solving my problem, or tell me how this problem is referred to in the academic literature, so…
4
votes
3 answers

TFIDF for very short sentences

I'm trying to build a regression model, in which one of the features contains text data. I was thinking in using scikit-learn's sklearn.feature_extraction.text.TfidfVectorizer. The issue however, is that the actual strings contain very few words.…
yatu
  • 293
  • 1
  • 11
4
votes
3 answers

Is there a good German Stemmer?

What I tried: # -*- coding: utf-8 -*- from nltk.stem.snowball import GermanStemmer st = GermanStemmer() token_groups = [(["experte", "Experte", "Experten", "Expertin", "Expertinnen"], []), (["geh", "gehe", "gehst", "geht", "gehen",…
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
1
2 3
9 10