Sentiment analysis refers to categorizing some given data as to what sentiment(s) it expresses. Usually, it refers to extracting sentiment from a text, e.g. tweets or blog posts.
Questions tagged [sentiment-analysis]
242 questions
67
votes
4 answers
What is purpose of the [CLS] token and why is its encoding output important?
I am reading this article on how to use BERT by Jay Alammar and I understand things up until:
For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…
user3768495
- 887
- 1
- 7
- 8
29
votes
1 answer
NLP - why is "not" a stop word?
I am trying to remove stop words before performing topic modeling. I noticed that some negation words (not, nor, never, none etc..) are usually considered to be stop words. For example, NLTK, spacy and sklearn include "not" on their stop word lists.…
E.K.
- 405
- 4
- 6
12
votes
2 answers
Features of word vectors in Word2Vec
I am trying to do sentiment analysis. In order to convert the words to word vectors, I am using Word2Vec model. Suppose I have all the sentences in a list named 'sentences' and I am passing these sentences to word2vec as follows:
model =…
enterML
- 3,011
- 9
- 26
- 38
12
votes
5 answers
How to overcome training example's different lengths when working with Word Embeddings (word2vec)
I'm working on Sentiment Analysis over tweets using word2vec as word representation.
I have trained my word2vec model. But when I'm going to train my classifier, I'm facing the issue that every tweet has different length and the classifier…
antorqs
- 221
- 2
- 5
9
votes
3 answers
Sentiment Analysis Tutorial
I am trying to understand sentiment analysis and how to apply it using any language (R, Python etc). I would like to know if there is a good place on internet for tutorial that I can follow. I googled, but I wasn't very much satisfied because they…
KurioZ7
- 285
- 3
- 7
9
votes
3 answers
BPE vs WordPiece Tokenization - when to use / which?
What's the general tradeoff between choosing BPE vs WordPiece Tokenization? When is one preferable to the other? Are there any differences in model performance between the two? I'm looking for a general overall answer, backed up with specific…
vgoklani
- 229
- 2
- 6
9
votes
4 answers
Improving accuracy of Text Classification
I am working on a text classification problem, the objective is to classify news articles to their corresponding categories, but in this case the categories are not very broad like, politics, sports, economics, etc., but are very closely related and…
ac-lap
- 159
- 1
- 1
- 6
8
votes
3 answers
What is parts of speech technique in sentiment analysis?
In an article, I saw Sentiment Analysis using Parts Of Speech(POS) technique. When I searched I got some paper on POS but I couldn't understand what POS basically is. Though I am new to sentiment analysis please help me to understand POS.
SRJ577
- 197
- 1
- 4
- 14
8
votes
1 answer
Extracting individual emails from an email thread
Most of the open source datasets are well formatted i.e each email message is separated well like the enron email dataset. But out in the real world it is highly difficult to separate a top email message from a thread of emails.
For example consider…
Greedy Coder
- 143
- 1
- 6
8
votes
3 answers
Twitter Sentiment Analysis: Detecting neutral tweets despite training on only Positive and Negative Classes
I am a newbie when it comes to machine learning. I am trying to get hands on experience by analyzing different supervised learning algorithms using scikit-learn library of python. I am using the sentiment140 dataset of 1.6 million tweets for…
tedghosh
- 81
- 1
- 4
7
votes
1 answer
Using Apache Spark to do ML. Keep getting serializing errors
so I'm using Spark to do sentiment analysis, and I keep getting errors with the serializers it uses (I think) to pass python objects around.
PySpark worker failed with exception:
Traceback (most recent call last):
File…
seashark97
- 71
- 1
- 3
7
votes
1 answer
On a multi lingual sentiment corpus
I am looking to compile a sentiment corpus for news articles in multiple languages (~100k per lang. for a machine learning experiment) where each article is labeled positive, neutral, or negative. I have searched high and low but could not find…
Chris
- 193
- 1
- 6
7
votes
1 answer
Understanding of naive bayes: computing the conditional probabilities
For a task on sentiment analysis, suppose we have some classes represented by $c$ and features $i$.
We can represent the conditional probability of each class as: $$P(c | w_i) = \frac{P(w_i|c) \cdot P(c)}{P(w_i)}$$
where $w_i$ represents each…
user19241256
- 173
- 3
7
votes
2 answers
Combining Machine Learning classifier with NLTK Vader for Sentiment Analysis
As a part of my university project, I am researching/developing a sentiment analysis model wherein I am trying to combine NLTK Vader (SentimentIntensityAnalyzer) results with a Machine Learning trained classifier for prediction of Sentiments on…
Chetan
- 171
- 3
7
votes
5 answers
Training Dataset for Sentiment Analysis of Movie Reviews
I am currently working on sentiment analysis using Python. I wanted to find whether reviews given for a movie is positive or negative based on sentiment analysis. I have found a training dataset as provided in this link.
This dataset have reviews…
SRS
- 1,045
- 5
- 11
- 22