Highest Voted 'ngrams' Questions - Data Science Stack Exchange

12

votes

1 answer

ngram and RNN prediction rate wrt word index

I tried to plot the rate of correct predictions (for the top 1 shortlist) with relation to the word's position in sentence : I was expecting to see a plateau sooner on the ngram setup since it needless context. However, one thing I wasn't expecting…

asked Oct 27 '15 at 09:55

Arkantus

157
3

6

votes

2 answers

N-grams for RNNs

Given a word $w_{n}$ a statistical model such a Markov chain using n-grams predicts the subsequent word $w_{n+1}$. The prediction is by no means random. How is this translated into a neural model? I have tried tokenizing and sequencing my sentences,…

nlp lstm rnn ngrams

asked Jun 19 '20 at 23:08

mojbius

61
2

6

votes

2 answers

Clustering or classifing n-gram-based text categories

I have large set of data records looking like this: "text", "category" I extract n-grams from text (2-, 3- and 4-grams) and store count of each n-gram per category, like so: "ngram1", "category1", 1000 "ngram1", "category2", 20 "ngram1",…

classification clustering text-mining ngrams

asked May 08 '17 at 14:10

Andrzej H

169
1
3

4

votes

1 answer

In smoothing of n-gram model in NLP, why don't we consider start and end of sentence tokens?

When learning Add-1 smoothing, I found that somehow we are adding 1 to each word in our vocabulary, but not considering start-of-sentence and end-of-sentence as two words in the vocabulary. Let me give an example to explain. Example: Assume we have…

nlp language-model stanford-nlp ngrams

asked Oct 05 '20 at 10:47

KGhatak

123
6

4

votes

1 answer

Artificially increasing frequency weight of word ending characters in word building

I have a database of letter pair bigrams. For example: +-----------+--------+-----------+ | first | second | frequency | +-----------+--------+-----------+ | gs | so | 1 | | gs | sp | 2 | | gs | sr …

machine-learning python markov-process ngrams

asked Mar 15 '19 at 23:51

Matt

141
2

3

votes

1 answer

FastText Model Explained

I was reading the FastText paper and I have a few questions about the model used for classification. Since I am not from NLP background, some I am unfamiliar with the jargon. In the figure, what exactly is are the $x_i$? I am not sure what $N$…

nlp ngrams fasttext

asked May 28 '20 at 11:28

Black Jack 21

173
6

3

votes

0 answers

Understanding Kneser-Ney Formula for implementation

I am trying to implement this formula in Python $$ \frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} + \lambda(c_{KN}(w^{i-1}_{i-n+1})\mathbb{P}(c_{KN}(w_{i}|w^{i-1}_{i-n+2})$$ where $$ \mathrm{c_{KN}}(\cdot) = \begin{cases} …

nlp mathematics language-model ngrams

asked Feb 09 '22 at 17:18

Wolfy

237
2
9

2

votes

1 answer

Usage of KL divergence to improve BOW model

For a university project, I chose to do sentiment analysis on a Google Play store reviews dataset. I obtained decent results classifying the data using the bag of words (BOW) model and an ADALINE classifier. I would like to improve my model by…

classification ngrams bag-of-words

asked May 05 '21 at 08:34

Balocre

23
3

2

votes

1 answer

NLP: find the best preposition for connecting parts of a sentence

My task is to connect 2-3 parts of the sentence into one whole using a preposition the first part is some kind of action. Ex. "take pictures" the second part is an object that can consist of only one noun or a noun with adjectives and…

nlp ngrams

asked Apr 05 '21 at 17:07

Liza Savenko

21
4

2

votes

1 answer

How to customize word division in CountVectorizer?

>>> from sklearn.feature_extraction.text import CountVectorizer >>> import numpy >>> import pandas >>> vectorizer = CountVectorizer() >>> corpus1 = ['abc-@@-123','cde-@@-true','jhg-@@-hud'] >>> xtrain = vectorizer.fit_transform(corpus1) >>>…

python scikit-learn regex ngrams tokenization

asked Jun 14 '18 at 14:54

helloworld

23
1
3

2

votes

1 answer

How to improve Naive Bayes?

I am solving a problem that address this question "What are the Actions that lead to high or low score?" I have the following Data that consist of text and score , I want to derive the words or Actions from text that lead to high/low score I have…

classification naive-bayes-classifier ngrams

asked May 09 '18 at 04:52

sara

481
7
15

2

votes

1 answer

Classifying short strings of text with additional context

I have a list of short strings each identifying a city. Misspellings are very common. The example below shows some of these short strings, along with the correct city they're supposed to…

neural-network decision-trees text-classification text ngrams

asked Oct 06 '21 at 11:24

Jivan

165
1
8

1

vote

1 answer

what is the training phase in N-gram model?

Following is my understanding of N gram model used in text prediction case : Given a sentence say, " I love my " (say N = 1 /bigram), using N gram and say 4 possible candidates ( country, family, wife, school) I can estimate the conditional…

nlp ngrams

asked Jan 22 '21 at 13:09

black sheep 369

172
5

1

vote

0 answers

Self Organising Map with variable length ordered sets of N-grams

I want to preface my question with the highlighted situation I have might not be applicable to kohonen self organising maps (SOM) due to a lack of understanding on my part so I do apologise if that is the case. If this the case I would greatly…

feature-selection features ngrams

asked Dec 20 '20 at 22:09

Cookies

111
2

1

vote

0 answers

For an n-Gram model with n>2, do we need more context at end of each sentence?

Jurafsky's book says we need to add context to left and right of a sentence: Does this mean, for example, if we've a corpus of three sentences: "John read Moby Dick", "Mary read a different book", and "She read a book by Cher"; and after training…

nlp language-model stanford-nlp ngrams

asked Oct 05 '20 at 15:06

KGhatak

123
6

Questions tagged [ngrams]