Questions tagged [language-model]

Language models are used extensively in Natural Language Processing (NLP) and are probability distributions over a sequence of words or terms.

Language models are used extensively in Natural Language Processing (NLP) and are probability distributions over a sequence of words or terms. Commonly, language models are constructed to determine the probability of any given word given the set of n previous words. A popular language model is an n-gram one which has two variations: unigram and bigram.

The unigram model (Bag of Words, n=1):

$P_{unigram}(w_1,w_2,w_3,w_4) = P(w_1)P(w_2)P(w_3)P(w_4)$

The bigram model (n=2):

$P_{bigram}(w_1,w_2,w_3,w_4) = P(w_1)P(w_2|w_1)P(w_3|w_2)P(w_4|w_3)$

Other more sophisticated methods for constructing language models also exist using Exponential and Neural Networks.

142 questions
67
votes
4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…
49
votes
7 answers

What is the difference between model hyperparameters and model parameters?

I have noticed that such terms as model hyperparameter and model parameter have been used interchangeably on the web without prior clarification. I think this is incorrect and needs explanation. Consider a machine learning model, an SVM/NN/NB based…
minerals
  • 2,137
  • 3
  • 17
  • 19
16
votes
6 answers

Are there any good out-of-the-box language models for python?

I'm prototyping an application and I need a language model to compute perplexity on some generated sentences. Is there any trained language model in python I can readily use? Something simple like model = LanguageModel('en') p1 =…
Fred
  • 403
  • 3
  • 9
16
votes
2 answers

Word2Vec embeddings with TF-IDF

When you train the word2vec model (using for instance, gensim) you supply a list of words/sentences. But there does not seem to be a way to specify weights for the words calculated for instance using TF-IDF. Is the usual practice to multiply the…
SFD
  • 281
  • 1
  • 2
  • 7
12
votes
1 answer

What is whole word masking in the recent BERT model?

I was checking BERT GitHub page and noticed that there are new models built from a new training technique called "whole word masking". Here is a snippet describing it: In the original pre-processing code, we randomly select WordPiece tokens to…
kee
  • 223
  • 2
  • 6
10
votes
5 answers

How to create a good list of stopwords

I am looking for some hints on how to curate a list of stopwords. Does someone know / can someone recommend a good method to extract stopword lists from the dataset itself for preprocessing and filtering? The Data: a huge amount of human text input…
PlagTag
  • 333
  • 1
  • 3
  • 10
10
votes
1 answer

What is generative and discriminative model? How are they used in Natural Language Processing?

This question asks about generative vs. discriminative algorithm, but can someone give an example of the difference between these forms when applied to Natural Language Processing? How are generative and discriminative models used in NLP?
alvas
  • 2,340
  • 6
  • 25
  • 38
9
votes
2 answers

Is BERT a language model?

Is BERT a language model in the sense of a function that gets a sentence and returns a probability? I know its main usage is sentence embedding, but can it also provide this functionality?
Amit Keinan
  • 776
  • 6
  • 19
7
votes
3 answers

Can finite state machines be encoded as input/output for a neural network?

I want to encode finite state machines (specifically DFAs) as output (or input) of a neural network for a supervised learning task. Are there any ways in the literature for doing this? I've already found some algorithms being able to extract a DFA…
Gabrer
  • 211
  • 2
  • 6
7
votes
5 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…
user141493
  • 191
  • 1
  • 1
  • 8
7
votes
1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…
user141493
  • 191
  • 1
  • 1
  • 8
6
votes
2 answers

What is the difference between GPT blocks and Transformer Decoder blocks?

I know GPT is a Transformer-based Neural Network, composed of several blocks. These blocks are based on the original Transformer's Decoder blocks, but are they exactly the same? In the original Transformer model, Decoder blocks have two attention…
Leevo
  • 6,005
  • 3
  • 14
  • 51
5
votes
1 answer

What does 'Linear regularities among words' mean?

Context: In the paper "Efficient Estimation of Word Representations in Vector Space" by T. Mikolov et al., the authors make use of the phrase: 'Linear regularities among words'. What does that mean in the context of the paper, or in a general…
Dawny33
  • 8,226
  • 12
  • 47
  • 104
5
votes
2 answers

Further Training a pre-trained LLM

My goal is to use the general knowledge and language understanding of a pre-trained LLM and to continue training on a smaller domain specific corpus to improve the model's knowledge on the domain. What is the best practice approach here without…
5
votes
1 answer

What size language model can you train on a GPU with x GB of memory?

I'm trying to figure out what size language model I will be able to train on a GPU with a certain amount of memory. Let's for simplicity say that 1 GB = 109 bytes; that means that, for example, on a GPU with 12 GB memory, I can theoretically fit 6…
HelloGoodbye
  • 161
  • 5
1
2 3
9 10