Questions tagged [language-model]

Language models are used extensively in Natural Language Processing (NLP) and are probability distributions over a sequence of words or terms.

Language models are used extensively in Natural Language Processing (NLP) and are probability distributions over a sequence of words or terms. Commonly, language models are constructed to determine the probability of any given word given the set of n previous words. A popular language model is an n-gram one which has two variations: unigram and bigram.

The unigram model (Bag of Words, n=1):

$P_{unigram}(w_1,w_2,w_3,w_4) = P(w_1)P(w_2)P(w_3)P(w_4)$

The bigram model (n=2):

$P_{bigram}(w_1,w_2,w_3,w_4) = P(w_1)P(w_2|w_1)P(w_3|w_2)P(w_4|w_3)$

Other more sophisticated methods for constructing language models also exist using Exponential and Neural Networks.

142 questions

votes

4 answers

What is purpose of the [CLS] token and why is its encoding output important?

I am reading this article on how to use BERT by Jay Alammar and I understand things up until: For sentence classification, we’re only only interested in BERT’s output for the [CLS] token, so we select that slice of the cube and discard everything…

asked Jan 09 '20 at 17:20

user3768495

votes

7 answers

What is the difference between model hyperparameters and model parameters?

I have noticed that such terms as model hyperparameter and model parameter have been used interchangeably on the web without prior clarification. I think this is incorrect and needs explanation. Consider a machine learning model, an SVM/NN/NB based…

machine-learning parameter hyperparameter language-model

asked Sep 24 '16 at 11:24

minerals

2,137
3
17
19

votes

6 answers

Are there any good out-of-the-box language models for python?

I'm prototyping an application and I need a language model to compute perplexity on some generated sentences. Is there any trained language model in python I can readily use? Something simple like model = LanguageModel('en') p1 =…

python nlp language-model

asked Sep 20 '18 at 13:34

Fred

votes

2 answers

Word2Vec embeddings with TF-IDF

When you train the word2vec model (using for instance, gensim) you supply a list of words/sentences. But there does not seem to be a way to specify weights for the words calculated for instance using TF-IDF. Is the usual practice to multiply the…

machine-learning nlp word2vec language-model tfidf

asked Mar 04 '18 at 12:07

SFD

votes

1 answer

What is whole word masking in the recent BERT model?

I was checking BERT GitHub page and noticed that there are new models built from a new training technique called "whole word masking". Here is a snippet describing it: In the original pre-processing code, we randomly select WordPiece tokens to…

nlp language-model bert

asked Jun 15 '19 at 23:13

kee

votes

5 answers

How to create a good list of stopwords

I am looking for some hints on how to curate a list of stopwords. Does someone know / can someone recommend a good method to extract stopword lists from the dataset itself for preprocessing and filtering? The Data: a huge amount of human text input…

data-mining nlp information-retrieval language-model

asked May 24 '15 at 21:45

PlagTag

votes

1 answer

What is generative and discriminative model? How are they used in Natural Language Processing?

This question asks about generative vs. discriminative algorithm, but can someone give an example of the difference between these forms when applied to Natural Language Processing? How are generative and discriminative models used in NLP?

nlp language-model

asked May 18 '14 at 06:17

alvas

2,340
6
25
38

votes

2 answers

Is BERT a language model?

Is BERT a language model in the sense of a function that gets a sentence and returns a probability? I know its main usage is sentence embedding, but can it also provide this functionality?

nlp bert transformer language-model

asked May 13 '20 at 12:22

Amit Keinan

votes

3 answers

Can finite state machines be encoded as input/output for a neural network?

I want to encode finite state machines (specifically DFAs) as output (or input) of a neural network for a supervised learning task. Are there any ways in the literature for doing this? I've already found some algorithms being able to extract a DFA…

neural-network classification language-model

asked Jul 21 '16 at 09:48

Gabrer

votes

5 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…

nlp language-model gpt

asked Feb 03 '23 at 08:57

user141493

votes

1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…

nlp computer-vision language-model gpt deepmind

asked Oct 24 '22 at 23:26

user141493

votes

2 answers

What is the difference between GPT blocks and Transformer Decoder blocks?

I know GPT is a Transformer-based Neural Network, composed of several blocks. These blocks are based on the original Transformer's Decoder blocks, but are they exactly the same? In the original Transformer model, Decoder blocks have two attention…

deep-learning transformer language-model

asked Nov 16 '20 at 09:54

Leevo

6,005
3
14
51

votes

1 answer

What does 'Linear regularities among words' mean?

Context: In the paper "Efficient Estimation of Word Representations in Vector Space" by T. Mikolov et al., the authors make use of the phrase: 'Linear regularities among words'. What does that mean in the context of the paper, or in a general…

nlp language-model representation

asked Mar 04 '19 at 15:34

Dawny33

8,226
12
47
104

votes

2 answers

Further Training a pre-trained LLM

My goal is to use the general knowledge and language understanding of a pre-trained LLM and to continue training on a smaller domain specific corpus to improve the model's knowledge on the domain. What is the best practice approach here without…

transformer transfer-learning language-model pretraining

asked Jun 12 '23 at 09:57

Arthuro

votes

1 answer

What size language model can you train on a GPU with x GB of memory?

I'm trying to figure out what size language model I will be able to train on a GPU with a certain amount of memory. Let's for simplicity say that 1 GB = 109 bytes; that means that, for example, on a GPU with 12 GB memory, I can theoretically fit 6…

nlp gpu language-model memory

asked Jan 02 '23 at 01:14

HelloGoodbye

2 3

…

9 10 Next