Questions tagged [gpt]

88 questions
10
votes
2 answers

Does BERT has any advantage over GPT3?

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better…
Bipin
  • 203
  • 1
  • 2
  • 8
10
votes
1 answer

How is GPT able to handle large vocabularies?

From what I understand, GPT and GPT-2 are trained to predict the $N^{th}$ word in a sentence given the previous $N-1$ words. When the vocabulary size is very large (100k+ words) how is it able to generate any meaningful prediction? Shouldn't it…
AAC
  • 499
  • 2
  • 5
  • 13
9
votes
1 answer

How to summarize a long text using GPT-3

What is the best way to summarize a long text that exceeds 4096 token limit (like a podcast transcript for example)? As I understand I need to split the text into chunks to summarize, and then concatenate the results and summarize those. Is there…
Poma
  • 193
  • 1
  • 5
8
votes
1 answer

What tokenizer does OpenAI's GPT3 API use?

I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error. The closest I got to an answer…
Herman Autore
  • 83
  • 1
  • 3
8
votes
1 answer

How does an LLM "parameter" relate to a "weight" in a neural network?

I keep reading about how the latest and greatest LLMs have billions of parameters. As someone who is more familiar with standard neural nets but is trying to better understand LLMs, I'm curious if a LLM parameter is the same as a NN weight i.e. is…
slim_wizard
  • 83
  • 1
  • 4
8
votes
1 answer

BERT vs GPT architectural, conceptual and implemetational differences

In the BERT paper, I learnt that BERT is encoder-only model, that is it involves only transformer encoder blocks. In the GPT paper, I learnt that GPT is decoder-only model, that is it involves only transformer decoder blocks. I was guessing whats…
Rnj
  • 205
  • 2
  • 7
7
votes
5 answers

ChatGPT's Architecture - Decoder Only? Or Encoder-Decoder?

Does ChatGPT use an encoder-decoder architecture, or a decoder-only architecture? I have been coming across Medium and TowardsDataScience articles suggesting that it has an encoder-decoder architecture (see sources below): --…
user141493
  • 191
  • 1
  • 1
  • 8
7
votes
1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…
user141493
  • 191
  • 1
  • 1
  • 8
5
votes
5 answers

Is using GPT-4 to label data advisable?

If I have a lot of text data that needs to be labeled (e.g. sentiment analysis), and given the high accuracy of GPT-4, could I use it to label data? Or would that introduce bias or some other issues?
4
votes
1 answer

What's the right input for gpt-2 in NLP

I'm fine-tuning pre-trained gpt-2 for text summarization. The dataset contains 'text' and 'reference summary'. So my question is how to add special tokens to get the right input format. Currently I'm thinking doing like this: example1 text …
yuqiong11
  • 61
  • 1
  • 2
4
votes
2 answers

ChatGPT: How to use long texts in prompt?

I like the website chatpdf.com a lot. You can upload a PDF file and then discuss the textual content of the file with the file "itself". It uses ChatGPT. I would like to program something similar. But I wonder how to use the content of long PDF…
meyer_mit_ai
  • 63
  • 1
  • 1
  • 5
4
votes
2 answers

Does fine-tuning require retraining the entire model?

Would it be necessary to retrain the entire model if we were to perform fine-tuning? Let's say we somehow got the GPT-3 model from OpenAI (I know GPT-3 is closed source). Would anyone with access to a couple of RTX 3080 GPUs be able to fine tune it…
Exploring
  • 125
  • 8
3
votes
2 answers

Does the transformer decoder reuse previous tokens' intermediate states like GPT2?

I recently read Jay Alammar's blogpost about GPT-2 (http://jalammar.github.io/illustrated-gpt2/) which I found quite clear appart from one point : He explains that the decoder of GPT-2 processes input tokens one at a time, only actively processing…
Johncowk
  • 195
  • 1
  • 6
3
votes
2 answers

How to generate a sentence with exactly N words?

Thanks to GPT2 pretrained model now it is possible to generate meaningful sequence of words with or without prefix. However a sentence should end with a proper endings (.,!,?). I am just wondering how to generate a sentence (with proper ending) of…
user185597
  • 31
  • 2
3
votes
0 answers

Fine tune gpt2 via huggingface API for domain specific LM

i am using the script in the examples folder to fine-tune the LM for a bot meant to deal with insurance related queries. So if someone were to type "i am looking to modify my ..." , the autocomplete suggestions would be " modify my name ", "modify…
Vikram Murthy
  • 328
  • 1
  • 10
1
2 3 4 5 6