10

I have read a couple of documents that explain in detail about the greater edge that GPT-3(Generative Pre-trained Transformer-3) has over BERT(Bidirectional Encoder Representation from Transformers). So am curious to know whether BERT scores better than GPT-3 in any particular area of NLP?

It's quite interesting to note that OpenAI's GPT-3 is not open-sourced whereas tech behemoth Google's BERT is open-sourced. I felt OpenAI's stance and the hefty price tag for GPT-3 api is in stark contrast to its mission statement(OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity).

https://analyticsindiamag.com/gpt-3-vs-bert-for-nlp-tasks/ https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/ https://medium.com/towards-artificial-intelligence/gpt-3-from-openai-is-here-and-its-a-monster-f0ab164ea2f8

Bipin
  • 203
  • 1
  • 2
  • 8
  • Can you add links to the articles that you are referring to? – Akavall Sep 12 '20 at 06:20
  • https://analyticsindiamag.com/gpt-3-vs-bert-for-nlp-tasks/ – Bipin Sep 12 '20 at 06:24
  • https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/ – Bipin Sep 12 '20 at 06:24
  • https://medium.com/towards-artificial-intelligence/gpt-3-from-openai-is-here-and-its-a-monster-f0ab164ea2f8 – Bipin Sep 12 '20 at 06:25
  • My query is that despite GPT-3 having an upper hand over BERT, does the latter has anything that scores over the former? – Bipin Sep 12 '20 at 06:28
  • There are some things that BERT is really good for, Named Entity Recognition, Sentiment Analysis, Question answering. I don't see these articles claiming that GPT-3 can do these tasks better than BERT. – Akavall Sep 13 '20 at 05:37

2 Answers2

6

This article on Medium introduces GPT-3 makes some comparisons with BERT.

Specifically, section 4 examines how GPT-3 and BERT differ and mentions that: "On the Architecture dimension, BERT still holds the edge. It’ s trained-on challenges which are better able to capture the latent relationship between text in different problem contexts."

Also, in section 6 from the article, author lists areas where GPT-3 struggles. It may be that BERT and other bi-directional encoder/transformers may do better, although I have no data/references to support this yet.

Langley
  • 161
  • 1
  • 3
4

BERT needs to be fine-tuned to do what you want.

GPT-3 cannot be fine-tuned (even if you had access to the actual weights, fine-tuning it would be very expensive)

If you have enough data for fine-tuning, then per unit of compute (i.e. inference cost), you'll probably get much better performance out of BERT.

MWB
  • 141
  • 4