How to measure accuracy of GPT model

Question

I am working on a model to build questions automatically from some text

My model will analyse provided article and ask authors questions that can help improving their articles

How can we measure the accuracy of these ML-generated questions?

There is the relevance part of the questions as these questions represent an area of improvement in the article

How to measure that?

Any previous work on similar models would be a great help too

Thanks

noe · Accepted Answer · 2023-07-29T18:40:39.057

2

You can check the Question Generation section of paperswithcode. There, you can see for different datasets how the performance is measured and how different proposed approaches compare on them.

Usually, you check how similar is the question to the reference text. Some used measures are BLEU-1 (based on matching unigrams) and ROUGE-L (based on the longest common subsequence). This is "unsupervised testing" in the sense that you don't need labeled data. However, they may not be directly correlated with their actual quality (see Towards a better metric for evaluating question generation systems and Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering).

In other cases, they use QA-based Evaluation (QAE), which measures how similar the generated QA pairs are compared to some ground truth QA pairs. For this, you need a reference labeled QA dataset on which the model is to be evaluated.

edited Jul 29 '23 at 18:40

answered Jul 29 '23 at 18:03

noe

22,074
1
43
70

If you find the answers useful, please consider upvoting it and accepting it if you consider it correct or, alternatively, please describe in a comment why you consider it incorrect or not clear enough. – noe Jul 29 '23 at 18:13
Thanks, but these measures focus on the "Answerability" of the questions.. What I am looking for is to measure the "Relativity" of the question; as is this the right question to ask? – asmgx Jul 30 '23 at 15:40
QAE focuses on answerability, but the others (BLEU, ROUGE) do not; you can check [_Towards a Better Metric for Evaluating Question Generation Systems_](https://aclanthology.org/D18-1429/)) regarding this. How do you define "relativity"? Are you referring to how related those questions are to the reference text? – noe Jul 30 '23 at 20:32
yes i meant how relavent those questions are to the reference text – asmgx Jul 30 '23 at 23:26

How to measure accuracy of GPT model

1 Answers1