1

I need to write a program(like a chatbot) that retrieves an answer from a CSV datafile based on a question user asks. So for example if the CSV stores list of products and its specifications in 5-10 columns, then if a user asks a question about specification Y for product X the program should return the correct answer based on CSV. I need to use NLP as the user can write synonyms of a particular word or ask a question a bit differently from the keywords in the dataset.

I think I am supposed to use BERT model using HuggingFace Transformer, but I'm not sure how to use NLP as this is over structured data. Additionally, I don't have a list of questions generated already.

Does anyone suggest how I should do this.

Also some of the specifications are values like prices. I was wondering if there is a way for the program to return the average or sum of two or more products if the user asks that question.

1 Answers1

2

One efficient way is to use the roberta base squad 2 model, using your text as context and then ask questions. It should work well and the model can be downloaded directly.

git lfs install
git clone https://huggingface.co/deepset/roberta-base-squad2

Here is an extract of code to use it:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

You can also test the Robert squad 2 model using this link:

https://huggingface.co/deepset/roberta-base-squad2?context=Francisco+is+in+the+west+USA.&question=Where+is+San+Francisco%3F

You can also fine-tune your model on your data: https://github.com/deepset-ai/haystack/blob/master/tutorials/Tutorial2_Finetune_a_model_on_your_data.ipynb

Nicolas Martin
  • 4,509
  • 1
  • 6
  • 15