0

At a super high level, I would like to create a fantasy language AI tutor. For this question, however, I would like to better understand how, generally speaking, you add your own custom data/knowledge to the LLM.

In my case, I have a spreadsheet of ~4,000 dictionary terms, with fantasy word and English definition. I would like to "teach the AI" about these terms, so it can respond with the definition when asked about a term (and, eventually later, can speak using these terms using the fantasy language grammar). In this sense, I need to "add to the LLM" it seems like, but I'm not sure where this fits into the LangChain/AI puzzle.

How do I add my own custom knowledge to the AI?

I have read a little about vector databases, such as Pinecone, but I'm not sure that's what I need to do. Is that how you customize the knowledge an AI assistant would have? Why not just customize the LLM directly? Is a vector database memory essentially a knowledge plugin to an LLM I guess? What am I missing in terms of understanding this picture correctly?

I need to add my dictionary definitions to the AI and have it remember them, do I just create "system" prompts to the AI, one prompt per term, to "teach" it the terms? Or convert the term + definition into vector embeddings, and store that in Pinecone? Not quite sure how the pieces fit together yet, looking for some guidance at a high level, don't really need the actual code snippets at this point, just what goes where generally speaking.

This "langchain retrieval augmentation" notebook says:

Large Language Models (LLMs) have a data freshness problem. The most powerful LLMs in the world, like GPT-4, have no idea about recent world events.

The world of LLMs is frozen in time. Their world exists as a static snapshot of the world as it was within their training data.

A solution to this problem is retrieval augmentation. The idea behind this is that we retrieve relevant information from an external knowledge base and give that information to our LLM. In this notebook we will learn how to do that.

That sort of helps, but if you could paint a slightly richer picture of what needs to happen, that would help.

Lance
  • 263
  • 2
  • 7
  • https://blog.ml6.eu/leveraging-llms-on-your-domain-specific-knowledge-base-4441c8837b47 – Lance Aug 11 '23 at 08:39

0 Answers0