3

I was recently playing around with llama_index and llama Hub and found it very easy to be used on my own data set. While this is nice, I cannot expose sensitive data of my organization to external resources (such as OpenAI).

I read there's a new local port called GPT4All, but wasn't able to find how (if any) to introduce a new data into its model.

Can someone please recommend of a suitable alternative that can be used really locally?

Ben
  • 199
  • 7
  • introduce new data for domain fine-tuning, instruction fine-tuning or inference? – Franck Dernoncourt Apr 07 '23 at 18:24
  • Not sure what is the difference so I'll put it in my own words: "I'd like to use my own dataset with the capabilities of ChatGPT". e.g. creating a Slack bot that will be "trained" on the data discussed in my organization Slack channels – Ben Apr 07 '23 at 18:27

1 Answers1

1

See the gpt4all issue #374, "How to train the model with my own files". One comment there says:

My understanding is that embeddings and retraining (fine-tuning) are different. If you just want extra info, you can embed, if you want new knowledge or style, you probably need to fine-tune.

which cites: "GPT4All Story, Fine-Tuning, Model Bias, Centralization Risks, AGI (Andriy Mulyar)", § "Fine Tuning Options" (@35:12).

Geremia
  • 111
  • 3