Use a "ChatGPT like" engine really locally without exposing sensitive data

Question

I was recently playing around with llama_index and llama Hub and found it very easy to be used on my own data set. While this is nice, I cannot expose sensitive data of my organization to external resources (such as OpenAI).

I read there's a new local port called GPT4All, but wasn't able to find how (if any) to introduce a new data into its model.

Can someone please recommend of a suitable alternative that can be used really locally?

introduce new data for domain fine-tuning, instruction fine-tuning or inference? — Franck Dernoncourt, Apr 07 '23 at 18:24
Not sure what is the difference so I'll put it in my own words: "I'd like to use my own dataset with the capabilities of ChatGPT". e.g. creating a Slack bot that will be "trained" on the data discussed in my organization Slack channels — Ben, Apr 07 '23 at 18:27

score 1 · Answer 1 · answered May 15 '23 at 04:43

See the gpt4all issue #374, "How to train the model with my own files". One comment there says:

My understanding is that embeddings and retraining (fine-tuning) are different. If you just want extra info, you can embed, if you want new knowledge or style, you probably need to fine-tune.

which cites: "GPT4All Story, Fine-Tuning, Model Bias, Centralization Risks, AGI (Andriy Mulyar)", § "Fine Tuning Options" (@35:12).

Use a "ChatGPT like" engine really locally without exposing sensitive data

1 Answers1