Highest Voted 'llm' Questions - Data Science Stack Exchange

2

votes

1 answer

Implementation of spBLEU

I was looking for a way to explore evaluation metrics for language translation models and I came across spBLEU. I can’t find any implementations/examples that would help me start. Does anyone have a lead on what I can pursue? thanks in advance!

asked Aug 24 '23 at 12:50

Prithvi

23
2

1

vote

1 answer

How to measure accuracy of GPT model

I am working on a model to build questions automatically from some text My model will analyse provided article and ask authors questions that can help improving their articles How can we measure the accuracy of these ML-generated questions? There is…

accuracy gpt llm chatgpt

asked Jul 29 '23 at 17:34

asmgx

539
2
17

1

vote

1 answer

Does Negative Prompting Exist?

All the prompt engineering techniques I've seen seem to focus on telling the model what to do e.g. Few-Shot Prompting. Is there any value in giving the model examples of what not to do? Can you link me to any papers/techniques on the…

llm prompt-engineering

asked Jul 12 '23 at 12:29

codeananda

268
3
10

0

votes

0 answers

Fine-tuning LLM with limited documents and hierarchy

Hello LLM enthusiasts. I am wondering w.r.t. a neighbouring project if there are state of the art approaches to fine tune a model if: the realm of documents is limited (still more than just a few), these documents are regularly in a…

finetuning labelling llm

asked Aug 29 '23 at 11:59

MaK

1
2

0

votes

1 answer

Training embeddings on own dataset

In my project I follow the retrieval augmented generation (RAG) approach. I want to create embeddings for my own dataset and use it in combination with llama-2. In the dataset are german annual reports, 548 reports as pdf-files with about 300 sites…

machine-learning python nlp word-embeddings llm

asked Aug 28 '23 at 06:04

Christian01

101
1

0

votes

0 answers

Finetune LLM model on tabular data

Is it possible or even recommended to finetune LLMs such as llama2 on tabular data? I have a csv with historical gold buy prices. DAY,HOUR,OPEN,HIGH,LOW,CLOSE,VOLUME 2018.06.28,03:02,1.15603,1.15613,1.15602,1.15605,107 I'm hoping to be able to ask…

csv finetuning data-table llm

asked Aug 17 '23 at 15:42

fpena06

101

0

votes

0 answers

NER Named Entity Recognition using LLMs Like tranF5 or LLAMA2

I am trying to do NER (Named entity recognition) using Large language models like Trans-F5 or LLAMA2. Till now, I found the ways of using prompt engineering. Which means we need to specify what to find in the text. Can we fine tune and use these…

named-entity-recognition llm

asked Aug 16 '23 at 17:10

Sand T

11
3

0

votes

0 answers

Understanding alpha parameter tuning in LORA paper

I was reading the LORA paper https://arxiv.org/pdf/2106.09685.pdf a thing I don’t understand is section 4.1, where the updates are updated by alpha, where alpha is a constant in r. It is said that alpha is set to the first r tried. Then if I…

transformer finetuning llm

asked Aug 14 '23 at 08:52

jpotwor

1

0

votes

0 answers

How to add knowledge to the LLM using LangChain (at a high level)?

At a super high level, I would like to create a fantasy language AI tutor. For this question, however, I would like to better understand how, generally speaking, you add your own custom data/knowledge to the LLM. In my case, I have a spreadsheet of…

embeddings knowledge-base llm vector-database

asked Aug 11 '23 at 08:11

Lance

263
2
7

0

votes

0 answers

Teach LLM to generate code using a specific library

I am curious to know after seeing good code examples generated by Github copilot. I am wondering if I can create an Agent which basically takes commands as plain English and generates code based on one particular framework. Lets say I have a basic…

llm

asked Aug 11 '23 at 06:54

Lakshay Dulani

265
2
6

0

votes

0 answers

Imputing Missing Text Categories in Python Using Word Embeddings/Machine Learning/LLM

I'm working on a dataset where each row represents an entity with several attributes. The dataset includes fields such as 'id', 'category_name', 'text_content', 'created_at', 'last_updated_at', 'title', 'num_highlights', 'url', and 'external_url'.…

machine-learning nlp word-embeddings llm

asked Jul 17 '23 at 20:48

nehiljain

101
1

0

votes

0 answers

Are LLMs floating FP16?

I am curious to experiment with a project like this below where one can use an open-source LLM and then retune it with their own data where in this repo its a PDF in a folder called SOURCE_DOCUMENTS. The question I have is I don't have a GPU…

hardware llm

asked Jul 13 '23 at 15:06

bbartling

383
1
6
19

0

votes

1 answer

Can I run falcon-7b on a free google colab?

I'm a beginner at ML and AI. Background: I wanted to try out falcon-7b, the example I'm trying out: https://colab.research.google.com/drive/1BiQiw31DT7-cDp1-0ySXvvhzqomTdI-o?usp=sharing (Falcon-Guanaco example by hugging face team) . But I'm not…

huggingface google llm artificial-intelligence

asked Jul 13 '23 at 03:24

hungryWolf

103
2

0

votes

0 answers

What is the input and output of GPT model for fine-tuning?

From my understanding, for the pretraining of GPT model, we need to do next token prediction task. In this case, Input -> The GPT models are general-purpose language models that can perform ... (2048 tokens) Output-> GPT models are general-purpose…

gpt finetuning chatgpt llm

asked Jul 07 '23 at 11:50

Kyuwan

1

0

votes

0 answers

Is it a problem to store your vector database in memory?

I'm learning ChatGPT/LLM Development and am regularly coming across all different kinds of vector database implementations. Some of them, e.g. Chroma, currently only support in-memory implementations for Python. My initial reaction when I read that…

vector-database llm chatgpt

asked Jul 05 '23 at 12:51

codeananda

268
3
10

Questions tagged [llm]