Questions tagged [labelling]

26 questions
5
votes
5 answers

Is using GPT-4 to label data advisable?

If I have a lot of text data that needs to be labeled (e.g. sentiment analysis), and given the high accuracy of GPT-4, could I use it to label data? Or would that introduce bias or some other issues?
2
votes
1 answer

Solutions for Labelling Training Data for Binary Classification Problems

I have a huge dataset for which I am trying to use an 80-20 (Holdout method) approach to train and test my model. However, the dataset I have been given has 6m rows. The objective is to train+test+validate the model before using live data traffic…
2
votes
1 answer

Online Audio annotation tools

I need to find a decent online annotation tool to transcribe audio. There are some requirements for a potential tool: I should be able to deliver audio files to a few labelers. I should be able to track which files went to which labeler. It should…
Aidos
  • 123
  • 3
2
votes
1 answer

Python package for machine-learning aided data labelling

In a lot of cases unlabelled data needs to be transformed to labelled data. The best solution is to use (multiple) human classifiers. However, going to all the data by hand (i.e. in text-mining or image-processing) is often a daunting task. Is there…
Pieter
  • 961
  • 6
  • 19
2
votes
1 answer

too much data to label

I'm working on a Data Science project to flag bots on Instagram. I collected a lot of data (+80k users) and now I have to label them as bot/legit users. I already flagged 20k users with different techniques but now I feel like I'm gonna have to flag…
Marc
  • 222
  • 1
  • 7
2
votes
0 answers

Labelling a dataset for sentiment analysis, which model is the best?

I want to do some sentiment analysis on a large text dataset I scraped. From what I've learned so far, I know that I need to either manually label each text data (positive, negative, neutral) or use a pre-trained model like bert and textblob. I…
Dan K
  • 21
  • 1
1
vote
0 answers

Best practices to image annotation for object detection when objects overlap

If I have the following example: How should I annotate the bottom image? I can think of those scenarios: Create a large box that captures class B and a second box that captures entirely class A. This will lead to overlapping…
1
vote
2 answers

How do I label images faster

I have around 1600 images extracted from videos shot at night time. I am labeling each image and trying to be as accurate as I can in assigning bounding boxes. I am labeling vehicles and traffic light/traffic signs. This is very time-consuming, I am…
Vendetta
  • 121
  • 2
1
vote
1 answer

Label A records B times or label A*B records

This question concerns pre-training data sourcing. Suppose you have a human workforce of B individuals and a potentially unlimited source of data. The task is labeling images with classes. These classes are somewhat subjective (emotions). This…
1
vote
1 answer

Labelling large amounts of audio data in automatic or semi-automatic way

I am working on a project, where I have to label the audio datasets which has thousands of data, each audio data is for one second. I have to label where it is in idle or event happening or noise. I used some tool like Audacity and Labelstudio, I…
1
vote
1 answer

Sub labelling of an object

First timer in image processing - Pardon my cluelessness. Is there a concept of sub labeling in objection identification? I want to label a person and sub label "eye" of a person and train a model to detect if the person's eye is open or closed. i.e…
Jean
  • 111
  • 2
1
vote
0 answers

How to label legit users when trying developing a bot flagging classification model?

I’m working on a project where I try to flag bots from legit users on social media. The data I collected is not labeled but I have labeled about 17% of it (22k users) thought different techniques. Finding bots was easy as they all have similarities…
Marc
  • 222
  • 1
  • 7
1
vote
1 answer

CRFSuite/Wapiti: How to create intermediary data for running a training?

After having asked for and been suggested two pieces of software last week (for training a model to categorize chunks of a string) I'm now struggling to make use of either one of them. It seems that in machine learning (or at least, with CRF?), you…
Sixtyfive
  • 125
  • 5
1
vote
1 answer

Software/Library Suggestion: Is there a usable open-source sequence tagger around?

(Not sure if this is the right community for the question - please do downvote if stats. or whatever else is more appropriate...) I'm looking for a suggestion for either a command-line tool or library (preferably Python or Ruby, but at this point,…
Sixtyfive
  • 125
  • 5
0
votes
1 answer

How should I construct a binary classifier for small set of positive data and million of unlabeled data?

Does anyone have suggestions for specific algorithm or implementation for labeled data of only one class and unlabeled data that can be from either classes? And I'm unsure what is the proportion of Class A to B that exists within the unlabeled data…
Deli
  • 1
1
2