Questions tagged [labels]

Use for questions about the labels associated with the ground-truth of a dataset. Typically these data points have been labelled by a domain expert and can therefore be assumed to be true, against which we can compare the predictions of our algorithms.

104 questions
18
votes
7 answers

Interactive labeling/annotating of time series data

I have a data set of time series data. I'm looking for an annotation (or labeling) tool to visualize it and to be able to interactively add labels on it, in order to get annotated data that I can use for supervised ML. E.g. the input data is a…
mibrl12
  • 283
  • 1
  • 2
  • 5
12
votes
3 answers

Mass convert categorical columns in Pandas (not one-hot encoding)

I have pandas dataframe with tons of categorical columns, which I am planning to use in decision tree with scikit-learn. I need to convert them to numerical values (not one hot vectors). I can do it with LabelEncoder from scikit-learn. The problem…
user1700890
  • 335
  • 1
  • 3
  • 13
9
votes
3 answers

Is (nearly) all data separable?

Suppose I have some data set with two classes. I could draw a decision boundary around each data point belonging to one of these classes, and hence, separate the data, like so: Where the red lines are the decision boundaries around the data points…
Data
  • 467
  • 3
  • 11
9
votes
1 answer

Is it valuable to normalize/rescale labels in neural network regression?

Have there been any papers, or does anyone have any specific experience to know whether normalizing labels in a regression problem is likely to improve the performance of a neural network? I have labels that are in the range (0,1000) applying square…
davidparks21
  • 413
  • 4
  • 17
8
votes
1 answer

How to handle preprocessing (StandardScaler, LabelEncoder) when using data generator to train?

So, I have a dataset that is too big to load into memory all at once. Therefore I want to use a generator to load batches of data to train on. In this scenario, how do I go about performing scaling of the features using LabelEncoder +…
6
votes
2 answers

Can I use confident predictions to correct incorrect labels?

From visual inspection of a sub-portion of my data, I estimate that around 5-6% of the labels are incorrect. My classifier still performs well and when I take prediction probabilities that are above .95 for a given class that are in contrast to the…
6
votes
3 answers

Tool for labeling audio

I have few thousand audio signals to label into 2 different classes and save them to numpy array for further training of models. MATLAB recently released Signal Labeler for their Signal Analyzer, that could help to label time series, but for certain…
6
votes
4 answers

Tool to Label Images for Supervised Classification

I have a couple thousand photos of whales taken from drones and I'm planning to build a simple binary classifier to run on these and future images to see if they contain a whale. I'd like to label specific pixels within the image as whale (1) or not…
clifgray
  • 179
  • 1
  • 1
  • 4
5
votes
1 answer

What is the way to modify a neural network classifier to deal with sample points from outside of the label set?

I am solving an image classification problem. However some photos may not belong to any category, and I'd like not to give any fake information, rather to capture this situation. What are the ways to do it. One idea I have in mind is to give an…
kmichael08
  • 63
  • 3
4
votes
1 answer

Given a t-SNE plot, how can I infer the "most correct" labels? How does one understand its structure?

Let's say I begin with an exceptionally large dataframe (e.g. imported/munged from tsv files). Several of these columns are categorical labels. (As a more concrete example, let's imagine a group of students in a school district, pre-school to…
3
votes
2 answers

Is there a clustering algorithm which accepts some clusters as input and outputs some more clusters?

Heres the task: I have data I don't know much about. The final task is to build a classifier to classify the samples into a few categories. Some of the categories are pretty clear, we can easily use these as labels for a classifier. But I guess…
3
votes
3 answers

Should you turn off label smoothing when validating?

As the subject says. On one hand, the answer should be yes because label smoothing is a regularization feature and how can you know if it improves performance without turning it off? On the other hand, I haven't seen any authoritative source…
3
votes
1 answer

When unsupervised learning is more beneficial in comparison with supervised learning even the labelings are existed?

When unsupervised learning is more beneficial in comparison with supervised learning even the labeling are existed? If there is no labeling the unsupervised learning is better than supervised learning but in some cases even the labeling targets are…
user10296606
  • 1,784
  • 5
  • 17
  • 31
3
votes
1 answer

Discriminator of a Conditional GAN with continuous labels

OK, let's say we have well-labeled images with non-discrete labels such as brightness or size or something and we want to generate images based on it. If it were done with a discrete label it could be done like: def forward(self, inputs, label): …
3
votes
2 answers

How to correct mislabeled data in dataset?

I have a dataset of about 300k records. Classes are highly imbalanced (which means that one may have 30k records, and the other may have only 100). Unfortunately, about 5% of records is incorrectly labeled. Is there any way of finding out which…
severin
  • 31
  • 1
1
2 3 4 5 6 7