Use for questions about the labels associated with the ground-truth of a dataset. Typically these data points have been labelled by a domain expert and can therefore be assumed to be true, against which we can compare the predictions of our algorithms.
Questions tagged [labels]
104 questions
18
votes
7 answers
Interactive labeling/annotating of time series data
I have a data set of time series data. I'm looking for an annotation (or labeling) tool to visualize it and to be able to interactively add labels on it, in order to get annotated data that I can use for supervised ML.
E.g. the input data is a…
mibrl12
- 283
- 1
- 2
- 5
12
votes
3 answers
Mass convert categorical columns in Pandas (not one-hot encoding)
I have pandas dataframe with tons of categorical columns, which I am planning to use in decision tree with scikit-learn. I need to convert them to numerical values (not one hot vectors). I can do it with LabelEncoder from scikit-learn. The problem…
user1700890
- 335
- 1
- 3
- 13
9
votes
3 answers
Is (nearly) all data separable?
Suppose I have some data set with two classes. I could draw a decision boundary around each data point belonging to one of these classes, and hence, separate the data, like so:
Where the red lines are the decision boundaries around the data points…
Data
- 467
- 3
- 11
9
votes
1 answer
Is it valuable to normalize/rescale labels in neural network regression?
Have there been any papers, or does anyone have any specific experience to know whether normalizing labels in a regression problem is likely to improve the performance of a neural network? I have labels that are in the range (0,1000) applying square…
davidparks21
- 413
- 4
- 17
8
votes
1 answer
How to handle preprocessing (StandardScaler, LabelEncoder) when using data generator to train?
So, I have a dataset that is too big to load into memory all at once. Therefore I want to use a generator to load batches of data to train on.
In this scenario, how do I go about performing scaling of the features using LabelEncoder +…
Jim
- 181
- 3
6
votes
2 answers
Can I use confident predictions to correct incorrect labels?
From visual inspection of a sub-portion of my data, I estimate that around 5-6% of the labels are incorrect.
My classifier still performs well and when I take prediction probabilities that are above .95 for a given class that are in contrast to the…
Danyal Andriano
- 131
- 3
6
votes
3 answers
Tool for labeling audio
I have few thousand audio signals to label into 2 different classes and save them to numpy array for further training of models. MATLAB recently released Signal Labeler for their Signal Analyzer, that could help to label time series, but for certain…
Alexey Abramov
- 83
- 1
- 5
6
votes
4 answers
Tool to Label Images for Supervised Classification
I have a couple thousand photos of whales taken from drones and I'm planning to build a simple binary classifier to run on these and future images to see if they contain a whale. I'd like to label specific pixels within the image as whale (1) or not…
clifgray
- 179
- 1
- 1
- 4
5
votes
1 answer
What is the way to modify a neural network classifier to deal with sample points from outside of the label set?
I am solving an image classification problem. However some photos may not belong to any category, and I'd like not to give any fake information, rather to capture this situation. What are the ways to do it. One idea I have in mind is to give an…
kmichael08
- 63
- 3
4
votes
1 answer
Given a t-SNE plot, how can I infer the "most correct" labels? How does one understand its structure?
Let's say I begin with an exceptionally large dataframe (e.g. imported/munged from tsv files). Several of these columns are categorical labels.
(As a more concrete example, let's imagine a group of students in a school district, pre-school to…
ShanZhengYang
- 171
- 5
3
votes
2 answers
Is there a clustering algorithm which accepts some clusters as input and outputs some more clusters?
Heres the task: I have data I don't know much about. The final task is to build a classifier to classify the samples into a few categories. Some of the categories are pretty clear, we can easily use these as labels for a classifier. But I guess…
chefhose
- 91
- 5
3
votes
3 answers
Should you turn off label smoothing when validating?
As the subject says. On one hand, the answer should be yes because label smoothing is a regularization feature and how can you know if it improves performance without turning it off? On the other hand, I haven't seen any authoritative source…
Björn Lindqvist
- 163
- 5
3
votes
1 answer
When unsupervised learning is more beneficial in comparison with supervised learning even the labelings are existed?
When unsupervised learning is more beneficial in comparison with supervised learning even the labeling are existed?
If there is no labeling the unsupervised learning is better than supervised learning but in some cases even the labeling targets are…
user10296606
- 1,784
- 5
- 17
- 31
3
votes
1 answer
Discriminator of a Conditional GAN with continuous labels
OK, let's say we have well-labeled images with non-discrete labels such as brightness or size or something and we want to generate images based on it. If it were done with a discrete label it could be done like:
def forward(self, inputs, label):
…
user3023715
- 203
- 2
- 5
3
votes
2 answers
How to correct mislabeled data in dataset?
I have a dataset of about 300k records. Classes are highly imbalanced (which means that one may have 30k records, and the other may have only 100). Unfortunately, about 5% of records is incorrectly labeled.
Is there any way of finding out which…
severin
- 31
- 1