Questions tagged [semi-supervised-learning]

Making use of both unsupervised and supervised learning paradigms to train on a partially labelled dataset.

In a partly labeled dataset, using only the labeled observations to train a model can prove non-optimal. The remaining, unlabeled part of the dataset may contain valuable information about data structure, that could be used to improve the model, especially when the proportion of labeled data is low.

The semi-supervised learning approach uses both unsupervised learning and supervised learning concepts in order to get the best from a dataset. This paradigm includes specific semi-supervised techniques as well as mixed-up approaches using standard supervised and unsupervised methods.

54 questions
11
votes
3 answers

Build a binary classifier with only positive and unlabeled data

I have 2 datasets, one with positive instances of what I would like to detect, and one with unlabeled instances. What methods can I use ? As an example, suppose we want to understand detect spam email based on a few structured email characteristics.…
nassimhddd
  • 587
  • 4
  • 12
9
votes
4 answers

Why positive-unlabeled learning?

Machine learning can be divided into several areas: supervised learning, unsupervised learning, semi-supervised learning, learning to rank, recommendation systems, etc, etc. One such area is PU Learning, where only Positive and Unlabeled instances…
5
votes
1 answer

Custom conditional loss function in Keras

I'm looking for a way to create a conditional loss function that looks like this: there is a vector of labels, say l (l has the same length as the input x), then for a given input (y_true, y_pred, l) the loss should be: def…
Tian
  • 51
  • 1
  • 1
  • 3
5
votes
2 answers

General strategy for imbalanced, semi-supervised, sparse problem

I am looking for some general advice on where to start with this problem. There are 350 sparse (low positive integer) features. I have 2000 positives, 1000 negatives, and infinite unlabeled data, where the estimated true positive rate in the…
5
votes
3 answers

Predictive clustering

I have an hypothesis but i don't know if it's true. If the cluster is dense and we apply a supervised learning on this data, the model generated by this cluster will be more efficient for new data falling into this cluster than other. Thus we have…
KyBe
  • 410
  • 3
  • 13
3
votes
1 answer

How to approach semi-supervised binary classification problem with few labels only from one class?

I confront with a binary classification problem where I do have a few instances with labels (so far this is "semi-supervised" learning as far as I know), but only from the positive class. So I cannot take any negative examples as basis for learning…
3
votes
2 answers

Time series binary classificaiton with labelling issues

My situation is quite complicated so I will give a similar example from a simpler domain. Suppose we want to try to predict WHEN a mobile game users will make a purchase if given a sale. Almost every user is always instantaneously a non-purchaser…
3
votes
1 answer

Probability for label correctness in semi-supervised learning

I am aware of the existence of semi-supervised learning approaches, such as the Ladder Network, where only a subset of the data is labeled. Are there any methods or papers which consider correctness probabilities for the labels of that training data…
2
votes
1 answer

Solutions for Labelling Training Data for Binary Classification Problems

I have a huge dataset for which I am trying to use an 80-20 (Holdout method) approach to train and test my model. However, the dataset I have been given has 6m rows. The objective is to train+test+validate the model before using live data traffic…
2
votes
1 answer

What is the difference between all the different types of learning within machine learning?

This is a question that is really hard to google, and the differences are confusing. Does anyone have good examples of the differences between them all? Supervised Learning Semi-Supervised Learning Distant Supervision Active Learning Lightly…
2
votes
4 answers

Supervised clustering

I'm working on a clustering problem. I have a training set composed of sets of points where the clusters are known and I want to find the good clusters on a testing dataset. It's a kind of supervised clustering. I looked for articles about…
2
votes
0 answers

Inductive vs Transductive Learning

I am reading about Inductive and Transductive Learning. Some of the questions that come to mind are the following: What is the difference between these two? Which algorithms are usually employed for these methods? Why would someone choose the…
Outcast
  • 1,037
  • 2
  • 11
  • 27
2
votes
1 answer

Generic strategy for object detection

I have a huge collection of objects from which only a tiny fraction are in a class of interest. The collection is initially unlabelled, but labels can be added using an expensive operation (for example, by human). Currently I use the simple generic…
2
votes
1 answer

Accuracy after selftraining didn't change

I used Decisiton Tree Classifier which I trained with 50 000 samples. I have also set with unlabeled samples, so I decided to use self training algorithm. Unlabeled set has 10 000 samples. I would like to ask if it is normal, that after retrainig…
SMI9
  • 21
  • 5
2
votes
0 answers

Neural Network for detecting/checking for requirements in diagrams

My question is more about what approach is a good/the best approach for my problem: THE PROBLEM - I'm an (mechanical/software) engineer and we take extensive amount of time to review technical drawings prior to them being complete/ready/meeting…
1
2 3 4