Questions tagged [machine-learning]

Machine Learning is a subfield of computer science that draws on elements from algorithmic analysis, computational statistics, mathematics, optimization, etc. It is mainly concerned with the use of data to construct models that have high predictive/forecasting ability. Topics include modeling building, applications, theory, etc.

What is Machine Learning?

Modern applications of Machine Learning are wide ranging including those in Bioinformatics, Astronomy, Computational Physics, Economics, Natural Language Processing, Image Recognition/Object Detection, Robotics, Recommendation Systems, etc.

machine-learning Tag usage

When posting questions about Machine Learning, please make sure to take the following into consideration:

All questions should include both sufficient detail and clarity to be able to solve the problem at hand. This includes links to original data sources, code used for model construction, links to tutorials/other resources used, etc.
Questions should generally be more specific than "which model should I use" or "how can I achieve this" and explain what has been attempted/done so far.
Unless directly related to the problem, all questions regarding where to get data (sources, APIs, datasets, etc.) should not be posted on Stack Exchange Data Science, but rather on: Open Data Stack Exchange.

Types

Please see below for a (non-exhaustive) list of the types of Machine Learning:

Supervised Learning supervised-learning
Unsupervised Learning unsupervised-learning
Semisupervised Learning semisupervised-learning
Reinforcement Learning reinforcement-learning
Deep Learning deep-learning

External Resources

scikit-learn: Machine Learning in Python

Machine Learning Journals

11236 questions

193

votes

6 answers

How to draw Deep learning network architecture diagrams?

I have built my model. Now I want to draw the network architecture diagram for my research paper. Example is shown below:

machine-learning neural-network deep-learning svm software-recommendation

asked Nov 03 '16 at 03:10

Muhammad Ali

2,437
5
19
22

191

votes

16 answers

Train/Test/Validation Set Splitting in Sklearn

How could I randomly split a data matrix and the corresponding label vector into a X_train, X_test, X_val, y_train, y_test, y_val with scikit-learn? As far as I know, sklearn.model_selection.train_test_split is only capable of splitting into two not…

machine-learning scikit-learn cross-validation

asked Nov 15 '16 at 14:55

Hendrik

8,377
17
40
55

188

votes

5 answers

What is the "dying ReLU" problem in neural networks?

Referring to the Stanford course notes on Convolutional Neural Networks for Visual Recognition, a paragraph says: "Unfortunately, ReLU units can be fragile during training and can "die". For example, a large gradient flowing through a ReLU…

machine-learning neural-network deep-learning

asked May 07 '15 at 04:11

tejaskhot

3,935
7
20
18

174

votes

20 answers

How do you visualize neural network architectures?

When writing a paper / making a presentation about a topic which is about neural networks, one usually visualizes the networks architecture. What are good / simple ways to visualize common architectures automatically?

machine-learning neural-network deep-learning visualization

asked Jul 18 '16 at 17:08

Martin Thoma

18,630
31
92
167

151

votes

6 answers

The cross-entropy error function in neural networks

In the MNIST For ML Beginners they define cross-entropy as $$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$$ $y_i$ is the predicted probability value for class $i$ and $y_i'$ is the true probability for that class. Question 1 Isn't it a problem that…

machine-learning tensorflow

asked Dec 10 '15 at 06:22

Martin Thoma

18,630
31
92
167

150

votes

17 answers

Best python library for neural networks

I'm using Neural Networks to solve different Machine learning problems. I'm using Python and pybrain but this library is almost discontinued. Are there other good alternatives in Python?

machine-learning python neural-network

asked Jul 07 '14 at 19:17

marcodena

1,667
4
14
17

127

votes

14 answers

Python vs R for machine learning

I'm just starting to develop a machine learning application for academic purposes. I'm currently using R and training myself in it. However, in a lot of places, I have seen people using Python. What are people using in academia and industry, and…

machine-learning r python

asked Jun 12 '14 at 06:04

user721

114

votes

10 answers

Choosing a learning rate

I'm currently working on implementing Stochastic Gradient Descent, SGD, for neural nets using back-propagation, and while I understand its purpose I have some questions about how to choose values for the learning rate. Is the learning rate related…

machine-learning neural-network deep-learning optimization hyperparameter

asked Jun 16 '14 at 18:08

ragingSloth

1,824
3
14
15

114

votes

5 answers

Why do cost functions use the square error?

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable. I have learnt that there is a hypothesis, which is: $h_\theta(x)=\theta_0+\theta_1x$ To find out good values for the…

machine-learning linear-regression loss-function

asked Feb 10 '16 at 21:52

Golo Roden

1,313
2
9
6

110

votes

9 answers

When should I use Gini Impurity as opposed to Information Gain (Entropy)?

Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)? Which metric is better to use in different scenarios while using decision trees?

machine-learning decision-trees information-theory gini-index entropy

asked Feb 12 '16 at 22:05

Krish Mahajan

1,201
2
9
4

votes

4 answers

Advantages of AUC vs standard accuracy

I was starting to look into area under curve(AUC) and am a little confused about its usefulness. When first explained to me, AUC seemed to be a great measure of performance but in my research I've found that some claim its advantage is mostly…

machine-learning accuracy

asked Jul 22 '14 at 03:43

aidankmcl

1,083
1
8
6

votes

6 answers

strings as features in decision tree/random forest

I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…

machine-learning python scikit-learn random-forest decision-trees

asked Feb 25 '15 at 01:07

user3001408

1,005
1
10
8

votes

7 answers

In supervised learning, why is it bad to have correlated features?

I read somewhere that if we have features that are too correlated, we have to remove one, as this may worsen the model. It is clear that correlated features means that they bring the same information, so it is logical to remove one of them. But I…

machine-learning correlation

asked Nov 07 '17 at 14:37

Spider

1,239
1
12
12

votes

9 answers

Data scientist vs machine learning engineer

What are the differences, if any, between a "data scientist" and a "machine learning engineer"? Over the past year or so "machine learning engineer" has started to show up a lot in job postings. This is particularly noticeable in San Francisco,…

machine-learning

asked Feb 20 '18 at 06:15

Ryan Zotti

4,129
3
19
32

votes

5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…

machine-learning algorithms xgboost ensemble-modeling gbm

asked Feb 11 '17 at 20:03

Aman

2 3

…

99 100 Next