Questions tagged [deep-learning]

a new area of Machine Learning research concerned with the technologies used for learning hierarchical representations of data, mainly done with deep neural networks (i.e. networks with two or more hidden layers), but also with some sort of Probabilistic Graphical Models.

What is Deep Learning?

Deep Learning is an area of which attempts to build to learn complex functions by using special architectures composed of many layers (hence the term "deep").

Deep architectures allow more complex tasks to be learned because, in addition to these neural networks having more layers to perform transformations, the larger number of layers and more complex architectures of the neural network allow a hierarchical organization of functionality to emerge.

Deep Learning was introduced into machine learning research with the intention of moving machine learning closer to artificial intelligence. A significant impact of deep learning lies in feature learning, mitigating much of the effort going into manual feature engineering in non-deep learning neural networks.


New to Deep Learning?

There are a variety of resources including books, tutorials/workshops, etc. for those looking to learn more about Deep Learning.

A popular introductory tutorial is:

SciPy 2020 Conference Tutorial:

Some popular introductory books:


Resources

Papers

Books

Videos

Stack Exchange Sites

Other StackExchange sites with Deep Learning tag:

4825 questions
256
votes
10 answers

How to set class weights for imbalanced classes in Keras?

I know that there is a possibility in Keras with the class_weights parameter dictionary at fitting, but I couldn't find any example. Would somebody so kind to provide one? By the way, in this case the appropriate praxis is simply to weight up the…
Hendrik
  • 8,377
  • 17
  • 40
  • 55
193
votes
6 answers

How to draw Deep learning network architecture diagrams?

I have built my model. Now I want to draw the network architecture diagram for my research paper. Example is shown below:
188
votes
5 answers

What is the "dying ReLU" problem in neural networks?

Referring to the Stanford course notes on Convolutional Neural Networks for Visual Recognition, a paragraph says: "Unfortunately, ReLU units can be fragile during training and can "die". For example, a large gradient flowing through a ReLU…
tejaskhot
  • 3,935
  • 7
  • 20
  • 18
176
votes
6 answers

When to use GRU over LSTM?

The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). Why do we make use of GRU when we clearly have more control on the network…
Sayali Sonawane
  • 2,001
  • 3
  • 12
  • 13
174
votes
20 answers

How do you visualize neural network architectures?

When writing a paper / making a presentation about a topic which is about neural networks, one usually visualizes the networks architecture. What are good / simple ways to visualize common architectures automatically?
Martin Thoma
  • 18,630
  • 31
  • 92
  • 167
114
votes
10 answers

Choosing a learning rate

I'm currently working on implementing Stochastic Gradient Descent, SGD, for neural nets using back-propagation, and while I understand its purpose I have some questions about how to choose values for the learning rate. Is the learning rate related…
86
votes
1 answer

When to use (He or Glorot) normal initialization over uniform init? And what are its effects with Batch Normalization?

I knew that Residual Network (ResNet) made He normal initialization popular. In ResNet, He normal initialization is used , while the first layer uses He uniform initialization. I've looked through ResNet paper and "Delving Deep into Rectifiers"…
Rizky Luthfianto
  • 2,176
  • 2
  • 19
  • 22
86
votes
8 answers

Time series prediction using ARIMA vs LSTM

The problem that I am dealing with is predicting time series values. I am looking at one time series at a time and based on for example 15% of the input data, I would like to predict its future values. So far I have come across two models: LSTM…
ahajib
  • 1,075
  • 1
  • 9
  • 15
80
votes
5 answers

What is the difference between "equivariant to translation" and "invariant to translation"

I'm having trouble understanding the difference between equivariant to translation and invariant to translation. In the book Deep Learning. MIT Press, 2016 (I. Goodfellow, A. Courville, and Y. Bengio), one can find on the convolutional…
Aamir
  • 963
  • 1
  • 7
  • 6
75
votes
6 answers

What is the difference between Gradient Descent and Stochastic Gradient Descent?

What is the difference between Gradient Descent and Stochastic Gradient Descent? I am not very familiar with these, can you describe the difference with a short example?
73
votes
6 answers

Cross-entropy loss explanation

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…
enterML
  • 3,011
  • 9
  • 26
  • 38
68
votes
5 answers

Adding Features To Time Series Model LSTM

have been reading up a bit on LSTM's and their use for time series and its been interesting but difficult at the same time. One thing I have had difficulties with understanding is the approach to adding additional features to what is already a list…
Rjay155
  • 1,205
  • 2
  • 12
  • 9
67
votes
5 answers

In softmax classifier, why use exp function to do normalization?

Why use softmax as opposed to standard normalization? In the comment area of the top answer of this question, @Kilian Batzner raised 2 questions which also confuse me a lot. It seems no one gives an explanation except numerical benefits. I get the…
Hans
  • 773
  • 1
  • 6
  • 5
67
votes
4 answers

Why mini batch size is better than one single "batch" with all training data?

I often read that in case of Deep Learning models the usual practice is to apply mini batches (generally a small one, 32/64) over several training epochs. I cannot really fathom the reason behind this. Unless I'm mistaken, the batch size is the…
Hendrik
  • 8,377
  • 17
  • 40
  • 55
66
votes
11 answers

Why should the data be shuffled for machine learning tasks

In machine learning tasks it is common to shuffle data and normalize it. The purpose of normalization is clear (for having same range of feature values). But, after struggling a lot, I did not find any valuable reason for shuffling data. I have read…
Green Falcon
  • 13,868
  • 9
  • 55
  • 98
1
2 3
99 100