Questions tagged [hyperparameter]

Hyperparameters of a model are the kind of parameters that cannot be directly learned during training but are set beforehand. Hyperparameters can define, for example, the complexity of the model or its capacity to learn.

Hyperparameters are model parameters that are defined before training begins whereas regular parameters are learned during the training process.

148 questions
114
votes
10 answers

Choosing a learning rate

I'm currently working on implementing Stochastic Gradient Descent, SGD, for neural nets using back-propagation, and while I understand its purpose I have some questions about how to choose values for the learning rate. Is the learning rate related…
49
votes
7 answers

What is the difference between model hyperparameters and model parameters?

I have noticed that such terms as model hyperparameter and model parameter have been used interchangeably on the web without prior clarification. I think this is incorrect and needs explanation. Consider a machine learning model, an SVM/NN/NB based…
minerals
  • 2,137
  • 3
  • 17
  • 19
40
votes
6 answers

How to set the number of neurons and layers in neural networks

I am a beginner to neural networks and have had trouble grasping two concepts: How does one decide the number of middle layers a given neural network have? 1 vs. 10 or whatever. How does one decide the number of neurons in each middle layer? Is it…
21
votes
4 answers

Hyperparameter search for LSTM-RNN using Keras (Python)

From Keras RNN Tutorial: "RNNs are tricky. Choice of batch size is important, choice of loss and optimizer is critical, etc. Some configurations won't converge." So this is more a general question about tuning the hyperparameters of a LSTM-RNN on…
wacax
  • 3,370
  • 4
  • 22
  • 45
11
votes
2 answers

What is the most efficient method for hyperparameter optimization in scikit-learn?

An overview of the hyperparameter optimization process in scikit-learn is here. Exhaustive grid search will find the optimal set of hyperparameters for a model. The downside is that exhaustive grid search is slow. Random search is faster than grid…
10
votes
2 answers

How do scientists come up with the correct Hidden Markov Model parameters and topology to use?

I understand how a Hidden Markov Model is used in genomic sequences, such as finding a gene. But I don't understand how to come up with a particular Markov model. I mean, how many states should the model have? How many possible transitions? Should…
ABCD
  • 3,510
  • 2
  • 18
  • 30
10
votes
4 answers

Which is first ? Tuning the parameters or selecting the model

I've been reading about how we split our data into 3 parts; generally, we use the validation set to help us tune the parameters and the test set to have an unbiased estimate on how well does our model perform and thus we can compare models based on…
8
votes
2 answers

XGBoost and Random Forest: ntrees vs. number of boosting rounds vs. n_estimators

So I understand the main difference between Random Forests and GB Methods. Random Forests grow parallel trees and GB Methods grow one tree for each iteration. However, I am confused on the vocab used with scikit's RF regressor and xgboost's…
8
votes
1 answer

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…
J. Doe
  • 81
  • 1
  • 2
8
votes
1 answer

How can you decide the window size on a pooling layer?

On the convolutional neural network, there used one or more pooling layers. As far as I know many tutorials instruct you to set it either 2 or 3 for the window size. For example, in this tutorial: Pooling Layers After some ReLU layers, programmers…
Blaszard
  • 901
  • 1
  • 13
  • 29
7
votes
3 answers

Regression model with variable number of parameters in dataset?

I work in physics. We have lots of experimental runs, with each run yielding a result, y and some parameters that should predict the result, x. Over time, we have found more and more parameters to record. So our data looks like the following: Year 1…
7
votes
1 answer

Overfitting for minority class after SMOTE w/ random forests

I used SMOTE to make a predictive model, with class 1 having 1800 samples and 35000+ of class 0 samples. Hence, as per SMOTE, synthetic samples were created and the random forest was trained. However, I am now getting most results as class 1 when I…
TdBm
  • 423
  • 1
  • 5
  • 15
6
votes
1 answer

Why do BERT classification do worse with longer sequence length?

I've been experimenting using transformer networks like BERT for some simple classification tasks. My tasks are binary assignment, the datasets are relatively balanced, and the corpus are abstracts from PUBMED. The median number of tokens from…
6
votes
1 answer

Neural Network Golf: smallest network for a certain level of performance

I am interested in any data, publications, etc about what is the smallest neural network that can achieve a certain level of classification performance. By small I mean few parameters, not few arithmetic operations (=fast). I am interested…
Alex I
  • 3,142
  • 1
  • 21
  • 27
6
votes
3 answers

Which parameters are hyper parameters in a linear regression?

Can the number of features used in a linear regression be regarded as a hyperparameter? Perhaps the choice of features?
1
2 3
9 10