Questions tagged [bootstraping]
15 questions
3
votes
2 answers
List of samples that each tree in a random forest is trained on in Scikit-Learn
In Scikit-learn's random forest, you can set bootstrap=True and each tree would select a subset of samples to train on. Is there a way to see which samples are used in each tree?
I went through the documentation about the tree estimators and all the…
theonionring0127
- 31
- 5
3
votes
0 answers
Difference Bagging and Bootstrap aggregating
Bootstrap belongs to Efron. Tibshirani wrote a book about that in reference to Efron.
Bootstrap process for estimating the standard error of statistic s(x). B bootstrap sample are generatied from original data. Finally the standard deviation of the…
martin
- 329
- 3
- 12
2
votes
2 answers
Resampling train and test data in R
I need to try out few different machine learning methods (SVM, Logistic regression etc.), predict a value either true or false, and write down their AUC and Accuracy of these predictions.
I have allready successfully done that, now i have a two…
znoris007
- 21
- 1
2
votes
1 answer
Question on bootstrap sampling
I have a corpus of manually annotated (aka "gold standard) documents and a collection of NLP systems annotations on the text from the corpus. I want to do a bootstrap sampling of the system and gold standard to approximate a mean and standard error…
horcle_buzz
- 201
- 1
- 6
1
vote
0 answers
How are the same observation sets treated in Random Forests with Bootstrapping?
Let's assume an extremely small dataset with only 4 observations. And I create a Random Forest model, with a quite large number of trees, say 200. If so, some sample sets that are the same each other can be used in fitting, right? Is it OK?
Even…
jlee
- 11
- 1
1
vote
1 answer
nnet in caret. Bootstrapping or cross-validation?
I want to train shallow neural network with one hidden layer using nnet in caret. In trainControl, I used method = "cv" to perform 3-fold cross-validation. The snipped the code and results summary are below.
myControl <- trainControl(## 3-fold CV
…
SiH
- 125
- 5
1
vote
1 answer
About confidence/prediction intervals: parametric methods VS non-parametric (via bootstrap) methods
About the methodology to find confidence and/or prediction intervals in, let's say, a regression problem, I know 2 main options:
Checking normality in the estimates/predictions distribution, and applying well known Gaussian alike methods to find…
German C M
- 2,674
- 4
- 18
1
vote
0 answers
What is the best way to combine cross-validation and bootstrapping for one application?
We intend to model data with non-parametric covariate splines and we would like to understand the uncertainty of the parameter estimates/response estimates.
Currently, we use cross-validation to model the optimal smoothness of our spline models…
Stan Tendijck
- 111
- 1
1
vote
0 answers
How to perform bootstrap validation on CART decision tree?
I have a relatively small dataset n = 500 for which I am training a CART decision tree.
My dataset has about 30 variables and the outcome has 3 classes.
I am using CART for interpretability purposes, as what I am interested in, is sharing and…
Eric Yamga
- 11
- 2
1
vote
0 answers
Evaluate Dendrogram Statistical Significance
I have N=21 objects and each one has about 80 possible not NaN descriptors.
I carried out a hierarchical clustering on the objects and I obtained this dendrogram.
I want some kind of 'confidence' index for the dendrogram or for each node. I saw…
Mirko
- 11
- 3
1
vote
0 answers
Stratified sampling - use of proxy variable
For splitting of the data into train/test/val I use stratified sampling. Is it appropriate to define strata using information extracted from the dataset? E.g. use machine-learning to model proxy variable used for the strata definition?
My worry is…
holoubekm
- 11
- 1
0
votes
0 answers
Estimate class proportions of a feature, central limit theorem
haven't been feeling smart lately and this is probably the most trivial question ever but I really need to know. I'm trying to point estimate some population parameters. I sampled from 1000 randomly generated bootstrapped samples of 130000…
Laurent
- 53
- 1
- 4
0
votes
1 answer
Perform bootstrapping of an ordinary linear regression model, using B=100 bootstrap resamples of my dataset, and getting RMSE
So Im studying machine learning through R, and Im working with the boston data set from the library MASS. I am practicing bootsrapping. I already carried out analysis to determine how ,many distinct data points on average are drawn from the sample…
0
votes
0 answers
Random walk through Bootstrap
I'm performing a Bootstrap Random Walk over a set of points which is a time series with a certain pattern.
Right now, I took the set of points and then resample it with replacement.
# Resample data with replacement
bootstrap_data =…
0
votes
1 answer
Understanding bootstrapping in bias variance decomposition
I was going through bias and variance tradeoff article and it makes use of bias_variance_decomp function from mlxtend library. This method takes a parameter called num_rounds which is described in API docs as follows:
num_rounds : int…
Mahesha999
- 179
- 5