Most Popular
1500 questions
8
votes
1 answer
What are the input and output channels of a convolution in PyTorch?
From the documentation of Pytorch for Convolution, I saw the function torch.nn.Conv1d requires users to pass the parameters "in_channels" and "out_channels". I know they refer to input channels and output channels but I am not sure about what they…
LastK7
- 101
- 1
- 1
- 3
8
votes
4 answers
XGBoost Huge Dataset ~1TB
Can a gradient boosting solution like XGBoost or Lightbgm be used for a huge amount of data ? I have a csv file of 820GB containing 1 Billion observations and each observation has 650 datapoints.
Is this too much data for XGBoost ? I have searched…
Medz Benz
- 81
- 1
- 2
8
votes
2 answers
How to determine input shape in keras?
I am having difficulty finding where my error is while building deep learning models, but I typically have issues when setting the input layer input shape.
This is my model:
model = Sequential([
Dense(32, activation='relu', input_shape=(1461,…
Josh Zwiebel
- 183
- 1
- 1
- 6
8
votes
3 answers
How to find out if two datasets are close to each other?
I have the following three datasets.
data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97]
data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90]
data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82]
data_a is real data…
Kartikeya Sharma
- 167
- 1
- 9
8
votes
1 answer
What makes binary cross entropy a better choice for binary classification than other loss functions?
I'm reading this
post where I came across this quote "Cross-entropy is the default loss function to use for binary classification problems."
But what about it makes it the default and presumably best loss function for binary classification?
John Slaine
- 81
- 1
- 2
8
votes
3 answers
Why does logistic function use e rather than 2?
sigmoid function could be used as activation function in machine learning.
$${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$$
If substitute e with 2,
def sigmoid2(z):
return 1/(1+2**(-z))
x = np.arange(-9,9,dtype=float)
y…
JJJohn
- 614
- 10
- 22
8
votes
2 answers
How to Use Shap Kernal Explainer with Pipeline models?
I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model.
My model is given below:
pipeline = Pipeline(steps= [
('imputer', imputer_function()),
('classifier', RandomForestClassifier()
…
Nayana Madhu
- 406
- 1
- 3
- 8
8
votes
1 answer
What are the differences between SVC, NuSVC, and LinearSVC?
What are the differences between SVC, NuSVC, and LinearSVC?
Please shed some light.
Taylor
- 93
- 1
- 2
- 5
8
votes
1 answer
How to extract features and classify alert emails coming from monitoring tools into proper category?
My company provides managed services to a lot of its clients. Our customers typically uses following monitoring tools to monitor their servers/webapps:
OpsView
Nagios
Pingdom
Custom shell scripts
Whenever any issue is found, an alert mail comes to…
Kartikeya Sinha
- 181
- 3
8
votes
2 answers
Model for Differing Number of Rows per Observation
Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don't want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal…
Zachary
- 181
- 1
8
votes
2 answers
What are the disadvantages of having a left skewed distribution?
I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness.
So I was wondering…
Jeeth
- 911
- 2
- 10
- 18
8
votes
1 answer
Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?
Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…
J. Doe
- 81
- 1
- 2
8
votes
2 answers
ValueError: could not convert string to float: '���'
I have a (2M, 23) dimensional numpy array X. It has a dtype of
cappy0704
- 231
- 1
- 3
- 7
8
votes
2 answers
In which cases shouldn't we drop the first level of categorical variables?
Beginner in machine learning, I'm looking into the one-hot encoding concept.
Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…
Dan Chaltiel
- 331
- 2
- 10
8
votes
3 answers
Fuzzy name and nickname match
I have a dataset with the following structure:
full_name,nickname,match
Christian Douglas,Chris,1,
Jhon Stevens,Charlie,0,
David Jr Simpson,Junior,1
Anastasia Williams,Stacie,1
Lara Williams,Ana,0
John Williams,Willy,1
where each predictor row…
David Masip
- 5,981
- 2
- 23
- 61