Most Popular

1500 questions
8
votes
1 answer

What are the input and output channels of a convolution in PyTorch?

From the documentation of Pytorch for Convolution, I saw the function torch.nn.Conv1d requires users to pass the parameters "in_channels" and "out_channels". I know they refer to input channels and output channels but I am not sure about what they…
LastK7
  • 101
  • 1
  • 1
  • 3
8
votes
4 answers

XGBoost Huge Dataset ~1TB

Can a gradient boosting solution like XGBoost or Lightbgm be used for a huge amount of data ? I have a csv file of 820GB containing 1 Billion observations and each observation has 650 datapoints. Is this too much data for XGBoost ? I have searched…
Medz Benz
  • 81
  • 1
  • 2
8
votes
2 answers

How to determine input shape in keras?

I am having difficulty finding where my error is while building deep learning models, but I typically have issues when setting the input layer input shape. This is my model: model = Sequential([ Dense(32, activation='relu', input_shape=(1461,…
Josh Zwiebel
  • 183
  • 1
  • 1
  • 6
8
votes
3 answers

How to find out if two datasets are close to each other?

I have the following three datasets. data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97] data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90] data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82] data_a is real data…
8
votes
1 answer

What makes binary cross entropy a better choice for binary classification than other loss functions?

I'm reading this post where I came across this quote "Cross-entropy is the default loss function to use for binary classification problems." But what about it makes it the default and presumably best loss function for binary classification?
John Slaine
  • 81
  • 1
  • 2
8
votes
3 answers

Why does logistic function use e rather than 2?

sigmoid function could be used as activation function in machine learning. $${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$$ If substitute e with 2, def sigmoid2(z): return 1/(1+2**(-z)) x = np.arange(-9,9,dtype=float) y…
JJJohn
  • 614
  • 10
  • 22
8
votes
2 answers

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() …
8
votes
1 answer

What are the differences between SVC, NuSVC, and LinearSVC?

What are the differences between SVC, NuSVC, and LinearSVC? Please shed some light.
Taylor
  • 93
  • 1
  • 2
  • 5
8
votes
1 answer

How to extract features and classify alert emails coming from monitoring tools into proper category?

My company provides managed services to a lot of its clients. Our customers typically uses following monitoring tools to monitor their servers/webapps: OpsView Nagios Pingdom Custom shell scripts Whenever any issue is found, an alert mail comes to…
8
votes
2 answers

Model for Differing Number of Rows per Observation

Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don't want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal…
8
votes
2 answers

What are the disadvantages of having a left skewed distribution?

I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness. So I was wondering…
Jeeth
  • 911
  • 2
  • 10
  • 18
8
votes
1 answer

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…
J. Doe
  • 81
  • 1
  • 2
8
votes
2 answers

ValueError: could not convert string to float: '���'

I have a (2M, 23) dimensional numpy array X. It has a dtype of
cappy0704
  • 231
  • 1
  • 3
  • 7
8
votes
2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…
8
votes
3 answers

Fuzzy name and nickname match

I have a dataset with the following structure: full_name,nickname,match Christian Douglas,Chris,1, Jhon Stevens,Charlie,0, David Jr Simpson,Junior,1 Anastasia Williams,Stacie,1 Lara Williams,Ana,0 John Williams,Willy,1 where each predictor row…
David Masip
  • 5,981
  • 2
  • 23
  • 61