Highest Voted Questions - Data Science Stack Exchange

8

votes

1 answer

What are the input and output channels of a convolution in PyTorch?

From the documentation of Pytorch for Convolution, I saw the function torch.nn.Conv1d requires users to pass the parameters "in_channels" and "out_channels". I know they refer to input channels and output channels but I am not sure about what they…

deep-learning pytorch

asked Jun 18 '19 at 09:46

LastK7

101
1
1
3

8

votes

4 answers

XGBoost Huge Dataset ~1TB

Can a gradient boosting solution like XGBoost or Lightbgm be used for a huge amount of data ? I have a csv file of 820GB containing 1 Billion observations and each observation has 650 datapoints. Is this too much data for XGBoost ? I have searched…

bigdata data xgboost

asked Jun 15 '19 at 08:05

Medz Benz

81
1
2

8

votes

2 answers

How to determine input shape in keras?

I am having difficulty finding where my error is while building deep learning models, but I typically have issues when setting the input layer input shape. This is my model: model = Sequential([ Dense(32, activation='relu', input_shape=(1461,…

python deep-learning keras numpy

asked Jun 12 '19 at 03:21

Josh Zwiebel

183
1
1
6

8

votes

3 answers

How to find out if two datasets are close to each other?

I have the following three datasets. data_a=[0.21,0.24,0.36,0.56,0.67,0.72,0.74,0.83,0.84,0.87,0.91,0.94,0.97] data_b=[0.13,0.21,0.27,0.34,0.36,0.45,0.49,0.65,0.66,0.90] data_c=[0.14,0.18,0.19,0.33,0.45,0.47,0.55,0.75,0.78,0.82] data_a is real data…

python statistics visualization simulation

asked Jun 09 '19 at 05:10

Kartikeya Sharma

167
1
9

8

votes

1 answer

What makes binary cross entropy a better choice for binary classification than other loss functions?

I'm reading this post where I came across this quote "Cross-entropy is the default loss function to use for binary classification problems." But what about it makes it the default and presumably best loss function for binary classification?

machine-learning classification loss-function

asked Jun 07 '19 at 15:41

John Slaine

81
1
2

8

votes

3 answers

Why does logistic function use e rather than 2?

sigmoid function could be used as activation function in machine learning. $${\displaystyle S(x)={\frac {1}{1+e^{-x}}}={\frac {e^{x}}{e^{x}+1}}.}$$ If substitute e with 2, def sigmoid2(z): return 1/(1+2**(-z)) x = np.arange(-9,9,dtype=float) y…

machine-learning deep-learning

asked Jun 06 '19 at 07:55

JJJohn

614
10
22

8

votes

2 answers

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() …

machine-learning machine-learning-model data-science-model ipython

asked May 23 '19 at 14:57

Nayana Madhu

406
1
3
8

8

votes

1 answer

What are the differences between SVC, NuSVC, and LinearSVC?

What are the differences between SVC, NuSVC, and LinearSVC? Please shed some light.

classification svm

asked May 12 '19 at 06:17

Taylor

93
1
2
5

8

votes

1 answer

How to extract features and classify alert emails coming from monitoring tools into proper category?

My company provides managed services to a lot of its clients. Our customers typically uses following monitoring tools to monitor their servers/webapps: OpsView Nagios Pingdom Custom shell scripts Whenever any issue is found, an alert mail comes to…

machine-learning classification clustering feature-extraction

asked Jan 27 '15 at 10:31

Kartikeya Sinha

181
3

8

votes

2 answers

Model for Differing Number of Rows per Observation

Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don't want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal…

predictive-modeling feature-selection feature-engineering

asked Apr 17 '19 at 16:47

Zachary

181
1

8

votes

2 answers

What are the disadvantages of having a left skewed distribution?

I'm currently working on a classification problem and I've a numerical column which is left skewed. i've read many posts where people are recommending to take log transformation or boxcox transformation to fix the left skewness. So I was wondering…

machine-learning python

asked Apr 05 '19 at 19:36

Jeeth

911
2
10
18

8

votes

1 answer

Is it OK to try to find the best PCA k parameter as we do with other hyperparameters?

Principal Component Analysis (PCA) is used to reduce n-dimensional data to k-dimensional data to speed things up in machine learning. After PCA is applied, one can check how much of the variance of the original dataset remains in the resulting…

machine-learning pca hyperparameter

asked Mar 27 '19 at 18:58

J. Doe

81
1
2

8

votes

2 answers

ValueError: could not convert string to float: '��'

I have a (2M, 23) dimensional numpy array X. It has a dtype of

python dataframe csv data-formats

asked Mar 26 '19 at 17:18

cappy0704

231
1
3
7

8

votes

2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…

machine-learning algorithms encoding dummy-variables

asked Mar 19 '19 at 19:55

Dan Chaltiel

331
2
10

8

votes

3 answers

Fuzzy name and nickname match

I have a dataset with the following structure: full_name,nickname,match Christian Douglas,Chris,1, Jhon Stevens,Charlie,0, David Jr Simpson,Junior,1 Anastasia Williams,Stacie,1 Lara Williams,Ana,0 John Williams,Willy,1 where each predictor row…

deep-learning nlp

asked Mar 19 '19 at 13:36

David Masip

5,981
2
23
61

Prev 1 2 3

…

99 100 Next

Most Popular