Highest Voted 'features' Questions - Data Science Stack Exchange

13

votes

2 answers

SHAP value analysis gives different feature importance on train and test set

Should SHAP value analysis be done on the train or test set? What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model? I intend to use SHAP analysis to identify how each…

asked Oct 07 '19 at 19:10

pbk

133
1
5

10

votes

2 answers

How to get feature importance from a keras deep learning model?

In case of scikit-learn's models, we can get feature importance using the relevant attributes of the model. I've been working on a RNN, using LSTMs for text embedding. Is there any way to get feature importance of various features from the…

deep-learning keras lstm feature-selection features

asked Feb 14 '20 at 15:27

soham_dhole

140
1
1
8

5

votes

2 answers

How to handle similarity search on mixed data types vectors?

I think this question is one that many beginners run into and I could not find a decent generic guide for it. My issue is the following. I want to evaluate similarity of vectors which have mixed data type features. Numerical values Text Ordinal GPS…

feature-engineering similarity feature-scaling features tools

asked Apr 14 '23 at 09:57

Chapo

53
3

4

votes

2 answers

Similarity Measure between two feature vectors

I have face identification system with following details: VGG16 model for feature extraction 512 dimensional feature vector (normalized) I need to calculate similarity measure between two feature vectors. So far I have tried as difference…

keras similarity distance features

asked Jul 24 '20 at 06:03

Elbek

143
1
7

4

votes

2 answers

What is the difference between handcrafted and learned features

I am having difficulty understanding what the differences are between handcrafted and learned features. Is it just the case that the handcrafted features are the input variables, and that the learned features would refer to the output variable? Or…

features

asked Jun 24 '19 at 13:06

GileBrt

236
1
4
12

3

votes

0 answers

Non-Gaussian like distributions - Classifier of source data fails on target data

I ask you for help on a classification problem (classes are represented by the numbers 0,1 and 2). All features are extracted from time series data (fundamental is sinus shape). I have a source dataset with features, which do not follow a gaussian…

classification statistics feature-selection distribution features

asked Dec 30 '20 at 00:31

deniz

41
1

3

votes

3 answers

How to insert two features in a model when a feature only applies to a certain group in the model

I'm building a machine learning model in Python to predict soccer player values. Consider the following feature columns of the dataframe: [features] --------------------------------- position | goals | goals_conceded --------…

machine-learning machine-learning-model prediction features

asked Aug 16 '20 at 04:45

Caldass_

147
1
7

3

votes

2 answers

Mathematically prove why sparsity leads to model overfitting

With respect to the stackoverflow post here: https://stackoverflow.com/a/59566478/9130959 I can't quite get why the logic stands: when # features increases, the hypothesis space is expanded, leading to sparse data, thus easily overfit. Is there a…

overfitting features

asked Jun 20 '20 at 12:55

Wong

103
4

3

votes

0 answers

NN training with repetitive features

I posted the question also on ai.stackexchange but it didn't get any answers so I though I could try here. Here is a copy paste: Let's say you are training a NN in a RL setting where the state (i.e. features/input data) does not change in every…

neural-network deep-learning reinforcement-learning training features

asked May 19 '20 at 19:56

mkanakis

131
2

2

votes

1 answer

Training & Test feature shape is different from number of columns in dataset

I am making a Sequential Neural Network for regression with 3 dense layers which will be trained on a simple dataset. But before I even get to that part of the code to execute the model I am getting a different shape of my features than columns in…

keras dataset data-cleaning feature-engineering features

asked Jun 24 '21 at 19:32

Victor Melvin

23
2

2

votes

0 answers

How SHAP value explains contribution of features for outliers event?

I'm trying to understand and experiment with how the SHAP value can explain behaviour for each outlier events (rows) and how it can be related to shap.force_plot(). I already created a simple synthetic dataset with 7 outliers. I didn't get how 4.85…

python features explainable-ai shap isolation-forest

asked Feb 17 '21 at 18:36

Mario

335
5
18

2

votes

1 answer

Multi-Feature One-Hot-Encoder with varying amount of feature instances

Let's assume we have data instances like this: [ [15, 20, ("banana","apple","cucumber"), ...], [91, 12, ("orange","banana"), ...], ... ] I am wondering how I can encode the third element of these datapoints. For multiple features values…

scikit-learn encoding features

asked Jan 29 '21 at 11:54

crazyvalues

23
5

2

votes

1 answer

How to handle a feature vector that could be variable length?

I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y Let's say One sample the followings Data is only one…

feature-engineering feature-construction features

asked Jul 13 '20 at 10:00

Crazy9

21
1

2

votes

1 answer

Getting the positive impacting features using SHAP

I'm attempting to use SHAP to automatically extract feature names that have a positive impact on my regression models. On inspection of the code I see that the bar plot, for example, determines these by taking the mean absolute SHAP values for a…

features shap

asked Apr 29 '20 at 09:56

amateurjustin

75
1
8

2

votes

3 answers

If a categorical feature only occurs a few times in a data set, should I drop it?

I have a data set of mostly categorical variables. When I one-hot encoded them some of the features occur less than 3% of the time. For instance the Tech-support feature only occurs 928 times in a data set with 32561 samples ie. it only occurs 2.9%…

logistic-regression svm features one-hot-encoding

asked Feb 07 '20 at 20:08

dawndance

21
2

Questions tagged [features]