Questions tagged [features]
97 questions
13
votes
2 answers
SHAP value analysis gives different feature importance on train and test set
Should SHAP value analysis be done on the train or test set?
What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model?
I intend to use SHAP analysis to identify how each…
pbk
- 133
- 1
- 5
10
votes
2 answers
How to get feature importance from a keras deep learning model?
In case of scikit-learn's models, we can get feature importance using the relevant attributes of the model.
I've been working on a RNN, using LSTMs for text embedding.
Is there any way to get feature importance of various features from the…
soham_dhole
- 140
- 1
- 1
- 8
5
votes
2 answers
How to handle similarity search on mixed data types vectors?
I think this question is one that many beginners run into and I could not find a decent generic guide for it.
My issue is the following. I want to evaluate similarity of vectors which have mixed data type features.
Numerical values
Text
Ordinal
GPS…
Chapo
- 53
- 3
4
votes
2 answers
Similarity Measure between two feature vectors
I have face identification system with following details:
VGG16 model for feature extraction
512 dimensional feature vector (normalized)
I need to calculate similarity measure between two feature vectors. So far I have tried as difference…
Elbek
- 143
- 1
- 7
4
votes
2 answers
What is the difference between handcrafted and learned features
I am having difficulty understanding what the differences are between handcrafted and learned features.
Is it just the case that the handcrafted features are the input variables, and that the learned features would refer to the output variable? Or…
GileBrt
- 236
- 1
- 4
- 12
3
votes
0 answers
Non-Gaussian like distributions - Classifier of source data fails on target data
I ask you for help on a classification problem (classes are represented by the numbers 0,1 and 2). All features are extracted from time series data (fundamental is sinus shape).
I have a source dataset with features, which do not follow a gaussian…
deniz
- 41
- 1
3
votes
3 answers
How to insert two features in a model when a feature only applies to a certain group in the model
I'm building a machine learning model in Python to predict soccer player values. Consider the following feature columns of the dataframe:
[features]
---------------------------------
position | goals | goals_conceded
--------…
Caldass_
- 147
- 1
- 7
3
votes
2 answers
Mathematically prove why sparsity leads to model overfitting
With respect to the stackoverflow post here: https://stackoverflow.com/a/59566478/9130959
I can't quite get why the logic stands: when # features increases, the hypothesis space is expanded, leading to sparse data, thus easily overfit. Is there a…
Wong
- 103
- 4
3
votes
0 answers
NN training with repetitive features
I posted the question also on ai.stackexchange but it didn't get any answers so I though I could try here.
Here is a copy paste:
Let's say you are training a NN in a RL setting where the state (i.e. features/input data) does not change in every…
mkanakis
- 131
- 2
2
votes
1 answer
Training & Test feature shape is different from number of columns in dataset
I am making a Sequential Neural Network for regression with 3 dense layers which will be trained on a simple dataset. But before I even get to that part of the code to execute the model I am getting a different shape of my features than columns in…
Victor Melvin
- 23
- 2
2
votes
0 answers
How SHAP value explains contribution of features for outliers event?
I'm trying to understand and experiment with how the SHAP value can explain behaviour for each outlier events (rows) and how it can be related to shap.force_plot(). I already created a simple synthetic dataset with 7 outliers.
I didn't get how 4.85…
Mario
- 335
- 5
- 18
2
votes
1 answer
Multi-Feature One-Hot-Encoder with varying amount of feature instances
Let's assume we have data instances like this:
[
[15, 20, ("banana","apple","cucumber"), ...],
[91, 12, ("orange","banana"), ...],
...
]
I am wondering how I can encode the third element of these datapoints. For multiple features values…
crazyvalues
- 23
- 5
2
votes
1 answer
How to handle a feature vector that could be variable length?
I would like to train a machine learning model with several features as input as X[] and with one output as Y. For example Every sample has a Data frame like this: X[0], X[1], X[2], X[3], X[4], Y
Let's say One sample the followings Data is only one…
Crazy9
- 21
- 1
2
votes
1 answer
Getting the positive impacting features using SHAP
I'm attempting to use SHAP to automatically extract feature names that have a positive impact on my regression models. On inspection of the code I see that the bar plot, for example, determines these by taking the mean absolute SHAP values for a…
amateurjustin
- 75
- 1
- 8
2
votes
3 answers
If a categorical feature only occurs a few times in a data set, should I drop it?
I have a data set of mostly categorical variables. When I one-hot encoded them some of the features occur less than 3% of the time.
For instance the Tech-support feature only occurs 928 times in a data set with 32561 samples ie. it only occurs 2.9%…
dawndance
- 21
- 2