Questions tagged [data-science-model]

Questions about the organization of elements of data, and the standardization of their relations.

647 questions
29
votes
1 answer

Should one hot vectors be scaled with numerical attributes

In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…
17
votes
2 answers

Feature Scaling both training and test data

It is stated that for: Feature Normalization - The test set must use identical scaling to the training set. And the point is given that: Do not scale the training and test sets using different scalars: this could lead to random skew in the…
aspiring1
  • 367
  • 1
  • 2
  • 13
14
votes
5 answers

When to remove correlated variables

Can somebody please suggest what is the correct stage to remove correlated variables before feature engineering or after feature engineering ?
bp89
  • 143
  • 1
  • 1
  • 5
9
votes
2 answers

Is there any consensus on choosing an appropriate ML approach?

I am studying data science at the moment and we are taught a dizzying variety of basic regression/classification techniques (linear, logistic, trees, splines, ANN, SVM, MARS, and so on....), along with a variety of extra tools (bootstrapping,…
8
votes
2 answers

image_dataset_from_directory VS flow_from_directory

What is the main diffrence between flow_from_directory VS image_dataset_from_directory in keras? which one should I use?
8
votes
2 answers

How to Use Shap Kernal Explainer with Pipeline models?

I have a pandas DataFrame X. I would like to find the prediction explanation of a a particular model. My model is given below: pipeline = Pipeline(steps= [ ('imputer', imputer_function()), ('classifier', RandomForestClassifier() …
8
votes
1 answer

How to Combat Data Drift

I have customer demographic data that include columns like: age, the first half of the postcode, occupation (there is a defined list of possible occupations), and more. Each month I get a new batch of 1000 rows of this type of data (which is not…
7
votes
1 answer

What is the difference between Trax and Tensorflow?

What is the main difference between Trax and Tensorflow? Both of them are deep learning libraries and implemented by Google. https://github.com/google/trax https://github.com/tensorflow/tensorflow
Bala venkatesh
  • 361
  • 3
  • 10
7
votes
3 answers

Should you use random state or random seed in machine learning models?

I'm starting to study machine learning. All the examples I saw, the person that created the ML model used a random state or a random seed to stop the randomness of the process. But, in real life, when you're trying to apply a machine learning model…
7
votes
2 answers

Correct interpretation of summary_plot shap graph

While through the various resources online to understand the shap plots, I ended up slightly confused. Find below my interpretation of the overall plot given in examples - Shap value 0 for a feature corresponds to the average prediction using all…
Sanchez_P
  • 101
  • 1
  • 5
7
votes
2 answers

How does real world machine learning production systems run?

Dear Machine Learning/AI Community, I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. I have built some models and converted into pickle objects in order to avoid…
7
votes
5 answers

How to handle missing value if imputation doesnt make sense

I have column/feature in my dataset showing years a person has been married "years_married". Since not every person is married there are NaN fields. It does not make sense to fillna(0) "years_married" since 0 would mean the person just married.A…
methus
  • 111
  • 5
6
votes
2 answers

How do I decide if I need to go for Normalization and not Standardization or vice-versa?

While designing a ML model, how do I decide if I need to go for Normalization and not Standardization or vice-versa? On what factor is this decision made?
6
votes
1 answer

Differences between big data, data warehousing, business intelligence and data science?

I know they are four different areas, but I would like to know what are the main differences between those disciplines, and how they are related to each other if some of them depend on each other, and what is the specific objective of each one.
6
votes
2 answers

How to remove the hotspots from given image by using Python and opencv?

In the picture below there are some regions which are very bright (i.e. more white). Some bright regions are wide and some are narrow or thin. The red box covers one such wide bright spot, and blue box covers one thin bright spot. Thin bright spots…
1
2 3
43 44