Highest Voted Questions - Data Science Stack Exchange

27

votes

4 answers

macro average and weighted average meaning in classification_report

I use the "classification_report" from from sklearn.metrics import classification_report in order to evaluate the imbalanced binary classification Classification Report : precision recall f1-score support 0 1.00…

classification accuracy class-imbalance

asked Jan 04 '20 at 10:38

user10296606

1,784
5
17
31

27

votes

5 answers

BERT vs Word2VEC: Is bert disambiguating the meaning of the word vector?

Word2vec: Word2vec provides a vector for each token/word and those vectors encode the meaning of the word. Although those vectors are not human interpretable, the meaning of the vectors are understandable/interpretable by comparing with other…

word2vec word-embeddings bert

asked Jun 21 '19 at 16:25

sovon

521
1
5
7

27

votes

4 answers

Cross validation Vs. Train Validate Test

I have a doubt regarding the cross validation approach and train-validation-test approach. I was told that I can split a dataset into 3 parts: Train: we train the model. Validation: we validate and adjust model parameters. Test: never seen before…

machine-learning cross-validation

asked May 26 '19 at 06:15

NaveganTeX

445
1
4
9

27

votes

2 answers

What is the difference between semantic segmentation, object detection and instance segmentation?

I'm fairly new at computer vision and I've read an explanation at a medium post, however it still isn't clear for me how they truly differ.

computer-vision object-detection

asked May 15 '19 at 15:00

Guilherme Marques

398
1
3
8

27

votes

4 answers

Word2Vec for Named Entity Recognition

I'm looking to use google's word2vec implementation to build a named entity recognition system. I've heard that recursive neural nets with back propagation through structure are well suited for named entity recognition tasks, but I've been unable…

machine-learning python neural-network nlp

asked Jun 19 '14 at 19:29

Madison May

2,029
2
17
18

27

votes

2 answers

What is the advantage of using log softmax instead of softmax?

Are there any advantages to using log softmax over softmax? What are the reasons to choose one over the other?

deep-learning loss-function

asked Nov 04 '18 at 18:29

rawwar

831
2
12
23

27

votes

1 answer

Adaboost vs Gradient Boosting

How is AdaBoost different from a Gradient Boosting algorithm since both of them use a Boosting technique? I could not figure out actual difference between these both algorithms from a theory point of view.

algorithms similarity ensemble-modeling boosting

asked Oct 04 '18 at 14:25

CodeMaster GoGo

768
1
6
15

27

votes

2 answers

How to deal with time series which change in seasonality or other patterns?

Background I'm working on a time series data set of energy meter readings. The length of the series varies by meter - for some I have several years, others only a few months, etc. Many display significant seasonality, and often multiple layers -…

data-mining clustering time-series beginner

asked Dec 22 '14 at 03:30

Jo Douglass

401
1
5
10

27

votes

2 answers

local minima vs saddle points in deep learning

I heard Andrew Ng (in a video I unfortunately can't find anymore) talk about how the understanding of local minima in deep learning problems has changed in the sense that they are now regarded as less problematic because in high-dimensional spaces…

machine-learning deep-learning optimization convergence

asked Sep 05 '17 at 19:14

oW_

6,254
4
28
45

27

votes

3 answers

How to sum values grouped by two columns in pandas

I have a Pandas DataFrame like this: df = pd.DataFrame({ 'Date': ['2017-1-1', '2017-1-1', '2017-1-2', '2017-1-2', '2017-1-3'], 'Groups': ['one', 'one', 'one', 'two', 'two'], 'data': range(1, 6)}) Date Groups data 0 …

python pandas dataframe

asked Jul 10 '17 at 15:47

Kevin

533
2
5
12

27

votes

3 answers

How to deal with string labels in multi-class classification with keras?

I am newbie on machine learning and keras and now working a multi-class image classification problem using keras. The input is tagged image. After some pre-processing, the training data is represented in Python list as: [["dog",…

machine-learning scikit-learn tensorflow keras encoding

asked Mar 11 '17 at 13:42

Dracarys

393
1
3
5

27

votes

4 answers

Is there a straightforward way to run pandas.DataFrame.isin in parallel?

I have a modeling and scoring program that makes heavy use of the DataFrame.isin function of pandas, searching through lists of facebook "like" records of individual users for each of a few thousand specific pages. This is the most time-consuming…

performance python pandas parallel

asked May 19 '14 at 23:59

Therriault

871
1
8
13

27

votes

1 answer

PyTorch vs. Tensorflow Fold

Both PyTorch and Tensorflow Fold are deep learning frameworks meant to deal with situations where the input data has non-uniform length or dimensions (that is, situations where dynamic graphs are useful or needed). I would like to know how they…

python deep-learning tensorflow pytorch

asked Feb 08 '17 at 10:26

noe

22,074
1
43
70

27

votes

3 answers

Encoding categorical variables using likelihood estimation

I am trying to understand how I can encode categorical variables using likelihood estimation, but have had little success so far. Any suggestions would be greatly appreciated.

feature-engineering

asked Apr 04 '16 at 09:31

small dwarf

271
1
3
4

26

votes

2 answers

Text categorization: combining different kind of features

The problem I am tackling is categorizing short texts into multiple classes. My current approach is to use tf-idf weighted term frequencies and learn a simple linear classifier (logistic regression). This works reasonably well (around 90% macro F-1…

machine-learning classification feature-selection logistic-regression information-retrieval

asked Aug 17 '14 at 17:29

elmille

361
1
3
4

Most Popular