Highest Voted Questions - Data Science Stack Exchange

48

votes

5 answers

Opening a 20GB file for analysis with pandas

I am currently trying to open a file with pandas and python for machine learning purposes it would be ideal for me to have them all in a DataFrame. Now The file is 18GB large and my RAM is 32 GB but I keep getting memory errors. From your experience…

python bigdata pandas anaconda

asked Feb 13 '18 at 14:03

Hari Prasad

491
1
5
4

47

votes

3 answers

What does Logits in machine learning mean?

"One common mistake that I would make is adding a non-linearity to my logits output." What does the term "logit" means here or what does it represent ?

machine-learning deep-learning

asked Apr 30 '18 at 14:55

Rajat

1,017
2
9
10

47

votes

2 answers

What exactly is bootstrapping in reinforcement learning?

Apparently, in reinforcement learning, temporal-difference (TD) method is a bootstrapping method. On the other hand, Monte Carlo methods are not bootstrapping methods. What exactly is bootstrapping in RL? What is a bootstrapping method in RL?

reinforcement-learning

asked Jan 22 '18 at 23:18

user10640

47

votes

3 answers

What is Ground Truth

In the context of Machine Learning, I have seen the term Ground Truth used a lot. I have searched a lot and found the following definition in Wikipedia: In machine learning, the term "ground truth" refers to the accuracy of the training set's…

machine-learning neural-network deep-learning

asked Mar 24 '17 at 12:09

Green Falcon

13,868
9
55
98

46

votes

6 answers

Calculating KL Divergence in Python

I am rather new to this and can't say I have a complete understanding of the theoretical concepts behind this. I am trying to calculate the KL Divergence between several lists of points in Python. I am using this to try and do this. The problem that…

python clustering scikit-learn

asked Dec 08 '15 at 10:37

Nanda

773
1
7
8

46

votes

12 answers

Data Science in C (or C++)

I'm an R language programmer. I'm also in the group of people who are considered Data Scientists but who come from academic disciplines other than CS. This works out well in my role as a Data Scientist, however, by starting my career in R and only…

machine-learning bigdata statistics programming c

asked Mar 20 '15 at 14:56

Hack-R

1,919
1
21
34

46

votes

9 answers

How much of data wrangling is a data scientist's job?

I'm currently working as a data scientist at a large company (my first job as a DS, so this question may be a result of my lack of experience). They have a huge backlog of really important data science projects that would have a great positive…

data-wrangling

asked Apr 03 '19 at 15:16

Victor Valente

569
4
9

46

votes

2 answers

How does the validation_split parameter of Keras' fit function work?

Validation-split in Keras Sequential model fit function is documented as following on https://keras.io/models/sequential/ : validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set…

keras data cross-validation

asked Sep 30 '18 at 06:30

rnso

1,558
3
16
34

46

votes

2 answers

Merging two different models in Keras

I am trying to merge two Keras models into a single model and I am unable to accomplish this. For example in the attached Figure, I would like to fetch the middle layer $A2$ of dimension 8, and use this as input to the layer $B1$ (of dimension 8…

machine-learning python deep-learning keras tensorflow

asked Dec 29 '17 at 08:12

Rkz

1,033
1
10
12

46

votes

5 answers

Does gradient descent always converge to an optimum?

I am wondering whether there is any scenario in which gradient descent does not converge to a minimum. I am aware that gradient descent is not always guaranteed to converge to a global optimum. I am also aware that it might diverge from an optimum…

machine-learning neural-network deep-learning optimization gradient-descent

asked Nov 09 '17 at 16:41

wit221

563
1
4
5

46

votes

5 answers

How to force weights to be non-negative in Linear regression

I am using a standard linear regression using scikit-learn in python. However, I would like to force the weights to be all non-negative for every feature. is there any way I can accomplish that? I was looking in the documentation but could not find…

python scikit-learn linear-regression

asked Apr 11 '17 at 03:02

user

1,971
6
20
36

45

votes

4 answers

Early stopping on validation loss or on accuracy?

I am currently training a neural network and I cannot decide which to use to implement my Early Stopping criteria: validation loss or a metrics like accuracy/f1score/auc/whatever calculated on the validation set. In my research, I came upon articles…

machine-learning neural-network deep-learning classification

asked Aug 20 '18 at 12:22

qmeeus

1,239
1
10
13

45

votes

4 answers

Why is ReLU used as an activation function?

Activation functions are used to introduce non-linearities in the linear output of the type w * x + b in a neural network. Which I am able to understand intuitively for the activation functions like sigmoid. I understand the advantages of ReLU,…

machine-learning neural-network deep-learning activation-function

asked Jan 10 '18 at 13:07

Bunny Rabbit

573
1
4
6

45

votes

3 answers

What does the notation mAP@[.5:.95] mean?

For detection, a common way to determine if one object proposal was right is Intersection over Union (IoU, IU). This takes the set $A$ of proposed object pixels and the set of true object pixels $B$ and calculates: $$IoU(A, B) = \frac{A \cap B}{A…

computer-vision

asked Feb 07 '17 at 09:09

Martin Thoma

18,630
31
92
167

44

votes

2 answers

What does from_logits=True do in SparseCategoricalcrossEntropy loss function?

In the documentation it has been mentioned that y_pred needs to be in the range of [-inf to inf] when from_logits=True. I truly didn't understand what this means, since the probabilities need to be in the range of 0 to 1! Can someone please explain…

machine-learning python keras tensorflow loss-function

asked Apr 27 '20 at 16:08

Nagendra Prasad

553
1
4
4

Most Popular