Highest Voted Questions - Data Science Stack Exchange

28

votes

5 answers

VM image for data science projects

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system. Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for…

python r tools

asked Jan 22 '15 at 21:34

JeanVuda

421
4
6

28

votes

2 answers

Keras vs. tf.keras

I'm a bit confused in choosing between Keras (keras-team/keras) and tf.keras (tensorflow/tensorflow/python/keras/) for my new research project. There is a debate that Keras isn't owned by anyone, so people are happier to contribute in and it'll be…

python deep-learning keras tensorflow

asked Mar 21 '19 at 20:20

Mo-

1,225
1
10
26

28

votes

7 answers

Publicly available social network datasets/APIs

As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…

open-source dataset crawling

asked Jun 17 '14 at 05:29

Rubens

4,097
5
23
42

28

votes

2 answers

When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?

In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer. What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?

machine-learning keras overfitting regularization dropout

asked Aug 23 '18 at 15:46

user781486

1,305
2
16
18

28

votes

3 answers

What does "baseline" mean in the context of machine learning?

What does "baseline" mean in the context of machine learning and data science? Someone wrote me: Hint: An appropriate baseline will give an RMSE of approximately 200. I don't get this. Does he mean that if my predictive model on the training data…

machine-learning regression predictive-modeling terminology

asked Apr 26 '18 at 23:17

Meiiso

411
1
4
7

28

votes

3 answers

How to combine categorical and continuous input features for neural network training

Suppose we have two kinds of input features, categorical and continuous. The categorical data may be represented as one-hot code A, while the continuous data is just a vector B in N-dimension space. It seems that simply using concat(A, B) is not a…

neural-network feature-selection categorical-data feature-construction

asked Mar 28 '18 at 08:49

JunjieChen

515
1
5
8

28

votes

2 answers

Is there away to change the metric used by the Early Stopping callback in Keras?

When using the early stopping callback in Keras, training stops when some metric (usually validation loss) is not increasing. Is there a way to use another metric (like precision, recall, or f-measure) instead of validation loss? All the examples I…

machine-learning neural-network deep-learning keras

asked Jan 19 '18 at 15:53

P.Joseph

393
1
3
9

28

votes

3 answers

Why convolutions always use odd-numbers as filter size

If we have a look to 90-99% of the papers published using a CNN (ConvNet). The vast majority of them use filter size of odd numbers:{1, 3, 5, 7} for the most used. This situation can lead to some problem: With these filter sizes, usually the…

deep-learning computer-vision convolutional-neural-network convolution

asked Sep 20 '17 at 17:53

Jonathan DEKHTIAR

590
2
5
10

28

votes

3 answers

What is weight and bias in deep learning?

I'm starting to learn Machine learning from Tensorflow website. I have developed a very very rudimentary understanding of the flow a deep learning program follows (this method makes me learn fast instead of reading books and big articles). There…

machine-learning deep-learning tensorflow

asked May 20 '17 at 21:40

Umer Farooq

389
1
3
4

28

votes

5 answers

What is the benefit of splitting tfrecord file into shards?

I'm working on speech recognition with Tensorflow and plan to train LSTM NN with massive waves dataset. Because of the performance gains, I plan to use tfrecords. There are several examples on internet (Inception for ex.) where tfrecords files are…

python tensorflow

asked Jan 14 '17 at 08:59

striki70

281
1
3
3

28

votes

3 answers

Why are NLP and Machine Learning communities interested in deep learning?

I hope you can help me, as I have some questions on this topic. I'm new in the field of deep learning, and while I did some tutorials, I can't relate or distinguish concepts from one another.

machine-learning data-mining neural-network nlp deep-learning

asked Oct 11 '14 at 10:24

user3352632

449
3
7

28

votes

6 answers

Machine learning techniques for estimating users' age based on Facebook sites they like

I have a database from my Facebook application and I am trying to use machine learning to estimate users' age based on what Facebook sites they like. There are three crucial characteristics of my database: the age distribution in my training set…

machine-learning dimensionality-reduction python

asked May 17 '14 at 09:16

Wojciech Walczak

916
12
23

28

votes

5 answers

Improve the speed of t-sne implementation in python for huge data

I would like to do dimensionality reduction on nearly 1 million vectors each with 200 dimensions(doc2vec). I am using TSNE implementation from sklearn.manifold module for it and the major problem is time complexity. Even with method = barnes_hut,…

python bigdata nlp scikit-learn dimensionality-reduction

asked Feb 06 '16 at 14:19

chmodsss

1,954
2
17
37

27

votes

4 answers

What makes columnar databases suitable for data science?

What are some of the advantages of columnar data-stores which make them more suitable for data science and analytics?

databases tools

asked Sep 30 '15 at 10:43

Dawny33

8,226
12
47
104

27

votes

8 answers

Visualizing a graph with a million vertices

What is the best tool to use to visualize (draw the vertices and edges) a graph with 1000000 vertices? There are about 50000 edges in the graph. And I can compute the location of individual vertices and edges. I am thinking about writing a program…

visualization graphs

asked Jul 22 '14 at 15:17

Cici

443
1
4
10

Most Popular