Most Popular
1500 questions
28
votes
5 answers
VM image for data science projects
As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system.
Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for…
JeanVuda
- 421
- 4
- 6
28
votes
2 answers
Keras vs. tf.keras
I'm a bit confused in choosing between Keras (keras-team/keras) and tf.keras (tensorflow/tensorflow/python/keras/) for my new research project.
There is a debate that Keras isn't owned by anyone, so people are happier to contribute in and it'll be…
Mo-
- 1,225
- 1
- 10
- 26
28
votes
7 answers
Publicly available social network datasets/APIs
As an extension to our great list of publicly available datasets, I'd like to know if there is any list of publicly available social network datasets/crawling APIs. It would be very nice if alongside with a link to the dataset/API, characteristics…
Rubens
- 4,097
- 5
- 23
- 42
28
votes
2 answers
When should one use L1, L2 regularization instead of dropout layer, given that both serve same purpose of reducing overfitting?
In Keras, there are 2 methods to reduce over-fitting. L1,L2 regularization or dropout layer.
What are some situations to use L1,L2 regularization instead of dropout layer? What are some situations when dropout layer is better?
user781486
- 1,305
- 2
- 16
- 18
28
votes
3 answers
What does "baseline" mean in the context of machine learning?
What does "baseline" mean in the context of machine learning and data science?
Someone wrote me:
Hint: An appropriate baseline will give an RMSE of approximately 200.
I don't get this. Does he mean that if my predictive model on the training data…
Meiiso
- 411
- 1
- 4
- 7
28
votes
3 answers
How to combine categorical and continuous input features for neural network training
Suppose we have two kinds of input features, categorical and continuous. The categorical data may be represented as one-hot code A, while the continuous data is just a vector B in N-dimension space. It seems that simply using concat(A, B) is not a…
JunjieChen
- 515
- 1
- 5
- 8
28
votes
2 answers
Is there away to change the metric used by the Early Stopping callback in Keras?
When using the early stopping callback in Keras, training stops when some metric (usually validation loss) is not increasing. Is there a way to use another metric (like precision, recall, or f-measure) instead of validation loss?
All the examples I…
P.Joseph
- 393
- 1
- 3
- 9
28
votes
3 answers
Why convolutions always use odd-numbers as filter size
If we have a look to 90-99% of the papers published using a CNN (ConvNet).
The vast majority of them use filter size of odd numbers:{1, 3, 5, 7} for the most used.
This situation can lead to some problem: With these filter sizes, usually the…
Jonathan DEKHTIAR
- 590
- 2
- 5
- 10
28
votes
3 answers
What is weight and bias in deep learning?
I'm starting to learn Machine learning from Tensorflow website. I have developed a very very rudimentary understanding of the flow a deep learning program follows (this method makes me learn fast instead of reading books and big articles).
There…
Umer Farooq
- 389
- 1
- 3
- 4
28
votes
5 answers
What is the benefit of splitting tfrecord file into shards?
I'm working on speech recognition with Tensorflow and plan to train LSTM NN with massive waves dataset. Because of the performance gains, I plan to use tfrecords. There are several examples on internet (Inception for ex.) where tfrecords files are…
striki70
- 281
- 1
- 3
- 3
28
votes
3 answers
Why are NLP and Machine Learning communities interested in deep learning?
I hope you can help me, as I have some questions on this topic. I'm new in the field of deep learning, and while I did some tutorials, I can't relate or distinguish concepts from one another.
user3352632
- 449
- 3
- 7
28
votes
6 answers
Machine learning techniques for estimating users' age based on Facebook sites they like
I have a database from my Facebook application and I am trying to use machine learning to estimate users' age based on what Facebook sites they like.
There are three crucial characteristics of my database:
the age distribution in my training set…
Wojciech Walczak
- 916
- 12
- 23
28
votes
5 answers
Improve the speed of t-sne implementation in python for huge data
I would like to do dimensionality reduction on nearly 1 million vectors each with 200 dimensions(doc2vec).
I am using TSNE implementation from sklearn.manifold module for it and the major problem is time complexity. Even with method = barnes_hut,…
chmodsss
- 1,954
- 2
- 17
- 37
27
votes
4 answers
What makes columnar databases suitable for data science?
What are some of the advantages of columnar data-stores which make them more suitable for data science and analytics?
Dawny33
- 8,226
- 12
- 47
- 104
27
votes
8 answers
Visualizing a graph with a million vertices
What is the best tool to use to visualize (draw the vertices and edges) a graph with 1000000 vertices? There are about 50000 edges in the graph. And I can compute the location of individual vertices and edges.
I am thinking about writing a program…
Cici
- 443
- 1
- 4
- 10