Most Popular
1500 questions
35
votes
1 answer
Time Series prediction using LSTMs: Importance of making time series stationary
In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since…
Abhijay Ghildyal
- 785
- 2
- 9
- 10
35
votes
6 answers
How do I load FastText pretrained model with Gensim?
I tried to load fastText pretrained model from here Fasttext model. I am using wiki.simple.en
from gensim.models.keyedvectors import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format('wiki.simple.bin', binary=True)
But, it shows the…
Sabbiu Shah
- 733
- 1
- 6
- 9
35
votes
2 answers
What is/are the default filters used by Keras Convolution2d()?
I am pretty new to neural networks, but I understand linear algebra and the mathematics of convolution pretty decently.
I am trying to understand the example code I find in various places on the net for training a Keras convolutional NN with MNIST…
ChrisFal
- 453
- 1
- 4
- 5
35
votes
6 answers
Why do convolutional neural networks work?
I have often heard people saying that why convolutional neural networks are still poorly understood. Is it known why convolutional neural networks always end up learning increasingly sophisticated features as we go up the layers?
What caused them…
Praise the lord
- 451
- 1
- 4
- 5
35
votes
6 answers
Merging multiple data frames row-wise in PySpark
I have 10 data frames pyspark.sql.dataframe.DataFrame, obtained from randomSplit as (td1, td2, td3, td4, td5, td6, td7, td8, td9, td10) = td.randomSplit([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1], seed = 100) Now I want to join 9 td's into a single…
krishna Prasad
- 1,147
- 1
- 14
- 23
35
votes
4 answers
Quick guide into training highly imbalanced data sets
I have a classification problem with approximately 1000 positive and 10000 negative samples in training set. So this data set is quite unbalanced. Plain random forest is just trying to mark all test samples as a majority class.
Some good answers…
IgorS
- 5,444
- 11
- 31
- 43
35
votes
1 answer
What is the best Keras model for multi-class classification?
I am working on research, where need to classify one of three event WINNER=(win, draw, lose)
WINNER LEAGUE HOME AWAY MATCH_HOME MATCH_DRAW MATCH_AWAY MATCH_U2_50 MATCH_O2_50
3 13 550 571 1.86 3.34 …
SpanishBoy
- 557
- 1
- 5
- 11
34
votes
5 answers
What are the use cases for Apache Spark vs Hadoop
With Hadoop 2.0 and YARN Hadoop is supposedly no longer tied only map-reduce solutions. With that advancement, what are the use cases for Apache Spark vs Hadoop considering both sit atop of HDFS? I've read through the introduction documentation for…
idclark
- 521
- 1
- 5
- 6
34
votes
6 answers
Gini coefficient vs Gini impurity - decision trees
The problem refers to decision trees building. According to Wikipedia 'Gini coefficient' should not be confused with 'Gini impurity'. However both measures can be used when building a decision tree - these can support our choices when splitting the…
Damien
- 341
- 1
- 3
- 3
33
votes
3 answers
Hypertuning XGBoost parameters
XGBoost have been doing a great job, when it comes to dealing with both categorical and continuous dependant variables. But, how do I select the optimized parameters for an XGBoost problem?
This is how I applied the parameters for a recent Kaggle…
Dawny33
- 8,226
- 12
- 47
- 104
33
votes
4 answers
Neural Network parse string data?
So, I'm just starting to learn how a neural network can operate to recognize patterns and categorize inputs, and I've seen how an artificial neural network can parse image data and categorize the images (demo with convnetjs), and the key there is to…
MidnightLightning
- 433
- 1
- 4
- 4
33
votes
6 answers
Are there any tools for feature engineering?
Specifically what I am looking for are tools with some functionality, which is specific to feature engineering. I would like to be able to easily smooth, visualize, fill gaps, etc. Something similar to MS Excel, but that has R as the underlying…
John
- 431
- 1
- 5
- 4
33
votes
4 answers
How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?
How do you use LeakyRelu as an activation function in sequence DNN in keras?
If I want to write something similar to:
model = Sequential()
model.add(Dense(90, activation='LeakyRelu'))
What is the solution? Put LeakyRelu similar to Relu?
Second…
user10296606
- 1,784
- 5
- 17
- 31
33
votes
8 answers
Best practical algorithm for sentence similarity
I have two sentences, S1 and S2, both which have a word count (usually) below 15.
What are the most practically useful and successful (machine learning) algorithms, which are possibly easy to implement (neural network is ok, unless the architecture…
DaveTheAl
- 493
- 1
- 5
- 11
33
votes
1 answer
Ways to deal with longitude/latitude feature
I am working on a fictional dataset with 25 features. Two of the features are latitude and longitude of a place and others are pH values, elevation, windSpeed etc with varying ranges. I can perform normalization on the other features but how do I…
AllThingsScience
- 443
- 1
- 4
- 5