Most Popular
1500 questions
8
votes
1 answer
sklearn SimpleImputer too slow for categorical data represented as string values
I have a data set with categorical features represented as string values and I want to fill-in missing values in it. I’ve tried to use sklearn’s SimpleImputer but it takes too much time to fulfill the task as compared to pandas. Both methods produce…
vlc146543
- 83
- 1
- 4
8
votes
1 answer
TensorFlow / Keras: What is stateful = True in LSTM layers?
Could you elaborate on this argument? I found the brief explanation from the docs unsatisfying:
stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i…
Leevo
- 6,005
- 3
- 14
- 51
8
votes
3 answers
What is the correct way to call Keras flow_from_directory() method?
In the following article there is an instruction that dataset needs to be divided into train, validation and test folders where the test folder should not contain the labeled subfolders. Instead it should only contain a single folder (i.e.…
Tauno
- 739
- 2
- 9
- 8
8
votes
2 answers
NLP : variations of a text without modifying it's meaning
I am currently working on the automation of recurring reports (weekly 30-50 pages reports for around 100 districts). Those reports have a mostly fixed form : maps, graphs, data tables and small zone of text.
Apart for some discussion around colors…
Lucas Morin
- 2,513
- 5
- 19
- 39
8
votes
3 answers
Pivoting a two-column feature table in Pandas
How can I transform the following DataFrame into one with cities as rows and each cuisine as a column, and 1 or 0 as values (1 if the city has that kind of cuisine)?
I think this turns out to be a very common problem in transforming data into…
blue-dino
- 383
- 2
- 3
- 11
8
votes
3 answers
How to combine GridSearchCV with Early Stopping?
I'm a beginner in machine learning and want to train a CNN (for image recognition) with optimized hyperparameter like dropout rate, learning rate and number of epochs.
The optimal hyperparameter I try to find via GridSearchCV from Scikit-learn.
I…
Code Now
- 393
- 5
- 10
8
votes
3 answers
How to find similarity between different factors in a dataset
Introduction
Let's say I have a dataset of different observation of different people and I want to group people together to know which person is closest to the other one. I also want to have a measure to know how close they are to each others and…
zipp
- 183
- 1
- 4
8
votes
2 answers
Data anonymization in Python
I am working on an industrial project which consists of real data. Now, the data contains sensitive information about company operations which could not be disclosed publically. As a result, I need to anonymize the original data first before…
Muhammad Ali
- 2,437
- 5
- 19
- 22
8
votes
1 answer
Why is word prediction an obsession in Natural Language Processing?
I have heard how great BERT is at masked word prediction, i.e. predicting a missing word from a sentence.
In a Medium post about BERT, it says:
The basic task of a language model is to predict words in a blank, or it predicts the probability that a…
SamR
- 183
- 1
- 5
8
votes
1 answer
Difference between Gensim word2vec and keras Embedding layer
I used the gensim word2vec package and Keras Embedding layer for various different projects. Then I realize they seem to do the same thing, they all try to convert a word into a feature vector.
Am I understanding this properly? What exactly is the…
Edamame
- 2,705
- 5
- 23
- 32
8
votes
2 answers
Best way to store large data set using R from Twitter?
I am working on a project that aims to retrieve a large data-set (i.e., tweet data which is a couple of days old) from Twitter using the twitteR library on R. have difficulty storing tweets because my machine has only 8 GB of memory. It ran out of…
Digital Dude
- 181
- 1
8
votes
2 answers
Can a decision tree learn to solve a xOR problem?
I have read online that decision trees can solve xOR type problems, as shown in images (xOR problem: 1) and (Possible solution as decision tree: 2).
My question is how can a decision tree learn to solve this problem in this scenario. I just don't…
lguerra
- 83
- 1
- 5
8
votes
3 answers
Algorithm for segmentation of sequence data
I have a large sequence of vectors of length N. I need some unsupervised learning algorithm to divide these vectors into M segments.
For example:
K-means is not suitable, because it puts similar elements from different locations into a single…
generall
- 273
- 1
- 11
8
votes
1 answer
how to check all values in particular column has same data type or not?
I have column 'ABC' which has 5000 rows. Currently, dtype of column is object. Mostly it has string values but some values dtype is not string, I want to find all those rows and modify those rows. Column is as following:
1 abc
2 def
3 ghi
4 23
5…
Kiran
- 195
- 1
- 1
- 5
8
votes
2 answers
visualize a horizontal box plot in R
I have a dataset like this. The data has been collected through a questionnaire and I am going to do some exploratory data analysis.
windows <- c("yes", "no","yes","yes","no")
sql <- c("no","yes","no","no","no")
excel <-…
Hamideh
- 920
- 2
- 11
- 22