Questions tagged [parallel]

39 questions
27
votes
4 answers

Is there a straightforward way to run pandas.DataFrame.isin in parallel?

I have a modeling and scoring program that makes heavy use of the DataFrame.isin function of pandas, searching through lists of facebook "like" records of individual users for each of a few thousand specific pages. This is the most time-consuming…
Therriault
  • 871
  • 1
  • 8
  • 13
16
votes
3 answers

Parallel and distributed computing

What is(are) the difference(s) between parallel and distributed computing? When it comes to scalability and efficiency, it is very common to see solutions dealing with computations in clusters of machines, and sometimes it is referred to as a…
Rubens
  • 4,097
  • 5
  • 23
  • 42
14
votes
1 answer

Make Keras run on multi-machine multi-core cpu system

I'm working on Seq2Seq model using LSTM from Keras (using Theano background) and I would like to parallelize the processes, because even few MBs of data need several hours for training. It is clear that GPUs are far much better in parallelization…
chmodsss
  • 1,954
  • 2
  • 17
  • 37
12
votes
3 answers

Instances vs. cores when using EC2

Working on what could often be called "medium data" projects, I've been able to parallelize my code (mostly for modeling and prediction in Python) on a single system across anywhere from 4 to 32 cores. Now I'm looking at scaling up to clusters on…
Therriault
  • 871
  • 1
  • 8
  • 13
11
votes
3 answers

What needs to be done to make n_jobs work properly on sklearn? in particular on ElasticNetCV?

The constructor of sklearn.linear_model.ElasticNetCV takesn_jobs as an argument. Quoting the documentation here n_jobs: int, default=None Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context.…
11
votes
1 answer

GPU Accelerated Data Processing for R in Windows

I'm currently taking a paper on Big Data which has us utilising R heavily for data analysis. I happen to have a GTX1070 in my pc for gaming reasons. Thus, I thought it would be really cool if I could use that to speed up some of the processing for…
Jesse Maher
  • 113
  • 1
  • 5
4
votes
1 answer

Parallel Q-learning

I'm looking for academic papers or other credible sources focusing on the topic of parralelized reinforcement learning, specifically Q-learning. I'm mostly interested in methods of sharing Q-table between processes (or joining/syncing them together…
4
votes
1 answer

Open source solver for large mixed integer programming task?

I'm currently using General Algebraic Modeling System (GAMS), and more specifically CPLEX within GAMS, to solve a very large mixed integer programming problem. This allows me to parallelize the process over 4 cores (although I have more, CPLEX…
rnorberg
  • 203
  • 2
  • 7
4
votes
2 answers

Parallel active optimization

I'm trying to optimize an expensive function for which I can choose sample points. The difficulty is that many function evaluations may be computed in parallel, taking varying amounts of time. I don't know which keywords to search for to find…
Mark
  • 213
  • 1
  • 6
2
votes
1 answer

What makes a graph algorithm a good candidate for concurrency?

GraphX is the Apache Spark library for handling graph data. I was able to find a list of 'graph-parallel' algorithms on these slides (see slide 23). However, I am curious what characteristics of these algorithms make them parallelizable.
sheldonkreger
  • 1,169
  • 8
  • 20
2
votes
1 answer

Can parallel computing be utilized for boosting?

Since boosting is sequential, does that mean we cannot use multi-processing or multi-threading to speed it up? If my computer has multiple CPU cores, is there anyway to utilized these extra resources in boosting?
Indominus
  • 155
  • 6
2
votes
0 answers

What should be the value of parallel iterations in tensorflow RNN implementations?

tf.nn.dynamic_rnn() and tf.nn.raw_rnn() take in an argument called parallel_iterations. The documentation says: parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal…
figs_and_nuts
  • 775
  • 1
  • 4
  • 13
2
votes
1 answer

Parallel Data preprocessing

I am looking for a suggestion. Is it possible to implement the data preprocessing steps like missing value imputation, outlier detection, normalization, label encoding in parallel? Can I implement cuda/openmp/mpi programming for data…
Encipher
  • 359
  • 1
  • 9
1
vote
2 answers

How to load and run feature selection on a dataset with 5,000 samples and 500,000 features?

I have a dataset with 5000 samples and 500,000 features (all categorical with a cardinality of 3). Two problems I'm trying to solve: Loading the dataset - I can't load it in memory despite using a computing cluster, so I'm assuming I should use a…
1
vote
0 answers

Pytorch Distributed Computing - Recomendations/Resources/Courses?

I would like to get into some distributed computing for processing Pytorch CNN models. I am completely fresh in this field and want to get some recommendations as to where I should start researching and learning techniques in distributed computing…
1
2 3