2

I understand Hadoop MapReduce and its features but I am confused about R MapReduce.

One difference I have read is that R utilizes maximum RAM. So do perform parallel processing integrated R with Hadoop.

My doubt is:

  1. R can do all stats, math and data science related stuff, but why R MapReduce?
  2. Is there any new task I can achieve by using R MapReduce instead of Hadoop MapReduce? If yes, please specify.
  3. We can achieve the task by using R with Hadoop (directly) but what is the importance of MapReduce in R and how it is different from normal MapReduce?
Air
  • 822
  • 9
  • 20
user3782364
  • 101
  • 3

1 Answers1

2

rhadoop (the part you are interested in is now called rmr2) is simply a client API for MapReduce written in R. You invoke MapReduce using R package APIs, and send an R function to the workers, where it is executed by an R interpreter locally. But it is otherwise exactly the same MapReduce.

You can call anything you like in R this way, but no R functions are themselves parallelized to use MapReduce in this way. The point is simply that you can invoke M/R from R. I don't think it somehow lets you do anything more magical than that.

Sean Owen
  • 6,585
  • 6
  • 31
  • 43
  • Can we do Regresssion,clustering,classifications using Rmr...... If it is possible then we can do using R directly.. only because of Parallelism we are using Rmr.. If i am correct.. Is there any main Difference between Hadoop Mapreduce and R mapreduce (apart from Parallelism)... – user3782364 Jun 28 '14 at 16:21
  • rmr is a framework for running R functions in MapReduce. That is the thing it lets you do that you could not do before. It is not a library of new statistical functions of course. – Sean Owen Jun 28 '14 at 17:32
  • Then any other specific feature where hadoop mapreduce can't handle?..... – user3782364 Jun 29 '14 at 11:32