Questions tagged [r]

R is a free, open-source programming language and software environment for statistical computing, bioinformatics, and graphics.

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R was created by Ross Ihaka and Robert Gentleman and is now developed by the R Development Core Team. The R environment is easily extended through a packaging system on CRAN.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and Mac OS.

1480 questions
132
votes
1 answer

How to get correlation between two categorical variable and a categorical variable and continuous variable?

I am building a regression model and I need to calculate the below to check for correlations Correlation between 2 Multi level categorical variables Correlation between a Multi level categorical variable and continuous variable VIF(variance…
GeorgeOfTheRF
  • 2,018
  • 5
  • 17
  • 20
127
votes
14 answers

Python vs R for machine learning

I'm just starting to develop a machine learning application for academic purposes. I'm currently using R and training myself in it. However, in a lot of places, I have seen people using Python. What are people using in academia and industry, and…
user721
  • 159
  • 2
  • 3
  • 3
61
votes
10 answers

IDE alternatives for R programming (RStudio, IntelliJ IDEA, Eclipse, Visual Studio)

I use RStudio for R programming. I remember about solid IDE-s from other technology stacks, like Visual Studio or Eclipse. I have two questions: What other IDE-s than RStudio are used (please consider providing some brief description on them). Does…
IgorS
  • 5,444
  • 11
  • 31
  • 43
55
votes
9 answers

Is the R language suitable for Big Data

R has many libraries which are aimed at Data Analysis (e.g. JAGS, BUGS, ARULES etc..), and is mentioned in popular textbooks such as: J.Krusche, Doing Bayesian Data Analysis; B.Lantz, "Machine Learning with R". I've seen a guideline of 5TB for a…
akellyirl
  • 723
  • 1
  • 6
  • 9
36
votes
7 answers

Organized processes to clean data

From my limited dabbling with data science using R, I realized that cleaning bad data is a very important part of preparing data for analysis. Are there any best practices or processes for cleaning data before processing it? If so, are there any…
Jay Godse
  • 461
  • 5
  • 7
33
votes
3 answers

Hypertuning XGBoost parameters

XGBoost have been doing a great job, when it comes to dealing with both categorical and continuous dependant variables. But, how do I select the optimized parameters for an XGBoost problem? This is how I applied the parameters for a recent Kaggle…
Dawny33
  • 8,226
  • 12
  • 47
  • 104
30
votes
4 answers

Is pandas now faster than data.table?

Here is the GitHub link to the most recent data.table benchmark. The data.table benchmarks has not been updated since 2014. I heard somewhere that Pandas is now faster than data.table. Is this true? Has anyone done any benchmarks? I have never used…
xiaodai
  • 620
  • 1
  • 5
  • 12
29
votes
10 answers

Any Online R console?

I am looking for an online console for the language R. Like I write the code and the server should execute and provide me with the output. Similar to the website Datacamp.
Gotham
  • 291
  • 1
  • 3
  • 3
28
votes
5 answers

VM image for data science projects

As there are numerous tools available for data science tasks, and it's cumbersome to install everything and build up a perfect system. Is there a Linux/Mac OS image with Python, R and other open-source data science tools installed and available for…
JeanVuda
  • 421
  • 4
  • 6
26
votes
2 answers

Removing strings after a certain character in a given text

I have a dataset like the one below. I would like to remove all characters after the character ©. How can I do that in R? data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", "© 2013 Chinese National Committee…
Hamideh
  • 920
  • 2
  • 11
  • 22
26
votes
7 answers

Is Python a viable language to do statistical analysis in?

I originally came from R, but Python seems to be the more common language these days. Ideally, I would do all my coding in Python as the syntax is easier and I've had more real life experience using it - and switching back and forth is a pain. Out…
confused
  • 488
  • 4
  • 10
25
votes
4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…
25
votes
4 answers

Is there any data tidying tool for python/pandas similar to R tidyr tool?

I'm working on a Kaggle challenge where some variables are represented by rows instead of columns (Telstra Network Disruption). I am currently searching for the equivalent of gather(), separate() and spread(), which can be found in R tidyr tool.
cpumar
  • 807
  • 1
  • 9
  • 14
21
votes
6 answers

What do you use to generate a dashboard in R?

I need to generate periodic (daily, monthly) web analytics dashboard reports. They will be static and don't require interaction, so imagine a PDF file as the target output. The reports will mix tables and charts (mainly sparkline and bullet graphs…
aiolias
20
votes
5 answers

Do modern R and/or Python libraries make SQL obsolete?

I work in an office where SQL Server is the backbone of everything we do, from data processing to cleaning to munging. My colleague specializes in writing complex functions and stored procedures to methodically process incoming data so that it can…
AffableAmbler
  • 363
  • 1
  • 2
  • 10
1
2 3
98 99