Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

22 questions
3
votes
1 answer

Weighted mean with summarise_at dplyr

I strictly need to use the summarise_at to compute a weighted mean, with weights based on the values of another column df %>% summarise_at(.vars = vars(FACTOR,tv:`smart tv/console`), .funs = weighted.mean, w=INVESTMENT,…
3nomis
  • 531
  • 6
  • 17
2
votes
1 answer

How to split data in R using dplyr if we want to have rows of the same group to belong to the same split?

In my current pipeline, I have sensed that there is data leakage. This is because the same person, though with slightly different values, is in both training and testing set. As a result, my model is overfitting. For eg my data looks like this: PID …
Dee
  • 41
  • 2
2
votes
2 answers

Divide a column by itself with mutate_at dplyr

Hi I'd like to turn each non zero value of my selected columns to a 1 using mutate_at() BRAND MEDIA_TYPE INV1 INV2 b1 newspapers 2 27 b1 magazines 3 0 b2 …
3nomis
  • 531
  • 6
  • 17
2
votes
3 answers

R summarise with condition

I have customer data with the products they purchased and the purchase date. I want to extract a result that shows each customer and the first two fruits they purchased. My actual set has 90000 rows with 9000 unique customers. I have tried groupby…
nut get
  • 21
  • 1
  • 3
2
votes
0 answers

Group_by field is not showing in the summarise output in R

In R,using dplyr package, I tried the function "summarise" and I expect the result to show along with the groupby field. However, all of a sudden I see summarized output but without the groupby filed which makes the results meaningless. Any one any…
1
vote
1 answer

Flag consecutive dates by group

Below is an example of my data (Room and Date). I would like to generate variables Goal1 , Goal2 and Goal3. Every time there is a gap in the Date variable means that the room was closed. My goal is to identify consecutive dates by room. Room …
Mar355
  • 37
  • 5
1
vote
2 answers

Sort a data frame column based on another sorted column value in R

I have a data frame that is sorted based on one column (numeric column) to assign the rank. If this column value is zero then arrange the data frame based on another character column for those rows which have zero as a value in a numeric column. But…
Sddr
  • 11
  • 1
  • 2
1
vote
1 answer

Find the mode value and frequency in R

I'm trying to come up with a function in R that gives the mode value of a column along with the number of times (or frequency) that the value occurs. I want it to exclude missing (or blank) values, and treat ties by showing both values. When there…
DataGuy23
  • 31
  • 1
  • 4
1
vote
1 answer

Group_by 2 variables and pivot_wider distribution based on 2 others

Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum…
DataGuy23
  • 31
  • 1
  • 4
1
vote
2 answers

R: Producing multiple plots (ggplot, geom_point) from a single CSV with multiple subcategories

I have a collection of bacteria data from approximately 140 monitoring locations in California. I would like to produce a scatterplot for each monitoring location with the Sampling Date on the Y-axis and the Bacteria Data on the X-axis. The Sampling…
Kota_K
  • 35
  • 6
1
vote
2 answers

Mutate with dynamic column names dplyr

Hi I have this dataset (It has many more columns) media brand radio tv cinema radio 0 0 0 0 tv 0 0 …
3nomis
  • 531
  • 6
  • 17
1
vote
4 answers

Mutate with custom function in R does not work

I have a data frame, containing a column called: "Frequency". Frequency has values like "Year", "Week", "Month" etc. Now I want to create a new column based on the Frequency column where Year's new corresponding value will be 1, Month's…
Nuibb
  • 125
  • 2
  • 6
1
vote
2 answers

Theoretical Question: Data.table vs Data.frame with Big Data

I know that I can read in a very large csv file much faster with fread using the data.table library than with read.csv that reads a file in as a data.frame. However, dplyr can only perform operations on data.frame. My questions are: Why was…
Bear
  • 145
  • 1
  • 9
1
vote
0 answers

More efficient way to create frequency column based on different groupings

I have code below that calculates a frequency for each column element (respective to it's own column) and adds all five frequencies together in a column. The code works but is very slow and the majority of the processing time is spent on this…
Curt
  • 11
  • 1
0
votes
1 answer

How to add a column for descending row numbers into dataset in R

I am new to R and would like to insert a new column that numbers the row to a large dataset. I have no idea how to use 'mutate()' to insert this. Would appreciate any help. Thanks.
BioIsaac
  • 23
  • 4
1
2