dplyr – IT Nursery

How to interpret dplyr message `summarise()` regrouping output by ‘x’ (override with `.groups` argument)?

June 4, 2022 by IT Nursery

I started getting a new message (see post title) when running group_by and summarise() after updating to dplyr development version 0.8.99.9003. Here is an example to recreate the output: library(tidyverse) library(hablar) df <- read_csv(“year, week, rat_house_females, rat_house_males, mouse_wild_females, mouse_wild_males 2018,10,1,1,1,1 2018,10,1,1,1,1 2018,11,2,2,2,2 2018,11,2,2,2,2 2019,10,3,3,3,3 2019,10,3,3,3,3 2019,11,4,4,4,4 2019,11,4,4,4,4”) %>% convert(chr(year,week)) %>% mutate(total_rodents = rowSums(select_if(., is.numeric))) %>% … Read more

Fixing a multiple warning “unknown column”

June 1, 2022 by IT Nursery

I have a persistent multiple warning of “unknown column” for all types of commands (e.g., str(x) to installing updates on packages), and not sure how to debug this or fix it. The warning “unknown column” is clearly related to a variable in a tbl_df that I renamed, but the warning comes up in all kinds … Read more

Can dplyr package be used for conditional mutating?

May 28, 2022 by IT Nursery

Can the mutate be used when the mutation is conditional (depending on the values of certain column values)? This example helps showing what I mean. structure(list(a = c(1, 3, 4, 6, 3, 2, 5, 1), b = c(1, 3, 4, 2, 6, 7, 2, 6), c = c(6, 3, 6, 5, 3, 6, 5, 3), … Read more

Extract a dplyr tbl column as a vector

May 25, 2022 by IT Nursery

Is there a more succinct way to get one column of a dplyr tbl as a vector, from a tbl with database back-end (i.e. the data frame/table can’t be subset directly)? require(dplyr) db <- src_sqlite(tempfile(), create = TRUE) iris2 <- copy_to(db, iris) iris2$Species # NULL That would have been too easy, so collect(select(iris2, Species))[, 1] … Read more

Use dynamic name for new column/variable in `dplyr`

May 24, 2022 by IT Nursery

I want to use dplyr::mutate() to create multiple new columns in a data frame. The column names and their contents should be dynamically generated. Example data from iris: library(dplyr) iris <- as_tibble(iris) I’ve created a function to mutate my new columns from the Petal.Width variable: multipetal <- function(df, n) { varname <- paste(“petal”, n , … Read more

Display / print all rows of a tibble (tbl_df)

May 23, 2022 by IT Nursery

tibble (previously tbl_df) is a version of a data frame created by the dplyr data frame manipulation package in R. It prevents long table outputs when accidentally calling the data frame. Once a data frame has been wrapped by tibble/tbl_df, is there a command to view the whole data frame though (all the rows and … Read more

Filter rows which contain a certain string

May 20, 2022 by IT Nursery

I have to filter a data frame using as criterion those row in which is contained the string RTB. I’m using dplyr. d.del <- df %>% group_by(TrackingPixel) %>% summarise(MonthDelivery = as.integer(sum(Revenue))) %>% arrange(desc(MonthDelivery)) I know I can use the function filter in dplyr but I don’t exactly how to tell it to check for the … Read more

data.table vs dplyr: can one do something well the other can’t or does poorly?

April 19, 2022 by IT Nursery

Overview I’m relatively familiar with data.table, not so much with dplyr. I’ve read through some dplyr vignettes and examples that have popped up on SO, and so far my conclusions are that: data.table and dplyr are comparable in speed, except when there are many (i.e. >10-100K) groups, and in some other circumstances (see benchmarks below) … Read more