data.table vs dplyr: can one do something well the other can’t or does poorly?

Overview I’m relatively familiar with data.table, not so much with dplyr. I’ve read through some dplyr vignettes and examples that have popped up on SO, and so far my conclusions are that: data.table and dplyr are comparable in speed, except when there are many (i.e. >10-100K) groups, and in some other circumstances (see benchmarks below) … Read more

Grouping functions (tapply, by, aggregate) and the *apply family

Whenever I want to do something “map”py in R, I usually try to use a function in the apply family. However, I’ve never quite understood the differences between them — how {sapply, lapply, etc.} apply the function to the input/grouped input, what the output will look like, or even what the input can be — … Read more

Sort (order) data frame rows by multiple columns

I want to sort a data frame by multiple columns. For example, with the data frame below I would like to sort by column ‘z’ (descending) then by column ‘b’ (ascending): dd <- data.frame(b = factor(c(“Hi”, “Med”, “Hi”, “Low”), levels = c(“Low”, “Med”, “Hi”), ordered = TRUE), x = c(“A”, “D”, “A”, “C”), y = … Read more

How to make a great R reproducible example

This question’s answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions. When discussing performance with colleagues, teaching, sending a bug report or searching for guidance on mailing lists and here on Stack Overflow, a reproducible example is often asked and always helpful. What are … Read more