Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

I’m having a little trouble understanding the pass-by-reference properties of data.table. Some operations seem to ‘break’ the reference, and I’d like to understand exactly what’s happening. On creating a data.table from another data.table (via <-, then updating the new table by :=, the original table is also altered. This is expected, as per: ?data.table::copy and … Read more

How do you delete a column by name in data.table?

To get rid of a column named “foo” in a data.frame, I can do: df <- df[-grep(‘foo’, colnames(df))] However, once df is converted to a data.table object, there is no way to just remove a column. Example: df <- data.frame(id = 1:100, foo = rnorm(100)) df2 <- df[-grep(‘foo’, colnames(df))] # works df3 <- data.table(df) df3[-grep(‘foo’, … Read more

data.table vs dplyr: can one do something well the other can’t or does poorly?

Overview I’m relatively familiar with data.table, not so much with dplyr. I’ve read through some dplyr vignettes and examples that have popped up on SO, and so far my conclusions are that: data.table and dplyr are comparable in speed, except when there are many (i.e. >10-100K) groups, and in some other circumstances (see benchmarks below) … Read more