Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

I’m having a little trouble understanding the pass-by-reference properties of data.table. Some operations seem to ‘break’ the reference, and I’d like to understand exactly what’s happening.

On creating a data.table from another data.table (via <-, then updating the new table by :=, the original table is also altered. This is expected, as per:

?data.table::copy
and stackoverflow: pass-by-reference-the-operator-in-the-data-table-package

Here’s an example:

library(data.table)

DT <- data.table(a=c(1,2), b=c(11,12))
print(DT)
#      a  b
# [1,] 1 11
# [2,] 2 12

newDT <- DT        # reference, not copy
newDT[1, a := 100] # modify new DT

print(DT)          # DT is modified too.
#        a  b
# [1,] 100 11
# [2,]   2 12

However, if I insert a non-:= based modification between the <- assignment and the := lines above, DT is now no longer modified:

DT = data.table(a=c(1,2), b=c(11,12))
newDT <- DT        
newDT$b[2] <- 200  # new operation
newDT[1, a := 100]

print(DT)
#      a  b
# [1,] 1 11
# [2,] 2 12

So it seems that the newDT$b[2] <- 200 line somehow ‘breaks’ the reference. I’d guess that this invokes a copy somehow, but I would like to understand fully how R is treating these operations, to ensure I don’t introduce potential bugs in my code.

I’d very much appreciate if someone could explain this to me.

2 Answers
2

Leave a Comment