Why are elementwise additions much faster in separate loops than in a combined loop?
Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This loop is executed 10,000 times via another outer for loop. To speed it … Read more