Cache (pronounced like cash)

A cache is used to speed access to memory. There are two major ideas that govern the effectiveness of a cache

  1. When going to memory to fill the cache, get more code or data than is needed
  2. Hopefully, the extra code or data that is fetched will be used in the next few instructions

Since it takes so long to make a new request to RAM, then it makes sense to fetch a lot of code or data every time that RAM is accessed. Of course, if the code or data that is fetched is not used before it is purged from the cache, then it was a waste of time to bring it into the cache. However, computer programs tend to conform to the Locality Principle: the next memory location needed from RAM will be close to the current memory location.

Consider several situations for the locality principle

  1. A loop without procedure calls.
  2. A procedure that does not call other procedures
  3. Local variables in a function
  4. The elements of an array

In the first two examples, code is being referenced. When the first statement for the loop or procedure is loaded from RAM, it will be placed into the cache, along with many of the following instructions. Possibly all of the instructions in the loop or function will be loaded into the cache during the first read to RAM. Once instructions are in the cache, they can be read much more quickly than if they were in RAM, so the execution cycle is sped up. If the loop or function is large, then it may fill several lines of data in the cache. In this case, every time a statement that is needed is not in the cache, a call to RAM will occur, and another block of instructions will be placed into the cache.

In the second two examples, data is being referenced. It is important to note that all the local variables will be near each other in memory, as will all the elements in an array. When the first local variable is loaded into the cache, then several more will be loaded at the same time. The same is true for the array, when a reference is made to the array, then several array elements will be fetched into the cache. If there are a lot of local variables, or a large array is being accessed, several cache lines may need to be filled. Whenever a new cache line is filled, there will be a delay while RAM is being accessed and the data is transferred to the cache. Any subsequent references to data in that cache line will be much faster than accessing RAM again.

As a simple example, suppose that a cache line can hold either 4 instructions or 4 data items. Consider the following program segment.

1         int a,b,c,d;
2         int arr[10];
3         a=0;
4	  c=2;
5	  d=3;
6  top:   if (a >= 10) goto done
7         b = c + d;
8         c = c + 1;
9         d = b * 3;
10        arr[a] = c + d;
11        a = a + 1;
12        goto top
13 done:  
  

Let's investigate how this program would get loaded into the cache. Assume that the cache is empty when we start the program.

Some observations about this process

A question to consider about this process: What would happen if there were only 4 cache lines?