## ECSE 420, Fall 2008

## **Midterm Examination**

1- (Performance bounds- 20 points) We are going to find the average of elements in grid of (n\*n). Each element of this grid may be a mathematical expression. Therefore, every element of this grid requires computation. Based on the Amdhal's law compute the speed up of using K processors in these two following situations:

- a. Each processor has its own private value for holding the sum.
- b. The processors have to use n\_sv shared values to keep tracking of the sum. That is, every processor should sum its result to the shared variable of sum (assume that shared variables are assigned equally; for example, if we have 8 processors and 4 shared variables each shared variable is assigned to two processors.

Note: Use T(K) and T(1) respectively with respect to the running time under K processors and 1 processor:

2-(Communication cost – 15 points) For a machine with the message start-up time of  $T_0$  ns and the asymptotic peak bandwidth of 2 GB/s, if message lengths for reaching the 1/3rd of the peak bandwidth is 540 bytes what is the amount of  $T_0$  and the message length (bytes) for reaching to 2/3rd of the peak bandwidth?

3- (Domain Decomposition– 15 points) You are given a simple 3D Array of integers which consists of  $n^3$  elements. There are p processors in a system. Using the Domain Decomposition technique, you can be able to varyingly parallelize operations of P processors on this 3D array. What are two different ways of decomposition and the concerning computation and computation overhead of each way.

4- (Bus communication requirements – 20 points) Consider a bus-based machine with 4 processors, each running at a 0.5 GIPS and running a workload that consists of: 50% ALU operations, 20% loads, 10% stores and 20% branches. Suppose that the cache miss rate at each processor is1% for instruction cache and 3% for data cache, and that the cache sharing among 2 processors is 40% and zero otherwise. The system bus bandwidth is 7.5GB/s. Assuming that the cache line is 32 bytes large, and a snooping protocol, determine the bandwidth used. How many processors could the bus accommodate?

5- (Cache coherency -10 points) Using the MSI protocol, show the state transitions and bus transaction for the following scenario; for your convenience, we put the 3 states MSI protocol here:



| Processor Action | State in P1 | State in P2 | State in P3 | Bus Action | Data Supplied by |
|------------------|-------------|-------------|-------------|------------|------------------|
| P1 reads u       |             |             |             |            |                  |
| P3 reads u       |             |             |             |            |                  |
| P3 writes u      |             |             |             |            |                  |
| P1 reads u       |             |             |             |            |                  |
| P2 reads u       |             |             |             |            |                  |

6-(Cache coherency – 20 points) classify the misses in the following reference from three processors into categories shown in the attached page. Assume that each processor's cache consists of only a single four-word cache block and that all the cache is initially empty. Moreover, assume that each processor's cache consist of only a single four-word cache block. That is, w0 through w3 fall on the same cache block, as do words w4 through w7.

|    | P1    | P2    | P3    | Miss classification |
|----|-------|-------|-------|---------------------|
| 1  | Ld w0 |       | Ld w2 |                     |
| 2  |       |       | st w2 |                     |
| 3  |       | Ld w1 |       |                     |
| 4  |       | Ld w2 | Ld w7 |                     |
| 5  | Ld w5 |       |       |                     |
| 6  |       | Ld w6 |       |                     |
| 7  |       | st w6 |       |                     |
| 8  | Ld w5 |       |       |                     |
| 9  | Ld w6 |       |       |                     |
| 10 | Ld w2 | Ld w1 |       |                     |
| 11 | St w5 |       |       |                     |
| 12 |       |       | St w2 |                     |
| 13 |       |       | Ld w7 |                     |
| 14 |       |       | Ld w2 |                     |
| 15 | Ld w0 |       |       |                     |

