| Last Name: SATIPLE | First Name:                                 |
|--------------------|---------------------------------------------|
| Student ID:        | Signature:                                  |
|                    | •                                           |
| Course 304         | 425B Computer Organization and Architecture |

Final examination

April 18, 2000, 9:00 -- 12:00

Examiner: Prof. V. Hayward Associate Examiner: Prof. K. Khordoč

### INSTRUCTIONS

- This is a closed book examination. Calculators and up to two sheets of notes are allowed.
- Explain every result concisely when asked. Marks will be given for clear, concise solutions.
- State any assumption required for an answer if it is not clear in the text of the question.
- This exam has 12 pages including this one. It has 7 sections for 24 questions (including a bonus question) indicated by the bullet sign (•). The marks add up to 100.
- Please sign this paper at the top of the page, write your name and student number legibly there.
- Put your answers in the space provided and keep all the pages together.

### PLEASE NOTE CAREFULLY

- Make sure that the signed paper in its entirety is handed in (along with all signed exam books) at the end of examination.
- Make sure that the answers are put in the space provided, answers in any other location will not be marked.
- You have approximately 180 minutes to complete the exam.

## Section 1: Performance (12 points)

Apply Amdahl's law to compute the speed-up factor for a machine to which an enhancement is added to improve some mode of execution by a factor 10. This mode is used 50% of the time, measured as a percenta of the original exec time. (4 points)

$$T_{e} = T_{u} \left[ \left( 1 - F_{e} \right) + \frac{F_{e}}{SU_{e}} \right]$$
  

$$SU = \frac{T_{u}}{T_{e}} = \frac{1}{\left( 1 - F_{e} \right) + \frac{F_{e}}{SU_{e}}} = \frac{1}{.5 + \frac{.5}{10}} = \frac{1}{.55} = 1.82$$

Derive a variant of Amdahl's law to compute the speed-up factor for a machine to which an enhancement is added to improve some mode of execution by a factor 10. However in this question, the mode is used 50% the time measured as a percentage of the *enhanced* exec time. *Hint*: start from the definition of speed-up:  $Speed\_up = \frac{ExecTime_{unenhanced}}{ExecTime}$ , in short: SU =(4 points)

$$SU = \frac{T_u}{T_e} = \frac{1}{T_e} T_e \left[ (1 - F_e) + F_e SU_e \right]$$
  
=  $\cdot 5 + 5 = 5.5$ 

Assume that we have a Load/Store machine which behaves with a perfect cache as follows:

| ALU ops             | 40% | 1 clock cycle  |
|---------------------|-----|----------------|
| Load/Stores         | 30% | 2 clock cycles |
| Branches and others | 30% | 2 clock cycles |

The machine is modified to add new ALU instructions that have one source operand in memory. These new register-memory instructions have a clock cycle count of 2. The total number of ALU operations, branches, ar others instruction remains the same, of course, but the number of loads and stores is divided by two. Is this enhancement worth implementing? CC. (4 points)

$$\frac{CPU_{\text{TINE}}}{Cc} = \text{IC} \sum F_{i} CPI_{i} \quad s.i. \sum F_{i}=1 \quad \text{IC}, \text{ CLASSES}, \text{ [CPI'_G CHAN} \\ \text{NEW CLASSES : ALUOP, ALUOP2 } \frac{1/5}{2} \quad BO \\ \text{NEW CPT'S : } 2 & 2 \\ \text{NEW Fi'S : } \frac{4-.15}{.85} \quad \frac{.15}{.85} \quad \frac{.15}{.85} \quad \frac{.3}{.85} \\ \text{NEW Fi' S : } \frac{.4-.15}{.85} \quad \frac{.15}{.85} \quad \frac{.3}{.85} \\ \text{NEW FC : } .85 \text{ IC}_{OLD} \\ \frac{1}{\text{ImE}} \frac{1}{1} \text{ ImE}} = \frac{.85}{.85} \left( .25 + .15 \times 2 + .3 \times 2 \right) = 1.45 \\ \frac{.25}{.85} = \frac{.25}{.85} \right)$$

Timing analysis reveals that the memory cycles in the standard DLX pipeline are the limiting factor for clock c time improvement. One design option is to split the memory cycles in an attempt to increase the clock rate. Th often called super pipelining and is illustrated in the diagram below. Complete instruction fetches takes two st IA and IB. In the first stage, the memory addressed is specified, in the second, the instruction is read out. same technique is applied to the MEM stage, now split into a MA and a MB stage. The new design is fully pipeli This is symbolically represented by introducing two new pipe registers.



#### SUPERPIPELINED DLX

• Assuming full bypassing/forwarding (including to and from the memory) use the chart below to report timing diagram for this code. Also note that the branch must stall. Show the branch behavior to be del branch. Suppose that the jump instructions benefit from branch folding and that there is a hit. (10 points)

| LOOP: | LW   | R1,  | 0 (R. | 2) |
|-------|------|------|-------|----|
|       | SW   | R1,  | 0 (R. | 3) |
|       | BEQZ | R1,  | OUT   |    |
|       | ADDI | R2,  | R2,   | 4  |
|       | ADDI | R3,  | R3,   | 4  |
|       | J    | LOOI | 2     |    |
|       |      |      |       |    |

HA HB WB LW TA IB ID EX MA IA IB TD EX S 11B WB SW , WB BEQZ TA IB 5 5 ID Ex MA 1B Ex TA S 3 MA MB WB ADDI IB ID EX ADDI IA IBID HA TB WB IA TB J ID. D'A 致 EX MA TB WB LW ID EX S HA NB WB SW TВ TD



STANDARD DLX PIPELINE

Recall that there are four basic techniques to handle branches in a pipeline like DLX's:

(A) flush (or freeze) a number of instructions after the branch; (B) static prediction such as "predict-not-taken (C) delayed branch which creates "delay slots"; (D) delayed branch with canceling.

Consider now the following sequence to compute the double of the absolute value of a number in memory:

| 1. |       | LW   | R2,   | 0(R3)  | 11 | load number       |
|----|-------|------|-------|--------|----|-------------------|
| 2. |       | SLTI | R1,   | R2, 0  | 11 | R1 < 1 if $a < 0$ |
| 3. |       | BEQZ | R1,   | SKIP   | 11 | skip if a > 0     |
| 4. |       | SUB  | R2,   | R0, R2 | 11 | negate            |
| 5. | SKIP: | ADD  | R2,   | R2, R2 | 11 | double            |
| 6. |       | SW   | 0 (R3 | 3), r2 | 11 | store back        |

Show the timing of this sequence for the DLX pipeline assuming full forwarding and bypassing hardware assuming a register read and a write in the same clock cycle implicitly "forwards" through the register file ( first and then read). Use the chart to show the timing of instructions starting at instruction SLTI when the bra is taken. Fill-in the two blank entries according to the case. (note: a similar question was given last term, howev is NOT the same question).



4

• Assuming now that the machine can detect hazards, has forwarding hardware, and uses delayed branches (c C). Schedule the following code, to minimize the stalls. (5 points):

1. LOOP: 🔊 SGT R4, R1, R6 \\ compare R1 with R6 2. BNEZ R4, OUT \\ if R1 > R6 + LW 3. R2, 0(R3) \\ Load number **©** -4. SLTI R4, R2, 0  $\ R4 < -1 if a < 0$ (0) 5. BEQZ R4, SKIP  $\$  skip if a > 0 R2, R0, R2 6. SUB \\ negate 7. R2, R2, R2 SKIP: ADD \\ double \\ increment pointer
\\ store back ADDI R3, R3, 4 8. -9. SW 0(R3), r2\\ increment counter
\\ loop back to while test 10. ADDI R1, R1, 1 11. J LOOP 12. OUT: AND R2, R0, R0 \\ clear R2 . . . . DATA HAZARD (A),(D) (DELAYED BRANCH): (B) OK, (E) NEEDS DELAY BRANCH HAZARD LOAD  $\bigcirc$ THE SLOTS ; EG. STRATE GES TO FILL MANY "ADDI'S UP THE - ROVE "LOAD" BEFORE BNEZ : FILLS 100 SLOT C. THE ROVE "AND BEFORE BNE 2 NOVE THE SLOT (MNOWLEDGE of serant PELAY NUMBER IN POUBLE EXAMPLE R4, R1, R6 SGT LOOP: LOAD NUMBER REGARDLESS OF LW R2, O(R3)BRANCH OUT COI R4, OUT BNEZ SLTI R4, R2, 0 po test in DELAY SLOT R3, R3, ADDI FILL WITH APPI R4, SKIP BEQZ RIJRJJI APPI FILL SLOT WITH APPI R2, Ro, R2 SUB SKIP: R2, R2, R2 ADD O(R3), R2 SW J LOOP R2, RO, RO AND OUT :

5

Consider the pipeline below. The integer units can be controlled to carry out any types of integer instructions an the FP units any types of floating point operations.



| LOOP: | LD<br>MULTD                      | F2,0(R1)<br>F4,F2,F0                                     | Consider the $Y = a * X + Y$                                     |
|-------|----------------------------------|----------------------------------------------------------|------------------------------------------------------------------|
|       | LD<br>ADDD<br>SD<br>ADDI<br>ADDI | F6,0(R2)<br>F6,F4,F6<br>0(R2),F6<br>R1,R1,#8<br>R2,R2,#8 | Assume the la<br>additions, 6<br>result. The Co<br>support multi |
|       | SGTI<br>BEQZ                     | R3, R1, <b>20072</b><br>R3, LOOP                         | Use "Mem[1]<br>first load, "Re                                   |

Consider the code at the left which implement the vector operation: Y = a \* X + Y where X and Y are vector arrays.

Assume the latencies are 0 for all integer operations including loads, 4 for additions, 6 for the multiplication regardless of the instruction using result. The Common Data Bus is written and read on the same cycle and c support multiple data transfers (so there is no structural hazard there).

Use "Mem[10+Reg[R1]]" to denote, for example, the *value* fetched by first load, "Reg[R1]" to denote the *value* to be held in register R1, and to denote the value 8.

To illustrate operation, the table below indicates the status of the pipeline once the instructions of the first iterati have issued (that is at clock cycle 8), starting from a blank state.

|             |                                       | the second s |              |
|-------------|---------------------------------------|----------------------------------------------------------------------------------------------------------------|--------------|
|             | Instruction Status including the CC c | ounts spent in each stage                                                                                      | 8            |
| Instruction | Ĭssue                                 | Execute                                                                                                        | Write Result |
| LD          | 1                                     | 2                                                                                                              | 3            |
| MULTD       | 2                                     | 37                                                                                                             | 8            |
| LD          | 3                                     | 4                                                                                                              | 5            |
| ADDD        | - <b>4</b>                            | 8?? 4 - 10                                                                                                     | ?? ll        |
| SD          | 5                                     | ?? 6                                                                                                           | ??           |
| ADDI        | 6                                     | 7                                                                                                              | 8            |
| ADDI        | 7                                     | 8                                                                                                              | ?? 9         |
| SGTI        | 8                                     | ? <b>q</b>                                                                                                     | ?? <b>10</b> |

٠

•

INTO 3

•

Qi

Busy Address

has been issued).

# Indicate in the table below the status of the registers.

Y

|       |    |     | E Register Status |      |  |    |      |      |  |
|-------|----|-----|-------------------|------|--|----|------|------|--|
| Field | F0 | F2  | F4                | F6   |  | R1 | R2   | R3   |  |
| Qi    |    | LDI |                   | FPU2 |  |    | INT2 | INT3 |  |

| • Indicate in the ta | tole below the state of the stole bullers |  |
|----------------------|-------------------------------------------|--|
|                      | Store Buffers                             |  |

RZ]

FPU 2

0+

| QI  | 601                                   | FPUC                 | INCC | 1113    |            |
|-----|---------------------------------------|----------------------|------|---------|------------|
| •   | Indicate in the table below the state | of the store buffers |      |         | (1 points) |
|     |                                       | Store Buffers        | 1    |         |            |
| Fie | Id Store 1                            | Store 2              |      | Store 3 |            |

7

|        |      |      | Rese     | rvation Stations |    |    |
|--------|------|------|----------|------------------|----|----|
| Name   | Busy | Op   | Vj       | Vk               | Qj | Qk |
| FPU 2  | Y    | ADDD | REG [F4] | TENO+REG[R2]]    |    | -  |
| INTU 2 | ÿ    | ADDI | REG/R2]  | # 8              | -  | -  |

REG

DONE

| Field   | Store 1   | Store 2 | Store 3 |  |
|---------|-----------|---------|---------|--|
| Qi      | FPU 2     |         |         |  |
| Busy    | Y         |         |         |  |
| Address | 0-0-0-23- |         |         |  |
|         |           |         |         |  |

Ignoring the branch delay, now show the new state of the machine, one clock cycle later (this means a new load

Store Buffers

Indicate in the table below the state of the reservation stations.

SGTI

Indicate in the table below the state of the store buffers

| Name   | Busy     | Op     | Vj            | Vk               | Qj    | Qk |  |
|--------|----------|--------|---------------|------------------|-------|----|--|
| FPU 1  | Y        | HULTD  | MEMO+REG      | REGIFOT REGIFOT  |       |    |  |
| FPU 2  | Y        | ADDD   |               | MEN TO+REG [R2]] | FPUI  |    |  |
| INTI   | Y        | ADDI   | REGERI        | #8               |       | ~  |  |
| INT 2  | Y        | ADDI   | REF[82]       | # 8              | -     |    |  |
| INT. 3 | <b>Y</b> | SETI - | <mark></mark> | DONE             | INTUL |    |  |
|        |          |        | · · ·         |                  |       |    |  |
|        |          |        |               |                  |       |    |  |

**Reservation Stations** 

| IN |          | Y            |                 | KET/KGI              | ~ <u>0</u> |         |            | _ |
|----|----------|--------------|-----------------|----------------------|------------|---------|------------|---|
| IN | 3        | <b>Y</b>     | SETI            | <u> </u>             | DONE       | · INTUL |            | _ |
|    |          |              |                 | γ,                   |            | *       |            | _ |
| •  | Indicate | in the table | below the stati | is of the registers. |            |         | (2 points) |   |

|       |    | · · · · · · · · · · · · · · · · · · · |      | I     | Register S | tatus |      |      | <br> |
|-------|----|---------------------------------------|------|-------|------------|-------|------|------|------|
| Field | F0 | F2                                    | F4   | F6    | ····       | R1    | R2   | R3   |      |
| Qi    |    |                                       | FPUL | FPU 2 |            | INTI  | INT2 | INT3 |      |

CALL 74 INT 3 INT 2 INT 1

Indicate in the table below the state of the reservation stations.

(4 points)

(2 points)

(4 points)

(2 points)

# Section 4. Unrolling (5 points)

Consider a standard FP pipeline as in the mid-term:



Consider again the loop of the previous question:

| LOOP: | LD    | F2,0(R1) 100    |
|-------|-------|-----------------|
|       | MULTD | F4, F2, F0 3 CC |
|       | LD    | F6,0(R2)        |
|       | ADDD  | F6, F4, F6      |
|       | SD    | 0(R2),F6        |
|       | ADDI  | R1,R1,#8        |
|       | ADDI  | R2,R2,#8        |
|       | SGTI  | R3, R1, DONE    |
|       | BEOZ  | R3,LOOP         |

• Unroll this loop twice and schedule it for minimal execution time on average when run on the pipeline above Ignore the branch delay and assume that all branches are correctly predicted. (5 points)

|       |                    | 1000 |   |        |              |              |
|-------|--------------------|------|---|--------|--------------|--------------|
| 2008. | LD F2, 0(RI)       |      |   | LD     | FZ, O(RL)    |              |
|       | MUCTO F4, F2, F0   |      |   | 20     | FO, SUL      | START M      |
|       | LD F6, O(R2)       |      |   | nucto  | F4, F2, F0   | AS EARCH     |
| × .1  | ADDD F6, F4, FC    |      |   | HULTO  | FID, F8, F0  | POSSIBLE     |
|       | SD 0(R2), F6       |      |   | LD     | F6, 0(R2)    |              |
|       | LD F8, B(R1)       |      |   | LD     | FR,8 (R2)    |              |
|       | HULD FID, F8, FO   |      |   | ADDD   | F6, F4, F6   |              |
|       | LD FI2, $G(R2)$    |      |   | ADDD   | FR, FID, FIZ |              |
|       | APIND FR, FID, FIZ |      |   | SD     | 0(R2), F6    |              |
|       | ADDI 8(82), #16    |      |   | SD     | 8(R2) F17    |              |
|       | ADDI #16           |      | , |        |              |              |
|       | STI                |      | Ş | INTEGE | R CODE 3     | COULD BE     |
|       | BEQZ LOOP          |      | ζ |        | -5           | SCHE DULIN ( |
|       |                    |      |   |        |              |              |

### Section 5: Branch predictors (15 points)

Consider this infinite loop and its assembly code translation

| a = 1;    |       |    |     |         |   |    |       | ADDI | R1, | R0,  | 1 | 11 | init | а |
|-----------|-------|----|-----|---------|---|----|-------|------|-----|------|---|----|------|---|
| b = 1;    |       |    |     |         |   |    |       | ADDI | R2, | R0,  | 1 | 11 | init | b |
| while (1) | {     | /* | for | ever */ | • |    | B1:   | BNEZ | R1, | ELSI | Ξ |    |      |   |
| if (a     | == 0) |    |     |         |   |    |       | ADDI | R1, | R0,  | 1 |    |      |   |
| a         | = 1;  |    |     |         |   |    |       | J    | B2  |      |   |    |      |   |
| else      |       |    |     |         |   | r. | ELSE: | ADDI | R1, | R0,  | 0 |    |      |   |
| a         | = 0;  |    |     |         |   |    | B2:   | BEZ  | R1, | B3   |   |    |      |   |
| if (a     | != 0) |    |     |         |   |    |       | ADDI | R2, | R0,  | 0 |    |      |   |
| b         | = 0;  |    |     |         |   |    | B3:   | BNEZ | R2, | B1   |   |    |      |   |
| if (b     | == 0) |    |     |         |   |    |       | ADDI | R2, | RO,  | 1 |    |      |   |
| b         | = 1;  |    |     |         |   |    |       | J    | B1  |      |   |    |      |   |
|           |       |    |     |         |   |    |       |      |     |      |   |    |      |   |

In the table below, the successive values of a and b are listed. Notice the period two. The sequence of taken (T) an not taken (N) branch outcomes is also given in the table below.

| a | b |           |               |
|---|---|-----------|---------------|
| 1 | 1 |           | B1 outcome: T |
| 0 | 1 |           | B2 outcome: T |
| 0 | 1 |           | B3 outcome: T |
| 0 | 1 |           | B1 outcome: N |
| 1 | 1 |           | B2 outcome: N |
| 1 | 0 |           | B3 outcome: N |
| 1 | 1 |           | B1 outcome: T |
| 0 | 1 | And so-on | B2 outcome: T |

A machine has a 2-bit branch predictor mechanism. What is the performance of this predictor while executin this code in the steady state in terms of correct predictions(s) per iteration? A concise explanation must be given to get the marks.
 (5 points)

EACH BRANCH (BI, B2, B3) HAS A TWO BIT PREDICTOR BI SEQUENCE: T, N, T, N ... B2 SEQUENCE: T, N, T, N ... B3 SEQUENCE: T, N, T, N ... ONE CORRECT PREPICTION OUT OF TWO : 50% /2

• A machine has a (1,1) correlating branch predictor. What is its performance while executing the same code the steady state in terms of correct prediction(s) per iteration? Fill the table below to get the marks. (5 points that below to get the marks. (5 points that below to get the marks.)

|   | LASI DIAMER   | NUT INNER STOR |                         |                      | -         |
|---|---------------|----------------|-------------------------|----------------------|-----------|
| 1 | B1 prediction | bits: NN       | B1 prediction: N        | B1 outcome: T        | UPDATE NO |
|   | B2 prediction | bits: TT       | B2 prediction: T        | B2 outcome: <b>T</b> | NO UPPATE |
|   | B3 prediction | bits: NN       | B3 prediction: N        | B3 outcome: T        | UPDATE    |
| - | B1 prediction | bits: TN       | B1 prediction: N        | B1 outcome: N        | NO UPDAT  |
|   | B2 prediction | bits: TT       | B2 prediction: <b>T</b> | B2 outcome: N        | UPDATE N  |
| ) | B3 prediction | bits: NT       | B3 prediction: N        | B3 outcome: N        | NO UPDA   |
|   | B1 prediction | bits: TN       | B1 prediction: T        | B1 outcome: 🍸        | NO UPD    |
|   | B2 prediction | bits:NT        | B2 prediction: N        | B2 outcome: T        | UPPATE    |
|   | B3 prediction | bits:NT        | B3 prediction: T        | B3 outcome: T        | NO UPDI   |
|   |               |                |                         |                      |           |

Average number of correct predictions?

2/3

for the code below, supposing that there is a hit in the buffer (that is: predicted taken), but the prediction is incorrect. (5 points)

| 1.<br>2.<br>3.<br>4.<br>5.<br>6. | SKIP: | SLTI<br>BNEZ<br>SUBI<br>MULT<br>SW<br>AND | R5,<br>R5,<br>R1,<br>R1,<br>R1,<br>R1, | R1, 0<br>SKIP<br>R0, R1<br>R1, R1<br>0(R7)<br>R0, R0 | $\begin{array}{c} \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\$ | compare R1 with 0<br>if R1 >= 0 skip<br>negate<br>double<br>store it<br>clear R1 |  |
|----------------------------------|-------|-------------------------------------------|----------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------|--|
|----------------------------------|-------|-------------------------------------------|----------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------|--|

| SLTI | H | Ð  | EX | ME | WB  |    |    |  |   |       |  |  |  |   |                  |
|------|---|----|----|----|-----|----|----|--|---|-------|--|--|--|---|------------------|
| BNEZ |   | If | S  | ID | EX  | NE | wВ |  |   |       |  |  |  |   |                  |
| NULT |   |    |    | IF | 10* |    |    |  |   |       |  |  |  |   | SENP PC          |
| SUBE |   |    |    |    | ٠.  | IF |    |  | - | -     |  |  |  |   | , <b>#11 , 1</b> |
| NULT |   | 8  |    |    |     |    | IF |  |   | <br>- |  |  |  | - |                  |

KKL NULT '

### Section 6: Loop level parallelism (15 points)

Consider this loop:

• List all the dependencies: output dependencies, anti-dependencies, and true data dependencies and indicate : each dependency the pair of statements and which are "loop carried" (5 points)

| Output Dependencies: | 51 - \$3 | L 00P | CAARIED | $a[i] \rightarrow a[i-1]$ |
|----------------------|----------|-------|---------|---------------------------|
| Anti Dependencies:   | 53 - 51  | 2009  | CARRIED |                           |
| Data Dependencies    | 52- 53   |       |         |                           |

Rewrite the loop so it becomes parallel. Solve this problem in two different ways:

$$for(i=1; i < 99; tti) {
b[i] = m + c[i];
a[i] = a[i] + b[i];
a[i] = c[i] + m;
}
b[m] = m + c[m]; 10
a[m] = a[] + b[m];$$

### Section 7: Memory Hierarchy (18 points)

A cache system has B blocks of N words and total storage capacity L (for valid bits, tags, and data) measured in bits. Recall that the degree of associativity A is defined as the number of blocks per set. Assume further that the memory address space is  $2^{2}$  and that the memory is word addressed (each word has W bits). Call H the hit time, M the miss rate, and P the miss penalty measured in clock cycles.

Consider now this contrived but interesting example (read the whole section before starting). The benchmark test is to visit (read only) all the addresses in the address space exactly once.

• Calculate the AMAT of the cache system for this test as a function of B, N, W, Z, H, and P starting from a blank cache (all the valid bits are off). In developing the formula, take the case of a direct mapped cache, or equivalently A = 1, that is M sets. (6 points)



• Work out the result for B = 16, N = 4, A = 1, Z = 32, W = 32, H = 2, and P = 20. (4 points) 1 HISS, 3 HITS IN CLOCK CYCLES 4 x 2 + 20 FOR EACH BLOCK OF 4 WORD AVERAGE ACCESS TIME =  $\frac{28}{4} = 7$  CC.

Note that these last two questions are independent. You can solve the numerical example by reasoning it out and then derive the formula, or you can develop the formula first and then plug the numbers in.

• Bonus question!: Solve the same problems for A = 2 (5 + 5 points

MORE ASSOCIATIVITY DOES NOT CHANGE RESULT.

CPU requests this sequence of addresses:3AC54230, A35C2340, and 57BF2344. If there is a miss, ate a replacement by showing which tag gets changed and assume the blocks continue to hold the same In any case, indicate below the values returned to the CPU. (4 points)

30: PHYS. ADDR. JAC5244; INDEX . 2 ; TAG : JAC52 ; MISS; RETURN 2.2 OR D. 40: PHYS. ADDR. A35CI23; INDEX: 1; TAG: A35CI; MISS; RETURN I. I OR E.E 44: PHYS. ADDR. 57BF244; INDEX: 2; TAG: 57BF2; MISS; RETURN D.D OR 2.2

ACCORDING TO



52

301

Write Back, Write Allocate

F2

all the bytes in each value. In this case: "8"