# Simulation Proposal Cache Optimization in Chip Multi-Processor



# **1. Design Options**

- Mapping protocol (Direct, Set associative and full associative) and performance evaluation
- Coherence protocol (MSI, MESI and Dragon)
  and performance evaluation
- Bus Arbitration (Random, LRU and LFU) and performance evaluation

## 2. Simulator

- SMPcache 2.0 is chose as the simulator. Parameters that can be designed in the simulator are: cache coherence protocols, policies of bus arbitration, mapping, replacement policies, cache size (blocks in cache), number of cache sets, number of words by block (memory block size) and word wide.
- Being a MIMD (Multiple Instruction stream, Multiple Data stream) system, CMP requires parallel programs executed in multiprocessors to take full advance of SMP architecture's computation capacity.

### 3. Workload

Table 1 shows three parallel programs for two, four and eight processors respectively, which represent several typical programs in real application. We keep the problem size constant in every simulation configuration. To be specific, 40,000 memory traces of FFT parallel program was executed in our simulation project.

| Name    | References | Language | Description                                                                               |
|---------|------------|----------|-------------------------------------------------------------------------------------------|
| FFT     | 7,451,717  | Fortran  | Parallel application that simulates the fluid dynamics with FFT                           |
| Simple  | 27,030,092 | Fortran  | Parallel version of the SIMPLE application                                                |
| Weather | 31,764,036 | Fortran  | Parallel version of the WEATHER<br>application, which is used for weather<br>forecasting. |

Table1 Details of parallel benchmarks

### 4.1 Mapping protocol and performance evaluation



### **4.2** Coherence protocol and performance evaluation



#### 4.3 Bus arbitration and performance evaluation



## Conclusion

 In this simulation we found Full associative, Dragon and LFU having more beneficial impact on system performance. As the number of processors is increased, total miss rates rise for that communication among processors increase, which leads to more coherence misses.

