- Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching
by E. Rotenberg, S. Bennett, and J.E. Smith, Proceedings of the 29th Annual
International Symposium on Microarchitecture, November 1996. First paper on
trace caches.
- Combining Branch Predictors, S. McFarling, WRL Technical Note TN-36, June
1993. Proposes the gshare branch predictor, covers a few others. See also the
paper by Yeh and Patt (below).
- Alternative Implementations of Two-Level Adaptive Branch Prediction by
T.-Y. Yeh and Y. N. Patt. Proceedings of the 19th Annual International Symposium
on Computer Architecture, June 1992, pp. 124-134. The classic reference on
two-level branch prediction.
- Checkpoint processing and recovery: Towards scalable large instruction
window processors. By H. Akkary, R. Rajwar, and S. T. Srinivasan. In MICRO 36,
December 2003. Reordering without the reorder buffer.
- Implementation of precise interrupts in pipelined processors by J. E. Smith
and A. R. Pleszkun. Proceedings of the 12th Annual International Symposium on
Computer Architecture, June 1985, pp. 36-44. The original paper on reorder
buffers and their alternatives.
- The Mips R10000 superscalar microprocessor by K. C. Yeager, IEEE Micro,
April 1996. One of the first out-of-order microprocessors. Uses a merged
physical register file (unlike the P6).
- The Alpha 21264 microprocessor by R. E. Kessler, IEEE Micro, Mar/Apr 1999.
Another out-of-order microprocessor that also uses a merged physical register
file. The 21264 was easily the fastest processor available when it came out. The
"dual cluster" design that uses two copies of the register file to reduce the
complexity and latency of the bypass network is particularly interesting. This
paper also has a substantial discussion of the 21264 tournament branch predictor
that's also described in the textbook.
- The Microarchitecture of the PentiumŪ 4 Processor by Glenn Hinton et al.
Intel Technology Journal, Vol. 5 Issue 1 (February 2001). Description of the
Pentium 4 microarchitecture by the chief designers. includes some comparisons
with P6 and some justification of the deep pipeline/high frequency design goal.