1. Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching by E. Rotenberg, S. Bennett, and J.E. Smith, Proceedings of the 29th Annual International Symposium on Microarchitecture, November 1996. First paper on trace caches.
  2. Combining Branch Predictors, S. McFarling, WRL Technical Note TN-36, June 1993. Proposes the gshare branch predictor, covers a few others. See also the paper by Yeh and Patt (below).
  3. Alternative Implementations of Two-Level Adaptive Branch Prediction by T.-Y. Yeh and Y. N. Patt. Proceedings of the 19th Annual International Symposium on Computer Architecture, June 1992, pp. 124-134. The classic reference on two-level branch prediction.
  4. Checkpoint processing and recovery: Towards scalable large instruction window processors. By H. Akkary, R. Rajwar, and S. T. Srinivasan. In MICRO 36, December 2003. Reordering without the reorder buffer.
  5. Implementation of precise interrupts in pipelined processors by J. E. Smith and A. R. Pleszkun. Proceedings of the 12th Annual International Symposium on Computer Architecture, June 1985, pp. 36-44. The original paper on reorder buffers and their alternatives.
  6. The Mips R10000 superscalar microprocessor by K. C. Yeager, IEEE Micro, April 1996. One of the first out-of-order microprocessors. Uses a merged physical register file (unlike the P6).
  7. The Alpha 21264 microprocessor by R. E. Kessler, IEEE Micro, Mar/Apr 1999. Another out-of-order microprocessor that also uses a merged physical register file. The 21264 was easily the fastest processor available when it came out. The "dual cluster" design that uses two copies of the register file to reduce the complexity and latency of the bypass network is particularly interesting. This paper also has a substantial discussion of the 21264 tournament branch predictor that's also described in the textbook.
  8. The Microarchitecture of the PentiumŪ 4 Processor by Glenn Hinton et al. Intel Technology Journal, Vol. 5 Issue 1 (February 2001). Description of the Pentium 4 microarchitecture by the chief designers. includes some comparisons with P6 and some justification of the deep pipeline/high frequency design goal.