Cache hit Haswell

Материал из YourcmcWiki
Перейти к: навигация, поиск

http://www.7-cpu.com/cpu/Haswell.html has a table of latency per cache (which I'll copy here), and some other experimental numbers, including L2-TLB hit latency (on an L1DTLB miss).

Intel i7-4770 (Haswell), 3.4 GHz (Turbo Boost off), 22 nm. RAM: 32 GB (PC3-12800 cl11 cr2).

  • L1 Data cache = 32 KB, 64 B/line, 8-WAY.
  • L1 Instruction cache = 32 KB, 64 B/line, 8-WAY.
  • L2 cache = 256 KB, 64 B/line, 8-WAY
  • L3 cache = 8 MB, 64 B/line
  • L1 Data Cache Latency = 4 cycles for simple access via pointer (mov rax, [rax])
  • L1 Data Cache Latency = 5 cycles for access with complex address calculation (mov rax, [rsi + rax*8]).
  • L2 Cache Latency = 12 cycles
  • L3 Cache Latency = 36 cycles
  • RAM Latency = 36 cycles + 57 ns (194 cycles)

The top-level benchmark page is http://www.7-cpu.com/utils.html, but still doesn't really explain what the different test-sizes mean, but the code is available. The test results include Skylake, which is nearly the same as Haswell in this test.

@paulsm4's answer has a table for a multi-socket Nehalem Xeon, including some remote (other-socket) memory / L3 numbers.