Yu. A. Nedbailo, A. V. Surchenko, I. N. Bychkov, “Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor”, Computer Research and Modeling, 15:3 (2023), 639

Computer Research and Modeling

RUS ENG

JOURNALS PEOPLE ORGANISATIONS CONFERENCES SEMINARS VIDEO LIBRARY PACKAGE AMSBIB

JavaScript is disabled in your browser. Please switch it on to enable full functionality of the website

	General information
	Latest issue
	Archive

	Search papers
	Search references

	RSS
	Latest issue
	Current issues
	Archive issues
	What is RSS

Computer Research and Modeling:
Year:
Volume:
Issue:
Page:
	Find

Personal entry:
Login:
Password:
	Save password
	Enter
	Forgotten password?
	Register

Computer Research and Modeling, 2023, Volume 15, Issue 3, Pages 639–656
DOI: https://doi.org/10.20537/2076-7633-2023-15-3-639-656 (Mi crm1080)

This article is cited in 1 scientific paper (total in 1 paper)

MODELS IN PHYSICS AND TECHNOLOGY

Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor

Yu. A. Nedbailo^ab, A. V. Surchenko^a, I. N. Bychkov^ab

^a MCST JSC, 108 Profsoyuznaya st., Moscow, 117437, Russia
^b INEUM im. I. S. Bruka, 24 Vavilova st., Moscow, 119334, Russia

Full-text PDF (1609 kB) Citations (1)

References:

PDF

HTML

DOI: https://doi.org/10.20537/2076-7633-2023-15-3-639-656

Abstract: Although the era of exponential performance growth in computer chips has ended, processor core numbers have reached 16 or more even in general-purpose desktop CPUs. As DRAM throughput is unable to keep pace with this computing power growth, CPU designers need to find ways of lowering memory traffic per instruction. The straightforward way to do this is to reduce the miss rate of the last-level cache. Assuming “non-inclusive cache, inclusive directory” (NCID) scheme already implemented, three ways of reducing the cache miss rate further were studied.
The first is to achieve more uniform usage of cache banks and sets by employing hash-based interleaving and indexing. In the experiments in SPEC CPU2017 refrate tests, even the simplest XOR-based hash functions demonstrated a performance increase of 3.2%, 9.1%, and 8.2% for CPU configurations with 16, 32, and 64 cores and last-level cache banks, comparable to the results of more complex matrix-, division- and CRC-based functions.
The second optimisation is aimed at reducing replication at different cache levels by means of automatically switching to the exclusive scheme when it appears optimal. A known scheme of this type, FLEXclusion, was modified for use in NCID caches and showed an average performance gain of 3.8%, 5.4%, and 7.9% for 16-, 32-, and 64-core configurations.
The third optimisation is to increase the effective cache capacity using compression. The compression rate of the inexpensive and fast BDI*-HL (Base-Delta-Immediate Modified, Half-Line) algorithm, designed for NCID, was measured, and the respective increase in cache capacity yielded roughly 1% of the average performance increase.
All three optimisations can be combined and demonstrated a performance gain of 7.7%, 16% and 19% for CPU configurations with 16, 32, and 64 cores and banks, respectively.

Keywords: multicore processor, memory subsystem, distributed shared cache, NCID, XOR-based hash function, data compression.

Received: 14.04.2023
Accepted: 03.05.2023

Document Type: Article

UDC: 004.318

Language: English

Citation: Yu. A. Nedbailo, A. V. Surchenko, I. N. Bychkov, “Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor”, Computer Research and Modeling, 15:3 (2023), 639–656

Citation in format AMSBIB

\Bibitem{NedSurByc23}

\by Yu.~A.~Nedbailo, A.~V.~Surchenko, I.~N.~Bychkov

\paper Reducing miss rate in a non-inclusive cache with inclusive directory of a chip multiprocessor

\jour Computer Research and Modeling

\yr 2023

\vol 15

\issue 3

\pages 639--656

\mathnet{http://mi.mathnet.ru/crm1080}

\crossref{https://doi.org/10.20537/2076-7633-2023-15-3-639-656}

Linking options:

https://www.mathnet.ru/eng/crm1080

https://www.mathnet.ru/eng/crm/v15/i3/p639

This publication is cited in the following 1 articles:

A. Surchenko, Yu. Nedbailo, “Hardware compression method for on-chip and interprocessor networks with wide channels and wormhole flow control policy”, Informatics and Automation, 23:3 (2024), 859–885

Citing articles in Google Scholar: Russian citations, English citations
Related articles in Google Scholar: Russian articles, English articles

Statistics & downloads:
Abstract page:	136
Full-text PDF :	52
References:	33

Registration to the website

Logotypes