False sharing

{{Short description|Performance-degrading usage pattern}} In computer science, '''false sharing''' is a performance-degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism. When a system participant attempts to periodically access data that is not being altered by another party, but that data shares a cache block with data that ''is'' being altered, the caching protocol may force the first participant to reload the whole cache block despite a lack of logical necessity.<ref name="Patterson 2012 p. 537">{{cite book | last=Patterson | first=David | title=Computer organization and design: the hardware/software interface | publisher=Morgan Kaufmann | publication-place=Waltham, MA | year=2012 | isbn=978-0-12-374750-1 | oclc=746618653 | page=537}}</ref> The caching system is unaware of activity within this block and forces the first participant to bear the caching system overhead required by true shared access of a resource.

==Multiprocessor CPU caches== By far the most common usage of this term is in modern multiprocessor CPU caches, where memory is cached in lines of some small power of two word size (e.g., 64 aligned, contiguous bytes). If two processors operate on independent data in the same memory address region storable in a single line, the cache coherency mechanisms in the system may force the whole line across the bus or interconnect with every data write, forcing memory stalls in addition to wasting system bandwidth. In some cases, the elimination of false sharing can result in order-of-magnitude performance improvements.<ref name="Bolosky">{{cite journal |last1=Bolosky |first1=William J. |last2=Scott |first2=Michael L. |title=False sharing and its effect on shared memory performance |journal=Sedms'93: USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems |date=1993-09-22 |volume=4 |url=https://www.usenix.org/legacy/publications/library/proceedings/sedms4/full_papers/bolosky.txt |access-date=11 July 2021}}</ref> False sharing is an inherent artifact of automatically synchronized cache protocols and can also exist in environments such as distributed file systems or databases, but current prevalence is limited to RAM caches.

==Example== <syntaxhighlight lang="cpp"> #include <atomic> #include <chrono> #include <iostream> #include <latch> #include <thread> #include <vector>

int main() { std::vector<std::jthread> threads; const int hc = std::jthread::hardware_concurrency(); constexpr int testLimit = 256; // for simplicity up most that many elements

for (int nThreads = 1; nThreads <= hc && nThreads <= testLimit; ++nThreads) { // precise measurement by starting all threads simultaneously std::latch sync(nThreads);

// some individual piece data for each individual thread struct { std::atomic_char mightBeShared; } globaldata[testLimit];

// mitigation: occupying a full cache line // struct alignas(64) { std::atomic_char mightBeShared; } globaldata[testLimit];

// sum of all threads execution times std::atomic_int64_t nsSum(0);

for (int t = 0; t != nThreads; ++t) { threads.emplace_back([&](int i) { sync.arrive_and_wait(); // sync beginning of thread execution on kernel-level

auto start = std::chrono::high_resolution_clock::now();

for (std::size_t r = 10'000'000; r--;) globaldata[i].mightBeShared.fetch_add(1);

nsSum += std::chrono::duration_cast<std::chrono::nanoseconds>( std::chrono::high_resolution_clock::now() - start ).count();

}, t); }

threads.clear(); // join all threads

std::cout << nThreads << ": " << static_cast<double>(nsSum / (1.0e7 * nThreads)) << std::endl; } } </syntaxhighlight>

This code shows the effect of false sharing. It creates an increasing number of threads from one thread to the number of physical threads in the system. Each thread sequentially increments one byte specific for that thread (i.e. not shared between them). The higher the level of contention between threads, the longer each increment takes, despite the fact that each thread is only incrementing its own piece of the global data. Those are the results on a Intel Core i7 12th generation system with 6 performance and 8 efficiency cores yielding 20 threads:

center|Scaling of false sharing

As one can see, concurrent access by those threads requires about 50 times more computing time than with mitigation where it approximately doubles.

==Mitigation== There are ways of mitigating the effects of false sharing. For instance, false sharing in CPU caches can be prevented by reordering variables or adding padding (unused bytes) between variables. However, some of these program changes may increase the size of the objects, leading to higher memory use.<ref name="Bolosky"/> Compile-time data transformations can also mitigate false-sharing.<ref name="Jeremiassen Eggers 1995 pp. 179–188">{{cite journal | last1=Jeremiassen | first1=Tor E. | last2=Eggers | first2=Susan J. | title=Reducing false sharing on shared memory multiprocessors through compile time data transformations | journal=ACM SIGPLAN Notices | publisher=Association for Computing Machinery (ACM) | volume=30 | issue=8 | year=1995 | issn=0362-1340 | doi=10.1145/209937.209955 | pages=179–188| doi-access=free }}</ref> However, some of these transformations may not always be allowed. For instance, the C++ programming language standard draft of C++23 mandates that data members must be laid out so that later members have higher addresses.<ref>{{cite web | title= Working Draft, Standard for Programming Language C++ [class] | website=eel.is | url=https://eel.is/c++draft/class | access-date=2021-07-11}}</ref>

There are tools for detecting false sharing.<ref>{{cite web | title=perf-c2c(1) | website=Linux manual page | date=2016-09-01 | url=https://man7.org/linux/man-pages/man1/perf-c2c.1.html | access-date=2021-08-08}}</ref><ref>{{cite conference | last1=Chabbi | first1=Milind | last2=Wen | first2=Shasha | last3=Liu | first3=Xu | title=Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming | chapter=Featherlight on-the-fly false-sharing detection | publisher=ACM | publication-place=New York, NY, USA | date=2018-02-10 | pages=152–167 | doi=10.1145/3178487.3178499 | isbn=9781450349826 }}</ref> There are also systems that both detect and repair false sharing in executing programs. However, these systems incur some execution overhead.<ref>{{cite conference | last1=Nanavati | first1=Mihir | last2=Spear | first2=Mark | last3=Taylor | first3=Nathan | last4=Rajagopalan | first4=Shriram | last5=Meyer | first5=Dutch T. | last6=Aiello | first6=William | last7=Warfield | first7=Andrew | title=Proceedings of the 8th ACM European Conference on Computer Systems | chapter=Whose cache line is it anyway? | publisher=ACM Press | publication-place=New York, New York, USA | year=2013 | pages=141–154 | doi=10.1145/2465351.2465366| isbn=9781450319942 }}</ref><ref>{{cite journal | last1=Liu | first1=Tongping | last2=Berger | first2=Emery D. | title=SHERIFF: precise detection and automatic mitigation of false sharing | journal=ACM SIGPLAN Notices | publisher=Association for Computing Machinery (ACM) | volume=46 | issue=10 | date=2011-10-18 | issn=0362-1340 | doi=10.1145/2076021.2048070 | pages=3–18}}</ref>

==References== {{reflist}}

==External links== * [https://parallelcomputing2017.wordpress.com/2017/03/17/understanding-false-sharing/ Easy Understanding on False Sharing] * [https://cpp-today.blogspot.com/2008/05/false-sharing-hits-again.html C++ today blog, False Sharing hits again!] * [http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206 Dr Dobbs article: Eliminate False Sharing] * [http://psy-lob-saw.blogspot.sg/2013/05/know-thy-java-object-memory-layout.html Be careful when trying to eliminate false sharing in Java]

Category:Cache coherency Category:Computer memory