Processor consistency

{{Short description|Consistency model in concurrent computing}} '''Processor consistency''' is one of the consistency models used in the domain of concurrent computing (e.g. in distributed shared memory, distributed transactions, etc.).

A system exhibits processor consistency if the order in which other processors see the writes from any individual processor is the same as the order they were issued. Because of this, processor consistency is only applicable to systems with multiple processors. It is weaker than the causal consistency model because it does not require writes from ''all'' processors to be seen in the same order, but stronger than the PRAM consistency model because it requires cache coherence.<ref name=":2">{{cite journal | author = David Mosberger | title = Memory Consistency Models | publisher = University of Arizona | date = 1992 | url = https://www-vs.informatik.uni-ulm.de/teach/ss05/dsm/arizona.pdf | accessdate = 2015-04-01}}</ref> Another difference between causal consistency and processor consistency is that processor consistency removes the requirements for loads to wait for stores to complete, and for Write Atomicity.<ref name=":2" /> Processor consistency is also stronger than cache consistency because processor consistency requires all writes by a processor to be seen in order, not just writes to the same memory location.<ref name=":2" />

== Examples of processor consistency ==

{| border="1" class="wikitable floatright" |+ Example 1: Processor Consistent ! P{{sub|1}} | W(x)1 | W(x)3 | | |- ! P{{sub|2}} | | | R(x)1 | R(x)3 |- ! P{{sub|3}} | W(y)1 | W(y)2 | | |- ! P{{sub|4}} | | | R(y)1 | R(y)2 |}

{| border="1" class="wikitable floatright" |+ Example 2: Not Processor Consistent ! P{{sub|1}} | W(x)1 | W(x)3 | | |- ! P{{sub|2}} | | | R(x)3 | R(x)1 |- ! P{{sub|3}} | W(y)1 | W(y)2 | | |- ! P{{sub|4}} | | | R(y)2 | R(y)1 |}

In Example 1 to the right, the simple system follows processor consistency, as all the writes by each processor are seen in the order they occurred in by the other processors, and the transactions are coherent.

Example 2 is ''not'' processor consistent, as the writes by P1 and P3 are seen out of order by P2 and P4 respectively.

Example 3 is processor consistent and ''not'' causally consistent because {{nowrap| <code>R(y)3,R(x)1</code> }} in P3: for causal consistency it should be {{Nowrap| <code>R(y)3,R(x)2</code> }} since W(x)2 in P1 causally precedes W(y)3 in P2.

Example 4 is ''not'' processor consistent because {{Nowrap| <code>R(y)3,R(x)1</code> }} in P2: for processor consistency it should be {{Nowrap| <code>R(y)3,R(x)2</code> }} because W(x)2 is the latest write to x preceding W(y)3 in P1.<br> This example cache consistent because P2 sees writes to individual memory locations in the order they were issued in P1.

{| border="1" class="wikitable" |+ Example 3: Causal: No; Processor: Yes ! P{{sub|1}} | W(x)1 | W(x)2 | | | | |- ! P{{sub|2}} | | | R(x)2 | W(y)3 | | |- ! P{{sub|3}} | | | | | R(y)3 | R(x)1 |}

{| border="1" class="wikitable" |+ Example 4: Processor: No; Cache: Yes ! P{{sub|1}} | W(x)1 | W(x)2 | W(y)3 | | |- ! P{{sub|2}} | | | | R(y)3 | R(x)1 |}

== Processor consistency vs. sequential consistency ==

Processor consistency (PC) relaxes the ordering between older stores and younger loads that is enforced in sequential consistency (SC).<ref name=":1">{{cite book |author1=Kourosh Gharachorloo |author2=Daniel Lenoski |author3=James Laudon |author4=Phillip Gibbons |author5=Anoop Gupta |author6=John Hennessy |title=25 years of the international symposia on Computer architecture (Selected papers) |chapter=Memory consistency and event ordering in scalable shared-memory multiprocessors | publisher = ACM | date = 1 August 1998 |pages=376–387 |doi=10.1145/285930.285997 |isbn=1581130589 |s2cid=47089892 | chapter-url = http://dl.acm.org/citation.cfm?id=285997&CFID=494829542&CFTOKEN=96526574 | chapter-format = PDF | accessdate = 2015-04-01}}</ref> This allows loads to be issued to the cache and potentially complete before older stores, meaning that stores can be queued in a write buffer without the need for load speculation to be implemented (the loads can continue freely).<ref name=":0">{{cite book|last1=Solihin|first1=Yan|title=Fundamentals of parallel computer architecture : multichip and multicore systems|date=2009|publisher=Solihin Pub.|isbn=978-0-9841630-0-7|pages=297–299}}</ref> In this regard, PC performs better than SC because recovery techniques for failed speculations are not necessary, which means fewer pipeline flushes.<ref name=":0" /> The prefetching optimization that SC systems employ is also applicable to PC systems.<ref name=":0" /> ''Prefetching'' is the act of fetching data in advance for upcoming loads and stores before it is actually needed, to cut down on load/store latency. Since PC reduces load latency by allowing loads to be re-ordered before corresponding stores, the need for prefetching is somewhat reduced, as the prefetched data will be used more for stores than for loads.<ref name=":0" />

== Programmer's intuition ==

In terms of how well a PC system follows a programmer's intuition, it turns out that in properly synchronized systems, the outcomes of PC and SC are the same.<ref name=":0" /> Programmer's intuition is essentially how the programmer expects the instructions to execute, usually in what is referred to as "program order". Program order in a multiprocessor system is the execution of instructions resulting in the same outcome as a sequential execution. The fact that PC and SC both follow this expectation is a direct consequence of the fact that corresponding loads and stores in PC systems are still ordered with respect to each other.<ref name=":0" /> For example, in lock synchronization, the only operation whose behavior is not fully defined by PC is the lock-acquire store, where subsequent loads are in the critical section and their order affects the outcome.<ref name=":0" /> This operation, however, is usually implemented with a store conditional or atomic instruction, so that if the operation fails it will be repeated later and all the younger loads will also be repeated.<ref name=":0" /> All loads occurring before this store are still ordered with respect to the loads occurring in the critical section, and as such all the older loads have to complete before loads in the critical section can run.

== Processor consistency vs. other relaxed consistency models ==

Processor consistency, while weaker than sequential consistency, is still in most cases a stronger consistency model than is needed. This is due to the number of synchronization points inherent to programs that run on multiprocessor systems.<ref name=":3" /> This means that no data races can occur (a data race being multiple simultaneous accesses to memory location where at least one access is a write).<ref name=":0" /> With this in mind, it is clear to see that a model could allow for reorganization of all memory operations, as long as no operation crosses a synchronization point<ref name=":0" /> and one does, called Weak Ordering. However, weak ordering does impose some of the same restrictions as processor consistency, namely that the system must remain coherent and thus all writes to the same memory location must be seen by all processors in the same order.<ref name=":3" /> Similar to weak ordering, the release consistency model allows reordering of all memory operations, but it gets even more specific and breaks down synchronization operations to allow more relaxation of reorders.<ref name=":0" /> Both of these models assume proper synchronization of code and in some cases hardware synchronization support, and so processor consistency is a safer model to adhere to if one is unsure about the reliability of the programs to be run using the model.

== Similarity to SPARC V8 TSO, IBM-370, and x86-TSO memory models ==

One of the main components of processor consistency is that if a write followed by a read is allowed to execute out of program order. This essentially results in the hiding of write latency when loads are allowed to go ahead of stores. Since many applications function correctly with this structure, systems that implement this type of relaxed ordering typically appear sequentially consistent. Two other models that conform to this specification are the SPARC V8 TSO (Total Store Ordering) and the IBM-370.<ref name=":3">{{cite journal | author = Kourosh Gharachorloo | title = Memory Consistency Models for Shared-Memory Multiprocessors | publisher = Western Research Laboratory | date = 1995 | url = http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-9.pdf | accessdate = 2015-04-07}}</ref>

The IBM-370 model follows the specification of allowing a write followed by a read to execute out of program order, with a few exceptions. The first is that if the operations are to the same location, they must be in program order. The second is that if either operation is part of a serialization instruction or there is a serialization instruction between the two operations, then the operations must execute in program order.<ref name=":3" /> This model is perhaps the strictest of the three models being considered, as the TSO model removes one of the exceptions mentioned.

The SPARC V8 TSO model is very similar to the IBM-370 model with the key difference that it allows operations to the same location to complete out of program order. With this, it is possible that a load returns a store that occurred that is "out of date" in terms of program order.<ref name=":3" /> These models are similar to processor consistency, but whereas these models only have one copy of memory, processor consistency has no such restriction. This suggests a system in which each processor has its own memory, which emphasizes upon processor consistency the "coherence requirement.<ref name=":3" />"

The x86-TSO model has a number of different definitions. The total store model, as the name suggests, is very similar to the SPARC V8. The other definition is based on local write buffers. The differences in the x86 and SPARC TSO models is in the omission of some instructions and inclusion of others, but the models themselves are very similar.<ref name=":4">{{cite journal |author1=Scott Owens |author2=Susmit Sarkar |author3=Peter Sewell | title = A better x86 memory model: x86-TSO (extended version) | publisher = University of Cambridge | date = 2009 |doi=10.48456/tr-745 | url = http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-745.html | format = PDF | accessdate = 2015-04-08}}</ref> The write buffer definition utilizes various states and locks to determine whether a particular value can be read/written to. In addition, this particular model for the x86 architecture is not plagued by the issues of previous (weaker consistency) models, and provides a more intuitive base for programmers to build upon.<ref name=":4" />

== See also ==

* Serializability

==References==

Category:Consistency models