# Pointer analysis

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Pointer_analysis
> Markdown URL: https://mediated.wiki/source/Pointer_analysis.md
> Source: https://en.wikipedia.org/wiki/Pointer_analysis
> Source revision: 1355881397
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{short description|Determining what or where each pointer points to in program code}}

In [computer science](/source/computer_science), '''pointer analysis''', or '''points-to analysis''', is a [static code analysis](/source/static_code_analysis) technique that establishes which [pointer](/source/pointer_(computer_programming))s, or [heap](/source/Heap_(data_structure)) references, can point to which [variables](/source/Variable_(computer_science)), or [storage locations](/source/Memory_address).  It is often a component of more complex analyses such as [escape analysis](/source/escape_analysis).  A closely related technique is [shape analysis](/source/shape_analysis_(software)).

This is the most common colloquial use of the term. A secondary use has ''pointer analysis'' be the collective name for both '''points-to analysis''', defined as above, and [alias analysis](/source/alias_analysis). Points-to and alias analysis are closely related but not always equivalent problems.

==Example==

Consider the following C program:

<syntaxhighlight lang="c">
int *id(int* p) {
  return p;
}
void main(void) {
  int x;
  int y;
  int *u = id(&x);
  int *v = id(&y);
}
</syntaxhighlight>

A pointer analysis computes a mapping from pointer expressions to a set of allocation sites of objects they may point to. For the above program, an idealized, fully precise analysis would compute the following results:

{| class="wikitable"
|-
! Pointer expression !! Allocation site
|-
| <code>&x</code> || <code>main::x</code>
|-
| <code>&y</code> || <code>main::y</code>
|-
| <code>u</code> || <code>main::x</code>
|-
| <code>v</code> || <code>main::y</code>
|-
| <code>p</code> || <code>main::x</code>, <code>main::y</code>
|}

(Where <code>X::Y</code> represents the stack allocation holding the local variable <code>Y</code> in the function <code>X</code>.)

However, a context-insensitive analysis such as Andersen's or Steensgaard's algorithm would lose precision when analyzing the calls to <code>id</code>, and compute the following result:

{| class="wikitable"
|-
! Pointer expression !! Allocation site
|-
| <code>&x</code> || <code>main::x</code>
|-
| <code>&y</code> || <code>main::y</code>
|-
| <code>u</code> || <code>main::x</code>, <code>main::y</code>
|-
| <code>v</code> || <code>main::x</code>, <code>main::y</code>
|-
| <code>p</code> || <code>main::x</code>, <code>main::y</code>
|}

==Introduction==

As a form of static analysis, fully precise pointer analysis can be shown to be [undecidable](/source/Undecidable_problem).<ref>{{Cite journal|last=Reps|first=Thomas|date=2000-01-01|title=Undecidability of context-sensitive data-dependence analysis|journal=ACM Transactions on Programming Languages and Systems|volume=22|issue=1|pages=162–186|doi=10.1145/345099.345137|s2cid=2956433|issn=0164-0925|doi-access=free}}</ref> Most approaches are [sound](/source/Soundness), but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include:<ref>{{cite conference
| title=Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages
| author=Barbara G. Ryder
| year=2003
| book-title=Compiler Construction, 12th International Conference, CC 2003 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2003 Warsaw, Poland, April 7–11, 2003 Proceedings
| pages=126–137
|doi = 10.1007/3-540-36579-6_10| doi-access=free
}}</ref><ref>{{harv|Hind}}</ref>

* ''Field sensitivity'' (also known as ''structure sensitivity''): An analysis can either treat each field of a [struct](/source/Record_(computer_science)) or [object](/source/Object_(computer_science)) separately, or merge them.
* ''Array sensitivity'': An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries.
* ''Context sensitivity'' or ''[polyvariance](/source/polyvariance)'': Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point.
* ''Flow sensitivity'': An analysis can model the impact of intraprocedural control flow on points-to facts.
* ''Heap modeling'': Run-time allocations may be abstracted by: 
** their allocation sites (the statement or instruction that performs the allocation, e.g., a call to <code>malloc</code> or an object constructor),
** a more complex model based on a [shape analysis](/source/Shape_analysis_(software)),
** the type of the allocation, or
** one single allocation (this is called ''heap-insensitivity'').
* ''Heap cloning'': Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation.
* ''Subset constraints'' or ''equality constraints'': When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in [Steensgaard's algorithm](/source/Steensgaard's_algorithm)) can be tracked with a [union-find data structure](/source/union-find_data_structure), leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., [Andersen's algorithm](/source/Andersen's_algorithm)).

==Context-insensitive, flow-insensitive algorithms==

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to.<ref>{{cite conference
| url           =https://www.zyrianov.org/papers/ICPC19.pdf
| title         =srcPtr: A Framework for Implementing Static Pointer Analysis Approaches
| last1 = Zyrianov | first1 = Vlas
| last2 = Newman | first2 = Christian D.
| last3 = Guarnera | first3 = Drew T.
| last4 = Collard | first4 = Michael L.
| last5 = Maletic | first5 = Jonathan I.
| year          =2019
| book-title     =ICPC '19: Proceedings of the 27th IEEE International Conference on Program Comprehension
| location = Montreal, Canada
| publisher     =IEEE
}}
</ref> 

[Steensgaard's algorithm](/source/Steensgaard's_algorithm) and [Andersen's algorithm](/source/Andersen's_algorithm) are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in [https://github.com/SVF-tools/SVF SVF]
<ref>{{cite conference
| url = https://yuleisui.github.io/publications/cc16.pdf
| title = SVF: interprocedural static value-flow analysis in LLVM
| last1 = Sui | first1=Yulei
| last2 = Xue | first2=Jingling
| year = 2016
| book-title = CC'16: Proceedings of the 25th international conference on compiler construction
| publisher = ACM 
}}
</ref>
and [LLVM](/source/LLVM).

==Flow-insensitive approaches==

Many approaches to flow-insensitive pointer analysis can be understood as forms of [abstract interpretation](/source/abstract_interpretation), where heap allocations are abstracted by their allocation site (i.e., a program location).<ref>{{Cite book|last1=Smaragdakis|first1=Yannis|last2=Bravenboer|first2=Martin|last3=Lhoták|first3=Ondrej|title=Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages |chapter=Pick your contexts well |date=2011-01-26|chapter-url=https://doi.org/10.1145/1926385.1926390|series=POPL '11|location=Austin, Texas, USA|publisher=Association for Computing Machinery|pages=17–30|doi=10.1145/1926385.1926390|isbn=978-1-4503-0490-0|s2cid=6451826}}</ref>

alt=A diagram showing how pointer analysis abstracts runtime memory|thumb|Flow-insensitive pointer analyses often abstract possible runtime allocations by their allocation site. At runtime, this program creates three separate heap allocations. A flow-insensitive pointer analysis would treat these as a single abstract memory location, leading to a loss of precision.

Many flow-insensitive algorithms are specified in [Datalog](/source/Datalog), including those in the Soot analysis framework for Java.<ref>{{Cite book|last1=Antoniadis|first1=Tony|last2=Triantafyllou|first2=Konstantinos|last3=Smaragdakis|first3=Yannis|title=Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis |chapter=Porting doop to Soufflé |date=2017-06-18|chapter-url=https://doi.org/10.1145/3088515.3088522|series=SOAP 2017|location=Barcelona, Spain|publisher=Association for Computing Machinery|pages=25–30|doi=10.1145/3088515.3088522|isbn=978-1-4503-5072-3|s2cid=3074689}}</ref>

Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per ''context''.<ref>{{harv|Smaragdakis|Balatsouras|p=29}}</ref> Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types).<ref>{{Cite journal|last1=Thiessen|first1=Rei|last2=Lhoták|first2=Ondřej|date=2017-06-14|title=Context transformations for pointer analysis|url=https://doi.org/10.1145/3140587.3062359|journal=ACM SIGPLAN Notices|volume=52|issue=6|pages=263–277|doi=10.1145/3140587.3062359|issn=0362-1340|url-access=subscription}}</ref> To ensure termination (and more generally, scalability), such analyses generally use a ''k''-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed.<ref>{{harv|Li|Tan|Møller|Smaragdakis|pp=1:4}}</ref> Three common variants of context-sensitive, flow-insensitive analysis are:<ref>{{harv|Smaragdakis|Balatsouras}}</ref>

* Call-site sensitivity
* Object sensitivity
* Type sensitivity

===Call-site sensitivity===

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.

The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.
<syntaxhighlight lang="c">
int *id(int* p) {
  return p;
}
void main(void) {
  int x;
  int y;
  int *u = id(&x);  // main.3
  int *v = id(&y);  // main.4
}
</syntaxhighlight>
For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that {{var|p}} can point to either the allocation holding {{var|x}} or that of {{var|y}}, so {{var|u}} and {{var|v}} may alias, and both could point to either allocation:

{| class="wikitable"
|-
! Pointer expression !! Allocation site
|-
| <code>&x</code> || <code>main::x</code>
|-
| <code>&y</code> || <code>main::y</code>
|-
| <code>u</code> || <code>main::x</code>, <code>main::y</code>
|-
| <code>v</code> || <code>main::x</code>, <code>main::y</code>
|-
| <code>p</code> || <code>main::x</code>, <code>main::y</code>
|}

A callsite-sensitive analysis would analyze {{var|id}} twice, once for <code>main.3</code> and once for <code>main.4</code>, and the points-to facts for {{var|p}} would be qualified by the call-site, enabling the analysis to deduce that when {{var|main}} returns, {{var|u}} can only point to the allocation holding {{var|x}} and {{var|v}} can only point to the allocation holding {{var|y}}:

{| class="wikitable"
|-
! Context !! Pointer expression !! Allocation site
|-
| <code>[]</code> || <code>&x</code> || <code>main::x</code>
|-
| <code>[]</code> || <code>&y</code> || <code>main::y</code>
|-
| <code>[]</code> || <code>u</code> || <code>main::x</code>
|-
| <code>[]</code> || <code>v</code> || <code>main::y</code>
|-
| <code>[main.3]</code> || <code>p</code> || <code>main::x</code>
|-
| <code>[main.4]</code> || <code>p</code> || <code>main::y</code>
|}

===Object sensitivity===

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is ''non-syntactic'' or ''non-local'': the context entries are derived during the points-to analysis itself.<ref>{{harv|Smaragdakis|Balatsouras|p=37}}</ref>

===Type sensitivity===

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object.<ref>{{harv|Smaragdakis|Balatsouras|p=39}}</ref> This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

==References==
{{Reflist}}

==Bibliography==
*{{cite conference
| url           =https://www.zyrianov.org/papers/ICPC19.pdf
| title         =srcPtr: A Framework for Implementing Static Pointer Analysis Approaches
| last1 = Zyrianov | first1 = Vlas
| last2 = Newman | first2 = Christian D.
| last3 = Guarnera | first3 = Drew T.
| last4 = Collard | first4 = Michael L.
| last5 = Maletic | first5 = Jonathan I.
| year          =2019
| book-title     =ICPC '19: Proceedings of the 27th IEEE International Conference on Program Comprehension
| location = Montreal, Canada
| publisher     =IEEE
}}
*{{cite journal
 |last1=Smaragdakis
 |first1=Yannis
 |last2=Balatsouras
 |first2=George
 |date=2015
 |title=Pointer Analysis
 |url=https://yanniss.github.io/points-to-tutorial15.pdf
 |journal=Foundations and Trends in Programming Languages
 |volume=2
 |issue=1
 |pages=1–69
 |access-date=May 30, 2019
|doi=10.1561/2500000014
 |s2cid=207179267
 }}
*{{cite journal   
|last1=Li
|first1=Yue
|last2=Tan/
|first2=Tian
|last3=Møller
|first3=Anders
|last4=Smaragdakis
|first4=Yannis
|date=2020-05-18
|title=A Principled Approach to Selective Context Sensitivity for Pointer Analysis
|url=https://doi.org/10.1145/3381915
|journal=ACM Transactions on Programming Languages and Systems
|volume=42
|issue=2
|pages=10:1–10:40
|doi=10.1145/3381915
|s2cid=214812357
|issn=0164-0925
}}
* {{cite conference
| url           =http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/p54-hind.pdf
| title         =Pointer analysis: haven't we solved this problem yet?
| author        =Michael Hind
| year          =2001
| book-title     =PASTE '01: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
| pages         =54–61
| isbn          =1-58113-413-4
| publisher     =ACM
}}
*{{cite conference
| url           =http://cms.dc.uba.ar/materias/aap/2008/c2/descargas/pointsTo.pdf
| title         =Points-to analysis in almost linear time
| last = Steensgaard | first = Bjarne
| year          =1996
| book-title     =POPL '96: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
| pages         =32–41
| doi = 10.1145/237721.237727
| isbn          =0-89791-769-3
| location = New York, NY, USA
| publisher     =ACM
}}
* {{cite thesis 
| degree        =PhD
| title         =Program Analysis and Specialization for the C Programming Language
| url           =https://www.cs.cornell.edu/courses/cs711/2005fa/papers/andersen-thesis94.pdf
| author        =Andersen, Lars Ole
| year          =1994
}}

{{DEFAULTSORT:Pointer Analysis}}
{{Compiler optimizations}}
Category:Static program analysis
Category:Pointers (computer programming)

---
Adapted from the Wikipedia article [Pointer analysis](https://en.wikipedia.org/wiki/Pointer_analysis) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Pointer_analysis?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
