Simple precedence grammar

{{Short description|Context-free formal grammar}} {{Technical|date=December 2024}} In computer science, a '''simple precedence grammar''' is a context-free formal grammar that can be parsed with a simple precedence parser.<ref> The Theory of Parsing, Translation, and Compiling: Compiling, Alfred V. Aho, Jeffrey D. Ullman, Prentice–Hall, 1972.</ref> The concept was first created in 1964 by Claude Pair,<ref>{{cite journal|author=Claude Pair |title = Arbres, piles et compilation|year=1964|journal=Revue française de traitement de l'information}}, in English ''Trees, stacks and compiling''</ref> and was later rediscovered, from ideas due to Robert Floyd, by Niklaus Wirth and Helmut Weber who published a paper, entitled ''EULER: a generalization of ALGOL, and its formal definition'', published in 1966 in the Communications of the ACM.<ref>{{citation |title=Machines, Languages, and Computation |publisher=Prentice–Hall |year=1978 |isbn=9780135422588 |quote=Wirth and Weber [1966] generalized Floyd's precedence grammars, obtaining the simple precedence grammars. |url-access=registration |url=https://archive.org/details/machineslanguage00denn }}</ref>

==Formal definition==

G = (''N'', Σ, ''P'', ''S'') is a simple precedence grammar if all the production rules in ''P'' comply with the following constraints:

* There are no erasing rules (ε-productions) * There are no useless rules (unreachable symbols or unproductive rules) * For each pair of symbols ''X'', ''Y'' (''X'', ''Y'' <math>\in</math> (''N'' ∪ Σ)) there is only one Wirth–Weber precedence relation. * G is uniquely inversible

==Examples== : <math>S \to aSSb | c</math>

;precedence table: <math>\begin{array}{c|ccccc} & S& a& b& c & \$ \\ \hline S& \dot =& \lessdot & \dot = & \lessdot& \\ a& \dot =& \lessdot& & \lessdot& \\ b& & \gtrdot& & \gtrdot& \gtrdot \\ c& & \gtrdot& \gtrdot& \gtrdot& \gtrdot \\ \$& & \lessdot& & \lessdot& \end{array}</math>

==Simple precedence parser== A '''simple precedence parser''' is a type of bottom-up parser for context-free grammars that can be used only by simple precedence grammars.

The implementation of the parser is quite similar to the generic bottom-up parser. A stack is used to store a viable prefix of a sentential form from a rightmost derivation. The symbols ⋖, ≐ and ⋗ are used to identify the '''pivot''', and to know when to '''Shift''' or when to '''Reduce'''.

===Implementation=== * Compute the Wirth–Weber precedence relationship table for a grammar with initial symbol S. * Initialize a stack with the '''starting marker''' $. * Append an '''ending marker''' $ to the string being parsed ('''Input'''). * Until Stack equals "$ S" and Input equals "$" ** Search the table for the relationship between Top(stack) and NextToken(Input) ** if the relationship is ⋖ or ≐ *** '''Shift''': *** Push(Stack, relationship) *** Push(Stack, NextToken(Input)) *** RemoveNextToken(Input) ** if the relationship is ⋗ *** '''Reduce''': *** SearchProductionToReduce(Stack) *** Remove the Pivot from the Stack *** Search the table for the relationship between the nonterminal from the production and first symbol in the stack (Starting from top) *** Push(Stack, relationship) *** Push(Stack, Non terminal)

SearchProductionToReduce (Stack) * Find the topmost ⋖ in the stack; this and all the symbols above it are the '''Pivot'''. * Find the production of the grammar which has the '''Pivot''' as its right side.

===Example===

Given following language, which can parse arithmetic expressions with the multiplication and addition operations: <pre> E --> E + T' | T' T' --> T T --> T * F | F F --> ( E' ) | num E' --> E </pre>

'''num''' is a terminal, and the lexer parse any integer as '''num'''; '''E''' represents an arithmetic expression, '''T''' is a term and '''F''' is a factor.

and the Parsing table:

{| class="wikitable" ! | ||E|| E' || T || T' || F || + ||*||(||)||num || $ |- ! E | || || || || ||≐|| || ||⋗ || || |- ! E' | || || || || || || || ||≐|| || |- ! T | || || || || ||⋗||≐|| ||⋗|| ||⋗ |- ! T' | || || || || ||⋗|| || ||⋗|| ||⋗ |- ! F | || || || || ||⋗||⋗|| ||⋗|| ||⋗ |- ! + | || ||⋖||≐||⋖|| || ||⋖|| ||⋖ || |- ! * | || || || ||≐|| || ||⋖|| ||⋖ || |- ! ( |⋖||≐||⋖||⋖||⋖|| || ||⋖|| ||⋖ || |- ! ) | || || || || ||⋗||⋗|| ||⋗|| ||⋗ |- ! num | || || || || ||⋗||⋗||||⋗|| ||⋗ |- ! $ |⋖|| ||⋖||⋖||⋖|| || ||⋖|| ||⋖ || |}

{{aligned table|cols=4|row1header=Y|class=wikitable|style=font-family:monospace|col3align=right | STACK | PRECEDENCE | INPUT | ACTION

| $ | ⋖ | 2 * ( 1 + 3 )$ | SHIFT | $ ⋖ 2 | ⋗ | * ( 1 + 3 )$ | REDUCE (F -> num) | $ ⋖ F | ⋗ | * ( 1 + 3 )$ | REDUCE (T -> F) | $ ⋖ T | ≐ | * ( 1 + 3 )$ | SHIFT | $ ⋖ T ≐ * | ⋖ | ( 1 + 3 )$ | SHIFT | $ ⋖ T ≐ * ⋖ ( | ⋖ | 1 + 3 )$ | SHIFT | $ ⋖ T ≐ * ⋖ ( ⋖ 1 | ⋗ | + 3 )$ | REDUCE 4× (F -> num) (T -> F) (T' -> T) (E ->T ') | $ ⋖ T ≐ * ⋖ ( ⋖ E | ≐ | + 3 )$ | SHIFT | $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ + | ⋖ | 3 )$ | SHIFT | $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ + < 3 | ⋗ | )$ | REDUCE 3× (F -> num) (T -> F) (T' -> T) | $ ⋖ T ≐ * ⋖ ( ⋖ E ≐ + ≐ T | ⋗ | )$ | REDUCE 2× (E -> E + T) (E' -> E) | $ ⋖ T ≐ * ⋖ ( ≐ E' | ≐ | )$ | SHIFT | $ ⋖ T ≐ * ⋖ ( ≐ E' ≐ ) | ⋗ | $ | REDUCE (F -> ( E' )) | $ ⋖ T ≐ * ≐ F | ⋗ | $ | REDUCE (T -> T * F) | $ ⋖ T | ⋗ | $ | REDUCE 2× (T' -> T) (E -> T') | $ ⋖ E | | $ | ACCEPT }}

==Wirth–Weber precedence relationship== In computer science, a ''Wirth–Weber relationship'' between a pair of symbols <math>(V_t \cup V_n)</math> is necessary to determine if a formal grammar is a simple precedence grammar. In such a case, the simple precedence parser can be used. The relationship is named after computer scientists Niklaus Wirth and Helmut Weber.

The goal is to identify when the viable prefixes have the ''pivot'' and must be reduced. A <math>\gtrdot</math> means that the ''pivot'' is found, a <math>\lessdot</math> means that a potential ''pivot'' is starting, and a <math> \doteq</math> means that a relationship remains in the same ''pivot''. {{TOC right}}

===Formal definition=== : <math> G = \langle V_n, V_t, S, P \rangle </math> :: <math>\begin{align} X \doteq Y &\iff \begin{cases} A \to \alpha X Y \beta \in P \\ A \in V_n \\ \alpha , \beta \in (V_n \cup V_t)^* \\ X, Y \in (V_n \cup V_t) \end{cases} \\ X \lessdot Y &\iff \begin{cases} A \to \alpha X B \beta \in P \\ B \Rightarrow^+ Y \gamma \\ A, B \in V_n \\ \alpha , \beta, \gamma \in (V_n \cup V_t)^* \\ X, Y \in (V_n \cup V_t) \end{cases} \\ X \gtrdot Y &\iff \begin{cases} A \to \alpha B Y \beta \in P \\ B \Rightarrow^+ \gamma X \\ Y \Rightarrow^* a \delta \\ A, B \in V_n \\ \alpha , \beta, \gamma, \delta \in (V_n \cup V_t)^* \\ X, Y \in (V_n \cup V_t) \\ a \in V_t \end{cases} \end{align}</math>

===Precedence relations computing algorithm=== We will define three sets for a symbol: : <math>\begin{align} \mathrm{Head}^+(X) &= \{Y\mid X \Rightarrow^+ Y \alpha \}\\ \mathrm{Tail}^+(X) &= \{Y\mid X \Rightarrow^+ \alpha Y \}\\ \mathrm{Head}^*(X) &= (\mathrm{Head}^+(X) \cup \{ X \}) \cap V_t \end{align}</math> ::Head*(''X'') is ''X'' if ''X'' is a terminal, and if ''X'' is a non-terminal, Head*(''X'') is the set with only the terminals belonging to Head+(''X''). This set is equivalent to '''First-set''' or '''Fi(''X'')''' described in LL parser. ::Head+(''X'') and Tail+(''X'') are ∅ if ''X'' is a terminal.

The pseudocode for computing relations is:

* RelationTable := ∅ * For each production <math> A \to \alpha \in P </math> ** For each two adjacent symbols {{mvar|X Y}} in {{mvar|α}} *** add(RelationTable, <math>X \doteq Y</math>) *** add(RelationTable, <math>X \lessdot \mathrm{Head}^+(Y)</math>) *** add(RelationTable, <math>\mathrm{Tail}^+(X) \gtrdot \mathrm{Head}^*(Y)</math>) * add(RelationTable, <math>\$ \lessdot \mathrm{Head}^+(S)</math>) where {{mvar|S}} is the initial non terminal of the grammar, and $ is a limit marker * add(RelationTable, <math>\mathrm{Tail}^+(S) \gtrdot \$ </math>) where {{mvar|S}} is the initial non terminal of the grammar, and $ is a limit marker

:<math>\lessdot</math> and <math>\gtrdot</math> are used with sets instead of elements as they were defined, in this case you must add all the cartesian product between the sets/elements.

=== Example 1 === <math>S \to aSSb | c</math>

* Head{{sup|+}}(''a'') = ∅ * Head{{sup|+}}(''S'') = {''a, c''} * Head{{sup|+}}(''b'') = ∅ * Head{{sup|+}}(''c'') = ∅ * Tail{{sup|+}}(''a'') = ∅ * Tail{{sup|+}}(''S'') = {''b, c''} * Tail{{sup|+}}(''b'') = ∅ * Tail{{sup|+}}(''c'') = ∅ * Head{{sup|*}}(''a'') = ''a'' * Head{{sup|*}}(''S'') = {''a, c''} * Head{{sup|*}}(''b'') = ''b'' * Head{{sup|*}}(''c'') = ''c'' * <math>S \to aSSb</math> ** ''a'' Next to ''S'' *** <math> a \doteq S </math> *** <math> a \lessdot \mathrm{Head}^+(S)</math> **** <math> a \lessdot a </math> **** <math> a \lessdot c </math> ** ''S'' Next to ''S'' *** <math> S \doteq S </math> *** <math> S \lessdot \mathrm{Head}^+(S)</math> **** <math> S \lessdot a </math> **** <math> S \lessdot c </math> *** <math>\mathrm{Tail}^+(S) \gtrdot \mathrm{Head}^*(S)</math> **** <math> b \gtrdot a </math> **** <math> b \gtrdot c </math> **** <math> c \gtrdot a </math> **** <math> c \gtrdot c </math> ** ''S'' Next to ''b'' *** <math> S \doteq b </math> *** <math>\mathrm{Tail}^+(S) \gtrdot \mathrm{Head}^*(b)</math> **** <math> b \gtrdot b </math> **** <math> c \gtrdot b </math> * <math>S \to c</math> ** there is only one symbol, so no relation is added.

;precedence table: <math>\begin{array}{c|ccccc} & S & a & b & c & \$ \\ \hline S & \doteq & \lessdot & \doteq & \lessdot & \\ a & \doteq & \lessdot & & \lessdot & \\ b & & \gtrdot & \gtrdot & \gtrdot & \gtrdot \\ c & & \gtrdot & \gtrdot & \gtrdot & \gtrdot \\ \$ & & \lessdot & & \lessdot & \end{array}</math>

=== Example 2 === <math>S \to a | aT | [S]</math>

* Head{{sup|+}}( ''S'' ) = { ''a, ['' } * Head{{sup|+}}( ''a'' ) = ∅ * Head{{sup|+}}( ''T'' ) = { ''b'' } * Head{{sup|+}}( ''['' ) = ∅ * Head{{sup|+}}( '']'' ) = ∅ * Head{{sup|+}}( ''b'' ) = ∅

* Tail{{sup|+}}( ''S'' ) = { ''a, T, ], b'' } * Tail{{sup|+}}( ''a'' ) = ∅ * Tail{{sup|+}}( ''T'' ) = { ''b, T'' } * Tail{{sup|+}}( ''['' ) = ∅ * Tail{{sup|+}}( '']'' ) = ∅ * Tail{{sup|+}}( ''b'' ) = ∅

* Head{{sup|*}}( ''S'' ) = { ''a, ['' } * Head{{sup|*}}( ''a'' ) = ''a'' * Head{{sup|*}}( ''T'' ) = { ''b'' } * Head{{sup|*}}( ''['' ) = ''['' * Head{{sup|*}}( '']'' ) = '']'' * Head{{sup|*}}( ''b'' ) = ''b''

* <math>S \to aT</math>

** ''a'' Next to ''T'' *** <math> a \doteq T </math> *** <math> a \lessdot \mathrm{Head}^+(T)</math> **** <math> a \lessdot b </math>

* <math>S \to [S]</math>

** ''['' Next to ''S'' *** <math> [ \doteq S </math> *** <math> [ \lessdot \mathrm{Head}^+(S)</math> **** <math> [ \lessdot a </math> **** <math> [ \lessdot [ </math>

** ''S'' Next to '']'' *** <math> S \doteq ] </math> *** <math>\mathrm{Tail}^+( S ) \gtrdot \mathrm{Head}^*( ] )</math> **** <math> a \gtrdot ] </math> **** <math> T \gtrdot ] </math> **** <math> ] \gtrdot ] </math> **** <math> b \gtrdot ] </math>

* <math>T \to bT</math>

** ''b'' Next to ''T'' *** <math> b \doteq T </math> *** <math> b \lessdot \mathrm{Head}^+(T)</math> **** <math> b \lessdot b </math>

;precedence table: <math>\begin{array}{c|ccccccc} & S & T & a & b & [ & ] & \$ \\ \hline S & & & & & & \doteq & \doteq \\ T & & & & & & \gtrdot & \gtrdot \\ a & & \doteq & & \lessdot & & \gtrdot & \gtrdot \\ b & & \doteq & & \lessdot & & \gtrdot & \gtrdot \\ \text{[} & \doteq & & \lessdot & & \lessdot & & \\ ] & & & & & & \gtrdot & \gtrdot \\ \$ & \doteq & & \lessdot & & \lessdot & & \end{array}</math>

==Notes== {{reflist}}

==References== * Alfred V. Aho, Jeffrey D. Ullman (1977). ''Principles of Compiler Design''. 1st Edition. Addison–Wesley. * William A. Barrett, John D. Couch (1979). ''Compiler construction: Theory and Practice''. Science Research Associate. * Jean-Paul Tremblay, P. G. Sorenson (1985). ''The Theory and Practice of Compiler Writing''. McGraw–Hill.

==Further reading== * {{citation |last1=Aho |first1=Alfred V. |last2=Ullman |first2=Jeffrey D. |title=The theory of parsing, translation, and compiling}}

==External links== * [http://www.cs.clemson.edu/course/cpsc827/material/Simple%20Precedence/Relations.pdf "Simple Precedence Relations"] at Clemson University

{{DEFAULTSORT:Simple Precedence Grammar}} Category:Formal languages