# Complexity function

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Complexity_function
> Markdown URL: https://mediated.wiki/source/Complexity_function.md
> Source: https://en.wikipedia.org/wiki/Complexity_function
> Source revision: 1314812973
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{Short description|Function that counts distinct factors of a string}}
In [computer science](/source/computer_science), the '''complexity function''' of a ''word'' or ''[string](/source/string_(computer_science))'' (a finite or infinite sequence of symbols from some [alphabet](/source/alphabet_(formal_languages))) is the function that counts the number of distinct ''factors'' (substrings of consecutive symbols) of that string. More generally, the complexity function of a [formal language](/source/formal_language) (a set of finite strings) counts the number of distinct words of given length.

==Complexity function of a word==
Let ''u'' be a (possibly infinite) sequence of symbols from an alphabet.  Define the function
''p''<sub>''u''</sub>(''n'') of a positive integer ''n'' to be the number of different factors (consecutive substrings) of length ''n'' from the string&nbsp;''u''.<ref name=L7>Lothaire (2011) p.7</ref><ref name=L46/><ref name=PF3>Pytheas Fogg (2002) p.3</ref><ref name=BLRS82>Berstel et al (2009) p.82</ref><ref name=AS298>Allouche & Shallit (2003) p.298</ref>

For a string ''u'' of length at least ''n'' over an alphabet of size ''k'' we clearly have 
:<math> 1 \le p_u(n) \le k^n \ , </math>
the bounds being achieved by the constant word and a [disjunctive word](/source/disjunctive_word),<ref name=Bug91>Bugeaud (2012) p.91</ref> for example, the [Champernowne word](/source/Champernowne_word) respectively.<ref name=CN165>Cassaigne & Nicolas (2010) p.165</ref>  For infinite words ''u'', we have ''p''<sub>''u''</sub>(''n'') bounded if ''u'' is ultimately periodic (a finite, possibly empty, sequence followed by a finite cycle).  Conversely, if ''p''<sub>''u''</sub>(''n'') ≤ ''n'' for some ''n'', then ''u'' is ultimately periodic.<ref name=PF3/><ref name=AS302>Allouche & Shallit (2003) p.302</ref>

An '''aperiodic sequence''' is one which is not ultimately periodic.  An aperiodic sequence has strictly increasing complexity function (this is the '''[Morse–Hedlund theorem](/source/Morse%E2%80%93Hedlund_theorem)'''),<ref name=BR166/><ref name=CN166>Cassaigne & Nicolas (2010) p.166</ref> so ''p''(''n'') is at least ''n''+1.<ref name=L22>Lothaire (2011) p.22</ref>

A set ''S'' of finite binary words is ''balanced'' if for each ''n'' the subset ''S''<sub>''n''</sub> of words of length ''n'' has the property that the [Hamming weight](/source/Hamming_weight) of the words in ''S''<sub>''n''</sub> takes at most two distinct values.  A '''balanced sequence''' is one for which the set of factors is balanced.<ref name=AS313>Allouche & Shallit (2003) p.313</ref>  A balanced sequence has complexity function at most ''n''+1.<ref name=L48>Lothaire (2011) p.48</ref>

A [Sturmian word](/source/Sturmian_word) over a binary alphabet is one with complexity function ''n''&nbsp;+&nbsp;1.<ref name=PF6>Pytheas Fogg (2002) p.6</ref>  A sequence is Sturmian [if and only if](/source/if_and_only_if) it is balanced and aperiodic.<ref name=L46>Lothaire (2011) p.46</ref><ref name=AS318>Allouche & Shallit (2003) p.318</ref>  An example is the [Fibonacci word](/source/Fibonacci_word).<ref name=PF6/><ref name=deluca1995>{{cite journal |  journal=Information Processing Letters | year=1995 | pages=307–312 | doi=10.1016/0020-0190(95)00067-M | title=A division property of the Fibonacci word | first=Aldo | last=de Luca |  volume=54 |  issue=6 }}</ref>  More generally, a Sturmian word over an alphabet of size ''k'' is one with complexity ''n''+''k''−1.  An Arnoux-Rauzy word over a ternary alphabet has complexity 2''n''&nbsp;+&nbsp;1:<ref name=PF6/> an example is the [Tribonacci word](/source/Tribonacci_word).<ref name=PF368>Pytheas Fogg (2002) p.368</ref>

For [recurrent word](/source/recurrent_word)s, those in which each factor appears infinitely often, the complexity function almost characterises the set of factors: if ''s'' is a recurrent word with the same complexity function as ''t'' are then ''s'' has the same set of factors as ''t'' or δ''t'' where δ denotes the letter doubling morphism ''a'' → ''aa''.<ref name=BLRS84>Berstel et al (2009) p.84</ref>

==Complexity function of a language==
Let ''L'' be a language over an alphabet and define the function ''p''<sub>''L''</sub>(''n'') of a positive integer ''n'' to be the number of different words of length ''n'' in ''L''<ref name=BR166>Berthé & Rigo (2010) p.166</ref>  The complexity function of a word is thus the complexity function of the language consisting of the factors of that word.

The complexity function of a language is less constrained than that of a word.  For example, it may be bounded but not eventually constant: the complexity function of the [regular language](/source/regular_language) <math>a(bb)^*a</math> takes values 3 and 4 on odd and even ''n''≥2 respectively.  There is an analogue of the Morse–Hedlund theorem: if the complexity of ''L'' satisfies ''p''<sub>''L''</sub>(''n'')  ≤ ''n'' for some ''n'', then ''p''<sub>''L''</sub> is bounded and there is a finite language ''F'' such that<ref name=BR166/>

:<math>L \subseteq \{ x y^k z : x,y,z \in F,\ k \in \mathbb{N} \} \ . </math>

A '''polynomial''' or '''[sparse language](/source/sparse_language)''' is one for which the complexity function ''p''(''n'') is bounded by a fixed power of ''n''.  A [regular language](/source/regular_language) which is not polynomial is ''exponential'': there are infinitely many ''n'' for which ''p''(''n'') is greater than ''k''<sup>''n''</sup> for some fixed ''k'' > 1.<ref name=BR136>Berthé & Rigo (2010) p.136</ref>

==Related concepts==
{{see also|Linguistic sequence complexity}}
The ''[topological entropy](/source/topological_entropy)'' of an infinite sequence ''u'' is defined by

:<math>H_{\mathrm{top}}(u) = \lim_{n \rightarrow \infty} \frac{\log p_u(n)}{n \log k} \ . </math>

The limit exists as the logarithm of the complexity function is [subadditive](/source/Subadditivity).<ref name=PF4>Pytheas Fogg (2002) p.4</ref><ref name=AS303>Allouche & Shallit (2003) p.303</ref>  Every [real number](/source/real_number) between 0 and 1 occurs as the topological entropy of some sequence is applicable,<ref name=CN169>Cassaigne & Nicolas (2010) p.169</ref> which may be taken to be [uniformly recurrent](/source/Uniformly_recurrent_word)<ref name=BR391>Berthé & Rigo (2010) p.391</ref> or even uniquely ergodic.<ref name=BR169>Berthé & Rigo (2010) p.169</ref>

For ''x'' a real number and ''b'' an [integer](/source/integer) ≥ 2 then the complexity function of ''x'' in base ''b'' is the complexity function ''p''(''x'',''b'',''n'') of the sequence of digits of ''x'' written in base ''b''.
If ''x'' is an [irrational number](/source/irrational_number) then ''p''(''x'',''b'',''n'') ≥ ''n''+1; if ''x'' is [rational](/source/rational_number) then ''p''(''x'',''b'',''n'') ≤ ''C'' for some constant ''C'' depending on ''x'' and ''b''.<ref name=Bug91>Bugeaud (2012) p.91</ref>  It is conjectured that for algebraic irrational ''x'' the complexity is ''b''<sup>''n''</sup> (which would follow if all such numbers were [normal](/source/Normal_number)) but all that is known in this case is that ''p'' grows faster than any linear function of ''n''.<ref name=BR414>Berthé & Rigo (2010) p.414</ref>

The '''abelian complexity function''' ''p''<sup>ab</sup>(''n'') similarly counts the number of occurrences of distinct factors of given length ''n'', where now we identify factors that differ only by a permutation of the positions.  Clearly ''p''<sup>ab</sup>(''n'') ≤ ''p''(''n'').  The abelian complexity of a Sturmian sequence satisfies ''p''<sup>ab</sup>(''n'') = 2.<ref>{{cite book | chapter=On the Asymptotic Abelian Complexity of Morphic Words | title=Developments in Language Theory. Proceedings, 17th International Conference, DLT 2013, Marne-la-Vallée, France, June 18-21, 2013 | pages=94–105 | year=2013 | doi=10.1007/978-3-642-38771-5_10 | isbn=978-3-642-38770-8 | series=Lecture Notes in Computer Science | volume=7907 | issn=0302-9743 | publisher=[Springer-Verlag](/source/Springer-Verlag) | location=Berlin, Heidelberg | editor1-first=Marie-Pierre | editor1-last=Béal | editor2-first=Olivier | editor2-last=Carton | author1-first=Francine | author1-last=Blanchet-Sadri | author2-first=Nathan | author2-last=Fox }}</ref>

==References==
{{reflist}}
*{{cite book | last1 = Allouche | first1 = Jean-Paul | last2 = Shallit | first2 = Jeffrey | author2-link = Jeffrey Shallit | isbn = 978-0-521-82332-6 | publisher = [Cambridge University Press](/source/Cambridge_University_Press) | title = Automatic Sequences: Theory, Applications, Generalizations | year = 2003 | zbl=1086.11015 }}
* {{cite book | last1=Berstel | first1=Jean | last2=Lauve | first2=Aaron | last3=Reutenauer | first3=Christophe | last4=Saliola | first4=Franco V. | title=Combinatorics on words. Christoffel words and repetitions in words | series=CRM Monograph Series | volume=27 | location=Providence, RI | publisher=[American Mathematical Society](/source/American_Mathematical_Society) | year=2009 | isbn=978-0-8218-4480-9 | url=https://www.ams.org/bookpages/crmm-27 | zbl=1161.68043 }}
* {{cite book | editor1-last=Berthé | editor1-first=Valérie|editor1-link= Valérie Berthé | editor2-last=Rigo | editor2-first=Michel | title=Combinatorics, automata, and number theory | series=Encyclopedia of Mathematics and its Applications | volume=135 | location=Cambridge | publisher=[Cambridge University Press](/source/Cambridge_University_Press) | year=2010 | isbn=978-0-521-51597-9 | zbl=1197.68006  }}
*{{cite book | last=Bugeaud | first=Yann | title=Distribution modulo one and Diophantine approximation | series=Cambridge Tracts in Mathematics | volume=193 | location=Cambridge | publisher=[Cambridge University Press](/source/Cambridge_University_Press) | year=2012 | isbn=978-0-521-11169-0 | zbl=1260.11001}}
* {{cite book | last1=Cassaigne | first1=Julien | last2=Nicolas | first2=François | chapter=Factor complexity | pages=163–247 | editor1-last=Berthé | editor1-first=Valérie|editor1-link= Valérie Berthé | editor2-last=Rigo | editor2-first=Michel | title=Combinatorics, automata, and number theory | series=Encyclopedia of Mathematics and its Applications | volume=135 | location=Cambridge | publisher=[Cambridge University Press](/source/Cambridge_University_Press) | year=2010 | isbn=978-0-521-51597-9 | zbl=1216.68204  }}
* {{cite book | last=Lothaire | first=M. | authorlink=M. Lothaire | title=Algebraic combinatorics on words | others=With preface by Jean Berstel and Dominique Perrin | edition=Reprint of the 2002 hardback | series=Encyclopedia of Mathematics and Its Applications | volume=90| publisher=Cambridge University Press | year=2011 | isbn=978-0-521-18071-9 | zbl=1221.68183 }}
* {{cite book | last=Pytheas Fogg | first=N. | editor1=Berthé, Valérie|editor1-link=Valérie Berthé|editor2=Ferenczi, Sébastien|editor3=Mauduit, Christian|editor4=Siegel, A. | title=Substitutions in dynamics, arithmetics and combinatorics | series=Lecture Notes in Mathematics | volume=1794 | location=Berlin | publisher=[Springer-Verlag](/source/Springer-Verlag) | year=2002 | isbn=3-540-44141-7 | zbl=1014.11015 }}

Category:Theoretical computer science

---
Adapted from the Wikipedia article [Complexity function](https://en.wikipedia.org/wiki/Complexity_function) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Complexity_function?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
