Delimiter

{{short description|Character(s) for specifying the boundary between regions of data}} {{hatnote|This article is about delimiters in computing. For delimiters in human use, see Word divider and digit grouping.}} thumb|Depiction of data using comma as a field delimiter.|alt=

In computing, a '''delimiter''' is a character or a sequence of characters for specifying the boundary between separate, independent regions in data such as a text file or data stream.<ref>{{Cite web |url=https://www.its.bldrdoc.gov/fs-1037/dir-011/_1544.htm |title=Definition: delimiter|work=Federal Standard 1037C - Telecommunications: Glossary of Telecommunication Terms |access-date=2019-11-25 |archive-url=https://web.archive.org/web/20130305032313/https://www.its.bldrdoc.gov/fs-1037/dir-011/_1544.htm |archive-date=2013-03-05 |url-status=live}}</ref><ref>{{Cite web|title=What is a Delimiter?|url=https://www.computerhope.com/jargon/d/delimite.htm|access-date=2020-08-09|website=www.computerhope.com|language=en|archive-date=2020-11-16|archive-url=https://web.archive.org/web/20201116162706/https://www.computerhope.com/jargon/d/delimite.htm|url-status=live}}</ref> For context, data boundaries can be indicated via other means. For example, declarative notation indicates the length of a field at the start of the field instead of relying on delimiters.<ref name="hollerity">{{cite book | last = Rohl | first = Jeffrey S. | title = Programming in Fortran | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 1973 | isbn = 978-0-7190-0555-8 }} describing the method in Hollerith notation under the Fortran programming language.</ref>

In mathematics, delimiters are often used to specify the scope of an operation in an expression, and can occur both as isolated symbols (e.g., colon in "<math>1 : 4</math>") and as a pair of opposing-looking symbols (e.g., angled brackets in <math>\langle a, b \rangle</math>).

== Examples ==

Delimiters are used for a wide range of purposes. The following examples demonstrate a small fraction of their applicability.

===Tabular data=== Tabular data, organized as rows and columns, is often delimited. A field delimiter separates the columns of a row, with each column corresponding to a field in that row, and a record delimiter separates the rows, with each row corresponding to a record.<ref name="FldDelm">{{cite book | last = de Moor | first = Georges J. | title = Progress in Standardization in Health Care Informatics | publisher =IOS Press | year = 1993 | isbn =90-5199-114-2}} p. 141</ref> The commonly used comma-separated values (CSV) format uses a comma to delimit fields, and an newline to delimit records. The following CSV data represents three records each with four fields. The first line is metadata that names the fields.

<pre> fname,lname,age,salary nancy,davolio,33,$30000 erin,borakova,28,$25250 tony,raphael,35,$28700 </pre>

CSV data is an example of flat-file database.

===<u><ref name="Kalani000" /></u>Bracket delimiters=== Bracket delimiters, also called block delimiters, region delimiters, or balanced delimiters, mark the start and end of a region of text.<ref name="BalaDelm">{{cite book | last = Friedl | first = Jeffrey E. F. | title = Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools | publisher = O'Reilly | year = 2002| isbn = 0-596-00289-0}} p. 319</ref><ref name="Scott000">{{cite book | title = Programming Language Pragmatics | first = Michael Lee | last = Scott | publisher = Morgan Kaufmann | year = 1999 | isbn = 1-55860-442-1 }}</ref> Commonly used bracket delimiters include:<ref name="programmingperl">{{cite book | title=Programming Perl |edition=Third | publisher=O'Reilly |date=July 2000 | isbn=0-596-00027-8 | last1=Wall | first1=Larry | first2=Jon |last2=Orwant | author-link1=Larry Wall | author-link3=Jon Orwant }}</ref> {| class="wikitable" |- ! <code>(</code> <code>)</code> | Parentheses; Lisp code is cited as recognizable by its use of parentheses<ref name="Kaufmann000">{{cite book | title = Computer-Aided Reasoning: An Approach | first = Matt | last = Kaufmann | publisher = Springer | year = 2000 | isbn = 0-7923-7744-3 }}p. 3</ref> |- ! Delimiters ! style="text-align:left" | Description |- ! <code>{</code> <code>}</code> | Braces; also called curly brackets<ref name="curly_brace_cstyle">{{cite book | last = Meyer | first = Mark | title = Explorations in Computer Science | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2005 | isbn = 978-0-7637-3832-7 }} references C-style programming languages prominently featuring curly brackets and semicolons.</ref> |- ! <code>[</code> <code>]</code> | Square brackets; commonly used to denote a subscript |- ! <code><</code> <code>></code> | Angle brackets<ref name="id_1268443793898_27">{{cite book | last = Dilligan | first = Robert | title = Computing in the Web Age | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 1998 | isbn = 978-0-306-45972-6 }}Describes syntax and delimiters used in HTML.</ref> |- ! <code>"</code> <code>"</code> | Double quote; commonly used to denote a string literal<ref name="id_1268443910269_75">{{cite book | last = Schwartz | first = Randal |author-link=Randal Schwartz | title = Learning Perl | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2005 | isbn = 978-0-596-10105-3 | url = https://archive.org/details/isbn_9780596101053 }}Describes string literals.</ref> |- ! <code>'</code> <code>'</code> | Single quote; commonly used to denote a string literal or character literal<ref name="id_1268443910269_75"/> |- ! <code><?</code> <code>?></code> | Used in XML to denote a processing instruction<ref name="id_1268443998814_32">{{cite book | last = Watt | first = Andrew | title = Sams Teach Yourself Xml in 10 Minutes | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2003 | isbn = 978-0-672-32471-0 | url-access = registration | url = https://archive.org/details/samsteachyoursel0000watt }} Describes XML processing instruction. p. 21.</ref> |- ! <code>/*</code> <code>*/</code> | Used in many programming languages to denote a comment<ref name="id_1268444112328_77">{{cite book | last = Cabrera | first = Harold | title = C# for Java Programmers | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2002 | isbn = 978-1-931836-54-8 }} Describes single-line and multi-line comments. p. 72.</ref> |- ! <code><%</code> <code>%></code> | Used in some web templates to specify a language boundary<ref>{{cite web|url=https://github.com/jakartaee/pages/blob/master/spec/src/main/asciidoc/ServerPages.adoc#jakarta-server-pages-specification-version-40|title=Jakarta Server Pages Specification, Version 4.0akarta Server Pages Specification, Version 4.0|website=GitHub|access-date=2023-02-10|archive-date=2023-02-10|archive-url=https://web.archive.org/web/20230210203352/https://github.com/jakartaee/pages/blob/master/spec/src/main/asciidoc/ServerPages.adoc#jakarta-server-pages-specification-version-40|url-status=live}}</ref> |}

==Delimiter collision== '''Delimiter collision''' describes a limitation of using delimiters. When content information contains a delimiter, then the processing of the data will fail since the embedded delimiter will be incorrectly interpreted as a data boundary unless provisions are made to prevent the collision.<ref name="FldDelm"/><ref name="mre_embed_problem">{{cite book | last = Friedl | first = Jeffrey | title = Mastering Regular Expressions | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2006 | isbn = 978-0-596-52812-6 }} describing solutions for embedded-delimiter problems p. 472.</ref> In XML, for example, collision can occur when content contains an angle bracket (< or >).

Each delimiter in a format can result in collision. In CSV, for example, field collision can occur when a field contains a comma (e.g., salary = "$30,000"), and record delimiter collision can occur when a field contains a newline. Both record and field delimiter collision occur frequently in CSV data.

A malicious user may seek to exploit collision. Consequently, delimiter collision can be the source of security vulnerability and exploit. Well-known examples include SQL injection and cross-site scripting in the context of SQL and HTML, respectively.

===Solutions=== Multiple methods for avoiding collision have been devised.

====Obfuscation==== Using a delimiter that is unlikely to appear in the content is an ad hoc approach that leads to limited success. It requires knowledge of expected content, guessing what won't appear in the content, and offers little security against malicious collisions.

====Control characters====

If content is restricted from containing control characters (which is typical), then using control characters for delimiters prevents delimiter collision that otherwise can occur when using non-control character delimiters.<ref>{{Cite web |url=https://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/ |title=Text File formats – ASCII Delimited Text – Not CSV or TAB delimited text |access-date=2026-04-28}}</ref> The ASCII character set was designed with this in mind by providing non-printing characters that can be used as delimiters in the range 28 to 31. Later, Unicode adopted the same code points.

{{Table alignment}} {| class="wikitable col1left col2center col3center" |- ! Common name ! ASCII (decimal) ! ASCII name ! Unicode name ! Use |- ! file separator | 28 | {{resize|200%|␜}} | INFORMATION SEPARATOR FOUR | End of file or between a concatenation of files |- ! group separator | 29 | {{resize|200%|␝}} | INFORMATION SEPARATOR THREE | Between sections of data; not needed in simple data files |- ! record separator | 30 | {{resize|200%|␞}} | INFORMATION SEPARATOR TWO | End of a record or row |- ! unit separator | 31 | {{resize|200%|␟}} | INFORMATION SEPARATOR ONE | Between fields of a record, or members of a row |}

====Escape sequence==== A commonly used method for avoiding delimiter collision is to use escape sequence. A specific printable character or sequence of characters before a character that otherwise would indicate a boundary, indicates that the delimiter character is not to be treated as a boundary. Although effective, this technique has drawbacks including:

* Content can be hard to read when it contains numerous escape sequences, a problem referred to as leaning toothpick syndrome (due to use of \ to escape / in Perl regular expressions, leading to sequences such as "\/\/"); * Data becomes difficult to parse via regular expression * Requires a way to escape the escape sequence (a way to use the escape sequence as content) * An escape sequence can be cryptic to those unfamiliar with the syntax<ref name="Kahrel000">{{cite book | title = Automating InDesign with Regular Expressions | first = Peter | last = Kahrel | publisher = O'Reilly | year = 2006 | isbn = 0-596-52937-6 | page = 11 }}</ref> * The method does not protect against injection attacks {{citation needed|date=March 2014}}

====Higher level encoding==== Some systems allow any character to be represented as a sequence of characters. This allows text that otherwise is a delimiter to be encoded in the content indirectly and thus prevent delimiter collision. A drawback of this method is that character codes are relatively hard to read, understand and memorize.

For example, Perl allows a character to be encoded as the sequence {{code|\x##}} where ## is the numeric value of the character code. The following shows how the sequence for double-quote ({{code|\x22}}) can be used to prevent collision with the delimiter that marks the begin and end of a string literal.

<syntaxhighlight lang="perl"> print "Nancy said \x22Hello World!\x22 to the crowd."; </syntaxhighlight>

produces the same output as:

<syntaxhighlight lang="perl"> print "Nancy said \"Hello World!\" to the crowd."; ### use escape char </syntaxhighlight>

====Dual quoting delimiters==== In contrast to escape sequences and escape characters, dual delimiters provide yet another way to avoid delimiter collision. Some languages, for example, allow the use of either a single quote (') or a double quote (") to specify a string literal. For example, in Perl:

<syntaxhighlight lang="perl"> print 'Nancy said "Hello World!" to the crowd.'; </syntaxhighlight>

produces the desired output without requiring escapes. This approach, however, only works when the string does not contain ''both'' types of quotation marks.

====Padding quoting delimiters==== In contrast to escape sequences and escape characters, padding delimiters provide yet another way to avoid delimiter collision. Visual Basic, for example, uses double quotes as delimiters. This is similar to escaping the delimiter.

<syntaxhighlight lang="basic"> print "Nancy said ""Hello World!"" to the crowd." </syntaxhighlight>

produces the desired output without requiring escapes. Like regular escaping it can, however, become confusing when many quotes are used. The code to print the above source code would look more confusing:

<syntaxhighlight lang="basic"> print "print ""Nancy said """"Hello World!"""" to the crowd.""" </syntaxhighlight>

==== Configurable alternative quoting delimiters ==== In contrast to dual delimiters, multiple delimiters are even more flexible for avoiding delimiter collision.<ref name="programmingperl" />{{rp|63}}

For example, in Perl: <syntaxhighlight lang="perl"> print qq^Nancy doesn't want to say "Hello World!" anymore.^; print qq@Nancy doesn't want to say "Hello World!" anymore.@; print qq(Nancy doesn't want to say "Hello World!" anymore.); </syntaxhighlight> all produce the desired output through use of [http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators quote operators], which allow any convenient character to act as a delimiter. Although this method is more flexible, few languages support it. Perl and Ruby are two that do.<ref name="programmingperl" />{{rp|62}}<ref name="Ruby000">{{cite book |last = Yukihiro |first = Matsumoto |title = Ruby in a Nutshell |publisher = O'Reilly |year = 2001 |isbn = 0-596-00214-9 |url = https://archive.org/details/rubyinnutshellde00mats }} In Ruby, these are indicated as ''general delimited strings''. p. 11</ref>

====Content boundary==== A '''content boundary''' is a special type of delimiter that is specifically designed to resist delimiter collision. It works by allowing the author to specify a sequence of characters that is guaranteed to always indicate a boundary between parts in a multi-part message, with no other possible interpretation.<ref name="Mime000">{{cite book | title = Network Protocols Handbook | publisher = Javvin Technologies Inc. | year = 2005 | isbn = 0-9740945-2-8 }} p. 26</ref>

The delimiter is frequently generated from a random sequence of characters that is statistically improbable to occur in the content. This may be followed by an identifying mark such as a UUID, a timestamp, or some other distinguishing mark. Alternatively, the content may be scanned to guarantee that a delimiter does not appear in the text. This may allow the delimiter to be shorter or simpler, and increase the human readability of the document. (''See e.g.'', MIME, Here documents).

====Whitespace or indentation==== Some programming and computer languages allow the use of whitespace delimiters or indentation as a means of specifying boundaries between independent regions in text.<ref name="id_1268444524465_10">{{cite book | title = Computational Linguistics and Intelligent Text Processing | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2001 | isbn = 978-3-540-41687-6 }} Describes whitespace delimiters. p. 258.</ref>

==== Regular expression syntax ==== {{see also|Regular expression examples}}

In specifying a regular expression, alternate delimiters may also be used to simplify the syntax for '''match''' and '''substitution''' operations in Perl.<ref name="Friedl000">{{cite book | last = Friedl | first = Jeffrey | title = Mastering Regular Expressions | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2006 | isbn = 978-0-596-52812-6 }} page 472.</ref>

For example, a simple match operation may be specified in Perl with the following syntax:

<syntaxhighlight lang="perl"> $string1 = 'Nancy said "Hello World!" to the crowd.'; # specify a target string print $string1 =~ m/[aeiou]+/; # match one or more vowels </syntaxhighlight>

The syntax is flexible enough to specify match operations with alternate delimiters, making it easy to avoid delimiter collision:

<syntaxhighlight lang="perl"> $string1 = 'Nancy said "http://Hello/World.htm" is not a valid address.'; # target string print $string1 =~ m@http://@; # match using alternate regular expression delimiter print $string1 =~ m{http://}; # same as previous, but different delimiter print $string1 =~ m!http://!; # same as previous, but different delimiter. </syntaxhighlight>

==== Here document ==== A here document allows the inclusion of arbitrary content by specifying a special end sequence. Many languages support this including PHP, Bourne shell, ruby and perl. A here document starts by describing what the end sequence is and continues until that sequence occurs at the start of a new line.<ref>{{Cite web |url=http://perldoc.perl.org/perlop.html |title=Perl operators and precedence |access-date=2011-11-11 |archive-date=2012-07-17 |archive-url=https://web.archive.org/web/20120717041740/http://perldoc.perl.org/perlop.html |url-status=live }}</ref> If the content is known, this technique avoids delimiter collision since an end sequence can be chosen that does not exist in the content.

An example in perl:

<syntaxhighlight lang="perl"> print <<ENDOFHEREDOC; It's very hard to encode a string with "certain characters".

Newlines, commas, and other characters can cause delimiter collisions. ENDOFHEREDOC </syntaxhighlight>

This code prints: <pre> It's very hard to encode a string with "certain characters".

Newlines, commas, and other characters can cause delimiter collisions. </pre>

====ASCII armor==== Although principally used as a mechanism for text encoding of binary data, ASCII armoring is a programming and systems administration technique that also helps avoid delimiter collision in some circumstances.<ref name="Rhee000">{{cite book | title = Internet Security: Cryptographic Principles, Algorithms and Protocols | first = Man | last = Rhee | publisher = John Wiley and Sons | year = 2003 | isbn = 0-470-85285-2 }}(an example usage of ASCII armoring in encryption applications)</ref><ref name="Gross000">{{cite book | title = Open Source for Windows Administrators | url = https://archive.org/details/opensourceforwin0000gros | url-access = registration | first = Christian | last = Gross | publisher = Charles River Media | year = 2005 | isbn = 1-58450-347-5 }}(an example usage of ASCII armoring in encryption applications)</ref> This technique is more complicated than many other collision avoidance techniques, and therefore is less suitable for small applications and simple data formats. The technique employs a special encoding scheme, such as base64, to ensure that delimiter or other significant characters do not appear in transmitted data. It prevents multilayered escaping, i.e. for double-quotes.

This technique is used, for example, in ASP.NET, and is closely associated with the VIEWSTATE component of that system.<ref name="Kalani000">{{cite book | title = Developing and Implementing Web Applications with Visual C# . NET and Visual Studio . NET | first = Amit | last = Kalani | publisher = Que | year = 2004 | isbn = 0-7897-2901-6 }}(describes the use of Base64 encoding and VIEWSTATE inside HTML source code)</ref> This prevents delimiter collision and ensures that incompatible characters will not appear inside the HTML code, regardless of what characters appear in the original (decoded) text.<ref name="Kalani000" /> The following example demonstrates how this technique works.

The following code fragment shows an HTML tag in which the VIEWSTATE value contains double-quotes {{endash}} characters that are incompatible with the delimiters of the HTML tag. The code is not valid and would fail.

To store arbitrary text in an HTML attribute, HTML entities can be used. In this case <code>&quot;</code> stands for double-quote.

Alternatively, any encoding could be used that doesn't include characters that have special meaning in the context, such as base64:

Or percent-encoding:

==See also== * {{Annotated link|CDATA}} * {{Annotated link|Decimal separator}} * {{Annotated link|Delimiter-separated values}} * {{Annotated link|Escape sequence}} * {{Annotated link|Newline}} * {{Annotated link|String literal}} * {{Annotated link|Tab-separated values}}

==References== {{reflist}}

==External links== * [http://www.catb.org/esr/writings/taoup/html/ch05s02.html Data File Metaformats] from The Art of Unix Programming by Eric Steven Raymond

Category:Markup languages Category:Pattern matching Category:Programming constructs Category:String (computer science)