# Coreference

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Coreference
> Markdown URL: https://mediated.wiki/source/Coreference.md
> Source: https://en.wikipedia.org/wiki/Coreference
> Source revision: 1351927860
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Two or more expressions in a text with the same referent

Not to be confused with [conference](/source/Conference).

This article may be confusing or unclear to readers. Please help clarify the article. There might be a discussion about this on the talk page. (March 2016) (Learn how and when to remove this message)

In [linguistics](/source/Linguistics), **coreference**, sometimes written **co-reference**, occurs when two or more expressions refer to the same person or thing; they have the same [referent](/source/Referent). For example, in *Bill said Alice would arrive soon, and she did*, the words *Alice* and *she* refer to the same person.[1]

Co-reference is often non-trivial to determine. For example, in *Bill said he would come*, the word *he* may or may not refer to Bill. Determining which expressions are coreferences is an important part of analyzing or understanding the meaning, and often requires information from the context, real-world knowledge, such as tendencies of some names to be associated with particular species ("Rover"), kinds of artifacts ("Titanic"), grammatical genders, or other properties.

Linguists commonly use indices to notate coreference, as in *Billi said hei would come*. Such expressions are said to be *coindexed*, indicating that they should be interpreted as coreferential.

When expressions are coreferential, the first to occur is often a full or descriptive form (for example, an entire personal name, perhaps with a title and role), while later occurrences use shorter forms (for example, just a given name, surname, or pronoun). The earlier occurrence is known as the [antecedent](/source/Antecedent_(grammar)) and the other is called a [proform](/source/Proform), anaphor, or reference. However, pronouns can sometimes refer forward, as in "When she arrived home, Alice went to sleep." In such cases, the coreference is called [cataphoric](/source/Cataphora) rather than anaphoric.

Coreference is important for [binding](/source/Binding_(linguistics)) phenomena in the field of syntax. The theory of binding explores the syntactic relationship that exists between coreferential expressions in sentences and texts.

## Types

When exploring coreference, numerous distinctions can be made, e.g. [anaphora](/source/Anaphora_(linguistics)), [cataphora](/source/Cataphora), split antecedents, coreferring noun phrases, etc.[2] Several of these more specific phenomena are illustrated here:

- **Anaphora** - a. **The musici** was so loud that **iti** couldn't be enjoyed. –The anaphor *it* follows the expression to which it refers (its antecedent). - b. **Our neighborsi** dislike the music. If **theyi** are angry, the cops will show up soon. – The anaphor *they* follows the expression to which it refers (its antecedent). **Cataphora** - a. If **theyi** are angry about the music, **the neighborsi** will call the cops. – The cataphor *they* precedes the expression to which it refers (its postcedent). - b. Despite **heri** difficulty, **Wilmai** came to understand the point. – The cataphor *her* precedes the expression to which it refers (its postcedent) **Split antecedents** - a. **Caroli** told **Bobi** to attend the party. **Theyi** arrived together. – The anaphor *they* has a split antecedent, referring to both *Carol* and *Bob*. - b. When **Caroli** helps **Bobi** and **Bobi** helps **Caroli**, **theyi** can accomplish any task. – The anaphor *they* has a split antecedent, referring to both *Carol* and *Bob*. **Coreferring noun phrases** - a. **The project leaderi** is refusing to help. **The jerki** thinks only of **himselfi**. – Coreferring noun phrases, whereby the second noun phrase is a predication over the first. - b. **Some of our colleagues1** are going to be supportive. **These kinds of people1** will earn our gratitude. – Coreferring noun phrases, whereby the second noun phrase is a predication over the first.

## Relation to bound variables

Semanticists and logicians sometimes draw a distinction between coreference and what is known as a [bound variable](/source/Bound_variable).[3] Bound variables occur when the antecedent to the proform is an indefinite quantified expression, e.g.[4][*[clarification needed](https://en.wikipedia.org/wiki/Wikipedia:Please_clarify)*]

1. **Every studenti** has received **hisi** grade. – The pronoun *his* is an example of a bound variable
1. **No studenti** was upset with **hisi** grade. – The pronoun *his* is an example of a bound variable

[Quantified expressions](/source/Quantifier_(logic)) such as *every student* and *no student* are not considered referential. These expressions are grammatically singular but do not pick out single referents in the discourse or real world. Thus, the antecedents to *his* in these examples are not properly referential, and neither is *his*. Instead, it is considered a *variable* that is *bound* by its antecedent. Its reference varies based upon which of the students in the discourse world is thought of. The existence of bound variables is perhaps more apparent with the following example:

1. **Only Jacki** likes **hisi** grade. – The pronoun *his* can be a bound variable.

This sentence is ambiguous. It can mean that Jack likes his grade but everyone else dislikes Jack's grade; or that no one likes their **own** grade except Jack. In the first meaning, *his* is coreferential; in the second, it is a bound variable because its reference varies over the set of all students.

Coindex notation is commonly used for both cases. That is, when two or more expressions are coindexed, it does not signal whether one is dealing with coreference or a bound variable (or as in the last example, whether it depends on interpretation).

## Coreference resolution

In [computational linguistics](/source/Computational_linguistics), coreference resolution is a well-studied problem in [discourse](/source/Discourse). To derive the correct interpretation of a text, or even to estimate the relative importance of various mentioned subjects, pronouns and other [referring expressions](/source/Referring_expression) must be connected to the right individuals. Algorithms intended to resolve coreferences commonly look first for the nearest preceding individual that is compatible with the referring expression. For example, *she* might attach to a preceding expression such as *the woman* or *Anne*, but not as probably to *Bill*. Pronouns such as *himself* have much stricter constraints. As with many linguistic tasks, there is a tradeoff between [precision and recall](/source/Precision_and_recall). [Cluster](/source/Cluster_analysis)-quality metrics commonly used to evaluate coreference resolution algorithms include the [Rand index](/source/Rand_index), the [adjusted Rand index](/source/Adjusted_Rand_index), and different [mutual information](/source/Mutual_information)-based methods.

A particular problem for coreference resolution in English is the pronoun *it*, which has many uses. *It* can refer much like *he* and *she*, except that it generally refers to inanimate objects (the rules are actually more complex: animals may be any of *it*, *he*, or *she*; ships are traditionally *she*; hurricanes are usually *it* despite having gendered names). *It* can also refer to abstractions rather than beings, e.g. *He was paid minimum wage, but didn't seem to mind it.* Finally, *it* also has [pleonastic](/source/Pleonastic) uses, which do not refer to anything specific:

1. **It'**s raining.
1. **It'**s really a shame.
1. **It** takes a lot of work to succeed.
1. Sometimes **it'**s the loudest who have the most influence.

Pleonastic uses are not considered referential, and so are not part of coreference.[5]

Approaches to coreference resolution can broadly be separated into mention-pair, mention-ranking or entity-based algorithms. Mention-pair algorithms involve [binary](https://en.wiktionary.org/wiki/binary) decisions if a pair of two given mentions belong to the same entity. Entity-wide constraints like [gender](/source/Gender) are not considered, which leads to [error propagation](/source/Error_propagation). For example, the pronouns *he* or *she* can both have a high probability of coreference with *the teacher*, but cannot be coreferent with each other. Mention-ranking algorithms expand on this idea but instead stipulate that one mention can only be coreferent with one (previous) mention. As a result, each previous mention must be given a score and the highest scoring mention (or no mention) is linked. Finally, in entity-based methods mentions are linked based on information of the whole coreference chain instead of individual mentions. The representation of a variable-width chain is more complex and computationally expensive than mention-based methods, which lead to these algorithms being mostly based on [neural network](/source/Neural_network) architectures.

## See also

- [Anaphora (linguistics)](/source/Anaphora_(linguistics)) – Use of an expression whose interpretation depends on context

- [Antecedent](/source/Antecedent_(grammar)) – Expression that gives its meaning to a pro-form in grammar

- [Binding](/source/Binding_(linguistics)) – Distribution of anaphoric elements

- [Cataphora](/source/Cataphora) – Use of an expression or word that co-refers with a later, more specific, expression

- [Nearest referent](/source/Nearest_referent)

- [Switch-reference](/source/Switch-reference) – Concept in linguistics

- [Word-sense disambiguation](/source/Word-sense_disambiguation) – Identification of which sense of a word is being used

## Notes

1. **[^](#cite_ref-1)** For definitions of coreference, see for instance Crystal (1997:94) and Radford (2004:332).

1. **[^](#cite_ref-2)** These distinctions (anaphora, cataphora, split antecedents, coreferring noun phrases, etc.) are discussed in Jurafsky and Martin (2000:669ff).

1. **[^](#cite_ref-3)** For discussions of bound variables, see for instance Portner (2005:102ff.).

1. **[^](#cite_ref-4)** See Jurafsky and Martin (2000:701) for an example of a bound variable like the ones given here.

1. **[^](#cite_ref-5)** Li et al. (2009) have demonstrated high accuracy in sorting out pleonastic *it*, and this success promises to improve the accuracy of coreference resolution overall.

## References

- Crystal, D. 1997. A dictionary of linguistics and phonetics. 4th edition. Cambridge, MA: Blackwell Publishing.

- [Jurafsky, D.](/source/Dan_Jurafsky) and H. Martin 2000. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. New Delhi, India: Pearson Education.

- Portner, P. 2005. What is semantics?: Fundamentals of formal semantics. Malden, MA: Blackwell Publishing.

- Radford, A. 2004. [English syntax: An introduction](https://books.google.com/books?id=LdAi292Q4-0C&q=coreference). Cambridge, UK: Cambridge University Press.

- Li, Y., P. Musilek, M. Reformat, and L. Wyard-Scott 2009. [Identification of pleonastic *it* using the web](https://www.aaai.org/Papers/JAIR/Vol34/JAIR-3410.pdf) [Archived](https://web.archive.org/web/20221026062452/https://www.aaai.org/Papers/JAIR/Vol34/JAIR-3410.pdf) 2022-10-26 at the [Wayback Machine](/source/Wayback_Machine). *Journal of Artificial Intelligence Research* 34, 339–389.

v t e Natural language processing General terms AI-complete Bag-of-words n-gram Bigram Trigram Computational linguistics Natural language understanding Stop words Text processing Text analysis Argument mining Collocation extraction Concept mining Coreference resolution Deep linguistic processing Distant reading Information extraction Named-entity recognition Ontology learning Parsing semantic syntactic Part-of-speech tagging Semantic analysis Semantic role labeling Semantic decomposition Semantic similarity Sentiment analysis Stylometry adversarial Terminology extraction Text mining Textual entailment Truecasing Word-sense disambiguation Word-sense induction Text segmentation Compound-term processing Lemmatization Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation Automatic summarization Multi-document summarization Sentence extraction Text simplification Machine translation Computer-assisted Example-based Rule-based Statistical Transfer-based Neural Distributional semantics models BERT Document-term matrix Explicit semantic analysis fastText GloVe Language model large small Latent semantic analysis Long short-term memory Seq2seq Transformer Word embedding Word2vec Language resources, datasets and corpora Types and standards Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies Data BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata Automatic identification and data capture Speech recognition Speech segmentation Speech synthesis Natural language generation Topic model Document classification Dynamic topic model Latent Dirichlet allocation Pachinko allocation Computer-assisted reviewing Automated essay scoring Concordancer Grammar checker Predictive text Pronunciation assessment Spell checker Natural language user interface Chatbot Interactive fiction Prompt engineering Question answering Virtual assistant Voice user interface Visual-linguistic Automatic image annotation CLIP Multimodal sentiment analysis Optical character recognition Vision-language model Vision–language–action model Related Formal semantics Gensim Hallucination Natural Language Toolkit spaCy

v t e Formal semantics (natural language) Central concepts Compositionality Denotation Entailment Extension Generalized quantifier Intension Logical form Presupposition Proposition Reference Scope Speech act Syntax–semantics interface Truth conditions Topics Areas Anaphora Ambiguity Binding Conditionals Definiteness Disjunction Evidentiality Focus Indexicality Lexical semantics Modality Negation Propositional attitudes Tense–aspect–mood Quantification Vagueness Phenomena Antecedent-contained deletion Cataphora Coercion Conservativity Counterfactuals Crossover effects Cumulativity De dicto and de re De se Deontic modality Discourse relations Donkey anaphora Epistemic modality Exhaustivity Faultless disagreement Free choice inferences Givenness Homogeneity (linguistics) Hurford disjunction Inalienable possession Intersective modification Logophoricity Mirativity Modal subordination Opaque contexts Performatives Polarity items Privative adjectives Quantificational variability effect Responsive predicate Rising declaratives Scalar implicature Sloppy identity Subsective modification Subtrigging Telicity Temperature paradox Veridicality Formalism Formal systems Alternative semantics Categorial grammar Combinatory categorial grammar Discourse representation theory (DRT) Dynamic semantics Generative grammar Glue semantics Inquisitive semantics Intensional logic Lambda calculus Mereology Montague grammar Segmented discourse representation theory (SDRT) Situation semantics Supervaluationism Type theory TTR Concepts Autonomy of syntax Context set Continuation Conversational scoreboard Downward entailing Existential closure Function application Meaning postulate Monads Plural quantification Possible world Quantifier raising Quantization Question under discussion Semantic parsing Squiggle operator Strawson entailment Strict conditional Type shifter Universal grinder See also Cognitive semantics Computational semantics Distributional semantics Formal grammar Inferentialism Logic translation Linguistics wars Philosophy of language Pragmatics Semantics of logic

---
Adapted from the Wikipedia article [Coreference](https://en.wikipedia.org/wiki/Coreference) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Coreference?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.