# Text simplification

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Text_simplification
> Markdown URL: https://mediated.wiki/source/Text_simplification.md
> Source: https://en.wikipedia.org/wiki/Text_simplification
> Source revision: 1337371129
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Automated process

This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (June 2012) (Learn how and when to remove this message)

**Text simplification** is an aspect of [natural language processing](/source/Natural_language_processing) that involves modifying, organizing, or categorizing existing text to make it easier to understand while retaining its original [meaning](/source/Meaning_(linguistic)). This process is essential in today's world, where communication is increasingly complex due to advancements in science, technology, and media. Human languages are inherently intricate, with extensive vocabularies and complex structures that can be challenging for machines to handle efficiently. Researchers have found that [semantic compression](/source/Semantic_compression) techniques can help streamline and simplify text by reducing linguistic diversity and simplifying the vocabulary used in a given context.

## Example

Text simplification involves modifying complex sentences into simpler ones to enhance readability and comprehension. Siddharthan (2006) provides an example to illustrate this process.[1] The original sentence contains multiple clauses and phrases, which can be broken down into simpler sentences for better understanding.

- *Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold.*

- *Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today.*

An approach to text simplification involves [lexical simplification](/source/Lexical_simplification) via [lexical substitution](/source/Lexical_substitution), a process that replaces complex words with simpler synonyms. Identifying complex words is a challenge addressed by machine learning classifiers trained on [labeled data](/source/Labeled_data). Researchers have found that asking labelers to sort words by complexity levels yields more consistent results than the traditional method of categorizing words as simple or complex.[2]

## See also

- [Automated paraphrasing](/source/Automated_paraphrasing)

- [Controlled natural language](/source/Controlled_natural_language)

- [Language reform](/source/Language_reform)

- [Lexical simplification](/source/Lexical_simplification)

- [Lexical substitution](/source/Lexical_substitution)

- [Semantic compression](/source/Semantic_compression)

- [Text normalization](/source/Text_normalization)

- [Simplified English](/source/Simplified_Technical_English)

- [Basic English](/source/Basic_English)

## References

1. **[^](#cite_ref-1)** Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". *Research on Language and Computation*. **4** (1): 77–109. [doi](/source/Doi_(identifier)):[10.1007/s11168-006-9011-1](https://doi.org/10.1007%2Fs11168-006-9011-1). [S2CID](/source/S2CID_(identifier)) [14619244](https://api.semanticscholar.org/CorpusID:14619244).

1. **[^](#cite_ref-2)** Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). ["Comparative judgments are more consistent than binary classification for labelling word complexity"](https://www.aclweb.org/anthology/W19-4024/). *Proceedings of the 13th Linguistic Annotation Workshop*: 208–214. [doi](/source/Doi_(identifier)):[10.18653/v1/W19-4024](https://doi.org/10.18653%2Fv1%2FW19-4024). Retrieved 22 November 2019.

- Wei Xu, Chris Callison-Burch and Courtney Napoles. "[Problems in Current Text Simplification Research](https://www.transacl.org/ojs/index.php/tacl/article/viewFile/549/131)". In Transactions of the Association for Computational Linguistics (TACL), Volume 3, 2015, Pages 283–297.

- Advaith Siddharthan. "[Syntactic Simplification and Text Cohesion](http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-597.pdf)". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.

- Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June. [\[1\]](https://web.archive.org/web/20110607105853/http://www.public.asu.edu/~sjonnal3/home/papers/NAACL%20HLT%202009.pdf)

## External links

- [Automatic Induction of Rules for Text Simplification](https://repository.upenn.edu/handle/20.500.14332/37520&context=ircs_reports) 1996

- [Text Simplification for Information-Seeking Applications](http://www.isi.edu/~marcu/papers/factoids04.pdf) [Archived](https://web.archive.org/web/20210425153136/https://www.isi.edu/~marcu/papers/factoids04.pdf) 2021-04-25 at the [Wayback Machine](/source/Wayback_Machine) 2004

v t e Natural language processing General terms AI-complete Bag-of-words n-gram Bigram Trigram Computational linguistics Natural language understanding Stop words Text processing Text analysis Argument mining Collocation extraction Concept mining Coreference resolution Deep linguistic processing Distant reading Information extraction Knowledge extraction Logic translation Named-entity recognition Ontology learning Parsing semantic syntactic Part-of-speech tagging Semantic analysis Semantic role labeling Semantic decomposition Semantic similarity Sentiment analysis Stylometry adversarial Terminology extraction Text mining Textual entailment Truecasing Word-sense disambiguation Word-sense induction Text segmentation Compound-term processing Lemmatization Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation Automatic summarization Multi-document summarization Sentence extraction Text simplification Machine translation Computer-assisted Example-based Rule-based Statistical Transfer-based Neural Distributional semantics models BERT Document-term matrix Explicit semantic analysis fastText GloVe Language model large small Latent semantic analysis Long short-term memory Seq2seq Transformer Word embedding Word2vec Language resources, datasets and corpora Types and standards Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies Data BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata Automatic identification and data capture Speech recognition Speech segmentation Speech synthesis Natural language generation Topic model Document classification Dynamic topic model Latent Dirichlet allocation Pachinko allocation Computer-assisted reviewing Automated essay scoring Concordancer Grammar checker Predictive text Pronunciation assessment Spell checker Natural language user interface Chatbot Interactive fiction Prompt engineering Question answering Virtual assistant Voice user interface Visual-linguistic Automatic image annotation CLIP Multimodal sentiment analysis Optical character recognition Vision-language model Vision–language–action model Related Formal semantics Gensim Hallucination Natural Language Toolkit spaCy

---
Adapted from the Wikipedia article [Text simplification](https://en.wikipedia.org/wiki/Text_simplification) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Text_simplification?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
