# CJK characters

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/CJK_characters
> Markdown URL: https://mediated.wiki/source/CJK_characters.md
> Source: https://en.wikipedia.org/wiki/CJK_characters
> Source revision: 1353017014
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Logographs in shared East Asian written tradition

For help with CJK character display, see [Help:Multilingual support (East Asian)](https://en.wikipedia.org/wiki/Help:Multilingual_support_(East_Asian)).

Translation of "That old man is 72 years old" in [Vietnamese](/source/Vietnamese_language), [Cantonese](/source/Cantonese), [Mandarin](/source/Mandarin_Chinese) (in [simplified](/source/Simplified_Chinese_characters) and [traditional characters](/source/Traditional_Chinese_characters)), [Japanese](/source/Japanese_language), and [Korean](/source/Korean_language)

In [internationalization](/source/Internationalization_and_localization), **CJK characters** is a collective term for [graphemes](/source/Graphemes) used in the [Chinese](/source/Written_Chinese), [Japanese](/source/Japanese_writing_system), and [Korean writing systems](/source/Korean_writing_system), which each include [Chinese characters](/source/Chinese_characters). It can also go by **CJKV** to include [Chữ Nôm](/source/Ch%E1%BB%AF_N%C3%B4m), the Chinese-origin [logographic](/source/Logogram) script formerly used for the [Vietnamese language](/source/Vietnamese_language), or **CJKVZ** to also include [Sawndip](/source/Sawndip), used to write the [Zhuang languages](/source/Zhuang_languages).

## Character repertoire

[Standard Mandarin Chinese](/source/Standard_Mandarin_Chinese) and [Standard Cantonese](/source/Standard_Cantonese) are written almost exclusively in [Chinese characters](/source/Chinese_characters). Over 3,000 characters are required for general literacy, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. As of 2013[\[update\]](https://en.wikipedia.org/w/index.php?title=CJK_characters&action=edit), some South Korean students were still expected to learn [1,800 characters](/source/Basic_Hanja_for_educational_use).[1]

Other scripts used for these languages, such as [bopomofo](/source/Bopomofo) and the [Latin](/source/Latin_script)-based [pinyin](/source/Pinyin) for Chinese, [hiragana](/source/Hiragana) and [katakana](/source/Katakana) for Japanese, and [hangul](/source/Hangul) for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.

The [sinologist](/source/Sinology) Carl Leban (1971) produced an early survey of CJK encoding systems.

Until the early 20th century, [Classical Chinese](/source/Classical_Chinese) was the written language of government and scholarship in Vietnam. Popular literature in [Vietnamese](/source/Vietnamese_language) was written in the *[chữ Nôm](/source/Ch%E1%BB%AF_N%C3%B4m)* script, consisting of Chinese characters with many characters created locally. Since the 1920s, the script since then used for recording literature has been the Latin-based [Vietnamese alphabet](/source/Vietnamese_alphabet).[2][3]

### Quadruplication

**Quadruplication** ([Chinese](/source/Chinese_language): 四叠字, literally "four-fold characters") is a method of forming CJK characters via ideographic repetition. [Ken Lunde](/source/Ken_Lunde) describes these characters as "clusters of four or more identical elements, along with three identical elements in a row arranged horizontally or vertically".[4] These characters were mostly used in [Old Chinese](/source/Old_Chinese) writings and are no longer commonly used, except as components in some modern [Han ideographs](/source/Han_ideographs) such as 惙.[5]

#### Examples

Quadruplicate Character English Meaning Notes 𪚥 (obsolete) verbose; talkative 龍 ("dragon" in a grid of four) 䲜 the appearance of many kinds of fish used in the chengyu 生活䲜䲜 㸚 (obsolete) sparse and clear only found in historical dictionaries such as the Shuowen Jiezi

## Encoding

The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit [character encodings](/source/Character_encoding), requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from [Unicode](/source/Unicode) up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the [GB 18030](/source/GB_18030) character set.

Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. [Unicode](/source/Unicode) has attempted, with some controversy, to unify the character sets in a process known as [Han unification](/source/Han_unification).

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as [pinyin](/source/Pinyin), [bopomofo](/source/Bopomofo), hiragana, katakana and hangul.[6]

CJK character encodings include:

- [Big5](/source/Big5) (the most prevalent encoding before Unicode was implemented)

- [CCCII](/source/Chinese_Character_Code_for_Information_Interchange)

- [CNS 11643](/source/CNS_11643) (official standard of [Republic of China](/source/Republic_of_China_(Taiwan)))

- [EUC-JP](/source/EUC-JP)

- [EUC-KR](/source/EUC-KR)

- [GB 2312](/source/GB_2312) (subset and predecessor of GB 18030)

- [GB 18030](/source/GB_18030) (mandated standard in the [People's Republic of China](/source/People's_Republic_of_China))

- Giga Character Set (GCS)

- [ISO-2022-JP](/source/ISO-2022-JP)

- [ISO-2022-KR](/source/ISO-2022-KR)

- [KS X 1001](/source/KS_X_1001)

- [KPS 9566](/source/KPS_9566)

- [Shift-JIS](/source/Shift-JIS)

- [TRON](/source/TRON_(encoding))

- [Unicode](/source/Unicode)

The CJK character sets take up the bulk of the assigned [Unicode](/source/Unicode) code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the [Han unification](/source/Han_unification) process used to map multiple Chinese and Japanese character sets into a single set of unified characters.[*[citation needed](https://en.wikipedia.org/wiki/Wikipedia:Citation_needed)*]

All three languages can be written both [left-to-right and top-to-bottom](/source/Horizontal_and_vertical_writing_in_East_Asian_scripts) (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.

## Legal status

Libraries cooperated on encoding standards for [JACKPHY](/source/JACKPHY) characters in the early 1980s. According to [Ken Lunde](/source/Ken_Lunde), the abbreviation "CJK" was a registered [trademark](/source/Trademark) of [Research Libraries Group](/source/Research_Libraries_Group)[7] (which merged with [OCLC](/source/OCLC) in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.[8]

## See also

- [Chinese character description languages](/source/Chinese_character_description_languages)

- [Chinese character encoding](/source/Chinese_character_encoding)

- [Chinese input methods for computers](/source/Chinese_input_methods_for_computers)

- [CJK Compatibility Ideographs](/source/CJK_Compatibility_Ideographs)

- [Chinese character strokes](/source/Chinese_character_strokes)

- [CJK Unified Ideographs](/source/CJK_Unified_Ideographs)

- [Complex Text Layout languages](/source/Complex_Text_Layout_languages) (CTL)

- [Input method editor](/source/Input_method_editor)

- [Japanese language and computers](/source/Japanese_language_and_computers)

- [Korean language and computers](/source/Korean_language_and_computers)

- [List of CJK fonts](/source/List_of_CJK_fonts)

- [Sinoxenic](/source/Sinoxenic)

- [Variable-width encoding](/source/Variable-width_encoding)

- [Vietnamese language and computers](/source/Vietnamese_language_and_computers)

## References

1. **[^](#cite_ref-1)** Lunde, Ken (2009). *CJKV information processing* (2nd ed.). Beijing, Boston, Farnham, Sebastopol, Tokyo: O'Reilly. [ISBN](/source/ISBN_(identifier)) [978-0-596-51447-1](https://en.wikipedia.org/wiki/Special:BookSources/978-0-596-51447-1).

1. **[^](#cite_ref-FOOTNOTECoulmas1991113–115_2-0)** [Coulmas (1991)](#CITEREFCoulmas1991), pp. 113–115.

1. **[^](#cite_ref-FOOTNOTEDeFrancis1977_3-0)** [DeFrancis (1977)](#CITEREFDeFrancis1977).

1. **[^](#cite_ref-4)** Lunde, Ken. ["UTN #43: Unihan Database Property "kStrange""](https://www.unicode.org/notes/tn43/). *www.unicode.org*.

1. **[^](#cite_ref-5)** Yuan, Alex (11 August 2020). ["A Discussion on the Approach of "Connections" to Chinese Character Studies in Teaching Chinese as a Foreign Language"](https://engagedscholarship.csuohio.edu/cltmt/vol3/iss1/6). *Chinese Language Teaching Methodology and Technology*. **3** (1): 46. [ISSN](/source/ISSN_(identifier)) [2572-1739](https://search.worldcat.org/issn/2572-1739). Retrieved 24 December 2025.

1. **[^](#cite_ref-6)** This article is based on material taken from [CJK](https://foldoc.org/CJK) at the *[Free On-line Dictionary of Computing](/source/Free_On-line_Dictionary_of_Computing)* prior to 1 November 2008 and incorporated under the "relicensing" terms of the [GFDL](/source/GNU_Free_Documentation_License), version 1.3 or later.

1. **[^](#cite_ref-:0_7-0)** [Ken Lunde, 1996](http://www.csse.monash.edu.au/~jwb/cjk.inf)

1. **[^](#cite_ref-8)** [Justia listing](http://trademarks.justia.com/736/38/cjk-73638777.html)

### Works cited

- Coulmas, Florian (1991). [*The writing systems of the world*](https://archive.org/details/writingsystemsof0000coul). Blackwell. [ISBN](/source/ISBN_(identifier)) [978-0-631-18028-9](https://en.wikipedia.org/wiki/Special:BookSources/978-0-631-18028-9).

- DeFrancis, John (1977). *Colonialism and language policy in Viet Nam*. The Hague: Mouton. [ISBN](/source/ISBN_(identifier)) [978-90-279-7643-7](https://en.wikipedia.org/wiki/Special:BookSources/978-90-279-7643-7).

## Sources

- [DeFrancis, John](/source/John_DeFrancis). *[The Chinese Language: Fact and Fantasy](/source/The_Chinese_Language%3A_Fact_and_Fantasy)*. Honolulu: University of Hawaii Press, 1990. [ISBN](/source/ISBN_(identifier)) [0-8248-1068-6](https://en.wikipedia.org/wiki/Special:BookSources/0-8248-1068-6).

- Hannas, William C. *Asia's Orthographic Dilemma*. Honolulu: University of Hawaii Press, 1997. [ISBN](/source/ISBN_(identifier)) [0-8248-1892-X](https://en.wikipedia.org/wiki/Special:BookSources/0-8248-1892-X) (paperback); [ISBN](/source/ISBN_(identifier)) [0-8248-1842-3](https://en.wikipedia.org/wiki/Special:BookSources/0-8248-1842-3) (hardcover).

- Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting.

- Leban, Carl. *[Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)](https://books.google.com/books?id=ePLMGwAACAAJ)*, State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971.

- [Lunde, Ken](/source/Ken_Lunde). *CJKV Information Processing*. Sebastopol, Calif.: O'Reilly & Associates, 1998. [ISBN](/source/ISBN_(identifier)) [1-56592-224-7](https://en.wikipedia.org/wiki/Special:BookSources/1-56592-224-7).

## External links

- [CJKV: A Brief Introduction](http://www.linfo.org/cjkv.html)

- [Lemberg CJK article from above, TUGboat18-3](http://tug.org/TUGboat/Articles/tb18-3/cjkintro600.pdf)

- [On "CJK Unified Ideograph"](http://www.wenlin.com/cdl/#jarg), from Wenlin.com

- [FGA: Unicode CJKV character set rationalization](https://web.archive.org/web/20130624130411/http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/unicode-cjkv-character-set-rationalization.html)

v t e CJK ideographs in Unicode[a] Block name Plane Chart range Characters Han unification Scripts contained in block CJK Unified Ideographs CJK Unified Ideographs Extension A CJK Unified Ideographs Extension B CJK Unified Ideographs Extension C CJK Unified Ideographs Extension D CJK Unified Ideographs Extension E CJK Unified Ideographs Extension F CJK Unified Ideographs Extension G CJK Unified Ideographs Extension H CJK Unified Ideographs Extension I CJK Unified Ideographs Extension J CJK Radicals Supplement Kangxi Radicals Ideographic Description Characters CJK Symbols and Punctuation CJK Strokes Enclosed CJK Letters and Months CJK Compatibility CJK Compatibility Ideographs CJK Compatibility Forms Enclosed Ideographic Supplement CJK Compatibility Ideographs Supplement 0 BMP 0 BMP 2 SIP 2 SIP 2 SIP 2 SIP 2 SIP 3 TIP 3 TIP 2 SIP 3 TIP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 1 SMP 2 SIP 4E00–9FFF 3400–4DBF 20000–2A6DF 2A700–2B73F 2B740–2B81F 2B820–2CEAF 2CEB0–2EBEF 30000–3134F 31350–323AF 2EBF0–2EE5F 323B0–3347F 2E80–2EFF 2F00–2FDF 2FF0–2FFF 3000–303F 31C0–31EF 3200–32FF 3300–33FF F900–FAFF FE30–FE4F 1F200–1F2FF 2F800–2FA1F 20,992 6,592 42,720 4,160 222 5,774 7,473 4,939 4,192 622 4,298 115 214 16 64 39 255 256 472 32 64 542 Unified Unified Unified Unified Unified Unified Unified Unified Unified Unified Unified Not unified Not unified Not unified Not unified Not unified Not unified Not unified 12 are unified Not unified Not unified Not unified Han Han Han Han Han Han Han Han Han Han Han Han Han Common Han, Hangul, Common, Inherited Common Hangul, Katakana, Common Katakana, Common Han Common Hiragana, Common Han Totals 22 104,053 ^ As of version 17.0

---
Adapted from the Wikipedia article [CJK characters](https://en.wikipedia.org/wiki/CJK_characters) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/CJK_characters?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
