{{Short description|Superseded Simplified Chinese character encoding, structured similarly to Shift JIS}} {{Use mdy dates|date=September 2024}} {{infobox character encoding |name=IBM-936 |status=Deprecated |alias=SHIFTGB<ref name="leisher">{{cite web |url=http://sofia.nmsu.edu/~mleisher/Software/csets/SHIFTGB.TXT |last=Leisher |first=Mark |date=2008 |orig-year=1998-03-06 |publisher=Department of Mathematical Sciences, [[New Mexico State University]] |title=SHIFTGB.TXT: Shifted GB2312.1980. Generated from an algorithm provided with some older Chinese packages. |archive-url=https://web.archive.org/web/20230120125054/http://sofia.nmsu.edu/~mleisher/Software/csets/SHIFTGB.TXT |archive-date=2023-01-20}}</ref> |next=[[Code page 1381|IBM-1381]] |encodes=[[GB 2312]] |otherrelated=[[Shift JIS]] |lang=[[Simplified Chinese]] |by=[[IBM]] }} '''IBM code page 936''' is a character encoding for [[Simplified Chinese]] including 1880 [[Private Use Areas#Private-use characters in other character sets|user-defined characters]] (UDC), which was superseded in 1993. It is a combination of the single-byte [[Code page 903]] and the double-byte '''Code page 928'''.<ref name="lunde2009">{{cite book |pages=278–282 |section=Chapter 4: Encoding Methods (§ Code Pages) |title=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |author-link=Ken Lunde |year=2009 |publisher=[[O'Reilly Media]] |isbn=978-0-596-51447-1 |location=[[Sebastopol, California]]}}</ref><ref>{{cite web|title=CCSID 936|publisher=[[IBM]]|archive-url=https://web.archive.org/web/20160327035758/http://www-01.ibm.com/software/globalization/ccsid/ccsid936.html|archive-date=2016-03-27|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid936.html}}</ref> '''Code page 946''' uses the same double-byte component, but an extended single-byte component ([[Code page 1042]]).<ref name="lunde2009"/><ref>{{cite web|title=CCSID 946|publisher=[[IBM]]|archive-url=https://web.archive.org/web/20160326215526/http://www-01.ibm.com/software/globalization/ccsid/ccsid946.html|archive-date=2016-03-26|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid946.html}}</ref>
IBM code page 936 should not be confused with [[Windows-936|the identically numbered Windows code page]], which is a variant of the [[GBK (character encoding)|GBK]] encoding;<ref name="lunde2009"/> GBK is called [[Code page 1386]] by IBM. While GBK is a superset of the [[EUC-CN]] encoding of [[GB 2312]], IBM-936 uses a different coded form of GB 2312, more closely resembling the relationship of [[Shift JIS]] to [[JIS X 0208]].
== History == [[File:IBM CJK Code Page Numbers.svg|right|thumb|Except for [[Shift JIS]] itself, the similarly structured code pages for other [[CJK characters|CJK]] locales were phased out between 1992 and 2016.]]
The encoding was in use mainly during the 1980s and early 1990s. While the original IBM PC ([[IBM 5150]]) lacked functionality for processing data in [[CJK characters|CJK]] languages, the [[IBM 5550]] possessed such functionality, and was available in models supporting Japanese, [[Korean language|Korean]], [[Traditional Chinese]] or [[Simplified Chinese]]. Code page 936 for Simplified Chinese accompanied [[code page 932 (IBM)|code page 932]] ([[Shift JIS]]) for Japanese, [[code page 934]] for Korean and [[code page 938]] for Traditional Chinese.
The last revision of IBM-928/936/946 was documented in 1992, and it was superseded in 1993 by the [[EUC-CN]]-based [[Code page 1380|code pages 1380 through 1383]]; code page 1380 encodes the same characters as code page 928, but in a different layout.<ref name="ibm1380">{{citation|mode=cs1|url=https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP01380.pdf|title=C-H 3-3220-130 1993-11: IBM Simplified Chinese Graphic Character Set|section=Table 1: Registration of GCSGID and CPGID for the IBM CH-S Graphic Character Set|year=1993|page=6}}</ref> As of 1998, "some older Chinese packages" still included an algorithm for converting between IBM-936 and other encodings of GB 2312.<ref name="leisher"/>
== Status == Although chart definitions for Code page 1380 (the document C-H 3-3220-130 1993-11) are provided online by IBM, IBM does not similarly provide the chart definition for the older Code page 928 (the document C-H 3-3220-130 1992-11, i.e. an earlier revision of the same specification).<ref name="ibm1380"/><ref>{{cite web|title=Code page 928 information document|archive-url=https://web.archive.org/web/20160317015802/http://www-01.ibm.com/software/globalization/cp/cp00928.html|archive-date=2016-03-17|url=https://www-01.ibm.com/software/globalization/cp/cp00928.html}}</ref> [[International Components for Unicode]] (ICU) does not include an IBM-936 or IBM-946 codec, and uses the Windows code page for the "cp936" label.<ref>{{cite web|url=https://ssl.icu-project.org/icu-bin/convexp?conv=cp936|work=ICU Demonstration – Converter Explorer|publisher=International Components for Unicode|title=windows-936-2000 (alias cp936)}}</ref> The ICU project does possess mapping data for IBM-946, which it makes publicly available,<ref name="icu946">{{cite web |url=https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/ibm-946_P100-1995.ucm |title=ibm-946_P100-1995 |work=[[International Components for Unicode]] Data Repository |publisher=[[Unicode Consortium]], [[IBM]]}}</ref> but does not ship it with ICU.
== Structure ==
Code page 928, the double byte component, includes 9,355 characters as double-byte sequences starting with 0x81 through 0xAC and 0xF0 through 0xFA.<ref>{{cite web|title=CCSID 928 information document|archive-url=https://web.archive.org/web/20160326215312/http://www-01.ibm.com/software/globalization/ccsid/ccsid928.html|archive-date=2016-03-26|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid928.html}}</ref>
The 0x81–AC lead byte range is used for GB 2312 characters: lead bytes 0x81–87 were used for non-hanzi, 0x88–9C are used for level 1 hanzi and 0x9C–AC are used for level 2 hanzi.<ref name="leisher"/><ref name="ibm1380"/><ref name="icu946"/> Like [[Shift JIS]], trail (second) bytes are in the range 0x40–FC excluding 0x7F, allowing two GB 2312 rows to be encoded per lead byte;<ref name="icu946"/> unlike Shift JIS, the bytes 0xA0–AC are not excluded from the lead byte range,<ref name="ibm1380"/><ref name="icu946"/> since [[JIS X 0201]] compatibility was not required. The 0xF0–FA lead byte range is used for IBM extensions: 0xF0 through 0xF9 are used for user-defined characters, and 0xFA is used for additional non-hanzi.<ref name="ibm1380"/>
==References== {{reflist}}
{{character encoding}} [[Category:Chinese character encodings]]