Unicode defines a total of 74,394 CJK Unified Ideographs, split across five blocks
The terms ideographs or ideograms may be misleading, since the Chinese script is not strictly a picture writing system.
Contents |
CJK Unified Ideographs block
The CJK Unified Ideographs block (4E00-9FFF) contains 20,940 basic Chinese characters, not only those used in the Chinese writing system but also the Kanji used in the Japanese writing system and the Hanja, whose use is diminishing in Korea. Many characters in this block are used in all three writing systems, while others are in only one or two of the three. Chinese characters were also used in the Vietnamese Chữ nôm script (now obsolete). The first 20,902 characters in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical sequence.
The block is the result of Han unification[1], which was somewhat controversial in the Far East.[2] Since Chinese, Japanese and Korean characters were coded in the same location, the appearance of a selected glyph could depend on the particular font being used. However, the source separation rule states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding.[3]
Using variation selectors[4] it is possible to specify certain variant CJK ideograms within Unicode. The Adobe-Japan1 character set proposal, which actually calls for 14,658 ideographic variation sequences,[4] is an extreme example of the use of variation selectors.[5]
The following tables list the characters of the CJK Unified Ideographs block (4E00-9FFF), though without official Unicode names and descriptions of each). For space reasons the character glyphs are divided among four separate articles.
- CJK Unified Ideographs, 4E00-62FF
- CJK Unified Ideographs, 6300-77FF
- CJK Unified Ideographs, 7800-8CFF
- CJK Unified Ideographs, 8D00-9FFF
The first 20,902 characters (4E00-9FA5) have been defined since Unicode version 1.0 (1991). 22 characters (9FA6-9FBB) were added in Unicode 4.1 (2005); 8 characters (9FBC-9FC3) were added in Unicode 5.1 (2008); and 8 characters (9FC4-9FCB) were added in Unicode 5.2 (2009).
CJK Unified Ideographs Extension A
The CJK Unified Ideographs Extension A block (3400-4DBF) comprises 6,582 less common characters that were added in Unicode 3.0 (1999).
CJK Unified Ideographs Extension B
The CJK Unified Ideographs Extension B block (20000-2A6DF) comprises 42,711 characters that were added in Unicode 3.1 (2001). These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Chữ Nôm characters that were historically used for writing Vietnamese language.
CJK Unified Ideographs Extension C
The CJK Unified Ideographs Extension C block (2A700-2B73F) comprises 4,149 characters that were added in Unicode 5.2 (2009).
CJK Compatibility Ideographs
The CJK Compatibility Ideographs block (F900-FAFF) includes twelve characters that despite their names and location are in fact classified as unified ideographs: FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29.
Notes
- ^ The Unicode standard 4.0, Appendix A - Han Unification History
- ^ Suzanne Topping, The secret life of Unicode
- ^ The Unicode standard, 4.0, Chapter 11 - East Asian scripts
- ^ a b Andrew West, "The Secret Life of Variation Selectors", 28 June 2007, accessed 2008-08-02
- ^ PRI 108: Combined registration of the Adobe Japan1 collection and of sequences in that collection
See also
External links
- Information on a number of the 98,884 characters in Unicode 5.0 from the decodeUnicode Wiki project at the University of Applied Sciences in Mainz, Germany
- PDF from Unicode Consortium
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)




