Share on Facebook Share on Twitter Email
Answers.com

Code point

 
Wikipedia: Code point

In character encoding terminology, a code point or code position is any of the numerical values that make up the code space.[1] For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112 code points in the range 0hex to 10FFFFhex. The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 \times 65,536 = 1,114,112.

Contents

Definition

The notion of a code point is used for abstraction, to distinguish both:

  • the number from an encoding as a sequence of bits, and
  • the abstract character from a particular graphical representation (glyph).

This is because one may wish to make these distinctions:

  • encode a particular code space in different ways, or
  • display a character via different glyphs.

For Unicode, the particular sequence of bits is called a code value – for the UCS-4 encoding, characters/code points are encoded as 4-byte (octet) binary numbers (which is fixed width and simple, but inefficient), while in the UTF-8 encoding, characters are encoded as 1 to 4 byte numbers (which is variable-width, hence more efficient but more complex, and backwards compatible with ASCII).

Code points are normally assigned to abstract characters. An abstract character is not a graphical glyph but a unit of textual data. The precise appearance of the character depends on the font. However code points may also be left reserved for future assignment (most of the Unicode code space is unassigned), or given other designated functions.

Unicode text

A Unicode text file is not necessarily merely a sequence of code points encoded into 4 byte blocks. Instead, an encoding scheme is used to serialize a sequence of code points into a sequence of bytes. A number of such schemes exist, and these trade between space efficiency and ease of encoding. A variable number of bytes can be used for each character. For example UTF-8, maintains some compatibility with ASCII. Encoding schemes also take into account endianness, and may have the property of being a self-synchronizing code, meaning character boundaries can be found without having to read from the beginning of the string.

Notes

See also


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 
Learn More
signaling point (technology)
include (technology)
point code (technology)

What is the code for zoinks points? Read answer...
How do you get free wii points codes? Read answer...
What is the code for 100000 trainer points? Read answer...

Help us answer these
WII POINTS card codes?
Bad points of morse code?
Are there 2 point codes for the CPI?

Post a question - any question - to the WikiAnswers community:

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Code point" Read more