UTF-8 Encoding

UTF-8 Encoding (루bits£©)

8-bit Unicode Transformation format, called UTF-8, is a variable width character encoding that can encode all of the 1.111.064 valid code points in Unicode wit one to four 8-bit bytes. The number “8” means 8-bit blocks are used by UTF for representing a character. (8位Unicode转换格式,称为UTF-8 ,是一种可变宽度字符编码,可以用一到四个8位字节对Unicode中的所有1.111.064有效代码点进行编码。数字“8”表示UTF使用8位块来表示字符。)

Since 2009, UTF-8 has been the leading encoding for the World Wide Web. (自2009年以来, UTF-8一直是万维网的领先编码方式。)

For characters that are equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. This is similar to the ASCII value. (对于等于或低于127 (十六进制0x7F )的字符, UTF-8表示为一个字节。这与ASCII值类似。)

For any character equal to or below 2047 (hex 0x07FF), the UTF-8 representation is scattered over two bytes. (对于等于或低于2047的任何字符(十六进制0x07FF ) , UTF-8表示分散在两个字节上。)

For any character that is equal to or greater than 2048 but less than 65535 (0xFFFF), the UTF-8 representation will be spread across three bytes. (对于等于或大于2048但小于65535 (0xFFFF)的任何字符, UTF-8表示将分布在三个字节中。)

The list below shows some UTF-8 character codes which are supported by HTML5:

Character CodesDecimalHexadecimal
C0 Controls and Basic Latin0-1270000-007F
C1 Controls and Latin-1 Supplement128-2550080-00FF
Latin Extended-A256-3830100-017F
Latin Extended-B384-5910180-024F
Spacing Modifiers688-76702B0-02FF
Diacritical Marks768-8790300-036F
Greek and Coptic880-10230370-03FF
Cyrillic Basic1024-12790400-04FF
Cyrillic Supplement1280-13270500-052F
General Punctuation8192-83032000-206F
Currency Symbols8352-839920A0-20CF
Letterlike Symbols8448-85272100-214F
Arrows8592-87032190-21FF
Mathmetical Operators8704-89592200-22FF
Box Drawings9472-95992500-257F
Block Elements9600-96312580-259F
Geometric Shapes9632-972725A0-25FF
Miscellaneous Symbols9728-99832600-26FF
Dingbats9984-101752700-27BF


请遵守《互联网环境法规》文明发言,欢迎讨论问题
扫码反馈

扫一扫,反馈当前页面

咨询反馈
扫码关注
返回顶部