Contents
List of Unicode characters
As of Unicode version 16.0, there are 155,063 characters with code points, covering 168 modern and historical scripts, as well as multiple symbol sets. This article includes the 1,062 characters in the Multilingual European Character Set 2 (MES-2) subset, and some additional related characters.
Character reference overview
HTML and XML provide ways to reference Unicode characters when the characters themselves either cannot or should not be used. A numeric character reference refers to a character by its Universal Character Set/Unicode code point, and a character entity reference refers to a character by a predefined name. A numeric character reference uses the format or where nnnn is the code point in decimal form, and hhhh is the code point in hexadecimal form. The x must be lowercase in XML documents. The nnnn or hhhh may be any number of digits and may include leading zeros. The hhhh may mix uppercase and lowercase, though uppercase is the usual style. In contrast, a character entity reference refers to a character by the name of an entity which has the desired character as its replacement text. The entity must either be predefined (built into the markup language) or explicitly declared in a Document Type Definition (DTD). The format is the same as for any entity reference: where name is the case-sensitive name of the entity. The semicolon is required. Because numbers are harder for humans to remember than names, character entity references are most often written by humans, while numeric character references are most often produced by computer programs.
Control codes
65 characters, including DEL. All belong to the common script. Footnotes:
Latin script
The Unicode Standard (version ) classifies 1,487 characters as belonging to the Latin script.
Basic Latin
95 characters; the 52 alphabet characters belong to the Latin script. The remaining 43 belong to the common script. The 33 characters classified as ASCII Punctuation & Symbols are also sometimes referred to as ASCII special characters. Often only these characters (and not other Unicode punctuation) are what is meant when an organization says a password "requires punctuation marks".
Latin-1 Supplement
96 characters; the 62 letters, and two ordinal indicators belong to the Latin script. The remaining 32 belong to the common script.
Latin Extended-A
128 characters; all belong to the Latin script.
Latin Extended-B
208 characters; all belong to the Latin script; 33 in the MES-2 subset.
Latin Extended Additional
256 characters; all belong to the Latin script; 23 in the MES-2 subset.
Additional Latin Extended
Phonetic scripts
IPA Extensions
96 characters; all belong to the Latin script; three in the MES-2 subset.
Spacing modifier letters
80 characters; 15 in the MES-2 subset.
Phonetic Extensions
Combining marks
Greek and Coptic
144 code points; 135 assigned characters; 85 in the MES-2 subset.
Greek Extended
For polytonic orthography. 256 code points; 233 assigned characters, all in the MES-2 subset (#670 – 902).
Cyrillic
256 characters; 191 in the MES-2 subset.
Cyrillic supplements
Armenian
Semitic languages
Arabic
Hebrew
Syriac
Mandaic
Samaritan
Thaana
Brahmic (Indic) scripts
The range from U+0900 to U+0DFF includes Devanagari, Bengali script, Gurmukhi, Gujarati script, Odia alphabet, Tamil script, Telugu script, Kannada script, Malayalam script, and Sinhala script.
Devanagari
Bengali and Assamese
Gurmukhi
Gujarati
Oriya
Tamil
Telugu
Kannada
Malayalam
Sinhala
Other Brahmic scripts
Other Brahmic and Indic scripts in Unicode include:
Other South and Central Asian writing systems
Southeast Asian writing systems
Georgian
African scripts
Ge'ez/Ethiopic script
Other African scripts
American scripts
Unified Canadian Aboriginal Syllabics
Other American scripts
Mongolian
Unicode symbols
General Punctuation
112 code points; 111 assigned characters; 24 in the MES-2 subset.
Superscripts and Subscripts
Currency Symbols
Letterlike Symbols
Number Forms
Arrows
Mathematical symbols
Miscellaneous Technical
Control Pictures
Optical Character Recognition
Enclosed Alphanumerics
Box Drawing
Block Elements
Geometric Shapes
Symbols for Legacy Computing
Symbols for Legacy Computing Supplement
Miscellaneous Symbols
Dingbats
East Asian writing systems
CJK Symbols and Punctuation
Hiragana
Katakana
Bopomofo
Hangul Jamo and Compatibility Jamo
Kanbun
Enclosed CJK Letters and Months
CJK Compatibility
CJK Compatibility Forms
CJK Unified Ideographs
CJK Radicals
Other East Asian writing systems
Alphabetic Presentation Forms
Ancient and historic scripts
Shavian
Notational systems
Braille
Music
Shorthand
Sutton SignWriting
Emoji
Alchemical symbols
Game symbols
Mahjong Tiles
Domino Tiles
Playing Cards
Chess Symbols
Special areas and format characters
This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.
Wikipedia® is a registered trademark of the
Wikimedia Foundation, Inc.
Bliptext is not
affiliated with or endorsed by Wikipedia or the
Wikimedia Foundation.