Unicode
![]() |
||
Type | Text encoding standard | |
---|---|---|
Developer | Unicode Consortium | |
First Release | 1991 | |
Open Format? | Yes | |
Free Format? | Yes | |
Unicode is a standard character set: an assignment of numeric values to characters. A huge number of characters from various writing systems (modern or ancient), as well as special symbols of many types, are each given a number. On Blu-ray, Unicode is used for text-based subtitles and text fonts, it is also the standard used in coding Java.
Unicode is an international standard and is the dominant text encoding format. It was first published in 1991. Subsequent revisions have continually expanded its character repertoire. Unicode was developed in reaction to the unwieldy multiplicity of character sets that had arisen to include various subsets of the many characters left out of the English-centric ASCII set.The standard way to denote a Unicode code point is to prefix it with
"U+", and write the number in hexadecimal, with a minimum of four hex
digits. For example, code point 42
is written as U+002A
, and code point 1,114,109
is U+10FFFD
.
Code points are the numbers assigned by the Unicode Consortium to every
character in every writing system. Code points are represented as U+
followed by four numbers and/or letters. Another example: Interrobang "‽" is
U+203D
.
Each code point is also assigned a human-readable name, which may be written after the "U+" notation. For example, you might see "U+002A ASTERISK" or "U+03A9 GREEK CAPITAL LETTER OMEGA".
In the Blu-ray technical specification, all text encoding (both for coding and displaying text) uses Unicode 2.0 (UTF-8 and UTF-16BE) which is defined in ISO/IEC 10646-1:1993. This version contains 38,885 characters (excluding private-use characters, control characters, non-characters, and surrogate code points) including Basic Latin, Cyrillic, Greek, Kanji, CJK characters, and etc. Unicode 2.0, released in July 1996, was a significant update to the Unicode standard, expanding the character repertoire to 38,885 assigned characters across multiple blocks. These blocks organize characters by script, symbol type, or usage, and they reflect the state of text encoding standardization at that time. This version of Unicode may be "outdated" by today's standards, but in Blu-ray context, it's still very relevant today for BD development.
The reason why Unicode 2.0 was released because it was a well-established standard by the early 2000s. During Blu-ray’s development (starting around 2000, with specs finalized by 2004–2006), the BDA likely prioritized a mature standard with broad compatibility over a newer, less-tested version like Unicode 4.1. Newer Unicode versions often introduce additional characters and complexity, which could require more extensive validation and risk introducing bugs or incompatibilities in a consumer product aiming for a global launch. While “outdated” by 2005, Unicode 2.0 was a proven choice that met Blu-ray’s needs without overcomplicating the specification.
In BD applications, Unicode can be used with bitmap fonts (PNG) or victor-based fonts (OpenType). Most BD-J titles use bitmap fonts and use Java classes like StringBuffer,
BufferedReader
,
FileInputStream, etc.
to display the bitmap text and it's code points. Rarely used, but if BD-J used victor-based fonts using OpenType, the fonts and text would be stored inside the Blu-ray's 4 MB text cache and powered by the BD player's font rendering engine. A BD-J app should include a font file (OpenType,) that's Unicode 2.0 compatible for their BD-J application, if not, then the player will use it's own default font. However, the majority of players may not include fonts on their own, so it's best to include fonts files.
Unicode 2.0 defines a total of 38,885 assigned characters across its code points from U+0000 to U+FFFF (the Basic Multilingual Plane, BMP). The exact count comes from tallying each named entry across the 55 blocks.
Range | Block Name | Assigned Characters | Notes |
---|---|---|---|
U+0000–U+007F | Basic Latin | 128 | ASCII characters (letters, digits, punctuation, controls). |
U+0080–U+00FF | Latin-1 Supplement | 128 | Additional Latin characters, symbols, and controls (e.g., £, ©). |
U+0100–U+017F | Latin Extended-A | 128 | Extended Latin for European languages (e.g., Œ, Š). |
U+0180–U+024F | Latin Extended-B | 113 | More Latin letters for African, Native American languages (e.g., Ɓ, ƒ). |
U+0250–U+02AF | IPA Extensions | 89 | Phonetic symbols for International Phonetic Alphabet (e.g., ɐ, ʃ). |
U+02B0–U+02FF | Spacing Modifier Letters | 80 | Modifiers for phonetics/typography (e.g., ʰ, ː). |
U+0300–U+036F | Combining Diacritical Marks | 112 | Marks combining with base characters (e.g., ◌̀, ◌̈). |
U+0370–U+03FF | Greek | 135 | Greek letters and symbols (e.g., α, Ω). |
U+0400–U+04FF | Cyrillic | 256 | Cyrillic script for Slavic languages (e.g., А, Я). |
U+0530–U+058F | Armenian | 85 | Armenian script (e.g., Ա, Ֆ). |
U+0590–U+05FF | Hebrew | 87 | Hebrew script (e.g., א, ת). |
U+0600–U+06FF | Arabic | 237 | Arabic script and symbols (e.g., ا, ى). |
U+0900–U+097F | Devanagari | 114 | Script for Hindi, Sanskrit (e.g., अ, ह). |
U+0980–U+09FF | Bengali | 92 | Bengali script (e.g., অ, হ). |
U+0A00–U+0A7F | Gurmukhi | 79 | Script for Punjabi (e.g., ਅ, ਹ). |
U+0A80–U+0AFF | Gujarati | 83 | Gujarati script (e.g., અ, હ). |
U+0B00–U+0B7F | Oriya | 81 | Oriya script (e.g., ଅ, ହ). |
U+0B80–U+0BFF | Tamil | 72 | Tamil script (e.g., அ, ஹ). |
U+0C00–U+0C7F | Telugu | 88 | Telugu script (e.g., అ, హ). |
U+0C80–U+0CFF | Kannada | 86 | Kannada script (e.g., ಅ, ಹ). |
U+0D00–U+0D7F | Malayalam | 89 | Malayalam script (e.g., അ, ഹ). |
U+0E00–U+0E7F | Thai | 87 | Thai script (e.g., ก, ๏). |
U+0E80–U+0EFF | Lao | 65 | Lao script (e.g., ກ, ຳ). |
U+0F00–U+0FFF | Tibetan | 168 | Tibetan script (e.g., ཀ, ྼ). |
U+10A0–U+10FF | Georgian | 83 | Georgian script (e.g., Ⴀ, ჶ). |
U+1100–U+11FF | Hangul Jamo | 240 | Korean Hangul components (e.g., ᄀ, ᇿ). |
U+1E00–U+1EFF | Latin Extended Additional | 185 | More Latin extensions (e.g., Ḁ, ỿ). |
U+1F00–U+1FFF | Greek Extended | 233 | Precomposed Greek with diacritics (e.g., ἀ, ῼ). |
U+2000–U+206F | General Punctuation | 71 | Punctuation marks (e.g., —, ‘). |
U+2070–U+209F | Superscripts and Subscripts | 34 | Superscript/subscript digits and letters (e.g., ⁰, ₓ). |
U+20A0–U+20CF | Currency Symbols | 12 | Currency signs (e.g., ₧). |
U+20D0–U+20FF | Combining Diacritical Marks for Symbols | 33 | Combining marks for symbols (e.g., ◌⃐, ◌⃡). |
U+2100–U+214F | Letterlike Symbols | 55 | Symbols resembling letters (e.g., ℂ, ℏ). |
U+2150–U+218F | Number Forms | 50 | Fractions, Roman numerals (e.g., ½, Ⅻ). |
U+2190–U+21FF | Arrows | 91 | Arrow symbols (e.g., ←, |
U+2200–U+22FF | Mathematical Operators | 256 | Math symbols (e.g., ∀, √). |
U+2300–U+23FF | Miscellaneous Technical | 126 | Technical symbols (e.g., ⌈, |
U+2400–U+243F | Control Pictures | 39 | Graphical representations of control codes (e.g., ␀, ␣). |
U+2440–U+245F | Optical Character Recognition | 11 | OCR-specific symbols (e.g., ⑀, ⑊). |
U+2460–U+24FF | Enclosed Alphanumerics | 160 | Circled numbers/letters (e.g., ①, ⓿). |
U+2500–U+257F | Box Drawing | 128 | Line-drawing characters (e.g., ─, ┼). |
U+2580–U+259F | Block Elements | 32 | Block graphic characters (e.g., ▀, █). |
U+25A0–U+25FF | Geometric Shapes | 96 | Shapes (e.g., ■, ◯). |
U+2600–U+26FF | Miscellaneous Symbols | 171 | Various symbols (e.g., |
U+2700–U+27BF | Dingbats | 174 | Decorative symbols (e.g., ✁, ❏). |
U+3000–U+303F | CJK Symbols and Punctuation | 63 | CJK-specific punctuation (e.g., 、, 〿). |
U+3040–U+309F | Hiragana | 93 | Japanese Hiragana (e.g., ぁ, ん). |
U+30A0–U+30FF | Katakana | 96 | Japanese Katakana (e.g., ァ, ヿ). |
U+3100–U+312F | Bopomofo | 27 | Chinese phonetic script (e.g., ㄅ, ㄩ). |
U+3130–U+318F | Hangul Compatibility Jamo | 94 | Legacy Korean Jamo (e.g., ㄱ, ㅿ). |
U+3200–U+32FF | Enclosed CJK Letters and Months | 191 | Enclosed CJK characters (e.g., ㈀, ㋿). |
U+3300–U+33FF | CJK Compatibility | 256 | Compatibility CJK variants (e.g., ㌀, ㏿). |
U+4E00–U+9FFF | CJK Unified Ideographs | 20,902 | Core Chinese/Japanese/Korean characters (e.g., 一, 龥). |
U+AC00–U+D7A3 | Hangul Syllables | 11,172 | Precomposed Korean syllables (e.g., 가, 힣). |
U+E000–U+F8FF | Private Use Area | 0 (reserved) | No predefined characters; for custom use. |
U+F900–U+FAFF | CJK Compatibility Ideographs | 302 | Compatibility variants of CJK ideographs (e.g., 豈, ᄒ). |
U+FB00–U+FB4F | Alphabetic Presentation Forms | 58 | Precomposed ligatures (e.g., ff, ſt). |
U+FB50–U+FDFF | Arabic Presentation Forms-A | 611 | Arabic contextual forms (e.g., ﭐ, ﷿). |
U+FE20–U+FE2F | Combining Half Marks | 16 | Half-width combining marks (e.g., ◌︠, ◌︯). |
U+FE30–U+FE4F | CJK Compatibility Forms | 32 | Vertical CJK punctuation variants (e.g., �30, ︴). |
U+FE50–U+FE6F | Small Form Variants | 26 | Small CJK punctuation (e.g., ﹐, ). |
U+FE70–U+FEFF | Arabic Presentation Forms-B | 141 | More Arabic forms, includes U+FEFF (BOM) (e.g., ﹰ, zero-width no-break). |
U+FF00–U+FFEF | Halfwidth and Fullwidth Forms | 225 | Fullwidth ASCII, halfwidth Katakana/Hangul (e.g., !, ヲ). |
U+FFF0–U+FFFF | Specials | 6 | Special-purpose characters (e.g., , �). |
- List of Supported Unicode Characters (U+0000 - U+058F)
- List of Supported Unicode Characters (U+0590 - U+10FF)
- List of Supported Unicode Characters (U+2190 - U+27BF)
- List of Supported Unicode Characters (U+3040 - U+F8FF)
- List of Supported Unicode Characters (U+1100 - U+218F)
- List of Supported Unicode Characters (U+F900 - U+FFFF)
Missing Characters
Since Unicode 2.0 was released in 1996, it's missing several key symbols that emerged later, most notably the Euro sign (€), introduced in 1999 and added in Unicode 2.1 (U+20AC). Other absences include modern currency symbols like the Indian Rupee (₹, U+20B9, Unicode 6.0), extensive emoji sets (e.g. Unicode 6.0+), and newer scripts like Cherokee (added in Unicode 3.0). These gaps reflect Unicode 2.0’s pre-1996 scope, limited to 38,885 characters in the BMP. The Private Use Area (PUA, U+E000–U+F8FF), with 6,400 unassigned code points, offers a workaround: developers can assign custom glyphs—like the Euro sign or proprietary icons—to PUA slots and pair them with a custom font.
If a character from a newer Unicode version is used, it will appear as a replacement character "�", a box "", or nothing at all.
Emoji Support
Since it uses Unicode 2.0, it has limited emoji-like support, offering only basic symbols like (U+263A) or
(U+2665) in the Arrows, Miscellaneous Technical, Miscellaneous Symbols, Dingbats, Geometric Shapes and CJK Symbols and Punctuation blocks. The player does not display rasterized bitmap images or layered graphics by default, the developer will have to do that manually for the BD-J application (using small PNG graphics). The emoji-like characters will start as scalable vector-based symbols in font formats (e.g., OpenType, via fonts like Noto Emoji) by default.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 203C | 2194 | 2195 | 2196 | 2197 | 2198 | 2199 | 21A9 | 21AA | 231A |
3 | 231B | 2328 | ⎗ 2397 | ⎘ 2398 | ⎙ 2399 | ⎚ 239A | 24C2 | 25B6 | 25C0 | 25FB |
4 | 25FC | 25FD | 25FE | 2600 | 2601 | 2602 | 2603 | 2604 | ★ 2605 | ☆ 2606 |
5 | 260E | ☏ 260F | ☐ 2610 | ☒ 2612 | ☚ 261A | ☛ 261B | ☜ 261C | 261D | ☞ 261E | ☟ 261F |
6 | 2620 | ☡ 2621 | 2622 | 2623 | 2626 | 262A | 262E | 262F | 2638 | 2639 |
7 | 263A | ☻ 263B | ☼ 263C | ☽ 263D | ☾ 263E | 2640 | 2642 | 2648 | 2649 | 264A |
8 | 264B | 264C | 264D | 264E | 264F | 2650 | 2651 | 2652 | 2653 | ♔ 2654 |
9 | ♕ 2655 | ♖ 2656 | ♗ 2657 | ♘ 2658 | ♙ 2659 | ♚ 265A | ♛ 265B | ♜ 265C | ♝ 265D | ♞ 265E |
A | ♟ 265F | 2660 | ♡ 2661 | ♢ 2662 | 2663 | ♤ 2664 | 2665 | 2666 | ♧ 2667 | 2668 |
B | ♩ 2669 | ♪ 266A | ♫ 266B | ♬ 266C | ♭ 266D | ♮ 266E | ♯ 266F | ✁ 2701 | 2702 | ✃ 2703 |
C | ✄ 2704 | ✆ 2706 | ✇ 2707 | 2708 | 2709 | 270C | 270D | 270F | 2712 | 2714 |
D | 2716 | ✚ 271A | 271D | ✞ 271E | ✠ 2720 | 2721 | ✤ 2724 | ✧ 2727 | ✩ 2729 | ✪ 272A |
E | 2733 | ❀ 2740 | 2744 | 2747 | ❖ 2756 | 2763 | 2764 | ❥ 2765 | ❦ 2766 | ❧ 2767 |
F | 27A1 | 27BF | 〄 3004 | 〠 3020 | 3030 | 〶 3036 | ㉿ 327F | 3297 | 3299 |
This is not a full list but these are top suggestions for text in BD applications such as subtitles or video games.
Code charts and references
- Official Unicode 2.0 Documentation - Highly Recommended
- Unicode official site -- has lots of standards documents and code charts
- Unicode.org - Unicode Official Homepage
- Codepoints.net - Unicode Database (Best one)
- Unicodepedia.com - Unicode Database
- Unicode page at Archiveteam.org
- Wikipedia Page
- Wikipedia list of Unicode Characters
Author(s) : Æ Firestone
Popular Pages
-
BD-ROMs use the BDMV application format as a standard for commercial movies, music albums and video games.
-
Oppo BDP-103 and BDP-103D are high-end Blu-ray players manufactured by Chinese company, Oppo Inc, who are known for their high quality Blu...
-
Unicode Type Text encoding standard Developer Unicode Consortium First Release 1991 Open Format? Yes Free Format? ...