Unicode
| Type | Text encoding standard | |
|---|---|---|
| Developer | Unicode Consortium | |
| First Release | 1991 | |
| Open Format? | Yes | |
| Free Format? | Yes | |
Unicode is a standard character set: an assignment of numeric values to characters. A huge number of characters from various writing systems (modern or ancient), as well as special symbols of many types, are each given a number. On Blu-ray, Unicode is used for text-based subtitles and text fonts, it is also the standard used in coding Java.
Unicode is an international standard and is the dominant text encoding format. It was first published in 1991. Subsequent revisions have continually expanded its character repertoire. Unicode was developed in reaction to the unwieldy multiplicity of character sets that had arisen to include various subsets of the many characters left out of the English-centric ASCII set.The standard way to denote a Unicode code point is to prefix it with
"U+", and write the number in hexadecimal, with a minimum of four hex
digits. For example, code point 42is written as U+002A, and code point 1,114,109 is U+10FFFD.
Code points are the numbers assigned by the Unicode Consortium to every
character in every writing system. Code points are represented as U+
followed by four numbers and/or letters. Another example: Interrobang "‽" is
U+203D.
Each code point is also assigned a human-readable name, which may be written after the "U+" notation. For example, you might see "U+002A ASTERISK" or "U+03A9 GREEK CAPITAL LETTER OMEGA".
In the Blu-ray
technical specification, all text encoding (both for coding and
displaying text) uses Unicode 2.0 (UTF-8 and UTF-16BE) which is defined
in ISO/IEC 10646-1:1993. This version contains 38,885 characters
(excluding private-use characters, control characters, non-characters,
and surrogate code points) including Basic Latin, Cyrillic, Greek,
Kanji, CJK characters, and etc. Unicode 2.0, released in July 1996, was a
significant update to the Unicode standard, expanding the character
repertoire to 38,885 assigned characters across multiple blocks. These
blocks organize characters by script, symbol type, or usage, and they
reflect the state of text encoding standardization at that time. This
version of Unicode may be "outdated" by today's standards, but in
Blu-ray context, it's still very relevant today for BD development.
The reason why Unicode 2.0 was released because it was a well-established standard by the early 2000s. During Blu-ray’s development (starting around 2000, with specs finalized by 2004–2006), the BDA likely prioritized a mature standard with broad compatibility over a newer, less-tested version like Unicode 4.1. Newer Unicode versions often introduce additional characters and complexity, which could require more extensive validation and risk introducing bugs or incompatibilities in a consumer product aiming for a global launch. While “outdated” by 2005, Unicode 2.0 was a proven choice that met Blu-ray’s needs without overcomplicating the specification.
In BD applications, Unicode
can be used with bitmap fonts (PNG) or victor-based fonts (OpenType).
Most BD-J titles use bitmap fonts and use Java classes like StringBuffer, BufferedReader, FileInputStream, etc.
to display the bitmap text and it's code points. Rarely used, but if
BD-J used victor-based fonts using OpenType, the fonts and text would be
stored inside the Blu-ray's 4 MB text cache and powered by the BD
player's font rendering engine. A BD-J app should include a font file
(OpenType,) that's Unicode 2.0 compatible for their BD-J application, if
not, then the player will use it's own default font. However, the
majority of players may not include fonts on their own, so it's best to
include fonts files.
Unicode 2.0 defines a total of 38,885 assigned characters across its code points from U+0000 to U+FFFF (the Basic Multilingual Plane, BMP). The exact count comes from tallying each named entry across the 55 blocks.
| Range | Block Name | Assigned Characters | Notes |
|---|---|---|---|
| U+0000–U+007F | Basic Latin | 128 | ASCII characters (letters, digits, punctuation, controls). |
| U+0080–U+00FF | Latin-1 Supplement | 128 | Additional Latin characters, symbols, and controls (e.g., £, ©). |
| U+0100–U+017F | Latin Extended-A | 128 | Extended Latin for European languages (e.g., Œ, Š). |
| U+0180–U+024F | Latin Extended-B | 113 | More Latin letters for African, Native American languages (e.g., Ɓ, ƒ). |
| U+0250–U+02AF | IPA Extensions | 89 | Phonetic symbols for International Phonetic Alphabet (e.g., ɐ, ʃ). |
| U+02B0–U+02FF | Spacing Modifier Letters | 80 | Modifiers for phonetics/typography (e.g., ʰ, ː). |
| U+0300–U+036F | Combining Diacritical Marks | 112 | Marks combining with base characters (e.g., ◌̀, ◌̈). |
| U+0370–U+03FF | Greek | 135 | Greek letters and symbols (e.g., α, Ω). |
| U+0400–U+04FF | Cyrillic | 256 | Cyrillic script for Slavic languages (e.g., А, Я). |
| U+0530–U+058F | Armenian | 85 | Armenian script (e.g., Ա, Ֆ). |
| U+0590–U+05FF | Hebrew | 87 | Hebrew script (e.g., א, ת). |
| U+0600–U+06FF | Arabic | 237 | Arabic script and symbols (e.g., ا, ى). |
| U+0900–U+097F | Devanagari | 114 | Script for Hindi, Sanskrit (e.g., अ, ह). |
| U+0980–U+09FF | Bengali | 92 | Bengali script (e.g., অ, হ). |
| U+0A00–U+0A7F | Gurmukhi | 79 | Script for Punjabi (e.g., ਅ, ਹ). |
| U+0A80–U+0AFF | Gujarati | 83 | Gujarati script (e.g., અ, હ). |
| U+0B00–U+0B7F | Oriya | 81 | Oriya script (e.g., ଅ, ହ). |
| U+0B80–U+0BFF | Tamil | 72 | Tamil script (e.g., அ, ஹ). |
| U+0C00–U+0C7F | Telugu | 88 | Telugu script (e.g., అ, హ). |
| U+0C80–U+0CFF | Kannada | 86 | Kannada script (e.g., ಅ, ಹ). |
| U+0D00–U+0D7F | Malayalam | 89 | Malayalam script (e.g., അ, ഹ). |
| U+0E00–U+0E7F | Thai | 87 | Thai script (e.g., ก, ๏). |
| U+0E80–U+0EFF | Lao | 65 | Lao script (e.g., ກ, ຳ). |
| U+0F00–U+0FFF | Tibetan | 168 | Tibetan script (e.g., ཀ, ྼ). |
| U+10A0–U+10FF | Georgian | 83 | Georgian script (e.g., Ⴀ, ჶ). |
| U+1100–U+11FF | Hangul Jamo | 240 | Korean Hangul components (e.g., ᄀ, ᇿ). |
| U+1E00–U+1EFF | Latin Extended Additional | 185 | More Latin extensions (e.g., Ḁ, ỿ). |
| U+1F00–U+1FFF | Greek Extended | 233 | Precomposed Greek with diacritics (e.g., ἀ, ῼ). |
| U+2000–U+206F | General Punctuation | 71 | Punctuation marks (e.g., —, ‘). |
| U+2070–U+209F | Superscripts and Subscripts | 34 | Superscript/subscript digits and letters (e.g., ⁰, ₓ). |
| U+20A0–U+20CF | Currency Symbols | 12 | Currency signs (e.g., ₧). |
| U+20D0–U+20FF | Combining Diacritical Marks for Symbols | 33 | Combining marks for symbols (e.g., ◌⃐, ◌⃡). |
| U+2100–U+214F | Letterlike Symbols | 55 | Symbols resembling letters (e.g., ℂ, ℏ). |
| U+2150–U+218F | Number Forms | 50 | Fractions, Roman numerals (e.g., ½, Ⅻ). |
| U+2190–U+21FF | Arrows | 91 | Arrow symbols (e.g., ←, ). |
| U+2200–U+22FF | Mathematical Operators | 256 | Math symbols (e.g., ∀, √). |
| U+2300–U+23FF | Miscellaneous Technical | 126 | Technical symbols (e.g., ⌈, ). |
| U+2400–U+243F | Control Pictures | 39 | Graphical representations of control codes (e.g., ␀, ␣). |
| U+2440–U+245F | Optical Character Recognition | 11 | OCR-specific symbols (e.g., ⑀, ⑊). |
| U+2460–U+24FF | Enclosed Alphanumerics | 160 | Circled numbers/letters (e.g., ①, ⓿). |
| U+2500–U+257F | Box Drawing | 128 | Line-drawing characters (e.g., ─, ┼). |
| U+2580–U+259F | Block Elements | 32 | Block graphic characters (e.g., ▀, █). |
| U+25A0–U+25FF | Geometric Shapes | 96 | Shapes (e.g., ■, ◯). |
| U+2600–U+26FF | Miscellaneous Symbols | 171 | Various symbols (e.g., , ). |
| U+2700–U+27BF | Dingbats | 174 | Decorative symbols (e.g., ✁, ❏). |
| U+3000–U+303F | CJK Symbols and Punctuation | 63 | CJK-specific punctuation (e.g., 、, 〿). |
| U+3040–U+309F | Hiragana | 93 | Japanese Hiragana (e.g., ぁ, ん). |
| U+30A0–U+30FF | Katakana | 96 | Japanese Katakana (e.g., ァ, ヿ). |
| U+3100–U+312F | Bopomofo | 27 | Chinese phonetic script (e.g., ㄅ, ㄩ). |
| U+3130–U+318F | Hangul Compatibility Jamo | 94 | Legacy Korean Jamo (e.g., ㄱ, ㅿ). |
| U+3200–U+32FF | Enclosed CJK Letters and Months | 191 | Enclosed CJK characters (e.g., ㈀, ㋿). |
| U+3300–U+33FF | CJK Compatibility | 256 | Compatibility CJK variants (e.g., ㌀, ㏿). |
| U+4E00–U+9FFF | CJK Unified Ideographs | 20,902 | Core Chinese/Japanese/Korean characters (e.g., 一, 龥). |
| U+AC00–U+D7A3 | Hangul Syllables | 11,172 | Precomposed Korean syllables (e.g., 가, 힣). |
| U+E000–U+F8FF | Private Use Area | 0 (reserved) | No predefined characters; for custom use. (e.g., - |
| U+F900–U+FAFF | CJK Compatibility Ideographs | 302 | Compatibility variants of CJK ideographs (e.g., 豈, ᄒ). |
| U+FB00–U+FB4F | Alphabetic Presentation Forms | 58 | Precomposed ligatures (e.g., ff, ſt). |
| U+FB50–U+FDFF | Arabic Presentation Forms-A | 611 | Arabic contextual forms (e.g., ﭐ, ﷿). |
| U+FE20–U+FE2F | Combining Half Marks | 16 | Half-width combining marks (e.g., ◌︠, ◌︯). |
| U+FE30–U+FE4F | CJK Compatibility Forms | 32 | Vertical CJK punctuation variants (e.g., ︐, ︴). |
| U+FE50–U+FE6F | Small Form Variants | 26 | Small CJK punctuation (e.g., ﹐, ). |
| U+FE70–U+FEFF | Arabic Presentation Forms-B | 141 | More Arabic forms, includes U+FEFF (BOM) (e.g., ﹰ, zero-width no-break). |
| U+FF00–U+FFEF | Halfwidth and Fullwidth Forms | 225 | Fullwidth ASCII, halfwidth Katakana/Hangul (e.g., !, ヲ). |
| U+FFF0–U+FFFF | Specials | 6 | Special-purpose characters (e.g., , �). |
Missing Characters
Since
Unicode 2.0 was released in 1996, it's missing several key symbols that
emerged later, most notably the Euro sign (€), introduced in 1999 and
added in Unicode 2.1 (U+20AC). Other absences include modern currency
symbols like the Indian Rupee (₹, U+20B9, Unicode 6.0), extensive emoji
sets (e.g. Unicode 6.0+), and newer scripts like Cherokee (added in
Unicode 3.0). These gaps reflect Unicode 2.0’s pre-1996 scope, limited
to 38,885 characters in the BMP. The Private Use Area (PUA,
U+E000–U+F8FF), with 6,400 unassigned code points, offers a workaround:
developers can assign custom glyphs—like the Euro sign or proprietary
icons—to PUA slots and pair them with a custom font.
For the Euro sign (€), it is also possible to use the old Euro-currency sign ₠ (U+20A0) as an substitute and can be disguised as the Euro sign with a custom font design.
If a character from a newer Unicode version is used, it will appear as a replacement character "�", a box "", or nothing at all.
Special Symbols & Emoji Support
Since it uses Unicode 2.0, it has limited emoji-like support, offering only basic symbols like (U+263A) or
(U+2665) in the Arrows, Miscellaneous Technical, Miscellaneous Symbols, Dingbats, Geometric Shapes and CJK Symbols and Punctuation blocks. The player does not display rasterized
bitmap images or layered graphics by default, the developer will have
to do that manually for the BD-J application (using small PNG graphics).
The emoji-like characters will start as scalable vector-based symbols
in font formats (e.g., OpenType, via fonts like Noto Emoji) by default. Example, ⌂ (U+2302) represents a house, if a small bitmap image was used, it will look like this,
.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | ‼️ 203C |
↔️ 2194 |
↕️ 2195 |
↖️ 2196 |
↗️ 2197 |
↘️ 2198 |
↙️ 2199 |
↩️ 21A9 |
↪️ 21AA |
⌚ 231A |
| 3 | ⌛ 231B |
⌨️ 2328 |
⎗ 2397 |
⎘ 2398 |
⎙ 2399 |
⎚ 239A |
Ⓜ️ 24C2 |
▶️ 25B6 |
◀️ 25C0 |
▪️ 25AA |
| 4 | ▫️ 25AB |
⌂ 2302 |
- | - | ☀️ 2600 |
☁️ 2601 |
☂️ 2602 |
☃️ 2603 |
☄️ 2604 |
★ 2605 |
| 5 | ☆ 2606 |
☎️ 260E |
☏ 260F |
☐ 2610 |
☒ 2612 |
☚ 261A |
☛ 261B |
☜ 261C |
☝️ 261D |
☞ 261E |
| 6 | ☟ 261F |
☠️ 2620 |
☡ 2621 |
☢️ 2622 |
☣️ 2623 |
☦️ 2626 |
☪️ 262A |
☮️ 262E |
☯️ 262F |
☸️ 2638 |
| 7 | ☹️ 2639 |
☺️ 263A |
☻ 263B |
☼ 263C |
☽ 263D |
☾ 263E |
♀️ 2640 |
♂️ 2642 |
♈ 2648 |
♉ 2649 |
| 8 | ♊ 264A |
♋ 264B |
♌ 264C |
♍ 264D |
♎ 264E |
♏ 264F |
♐ 2650 |
♑ 2651 |
♒ 2652 |
♓ 2653 |
| 9 | ♔ 2654 |
♕ 2655 |
♖ 2656 |
♗ 2657 |
♘ 2658 |
♙ 2659 |
♚ 265A |
♛ 265B |
♜ 265C |
♝ 265D |
| A | ♞ 265E |
♟ 265F |
♠️ 2660 |
♡ 2661 |
♢ 2662 |
♣️ 2663 |
♤ 2664 |
♥️ 2665 |
♦️ 2666 |
♧ 2667 |
| B | ♨️ 2668 |
♩ 2669 |
♪ 266A |
♫ 266B |
♬ 266C |
♭ 266D |
♮ 266E |
♯ 266F |
✁ 2701 |
✂️ 2702 |
| C | ✃ 2703 |
✄ 2704 |
✆ 2706 |
✇ 2707 |
✈️ 2708 |
✉️ 2709 |
✌️ 270C |
✍️ 270D |
✏️ 270F |
✒️ 2712 |
| D | ✔️ 2714 |
✖️ 2716 |
✚ 271A |
✝️ 271D |
✞ 271E |
✠ 2720 |
✡️ 2721 |
✤ 2724 |
✧ 2727 |
✩ 2729 |
| E | ✪ 272A |
✳️ 2733 |
❀ 2740 |
❄️ 2744 |
❇️ 2747 |
❖ 2756 |
❣️ 2763 |
❤️ 2764 |
❥ 2765 |
❦ 2766 |
| F | ❧ 2767 |
➡️ 27A1 |
〄 3004 |
〠 3020 |
〰 3030 |
〶 3036 |
㉿ 327F |
㊗️ 3297 |
㊙️ 3299 |
This is not a full list but these are top suggestions for text in BD applications such as subtitles, menus, or video games. They can be useful for showing instructions how to operate a feature on your remote, example, ← ↑ → ↓ are directional arrows, and ⓇⒼⒷⓎ can be used to represent the colored buttons.
Private Use Area
Private Use Area uses 6,400 unassigned code points that are reserved for private use. It can be used for various things, missing characters from newer Unicode versions, custom symbols, artificial characters like Klingon or Tengwar, or corporate symbols like the Apple logo.
Code charts and references
- Official Unicode 2.0 Documentation - Highly Recommended
- Unicode official site -- has lots of standards documents and code charts
- Unicode.org - Unicode Official Homepage
- Codepoints.net - Unicode Database (Best one)
- Unicodepedia.com - Unicode Database
- Unicode page at Archiveteam.org
- Wikipedia Page
- Wikipedia list of Unicode Characters
Author(s) : Æ Firestone
Popular Pages
-
Reavon UBR-X100 is an high-end Ultra HD Blu-ray player from the French company, Reavon. There are two other identical models, UBR-X110 and ...
-
Oppo BDP-103 and BDP-103D are high-end Blu-ray players manufactured by Chinese company, Oppo Inc, who are known for their high quality Blu...
-
The Sony BDZ-V9 is Sony's second generation Blu-ray recorder and first generation BD-ROM player released around 2006 exclusive to Japa...
