Index to this page:
Usage tips
Entity types
Useful links
Encoding recommendations
Go to the table with all HTML entities
Go to the test page with all Macintosh displayable Unicode characters
Go to a page with tables of Macintosh & Windows standard encodings and Symbol encodings.
Go to a page describing the UTF-8 encoding
Go to/Back to the index.
Netscape Navigator: do not forget to activate 'Use page-specified fonts' (Preferences: Appearance: fonts).
Internet Explorer: do not forget to activate 'Allow page to specify fonts' (Preferences: Web browser: web content).
This document activates the UTF-8 charset in a META tag.
General tips for UTF-8 pages on a Macintosh:
Internet Explorer (4.5 and above) displays the UTF character set by far the best.
If you use Netscape Navigator, make sure you use a TrueType font for displaying UTF pages. Why and how?
The Macintosh looks first for bitmap sizes of fonts if something has to be displayed on the screen. For printing the TrueType fonts are taken first. All standard Macintosh fonts are TrueType fonts including a series of bitmap fonts. A problem with these bitmap fonts is that the characters with ASCII values higher than 127 are not always defined. Result: you will see a box instead of a character. If you use a TrueType font or a font with better bitmaps, then you will see normal characters.
Force the display to TrueType by taking an odd font size, for instance 11 points or 15 points. You can do this in the following manner.
Netscape Navigator:
* 'Edit' menu: 'Preferences'
* then 'Appearance' preferences: 'Fonts'
* then 'For the encoding' menu: select 'Unicode'
* in the 'Variable width font' menu: select 'Geneva' or 'Times'
* in the 'Size' menu: select 'Other...', for instance 11 points.
Internet Explorer:
* Edit' menu: 'Preferences'
* then 'Web browsers: Language/Fonts'
* look at the 'Fonts' section
* in the 'Character set' menu: select 'Universal Alphabet (UTF-8)'
* in the 'Proportional font' menu: select for instance 'Times'
* close Preferences, then 'Edit' menu: 'Text size': select 'Medium' or 'Larger'.
Quick test: if you see a capital A with an inverted v above it here: Â
then you have selected a usable font.
An entity by name takes the following form: ♥ result: ♥
An entity by decimal number takes the following form: ♥ result: ♥
An entity by hexadecimal number takes the following form: ♥ result: ♥
Refer in a text to a Unicode character in the following way (always 4 hex numbers): U+2665
There is another way of displaying Unicode characters in a browser. This only works if the UTF-8 character set is enabled by a META tag, the HTTP header or if the user has chosen UTF-8 for page display. You can use the 'literal' UTF-8 encoding for the Unicode character. For the hearts symbol this encoding takes three bytes with numerical values 226, 153 and 165. These values can be calculated, which is explained on my UTF-8 page. Put three characters on a row, which your computer recognizes as the ASCII values 226, 153, 165. On a Macintosh these characters are the single low-9 quotation mark, o-circumflex, bullet and on a Windows computer these characters are the a-circumflex, trade mark, Yen. Result: ♥
If you see nowhere a hearts symbol: it would help if you install the Symbol font and use a version 4 or above browser.
The named entities are virtually not supported by the two Macintosh browsers Internet Explorer (4.5, 5.0) and Netscape Navigator (4.7). Only most of the entities with a decimal equivalent smaller than 256 can be displayed by referring to them with a named entity.
The hexadecimal entities are not supported by Netscape Navigator, but you can give it a try with version 4.7.x; Internet Explorer displays them mostly correct.
Many decimal entities above number 255 are reasonably well supported, but only if you use the UTF-8 character set and if you have installed the fonts Symbol and Zapf Dingbats. In that case the Symbol letters and the Dingbats from the Unicode character set are mapped to equivalent characters from the Symbol or Zapf Dingbats font.
Literal character with Zapf Dingbats font tag (doesn't work here): ª
A literal character with a decimal value of 128 or greater cannot be displayed reliably. You will only see a hearts symbol if the following requirements are ALL met:
* You have a Macintosh
* You use Netscape Navigator
* You have installed the Zapf Dingbats font
* You have specified 'Use page fonts' in the browser preferences
* You have chosen the MacRoman character set but only if the page doesn't specify a character set of its own - as does this page.
In all other cases you see no hearts symbol. Internet Explorer is not able to detect fonts with a space in their name. The MacRoman character set is usually not supported by the browsers of other platforms. You could figure out which decimal value in the ISO 8859-1 character set (153, type option i o) is equivalent to the necessary decimal position in the MacRoman character set (170, type option 2). If you have a META tag which defines an ISO 8859-1 character or another set for a document, then the reader of your document cannot choose another character set.
By now you might have guessed that it is virtually impossible to get special characters right if you use literals.
In the case of really difficult characters, the best way for all current browsers is: use pictures: ![]()
Alternative: use a META tag to specify a page character set and/or give very clear instructions to the reader of your page.
How do you specify a UTF-8 character set?
First, the reader could try to choose the 'Unicode (UTF-8)' character set from the 'Character set' submenu in the 'Edit' menu.
Second, the server from the web site could send character set information via the http-header.
Third, the maker of a web page could use a META tag like:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Place this META tag in the <head> part of the document, nowhere else!
Mac OS 9 update: full Unicode set still not displayable
The Apple Macintosh operating system has a very well designed Unicode support. A severe problem is that there is virtually no application program which supports this fully. A minor problem is that there is hardly any documentation on the Macintosh Unicode support; only documentation for programmers.
The two most used Macintosh browsers, Internet Explorer and Netscape Navigator, have a very poor and extremely obstinate support for Unicode. How's that?
If you have a Unicode font with all or lots of Unicode characters defined in it, then the two mentioned Macintosh browsers are refusing to display all characters defined in your Unicode font, whatever you do. It's difficult to explain what is happening, but let's give it a try.
First, you install all Apple Language Kits from your Mac OS 9 CD, including the traditional Chinese, simplified Chinese, Japanese, Korean, Indic and Central European kits. All these kits together give the best Unicode support. If you examine how browsers display the Unicode Chinese characters (U+4e00 to U+9fa5) then you will notice that the Mac displays at the best only the characters which are defined in Apple's LiSung font, the Unicode Chinese font for traditional Chinese.
Even if you install a Unicode font with the complete set of Chinese characters (e.g. Bitstream Cyberbit) and tell your browser to use this font when displaying UTF-8, then your browser acts obstinate and refuses to display all Chinese characters. The browser does however use the glyphs of the assigned Unicode font!
The same goes for e.g. Thai. A minimal Thai character set is defined in the Bitstream Cyberbit Unicode font. Whatever you do, it is not possible to use this font or any other font to display Thai text using UTF-8 in a (Macintosh) browser.
So a major flaw of the most popular (Macintosh) browsers is that the display of Unicode characters is somehow hard-coded into the browsers. It is not possible to use the full extent of Unicode fonts. It is not possible to map Unicode characters to separate fonts. Documentation on how the browser software manufacturers are mapping Unicode characters for display is not available.
Only browser Mozilla is able to use a Unicode font and to display UTF-8 encoded characters from such a font. Still one typical Macintosh problem remains: the Mac OS cannot handle fonts larger than 16 Megabyte, so a full Unicode font cannot be used.
Are the effects mentioned in the paragraph above not merely a shortcoming of the Macintosh operating system? Not quite. Proof: experiment with one of the very few fully Unicode capable word processors SUE. At the moment this program, made by Tomasz Kukielka, is under development. SUE even supports the Unicode hex input method of the Mac OS. Hold the option key down and type the 4-character hexadecimal Unicode number. (It would help if you choose Unicode hex input method first from the keyboard menu, with all Apple Language Kits installed.) This program shows all characters defined in all fonts. Try it with the Bitstream Cyberbit font.
Download page for word processor SUE.
Download the Bitstream Cyberbit font from Netscape's FTP site.
You will obtain a zip archive. After expanding you will get a Windows ttf font file of 12.7 MB. You might convert this to Macintosh format with e.g. TTConverter, TT FontConvert or other programs (Tucows, Info-Mac or other sources).
The Unicode databases can be found in the directory ftp://ftp.unicode.org/Public/UNIDATA/
The file UnicodeData-Latest.txt is the latest version with all Unicode character definitions, except for the CJK (Chinese, Japanese, Korean) characters, whose definitions can be found in Unihan.txt. This is a very large file. Look for a compressed gzipped version elsewhere. One time I found one in ftp://ftp.unicode.org/Public/3.0-Update/
Another file worth mentioning is Blocks.txt with the boundaries of all character blocks.
At last the Unicode organization has made PDF (Adobe Acrobat Reader needed) files with the character glyphs of nearly all Unicode characters, including the CJK characters.
The index can be found here: http://www.unicode.org/charts/
An example of the excellent quality of the glyph files, from within Acrobat Reader 12x enlarged:

According to the Unihan database this has Mandarin pronunciation ta4, Cantonese pronunciation daap6, Japanese On-reading tou and meaning: the appearance of a dragon walking.
If your web page uses letters with diacritics or some special punctuation marks, all with a numerical (ascii) value of 255 or less, then you should check if they are on the list of the iso-8859-1 character set. If they are, encode your characters with decimal entities. For instance, the Spanish inverted question mark is on the iso-8859-1 list at position 191 decimal. Encode this character as ¿ (result: ¿).
Put a meta-tag in the head-section of your page, assigning an iso-8859-1 or a utf-8 characterset to the page.
If you would like to encode letters with 'normal' diacritics, then you should confirm that they are on the list of the characterset encoding of your choice, but preferably iso-8859-1. They should be encoded with decimal entities, but many 'normal' accented letters do work if they are encoded by name.
Example: e with an acute accent. Encode this as é (result: é) or as é.
Advantage of encoding by name is that you can read your html-code better. But many named entities do not work in all browsers.
By encoding a numerical entity most browsers assume that this character is one from the Unicode list. But this is not always the case, so a meta-tag with a charset assignment should be given.
If you would like to display any other characters, then chances are that your reader will not see them as you intended. This situation is very difficult and should always be tested on the platforms and the browsers of your intended audience. About 75% of the web surfers have a Windows 98 machine and most of them use Internet Explorer 5.0. A problem is that not every user has installed all possible language support files or Unicode fonts.
Suppose that your reader has a reasonable computer, how to encode special characters?
That depends on the nature of the character. A rather high probability of correct display give the characters from the following Unicode blocks: