Thursday, 30 December 2010

ban legacy fonts!

Do you remember the bad old days before Unicode? The time when there was no standardized way of encoding phonetic symbols? when word processing was single-byte and fonts were 8-bit, so that any given font was limited to under two hundred characters? when the various phonetic fonts available all used different encodings, so that where one person had input ɥ another might see ɦ or ʰ or something else entirely arbitrary? when if you transferred a document to a different computer you would as likely as not get garbage for your phonetic symbols? when your Powerpoint presentation using the computer supplied by local organizers would probably fail to display your phonetic symbols properly?

Thank goodness those days are past. Nowadays we all use Unicode, the internationally agreed industry-wide font-encoding standard for all alphabets and scripts, covering all the languages of the world as well as all the phonetic (and other) symbols we might need. A single font can now contain thousands, indeed tens of thousands, of different characters. So we no longer have to keep switching fonts merely in order to include phonetic symbols. In this blog I can be confident that when I input a particular phonetic symbol you will see that same phonetic symbol on your screen, no matter where you are and no matter what platform you are using. (OK, there may be marginal cases where the font you are using falls down over one or two unusual symbols: but then you will probably see a blank square or something similar — you won’t see the wrong phonetic symbol or some ludicrous webding, as used to happen.)

I celebrated this progress and documented the details in the poster paper I gave at the 2007 International Congress of Phonetic Sciences in Saarbrücken. (If you’re interested, here’s the printed version.)

But phoneticians haven’t all caught up.
The next ICPhS is due to be held in Hong Kong in a few months’ time. The deadline for paper submission is the beginning of March, so it’s time for everyone to get their thoughts in order and start writing. The Call for Papers page on the conference website gives the following instructions about phonetic symbols in submitted papers.
• One of the following IPA fonts is to be used for congress papers:
IPA-SAM phonetic fonts: http://www.phon.ucl.ac.uk/shop/fonts.php
SIL phonetic fonts: http://scripts.sil.org/encore-ipa-download

What are these fonts, so brusquely prescribed?
  • The IPA-SAM fonts are 8-bit fonts that I created around fifteen years ago. Building on SIL software, they enjoyed some considerable popularity because the encoding and therefore the keyboarding fitted in nicely with the way phoneticians actually use phonetic symbols. Nevertheless, once Unicode became available it rendered these and other specialist 8-bit fonts obsolete. For the last five years or more I have been actively discouraging people from using the fonts I created, because Unicode phonetic fonts are now widely available. Indeed, more and more of the ‘core’ fonts supplied with new computers include all the IPA symbols. So everyone should now use Unicode rather than ‘legacy’ fonts like the IPA-SAM fonts.

  • If you follow the ICPhS link to the SIL site, you will see this notice, prominently displayed.
    Important
    The SIL Encore IPA and SIL IPA93 fonts are obsolete, symbol-encoded fonts. Their use is discouraged. If you decide to download and use these fonts, please note there is no user support for these fonts.
    If your university or organization requires the use of these fonts, please request they change their requirement to Doulos SIL, a Unicode-encoded font which contains the complete IPA repertoire.

    Yes, their use is discouraged. Did you read that, conference organizers?

The Word template supplied by the organizers for ICPhS conference papers contains the following.
Phonetic fonts
You can use phonetic symbols and special characters in your paper. To make sure that readers of your article can see the phonetic symbols in the PDF document, all special symbols must be embedded in the PDF. Depending on the software you use to produce the PDF the details may vary. In our experience the fonts are usually embedded, but this can be checked e.g. by inspecting the "Document Properties -- Fonts" in Acrobat Reader.
It is recommended to use one of the following fonts to show phonetic symbols (links for free download can also be found at the Congress website):
• IPA-SAM phonetic fonts [3]
• SIL phonetic fonts [4] (Unicode is accepted)

“Unicode is accepted.” As an afterthought. Big deal.

Where have the congress organizers been for the last ten years? Unicode should be required. And legacy fonts firmly deprecated.

16 comments:

  1. Hear hear! I just hope you wrote them a personal letter on this, besides complaining here in (on?) your blog.

    ReplyDelete
  2. Even the shiny new OED Online site, which has adopted Unicode, is still apparently using 3 instead of ɜ, e.g., s.v. learn:
    Pronunciation:  /l3ːn/ Inflections:  Pa. tense and pple. learned /l3ːnd/ , learnt /l3ːnt/ . Forms:  Pa. tense and pple. learned /l3ːnd/ , learnt /l3ːnt/ .

    Probably a throw-back to when they used SAMPA encodings but with a special font that had the proper IPA symbols in those positions.

    ReplyDelete
  3. I've read your document 'AN UPDATE ON PHONETIC SYMBOLS IN UNICODE'. Writing transcriptions is a pain in the neck, I would like to know if there are better ways to type phonetic symbols than those mentioned in section 6.4 (considering the document was written in 2007.

    ReplyDelete
  4. OK, there may be marginal cases where the font you are using falls down over one or two unusual symbols: but then you will probably see a blank square or something similar — you won’t see the wrong phonetic symbol or some ludicrous webding, as used to happen.
    Another Bad Thing which is unfortunately very common is that in some very widespread fonts some combining diacritics are broken, so the reader might see a tilde on the character before or after the one the author intended it to be on.

    ReplyDelete
  5. @Mike Lladrorg: various input methods have been mentioned in this blog from time to time. Different people have different preferences. Personally, I'm very happy with Mark Huckvale's Unicode Phonetic Keyboard (from UCL), which I used to edit the most recent edition of LPD and also use (usually) for writing this blog. For one-off symbols in MS Word, the Insert Character function is straightforward.

    ReplyDelete
  6. For those using Macs, I've made great use of a free program called Ukelele to make my own IPA keyboards. Once installed, they can be switched to by simple keyboard shortcut (I use mine so often I have it set to command+space).

    Come to think of it, I really should tweak the one I've been using for like two years—I made it before I realized things like the rarity of the bilabial trill [ʙ], which I have set to shift+b, while more common characters take a three-key combination.

    One note if you go that route, make sure to use the IPA [ɡ] and not the standard ‹g›, which can appear in non-IPA-approved looped form.

    Ah, I didn't realize/remember that it is distributed by SIL.

    Oh, and: Death to Legacy Fonts!

    ReplyDelete
  7. @dirck: you're wrong to say the standard (looped, spectacle-shaped) g is "non-IPA-approved". Both forms of g are permitted, by an explicit decision of the council years ago. Note also the recommendation (p. 14 of the 1949 booklet) to use the spectacle-shaped form under certain circumstances.

    ReplyDelete
  8. @Mike Lladrorg: A lot of phoneticians have told me that they have found this useful for inputting IPA characters http://rishida.net/scripts/pickers/ipa/

    ReplyDelete
  9. Btw, fwiw, i just added a new view to the IPA picker that uses the layout of Mark Huckvale's Unicode Phonetic Keyboard (see http://rishida.net/scripts/pickers/ipa/?view=keyboard).

    ReplyDelete
  10. "When your Powerpoint presentation using the computer supplied by local organizers would probably fail to display your phonetic symbols properly" certainly isn't restricted to days past: my experience of Powerpoint is that embedding the Unicode fonts I've used doesn't always work. I assume it's something flaky in Powerpoint rather than the fonts themselves since I don't seem to get such problems with pdf files. Strangely, I've never had a problem with embedding what we might call normal letters - it's only ever the phonetic symbols which are problematic. I don't see how that should be when I'm using Unicode fonts, but such are the mysteries of Powerpoint, I suppose.

    ReplyDelete
  11. I completely agree. Not using Unicode is simply counterproductive, especially from the standpoint of the evolution of computing. Another example of this is that the majority of software is not 64-bit compatible, whereas the majority of hardware is. Why not take advantage of the newer and definitely better system?

    The only thing about Unicode is that, unlike 64-bit computing, it has been the standard for years and is very easily accessible. There is no excuse to use deprecated legacy fonts!

    ReplyDelete
  12. I'm completely supportive of banning legacy fonts and moving entirely to Unicode. The difficulty lies in the fact that there is no converter to change the legacy fonts to the unicode ones. In my files of documents, I have the IPA-SAM and SIL IPA93 font set.

    Also, as someone who has hated MS Word for years and despises it more in the 2007/2010 versions, I have been a WordPerfect user for years. Unfortunately, the Corel corporation has not made WordPerfect fully Unicode compliant. Open Office provides a good alternative to Word, though. It's far more customizable.

    ReplyDelete
  13. Michael: SIL has some routines for converting SIL IPA93 to Unicode: http://scripts.sil.org/cms/scripts/page.php?item_id=SILIPA93DataConversion.

    ReplyDelete
  14. I will add that on the LaTeX front, most linguists use the TIPA package (http://www.ctan.org/tex-archive/fonts/tipa) for typesetting IPA strings. The TIPA package like basic LaTeX in general does not Unicode technology. It is, however, possible to do so using the XeLaTeX extension of LaTeX, which is uses Unicode explicitly. I wonder if XeLaTeX is just too new to have filtered down to users.

    ReplyDelete
  15. Too many wordprocessors use a limited (or nonstandard) version of unicode for me to switch just yet. Besides, good old SAMPA is enough for my small requirements.

    ReplyDelete