Wednesday 30 June 2010

tittles and jots

It is only just over two months since I last discussed the question of foreign “accented letters” in ordinary spelling. We need them for loanwords (façade, cliché) and in particular for foreign names (congrats to the Guardian for getting Žižek right, see above). They are perceived as problematic not only because we don’t use them in writing ordinary English, but also because not all software can cope with them (certain email clients being notoriously unable to deal with anything beyond ASCII), and because documentation on how to input them is often inadequate.

The current issue of the Economist has an interesting article on the subject.
OVER at Gulliver, our correspondent reports on his trip through Tromso airport. Or, as a commenter, Lafayette, notes, shouldn't we write it Tromsø, as the Norwegians do?
Our style book rule is to use the diacritic marks on French, German, Spanish and Portuguese names and words. The rest have to do without. Why?

It’s not difficult to input diacritics for those languages nowadays. Users of Word for Windows ought to be conversant with the keyboard shortcuts (see table). They’re just as easy to input in a Mac.

We need to remember that in German Günther is a different name from Gunther, and Köhler a different name from Kohler.

The Economist article was a follow-up to an earlier one that dealt particularly with the diacritics used in eastern European languages.
Estonia[n] has the õ, Latvian the ķ, Lithuanian the ų, Polish the infernally similar ż and ź, not to mention the ł; the Czechs have the ů, the Slovaks the ŕ and the Hungarians the ő. There are dozens of other examples, but you get the point. They tend to get overlooked.

East Europeans will have regained their real place in the world once their names are spelled properly, not mutilated by an inadequate foreign character set.

...But they matter. Estonia’s national anthem, for example, starts: “Mu isamaa, mu õnn ja rõõm” (My fatherland, my happiness and joy). Written in the western character set, “onn ja room” becomes something quite different: the comical "small hut and crawl".

The õ of Estonian happens to be needed for Portuguese, too, so it is covered by the Word keyboard shortcuts. But if ctrl-apostrophe e gives us é and ctrl-comma c gives us ç, why can we not use ctrl-apostrophe s to input ś and ctrl-comma k to input ķ? Wake up, Word!

Meanwhile, you will be glad to know that you can include any Unicode character lacking on your keyboard in your blog comments here, but only if you first compose your message in a Unicode-compliant word processor (e.g. Word), then copy and paste. In Word itself, failing implementation of the improvements I have suggested, you can use Insert Character, or alternatively enter the Unicode number and then do Alt-X.


  1. If online, I find this quite helpful too:

    and, to get back to phonetics:

    For me, the IPA picker has an advantage over "Insert Character" and suchlike in that the characters are arranged in a very similar fashion to the IPA chart and therefore are easier to find. I don't have to guess which Unicode block I should be looking in, for example.

  2. Thanks, Paul. Ishida's utilities are good, aren't they? I wasn't aware of them before.

  3. This IPA keyboard is what I use; I think it's better than Ishida's for IPA. You can find it easily by googling for [IPA keyboard].

    There are plenty of other Unicode-capable editors; you certainly don't have to fire up heavyweight Word just to write a blog comment. In particular, if you know the Unicode code point for a character, you can enter it directly into the comment window in the form &#xnnnn; For example, &#015B; gives you ś and ‰ gives you ķ.

  4. Perhaps someone could help me?
    I'm having trouble inserting a theta with an underscore bridge into Microsoft Word 2003.

    Inserting theta from "Insert > Symbol" and then pressing ALT + 810 for the bridge doesn't work properly (the bridge is offset to the right slightly);

    Copying and pasting from Wikipedia and the IPA keyboard both give the same results.

    This is a problem I'm only having with the Greek letters (beta, theta, gamma) when I try to combine them with diacritics. When I use diacritics with any other symbols, they show up perfectly.

  5. Tom -- this seems to be a font issue. I've just tested, and out of the box Word 2008 does the same if you follow your procedure. The thing is, the theta is inserted from whatever default font you use (Calibri for me), and the bridge from the dafault font that has the symbol (Cambria Math here). And not all fonts have "smart diacritic placement", which is needed for the diacritic to align properly (or at least they don't have it for all characters). Try selecting the whole thing (theta AND bridge) and changing the font to one that does smart placement throughout, e.g. Doulos SIL.

  6. I don't know why Word doesn't set it at standard, but there is indeed an easy way to make CTRL+' work for all consonants and vowels. It takes about ten minutes to set up. Go to the usual 'insert character' interface, click on an accented characters that you want to assign a shortcut to, select shortcut key, and press, for instance, CTRL+' and then c for ć. Proceed to do it for all the characters that you may need.

    Before I switched to Linux (which itself has a nice 'compose key' feature that allows me to insert characters as odd as ẘ with a few keystrokes whilst typing in a browser or an office document), I even used this feature to type IPA characters without having to ploddingly insert as I go along. I'd simply assign the shortcut as one that was unused for other diacritics, say CTRL+\ or |, and CTRL+\, a would produce ɑ, o = ɔ, etc. It made things much quicker - and can be set up for typing wherever you like by building a custom IPA keyboard in MS's custom keyboard layout creator. It saves quite an amount of time, so I'd recommend it.

  7. Jaroslaw: Thanks for the tip, it worked perfectly.

  8. I use (and love) the US Extended keyboard which came standard on my Mac - it enables me to type any accented Roman character.

  9. In fairness to Microsoft and the British press (never thought I'd hear myself say that) the problem with obscure characters is often not how to enter them but whether they even exist in the font you or rather your publisher has chosen to use. As for myself, I'm fussy enough to want the proper Polish characters and not just a z with an acute accent overstrike or an l with a slash through it, but sometimes that's just not technically possible. Your quarrel is really with people who design or choose inadequate fonts, whether because they don't know any better or just don't think it matters.

  10. And I'll chuck in my two pennorth among all the others: a suitably tiny piece of open-source freeware called AllChars. It works anywhere in Windows and involves minimal keystrokes and no memorising of arbitrary key sequences. You can get it from SourceForge.

  11. HELP!

    Can any Mac user help me? The little tale in John's link is full of wrong characters when I read it. I've tried changing Text Encoding and the Default Font in my browser (Safari).

    When I Copy and then Paste into Word, the correct characters are displayed. That's all very well for a short short story, but I can't rely on this trick to consult the site John referred us to on 7 June American "dialect" areas.

    Any suggestions?

  12. I have been working on my own invented language where these diacritics are needed, and it took me a long time linguistically to see why I need them.

    You are right, it is very hard to get diacritics beyond those of German, French, and Scandinavian languages, and it is unfair.

  13. Mac users have an alternative to keyboard shortcuts -- one that I find much more congenial. The Character Viewer will Insert any character from any font on your computer, working in any program on your computer.

    My favourite trick is to locate characters such as COMBINING GRAVE ACCENT (Unicode 0300), COMBINING ACUTE ACCENT (Unicode 0301) etc and store them among the Favourites.

    Less frequently used combining symbols can be accessed from Diacritics under the European Scripts heading using the By Character option. (The View should be set to All Characters.)

    Alternatively you can switch the View option to Code Tables, press the Unicode button and select from 00000300 Combining Diacritic Marks.

  14. The slightly irritating thing I've noticed about the Word for Windows deadkeys is that they don't seem to like ŵ, ŷ, and ỳ. This makes typing in Welsh on a Windows box (fortunately, I'm usually on Linux) a bit of a challenge: even with spell-checking software, there are still minimal pairs to watch for (e.g. "tŵr" ("tower") vs "twr" ("crowd", "heap")).

  15. I see now has a 'full' IPA keyboard on their site (they used to have one only containing symbols used in transcription of English). The new page is here.


Note: only a member of this blog may post a comment.