Thursday 1 December 2011

fun with symbols

Yesterday’s posting called for the small-cap-A symbol. I coded it straightforwardly in HTML as <small>A</small>. But blogspot accepts far fewer HTML tags in comments than it does in postings, so Paul, commenting, successfully entered it as a distinct Unicode entity, U+1D00.

Many, though by no means all, alphabetic small capitals are available in the Unicode range 1D00 to 1D7F. This block is known as Phonetic Extensions, and carries the introductory note
These are non-IPA phonetic extensions, mostly for the Uralic Phonetic Alphabet (UPA).
The small capitals, superscript, and subscript forms are for phonetic representations where style variations are semantically important.
For general text, use regular Latin, Greek or Cyrillic letters with markup instead.

As well as small caps (ᴀ ᴁ ᴄ), superscripts (ᴬ ᴭ ᵃ) and a few subscripts (ᵢ ᵣ ᵤ), the block contains various other typographically interesting characters. (I have no idea what they are used for in the Uralic Phonetic Alphabet — though see here.)

Here among the small caps you will find a ‘reversed N’, , a sideways Ø () and a sideways ü (). There is a ‘Latin letter voiced laryngeal spirant’ () and a ‘Latin letter ain’ ().

Not everything here is from the UPA. There is also a special ligature , which I can see appealing to English lexicographers who prefer respelling to proper phonetic symbols, as will ‘Latin small letter th with strikethrough’, . There is also something called ‘insular g’, , labelled ‘older Irish phonetic notation’.

Although they are not official IPA symbols, users of IPA will be happy to find here the lax high vowel symbols ‘with stroke’, ᵻ ᵼ ᵾ ᵿ: two of these are used in the Oxford Dictionary of Pronunciation, though the first, , bears the Unicode warning ‘used with different meanings by Americanists and Oxford dictionaries’.

A further Unicode block, Phonetic Extensions Supplement (1D80 to 1DBF) covers various former IPA symbols from which recognition was withdrawn at the Kiel Convention in 1989: those for consonants with velarization ᵬ ᵭ ᵮ ᵯ ᵰ ᵱ ᵲ ᵳ ᵴ ᵵ ᵶ and palatalization ᶀ ᶁ ᶂ ᶃ ᶄ ᶅ ᶆ ᶇ ᶈ ᶉ ᶊ ᶋ ᶌ ᶍ ᶎ, and for both vowels and consonants with retroflexion ᶏ ᶐ ᶑ ᶒ ᶓ ᶔ ᶕ ᶖ ᶗ ᶘ ᶙ ᶚ. So we can now find in Unicode everything we might need in order to digitize the 1949 IPA Principles, Jones’s The Phoneme, and various English-language accounts of Russian phonetics.


  1. The Wikipedia article on the UPA says small capitals represent unvoiced or partially voiced versions of voiced sounds, superscripted characters stand for very short sounds, and subscripted characters indicate coarticulation due to surrounding sounds.

  2. You're welcome for all that, John. For more in formation on many of the characters in those blocks, see and and and and and and I am sure there are others at

  3. 'Insular g' was simply the letter between F and H until the Norman Conquest — after which it was steadily displaced by continental g.

    Strikingly it lives on in a couple of Scottish surnames: Menzies (ˈmɪŋɪz) and Dalziel (dɪˈjɛl).

    Letter z is, of course, what insular g looked like.

    The latter name also has a spelling Dalyell which reflects the sound value of g before a vowel letter at the time when its use was insular.

  4. David, I thought that was "yogh", U+021C Ȝ (upper-case) and U+021D ȝ (lower-case).

  5. The distinctions between insular g, yogh, and plain Latin g are a little tricky, because you are only supposed to use the first two in Unicode when you need them for graphemic contrasts. Thus, if you want to represent Irish (of any period) or Old English in insular characters, you should use the plain Latin g with an insular font, just as you would use plain Latin d. In Middle English, the difference between inherited and French g is graphically and phonemically significant, at least in some written styles, and can conveniently to be represented in Unicode with the yogh character and the plain G character respectively. So the only purpose left for the insular g character is as a phonetic symbol, and that is why it was introduced into Unicode.

    Here are some cool pictures of Irish-language typewriters. The Royal one is especially interesting because it has doubled keys for d, t, and g so that it can be used to type both Irish and English. Note also that both the Tironian et and ampersand are present.

  6. P.S. It is indeed the yogh, rather than its ancestor the insular g, that was replaced by z in certain Scottish names.

  7. There was no need to invent the name yogh when there was was only one g-symbol in Ireland and the various parts of Great Britain.

    'Insular g' is the name palaeographers use when comparing Latin texts written here and written in Continental Europe. Look for the word angelus is this snippet from the Book of Kells.

    A good source of info on medieval scripts is this manual by Juan-José Marcos. Scroll down to Insular scripts.

    Manuscripts of the early Middle English period had continental g, insular g and yogh — sometimes interchangeable, sometimes with distinct sound values. This made sense to immigrant French scribes for whom the g-sounds of English and French were different. I seem to remember being told of a manuscript with all three symbols.

    Insular g was the first to go — leaving two symbols, hence two names.

    For the use of insular g in English, see this image of the start of Beowulf.

    Wikipedia refers to this article by Michael.

    1. Non working link. Get an error message... The resource could not be found. File not found: address.

  8. John Cowan

    I was composing my post while you posted. Michael's article in the link I gave confirms that yogh derived from insular g. I'm not sure why you call it the ancestor. For a short time, the one was simply an alternative way of writing the other.

  9. For any Mac users who haven't upgraded to the Lion OS, it's worth knowing that the new Character Viewer makes it much easier to scroll through Unicode characters and insert them. And they're broken into distinct blocks with headings.

  10. I think we must be at cross purposes. Insular g is the ancestor of yogh in the same sense that Greek beta is the ancestor of Latin b; that is, yogh was originally just a font variant (er, handwriting variant) of insular g, but later developed an independent context of use, and still later could be used in the same text with a different significance.

  11. John Cowan

    Parent might just about do as a metaphor. Ancestor implies a time depth and separation which simply didn't happen.

    Insular g was never called that until palaeographers felt the need many centuries later. Yogh was the same letter in the way that Mumbai is the same city as Bombay.

    It's as if I'd written that Byzantium lives on as Istanbul. The fact that there was an intervening name doesn't deny the equation.

  12. 'Istanbul' is a Turkicisation of modern Greek 'eis sten polin', they say, which means 'in(to) the city'.

  13. @Wojciech:
    I find it more plausible that İstanbul comes from Greek εἰς τὰν πόλιν eis tàn pólin /istamˈbolin/ ʻinto the city’, not from εἰς τὴν πόλιν eis tḕn pólin /istimˈbolin/ ʻin the city’.

  14. Ad homoid

    Sorry, I am no expert on/in later Greek. I don't know the form 'τὰν' in modern Greek, where does come from (not from Aeolic, I take it) and what does it mean? I thought the i-a change was somehow due to Turkish phonology.

  15. homoid, Wojciech

    It's hard to see how any Greek speaker in the time that Turks were around would write ταν. That doesn't mean that it wasn't pronounced somewhere with some sort of a-vowel. Especially as there's a suggestion that the Greek phrase was first turned into a word in Armenian.

  16. @ [ˈvɔ̝ˑi̯t͡sje̞x] :
    I am not an expert either but I thought it was due to long ᾱ ā changing into η ē in many dialects, e.g. ʻmother’ was μάτηρ mátēr or μήτηρ mḗtēr, depending on where the speaker-writer was from. (As time went by, only the forms with η ē remained, I think.)
    Interestingly I found that the German Wikipedia gives εἰς τὰν πόλιν as a plausible etymology too.
    PS: Sorry for my comparatively late response, but I kept getting strange error messages while trying to post this. (ʻThere is currently an attack which will attempt to mislead you into providing your username & password to a third party.’ and ʻYour OpenID credentials could not be verified.’)

  17. Ad homoid,

    yes, that was the point, already in the Hellenistic epoch there were no τὰν dialects around, only koine with its τὴν... . But as David Crosby has suggested, maybe there was some influence of a third language.

  18. This comment has been removed by the author.

  19. Wojciech

    already in the Hellenistic epoch there were no τὰν dialects around, only koine with its τὴν

    It's interesting what has happened to the term dialect. Modern scholarship has taken the ancient Greek word and concept, then little-by-little has changed it into something that is in some respects radically different.

    It's inconceivable that Hellenistic and Byzantine Greek did no have dialects in the modern sense. But for the speakers of the time there were no longer any dialects, only 'common' (in common, shared) κοινη.

    I once tried to argue with a Scottish friend that Scots could be seen as a 'dialect' in the original sense of regional spoken and written varieties such as Doric, Ionian etc. She was not impressed.

    I've since concluded that the best way to characterise 'The Greek Dialects' is that they were literary varieties based on idealised standardisations of regional speech.

    A writer would change dialect according to the literary genre — something almost without parallel in the modern world. Yes, there are Scottish writers who write in both Sots and English — but no non-Scottish writers compose, say, narrative ballads in Scots. The only remotely similar practice I can think of is the way composers of popular song used to write comic songs in the supposed idiom of a derided minority such as Blacks or Irish.

  20. Ad David Crosby

    thank you, but I knew all of that. Tsakonian is, by the way, reputed to be a descendant of a Doric dialect. Anyway, Greek literary 'dialects' were such in a sense radically different from ours. One William Barnes once attempted to write in a Dorset dialect, and wrote poems with beginnings like 'the girt wold heuouse o' muossy stuone' but that was not quite like what you mean.

    In any event, I'd be surprised if any local variety of Byzantine Greek still had 'τὰν' rather than τὴν.

  21. Ad David Crosby

    Popular music! British bands singing in an imitation of American English or vice-versa! Aint't that remote analogy to the old Greek practice of using different 'dialects' for different literary 'genres'.

  22. Well, in my idiolect my parents are my nearest ancestors.

    The 'into the city' etymology is not universally agreed on. Istanbul may simply be an allegro form of Constantinopolis, in which case the initial vowel is epenthetic (Turkish, like Spanish, does not allow initial s+consonant, but uses i as the vowel).

    I love the Scots language, of which I have a reading knowledge, and would like to write it as well. But unlike French or German, you can't learn Scots without going to Scotland, which is not practical for me.

    Tsakonian is indeed a descendant of Doric, but its Doricisms and other survivals (like the participle, which in all other Greek languages has been replaced by the infinitive) are perhaps not as interesting as its innovations, particularly the phonological ones.

  23. I do not know if there is much (a lot of?) evidence for the locution 'eis (s)ten polin' being very frequent in common language of Constantinople when it was conquered. If from 'Constantinople', the form would have to be allegrissimo indeed.

    There is also a thing called Doric in a Scottish context, but I forget what it is. One reason why you can't learn Scots other than in Scotland is simply the absence of corresponding learning books, like 'Scots for Beginners, with Exercises and a Grammar Outline' or such... .

  24. It would be interesting to see John's thoughts on Canepari's symbols, there are plenty of "fun" ones.

  25. Doric means north-east Scots, or sometimes Scots in general, and has to do with an analogy between the use of Scots or Scots dialects (or literary imitations of them) by Scottish writers on the one hand, and the use of Doric dialect (or literary imitations of it) by Attic-Ionic-speaking Ancient Greek writers on the other hand. Both were seen as "broad" (whatever that means exactly) and rustic.

    Scots for Beginners, with Exercises and a Grammar Outline

    Exactly. Of course a modern version would need to provide pedagogical (as opposed to research or generic) recordings in order to master the phonology.

  26. John Cowan

    Of course a modern version would need to provide pedagogical (as opposed to research or generic) recordings in order to master the phonology.

    Look out for this book and this accompanying CD.

    Ignore the ridiculous price quoted for a second-hand copy of the previous edition.

    I've seen the previous edition, and it looks sound. It won't equip to you write literary Doric, though.

    And do you know this steid?

  27. In case anybody's still interested in insular g and jogh, I've made a scan of part of the thirteenth century The Owl and the Nightingale available by clicking here.

    In this manuscript after some confusion initially after 1066, the symbol has settled down as a letter ȝ distinct from letter g. It's a survival — like the ƿ (thorn) andÞ (wynne) also used in the manuscript. Like them, it's used only in English words. Unlike them, it reflects only some of its use in Old English.

    The ȝ shape is a little different from its Old English appearance — but not nearly as different as the shapes of f, r, and s. So, it's the same (a continuation) and yet different (with restricted sound values).

    The name jogh is clearly appropriate for Middle English. David Crystal (at least in Evolving English) simplifies the problems of nomenclature by using jogh to refer also to the insular g of Old English.

    As I posted before, the letter as used in Old English (for all sound values of g) is nicely visible in this image of the start of Beowulf.

  28. Look out for this book and this accompanying CD.

    Does it teach Scots as really spoken anywhere?

  29. Wojciech

    I believe so. I'm thinking of buying it when it comes out. If I do, I'll let you know.

  30. Here's another look at the manuscript of The Owl and the Nightingale with David Crystal's transliteration and translation. It exhibits all three 'survival' letters jogh, wynne and thorn.

  31. Ad David Crosbie

    thank you, please do let me know when it's out and you think it's worth while. I dreamt (in times auld lang syne) of learning Scots but no better means occurred then to me of so doing than memorising 'Tam O'Shanter', which is an entertaining poem after all and contains many Scandinavisms such as 'it gars me greet' (it makes me cry) which I'd be surprised if many people these days were still using.

  32. Ad David Crosbie

    Re Owl and...

    'unwight' is 'grotesque thing', am I getting this right?

    That would be like 'Unding' in German, or for that matter 'Untier', and other things 'un-' (approximately: monstrously disfigured), but 'Unwicht' existeth not, in contemporary German, though 'Wicht' exist.

  33. My reference to Greek infinitives above should have been to complement clauses. Greek has lost the infinitive almost as thoroughly as the participle. It is preserved only in certain nouns historically derived from old infinitives, somewhat like the preservation of older English irregular past participles as frozen adjectives, like drunken rather than drunk. (Loss of infinitives is one of the earmarks of the Balkan Sprachbund.)

    David Crosbie: Thank you for the book information. Perhaps next year .... I do know about and have been there often.

    To summarize on various g's: In Old English there is only one g, which appeared in an insular form as a matter of course, and is correctly represented by a Carolingian g in modern transcriptions, including Unicode ones. In Modern English, there is of course only the Carolingian g. In Middle English there are two letters: the yogh, represented in modern transcriptions by ȝ, and the Carolingian g, again represented by a g. The Unicode insular g character is used only for Irish-style phonetic transcription, where it represents IPA /ɣ/.

  34. Ad John Cowan

    Don't worry, I understood like that (infitives-complement clauses in Modern Greek) right away. Slip of pen, presumably. Talking about Sprachbunds (-buende): d'you know any other but the Balkan? Just asking... . One frequently comes across 'balkanischer Sprachbund', but next to never across 'something-else-er Sprachbund'. Maybe in really insiderly literature which I ignore? Maybe Western-European Sprachbund, like what the Sapir-Whorf-boys call 'Standard-Average-European', meaning basically Germanic and Romance, minus Icelandic, minus Romanian?

  35. NED also used 'insular g' to distinguish different pronunciations of the letter G in Old English:

    'In OE. the letter stood for four different sounds, viz. the voiced guttural and palatal stop (in this Dictionary represented by g, g),and the voiced guttural and palatal spirant (here printed ᵹ, ).'

  36. John Cowan

    Yes, that's an elegant summary of what happened to letter g — more elegant than what actually happened in the hands of some scribes who were too muddled or forgetful to be consistent.

    One muddle that resolved itself a little against the grain is the use of jogh for velar fricatives. The practice of in inserting a letter h eventually became standard — leading to modern gh spelling. Hence niȝtingalenightingale.

    I was struck by your term 'Carolingian g'. It seemed like a really good idea until I looked up the manual by Juan-José Marcos. His font based on Carolingian models uses a shape very like Middle English jogh. The modern shape appears in what he calls Protogothic.

  37. Wojciech:

    The Wikipedia article is quite good, as with most articles on linguistics. It's important to realize that Sprachbund is a negative diagnosis: we call a group of similar languages spoken close to one another a Sprachbund if we cannot find sufficiently strong evidence of genetic relatedness. Other well-known cases mentioned in the article are the Indian subcontinent (where retroflex consonants diffused from the Dravidian languages into the Indo-European ones), the contour tones of south-east Asia, the diffusion of clicks from Khoisan into neighboring Bantu languages, and possibly the Altaic language group — there seem to be genetic links between Turkic and Mongolic, and between Mongolic and Tungusic, but few or none between Turkic and Tungusic, making some historical linguists reject the genetic hypothesis.

    It's important to distinguish between Whorf's use of "Standard Average European", a vague term with mostly semantic import (though supposedly mediated by syntax), and Haspelmath et al.'s use of "(Standard Average) European Sprachbund". The latter refers to the Germanic, Romance, Balto-Slavic, Albanian, Greek, and Hungarian languages, and is characterized primarily by morphosyntactic features. Note that it includes the Balkan Sprachbund as a subset. The core languages are considered to be French, Occitan, German, Dutch, and Northern Italian varieties. Firmly excluded are the Celtic, Armenian, Indo-Iranian, Basque, and the other Uralic languages (with the exception of some varieties of European Turkish, which are inside the Balkan Sprachbund). Again, the Wikipedia article gives the details.

    Note that both the Balkan and the European Sprachbunds formed in historic times: ancestral varieties such as Latin and Classical Greek do not belong to it.


    Carolingian g is the standard term — I can take no credit for it. I agree that g in actual Carolingian miniscule looks more yogh-like. Current g is the result of an artificial revival (in the Renaissance) of Carolingian script, whose direct modern descendant (much altered) is Fraktur. People who deal with typefaces must often be terminological buccaneers.

  38. Ad John Cowan

    Thank you. None of these Bunds is so salient, so striking, as the Balkanian---this is my impression, at least.

  39. Basically, this amounts to little else but this: neighbo(u)ring languages tend to become similar in various respects, though they be not affine to one another genetically or only remotely so. This is of course very true: Breton is, for instance, in some respects more like French than like Cornish or Welsh, Retoroman is more like Swiss German than like Italian (again, in some respects), Lithuanian is more like Polish or White-Russian than like Latvian. In this sense, the world is full of Sprach-bunds, most often overlapping one another multiply, and the concept sort of loses its bite. Unless, I am saying, it is reserved for really salient Bunds like the Balkanian, but then there are not so many Bunds at all, I'd suggest the Koreo-Japonic (the languages though not related have a lot of structural similarities, such as the obligatory topic marking and such).


Note: only a member of this blog may post a comment.