Wednesday, 4 April 2012


No sooner had I got back from Kyiv than I was off to Leeds for the biennial Colloquium of the British Association of Academic Phoneticians (BAAP).

The presentations there were of very high quality. Everyone started on time and finished on time (thanks to tight chairing), everyone was audible, all the Powerpoint slides were readable, and as far as I could tell no one just read out a prepared text. The posters were good, too. And that’s more than you can say of some academic conferences I have attended.

At each BAAP colloquium members vote to award three prizes. One, the Peter Ladefoged prize, goes to the paper or poster that best reflects Peter’s approach to phonetics. This time it went to Adrian Simpson (pictured) for a paper on Percussives. (“The percussive manner [of articulation] involves the striking together of two rigid or semi-rigid articulators.” — IPA Handbook, p, 187) As well as the [ʬ ʭ ¡] of ExtIPA, he discussed, with wave-form support, the labial percussive that arises from the approach phase of the [p] in the [ʔp͡ t] of words such as stamped, along with the transient ‘epiphenomenal clicks’ that arise as a consequence of other articulations. Adrian is the kind of phonetician that I approve of: he personally demonstrated to us each of the sound-types he referred to. (I don’t care for phoneticians who can’t or won’t perform in public. If we want our students to make this or that sound, we’ve got to be able to do it ourselves.)

The two Eugénie Henderson prizes go to the best oral presentation and the best poster by newcomers. The first was awarded to Michael Ramsammy for an electropalatographic investigation of the weakening of Spanish l in preconsonantal environments. The second was won by Nicholas Flynn for a poster comparing twenty vowel formant normalization methods. He concluded that “vowel-extrinsic, formant-intrinsic” methods performed the best at normalizing vowel formant data for sociophonetic study. (No, I don’t understand this, either.)

Here are some new technical terms I noted among the ninety or so papers and posters at BAAP:
attriter, a bilingual whose L1 has become subject to attrition through living in an L2 environment;
enchronic, relating to the time at which an utterance is uttered;
LADO, Language Analysis for the Determination of Origin (of an asylum seeker)

I have no idea what is meant by ‘7th- and 8th-order sliding-Gaussian-window lpc analysis’ (John Esling), nor by a ‘Bootstrap Markov Chain Monte Carlo algorithm’ (John Coleman). But perhaps they had their respective tongues in their cheeks when uttering these words. (If you feel strong, look here, here, and here.)


  1. ‘Bootstrap Markov Chain Monte Carlo algorithm’ is a statistical concept - not a linguistic one.

  2. I'm not sure if we aren't having our legs pulled here ;)

    But, FWIW, formant normalisation does what it says on the tin: it's a procedure to normalise formant measurements*. There are different formulas to achieve this, and vowel/formant/speaker in-/extrinsic refers to whether your formula uses data from one or many vowels/formants/speakers.

    (*) You can't meaningfully compare formant measurements from different speakers, even of the same sex, but especially of different sexes, due to anatomical variability in vocal tract sizes and shapes. Their F1-F2 vowel spaces will vary quite a bit in size and position without any corresponding difference in acoustic percepts. (And thus it's believed that the brain does some vowel normalisation as part of speech perception.) These days, because carrying out lots of computation is cheap and easy, formant normalisation is de rigueur in sociophonetics. One place to look at if you're interested is NORM. I have a strange feeling that studies on formant normalisation have proliferated greatly since NORM became available...

  3. Oh, and LPC, as one of your links shows, is Linear Predictive Coding, which is one of the techniques used for spectral analysis of speech. Due to its nature, it has to be done not at specific points in time, but over "windows", i.e. very short slices of the signal. "Gaussian" refers to how the slice is "cut out" from the singal -- the curve that represents this has a Gaussian, i.e. bell-curve, shape. "Sliding" means that the window, well, slides along the signal; there's slight overlap between neighbouring windows.

  4. The source for Flynn's terminology is probably here:

    Here's is the gist:

    "Summarizing, we find that procedures using information across vowels [vowel extrinsic] performed better than procedures using only information within vowels [vowel intrinsic] and procedures using information within formants [formant intrinsic] performed better than those using information across formants [formant extrinsic]"

    1. It's just another manifestation of professional terminology, aka jargon, isn't it? BTW, I think the intrinsic/extrinsic distinction goes much further back than Adank et al., maybe to Nearey's dissertation (I have a non-searchable pdf and don't have the time to check). Adank et al. is just a standard reference these days, as it compares several different methods.

    2. Sorry, I wasn’t clear. I’m never sure what is widespread knowledge and what isn’t, and the explanation might wind up being the length of a blog post. Someone else can do doubt explain this better than I, but here goes.

      The problem that normalization attempts to solve is that researchers using acoustic analysis want to filter out differences that are due to anatomical differences among speakers (i.e., the different lengths of the vocal tract in men, women, and children) but preserve those differences that are due to phonemic variation and sociolinguistic variation.

      In a vowel-intrinsic method, all the information for normalization is found within a single vowel token. That is, normalization occurs by using various combinations of F1, F2, F3, F0, etc for, say, GOOSE. This would model a human being's ability to categorize vowel a token in isolation. In a vowel-extrinsic method, the formant values of all the vowels are computed in relationship to one another. That is, you would need tokens from more than one lexical sets, possibly all of them, before you could begin normalization. This would suggest that a human being needs to hear how a speaker pronounces all their vowels before they could categorize a given vowel token correctly. So the distinction between vowel intrinsic and vowel extrinsic is whether you can normalize each individual vowel token in isolation from one another (vowel intrinsic) or whether you need to normalize all the vowels in relationship to one another (vowel extrinsic)

      The distinction between formant intrinsic and format extrinsic is somewhat related. In a formant intrinsic method, all the information you need for normalization is contained in the isolated formant values of a given vowel (that is, you can normalize an individual formant without knowing the values of the other formants) In a formant extrinsic method, information across formants are incorporated into the analysis (i.e., the normalization depends upon calculating distances between formant values, e.g, F2-F1, or, Bark converted values Z2-Z1, to model advancement).

    3. I should think that there is plenty of (anecdotal) evidence that actual human beings use a vowel-extrinsic method. For example, the Aussie in the U.S. restaurant who wants his coffee at the end of the meal, but it keeps coming back with more cream in it! This does not mean that AusE has a FACE-PRICE merger, but that the AusE FACE is misinterpreted as PRICE by the American waiter due to the poverty of the stimulus (in a somewhat different sense than usual). Of course, two different geographical accents is an extreme case, but no two people have exactly the same accent, so variants of it must come up all the time.

      But perhaps I am missing something.

    4. John Cowan, can you explain your coffee example? Maybe it's because I'm not a coffee drinker, but I can't imagine what the Aussie in the example is saying to be misinterpreted as wanting more cream. "I'd like my coffee after the meal" (or "...with dessert") just doesn't sound like "I'd like more cream" even without any vowels, nor does it have a FACE vowel. Nor does "milk" or "cream" have a PRICE vowel. Thus, I'm puzzled.

    5. @John Cowan:

      I have a similar example: an Australian friend seemed to be talking about a "straight parade", which I could only imagine to be some kind of rebuttal to the gay parades that are quite common in nearby San Francisco. Only later did I realize that he was actually saying "street parade" :)

      (His heavily diphthongal FLEECE vowel was misinterpreted by me as FACE).

  5. @John Cowan,

    For normalization, it is perhaps best to think that we have speakers with the same accent, say, an adult male, an adult female, and a child. If you plotted a vowel token for each of these speakers, they wouldn't appear in the same postion, because of the anatomical differences between men and women and between adults and children. However, the vowels would be perceived as being the "same." I think the evidence is mixed about whether vowel intrinsic or vowel extrinsic procedures model human perception (probably human beings do both: one would think redundancy would be an advantage here, and, contra Chomsky, redundancy isn't necessary a bad thing for a biological organism.

  6. "Enchronic" doesn't refer to "to the time at which an utterance is uttered" -- it refers to an understanding that talk happens in time, and so when we analyse e.g. turns at talk, we need to pay attention to the way the dynamics of how the turn unfolds in time, rather than the result (the final product, the utterance).

    The word comes form Nick Enfield's The Anatomy of Meaning (2009: 10). I used it because it captures an important shift in our analytic focus from product to process, and how a current listener might parse what they hear and make sense of it -- and co-ordinate their talk accordingly.