Thursday, 29 December 2011

nonstandard assimilation

At the Łódź conference John Coleman presented an interesting talk about the spoken component of the British National Corpus. It comprises about ten percent of the entire corpus.

It includes a wide range of authentic spoken material, recorded in 1991-92 by volunteers wearing Walkman devices recording all their conversational interactions over a 24-hour period. As well as all kinds of structured and unstructured talk directed at other people, from sermons to discussions of boyfriends, the files include dog-directed and parrot-directed speech. Who’s a pretty boy, then?

The material has now been digitized by the British Library from the original analogue recordings.

Although comprising only ten percent of the whole corpus, the audio material of the BNC extends to 9 TB (nine terabytes), about 1800 hours’ worth. So you won’t be downloading it all and storing it on your hard disc any time soon.

Although the whole spoken corpus is unmanageably large, a selection of audio files from the BNC is now available online.

The ten most frequently used words in the spoken corpus, Coleman says, occur more than 58,000 times each. At the other extreme, 23% of the words used (12,400 words) occur only once. Many other words that are surely in people’s vocabulary never occur at all.

Coleman presented some observations about assimilation of place of articulation. As well as the familiar dealveolar type (ˈtem ˈmɪnɪts, ˈɡʊɡ ˈɡɜːl), he found various instances of “nonstandard place assimilation of word-final /m/ and /ŋ/”. Delabial examples included siːn in seem to and seɪŋ in same kind of. As well as plenty of cases of aɪŋ(ɡ)ənə etc for I’m going to, he reports “18 tokens per 10 million” of əˈlɑːŋ klɒk for alarm clock. The most frequent item classified as develar was swimming pool pronounced as ˈswɪmɪm puːl — but there of course the underlying form of the -ing ending would be ɪn rather than ɪŋ for some speakers in some styles of speech (as the sociolinguists have documented), so that the assimilation could be dealveolar after all, not develar. The same applies to ˈwedɪm in wedding present.

We await further reports with interest.

14 comments:

  1. I've finally discovered my copy of Gillian Brown's Listening to Spoken English. She gives a number of similar examples from data collected before 1976. For example:

    /əˈmaʊntbaɪ/ [əˈmaʊmʔpbaɪ] amount by
    /ˈbændfəˈlaɪf/ [ˈbæmbfəˈlaɪf] banned for life

    ReplyDelete
  2. But those are "standard" (dealveolar) cases, David.

    ReplyDelete
  3. John

    Ah, I see.

    But one of her examples is [aɪŋˈgɜŋ] I'm going.

    ReplyDelete
  4. I see my link didn't work, so I'll try again. There's a description at this Amazon page:
    Listening to Spoken English.

    ReplyDelete
  5. Among these kinds of known diachronic supplements for linguistic treatments, i do not know what else she has as far as meaningful linguistic analyses are concerned. But we know that English vowels, unlike other languages, are arguably not marked for stress but for syllables. So an unstressed syllable at initial is also technically part of the syllable unless the unstressed is syllabic. So why do we mark a primary stress after the unstressed vowel if it is actually belonged to a syllable, even if the schwa is more or less of its varying quality?

    ReplyDelete
  6. If 1800 hours of audio occupies 9 TB, its mean bitrate is 12216.8 kbps, nearly 9 times that used on CDs (1411.2 kbps). This seems needlessly high for speech transferred from recordings in other media.

    ReplyDelete
    Replies
    1. Our copy is way less than 9 TB, but the British Library Sound Archive uses a standard for library/archive sound recordings - 24 bit, 96 kHz - that is indeed a needlessly high bitrate for digitized compact cassette recordings. But standards is standards ...

      For the BNC audio sampler at http://www.phon.ox.ac.uk/SpokenBNC, we use 16 bit, mono files at 16 kHz.

      John Coleman

      Delete
  7. My post should have actually shown like this but that's fine--

    Among these kinds of known diachronic supplements for linguistic treatments, i do not know what else she has as far as meaningful linguistic analyses are concerned. But we know that English vowels, unlike other languages, are arguably not marked for stress but a syllable (or two) is. So an unstressed vowel at initial is also technically part of its adjacent syllable as a +body or -mora (or whatever pleases) unless the unstressed one is syllabic. So why do we mark a primary stress after the unstressed vowel if it is actually belonged to a syllable, even if the schwa is more or less of its varying quality?

    ReplyDelete

  8. tutuapp vip
    tutuapp apk
    tutu app apk
    tutuapp install
    tutuapp download
    The app comes with simplicity, anyone can start using this app or we can say platform to get the best gaming experience.

    ReplyDelete


  9. hankyou for the information.
    Please do find the attached

    http://www.robbiramdhani.web.id/

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete

Note: only a member of this blog may post a comment.