1. Introduction


This paper presents a digital list of words and short phrases in the Meegye variety of Mangbetu (ISO 639–3 code [mdj]), a Central Sudanic language spoken by perhaps 620,000 people in the northeastern part of the Democratic Republic of the Congo (Lewis, Simons & Fennig 2013). [1] The list exemplifies the occurrence of labial vibrants (i.e. trills and flaps) in the language.

This presentation form was generated from an archival form of the data (Olson et al. 2013). The procedure we followed for creating both forms is detailed in Simons, Olson & Frank (2007). In producing the archival form, we followed the best practice recommendations discussed in Bird & Simons (2003:574–579) and Simons, Olson & Frank (2007).


In addition to a description of the primary data in the form of phonetic transcription, we provide a documentation of the data in the form of digital audio recordings (Himmelmann 1998), enabling the reader to verify and critique our transcription. This is important for these data, because Mangbetu has some unusual phonological characteristics.


First, Mangbetu has a three-way contrast between voiceless, voiced, and prenasalized bilabial trills [ʙ̥ ʙ m͡ʙ] (Larochette 1958, Tucker & Bryan 1966, Thomas, Bouquiaux & Cloarec-Heiss 1976, Demolin 1988, 1990, 1991, 1992, 2002, 2013, and McKee 1991, 2007). Similar three-way contrasts have only been attested in the geographically nearby languages Asua, Lombi, Baka, and Lika (Keating 2007).

The three bilabial trills in Mangbetu have been interpreted as simple phonemes (Larochette 1958, Demolin 1992) or as complexes consisting of a stop phoneme and a labial release (McKee 1991, 2007). That there are multiple possible phonemic interpretations in this case is a reflex of the non-uniqueness of phonemic solutions (Chao 1934). Regardless, all researchers agree that the sounds contrast with each other on the surface phonetic level. McKee (2007:183) provides the following minimal triplet:



‘to bring out from within’


‘to fan’


‘to enclose’

In addition, the three bilabial trills [ʙ̥ ʙ m͡ʙ] contrast with their respective bilabial plosives [p b m͡b] (Demolin 1992:168,174, McKee 2007:183,185):





#42 in our wordlist


‘to push out’




‘to feed a fire’







‘have him tour’


‘clay pot’


Second, bilabial trills are often thought to nearly always occur in a very specific environment, i.e. preceded by prenasalization and followed by a close back vocalic segment, and it is thought that they nearly always developed historically in this environment (e.g. Ladefoged & Maddieson 1996:130). While this [mʙu] sequence is quite common cross-linguistically (Keating 2007), both Demolin and McKee provide robust evidence for additional environments for the realization of the sounds in Mangbetu: (a) the presence of both voiceless and plain (non-prenasalized) voiced bilabial trills in Mangbetu (cf. example 1 above) demonstrates that prenasalization does not obligatorily co-occur with a bilabial trill, and (b) the sample data in example (3) below show that the trills occur before vocalic segments other than close back ones in Mangbetu:




‘leaping like a leopard’

#44, #59


‘early morning’

#45, #60


‘to blink’

#50, #65



#52, #67






‘kind of plant’

#57, #72






‘water pulsing from a hose onto the ground’

#58, #73


‘to drink’



In the cases of a bilabial trill followed by a front vowel, there is a noticeable labial glide at the intersection of the two segments (cf. Ladefoged, Cochran & Disner 1977:54). Whether this glide should be interpreted as an integral part of the trill or as a distinct intervening segment has been the matter of some debate (Demolin 1992:174, 2013; McKee 2007:181–182). One factor to note is that there is no contrast between the presence or absence of the glide in this environment.

Note also that the presence of these additional environments is weak evidence against Ladefoged & Maddieson’s claim that bilabial trills nearly always develop historically from an [mbu] sequence. An historical study of the Mangbetu case would be necessary in order to address that claim.


Third, these three bilabial trills also contrast with a phonemic labiodental flap /ⱱ/ in Mangbetu (Larochette 1958, Tucker & Bryan 1966, Demolin 1988, 1991, 1992, McKee 1991, Demolin & Teston 1996). This sound is sometimes realized as a bilabial flap in Mangbetu (Demolin 1992, Demolin & Teston 1996). Olson & Hajek (1999:110, citing a personal communication from Robert McKee) provide the following minimal pair:



‘to defecate’



‘to get fat’



The presence of both labial trills and flaps in the same language is rare, and has only been attested in languages in geographic proximity to Mangbetu (Olson & Hajek 2003, Keating 2007).

Item 32 [nɛ́ⱱjàⱱjá] ‘black bird’ contributes one additional example of the rare occurrence of a labiodental flap in a consonant cluster (cf. Olson & Hajek 2003:168).

2. Creating the wordlist


The materials included in this presentation of the data include the following:

  1. Wordlist: contains for each item a French and an English gloss, orthographic and broad phonetic transcriptions, and relevant notes.

  2. Recordings: MP3 [2] and WAV digital recordings of each item. The MP3 recording is accessible by clicking on the play icon next to the orthographic form of each word in the list below. Alternatively, you can play the WAV recording by clicking on the download icon. Your web browser will attempt to play the recording with the sound program that is set up as the default WAV player on your system. On Windows computers, you can download the WAV file by right-clicking on the download icon (control-click on Macintosh computers) and selecting “Save Target As…” or “Save Link As…” from the pull-down menu.

  3. Metadata: a description of the data, useful for resource discovery.


The original material consisted of an audio recording on two cassettes. The authors collected sample lexical items and short phrases, mostly from Demolin (1992) and McKee (2007) (the latter in draft form at the time). The second author transcribed the items in Meegye orthography (McKee 2002), adjusting them according to his own speech variety. Mangbetu exhibits much dialectal variation, so these adjustments should not be construed as invalidating the data from the other sources.


The wordlist was recorded in March 2004 at the ACATBA center (l’Association Centrafricaine pour la Traduction de la Bible et d’Alphabétisation) in Bangui, Central African Republic. The recording was made with a Marantz PMD 420 monaural cassette recorder and an Audio-Technica ATM 33a condenser microphone. The second author, an adult male mother-tongue speaker of Meegye, produced the French gloss and one token of each item, reading from the orthographic transcription. Some items were then repeated a second time.


The recording was digitized at the International Linguistics Center (ILC) in Dallas, Texas in March 2005 by Roger E. Olson. The audio cassettes were played on a Marantz PMD 221 analog cassette recorder, and the recording was digitized onto a standard Windows XP computer using a Tascam US-122 USB Audio/Midi interface and CoolEdit 2000 for audio capture. The digitization was done at a 48 kHz sampling rate and a 24-bit quantization, which according to IASA-TC03 (2005:8) is the minimum recommended digital resolution for the archiving of analog originals. (Plichta & Kornbluh 2002 recommend 96 kHz and 24 bits.) The recordings were stored in non-compressed WAV format, also the recommended industry standard (IASA-TC03 2005:8).


At present, the hardware of some computers does not handle 48 kHz, 24-bit audio. For the purpose of the presentation form, then, the first author downsampled the audio recording using CoolEdit 2000 to 44.1 kHz, 16-bit (i.e. standard audio CD quality) applying a low-pass pre-filter to the data in order to prevent aliasing (Ladefoged 1996:139–141). This sampling rate is sufficient for most technical purposes since it covers nearly all acoustic information pertinent to language (Ladefoged 2003:18,26). The higher quality original digital recording is available in the archival form.


Working from the orthographic version, the first author produced a broad phonetic transcription of the data which conforms to the extant version of the International Phonetic Alphabet (IPA 2006). This transcription is provided in the Phonetic column of the list. He also added clarifying notes to certain entries, found in the Notes  [3] column.


He keyboarded the data into a Microsoft Word document using Unicode 5.1 characters. [4] He converted the data to a comma-delimited (CSV) file, and imported them into TableTrans v. 1.2 software (Bird et al. 2002). Judy Kuntz time-aligned the items to the original audio recording at the ILC in February 2008. This annotation was outputted to an XML [5] annotation graph output and transformed into an XML descriptive wordlist format using an XSLT script.


The original digitized WAV file, the XML descriptive wordlist, and a metadata record constitute the archival form of the list. The metadata record follows the standard used by the Open Language Archives Community (OLAC). [6] The archival materials may be downloaded for free from the Internet (Olson et al. 2013).


The presentation form of the wordlist was then generated from the archival form. An XSLT script was employed to convert the archival XML descriptive wordlist into an HTML presentation wordlist. Then TableTrans was used to automatically create individual sound files corresponding to each of the segments identified in the transcription process for use in the presentation form. In-browser playing of the MP3 audio files was implemented using UbaPlayer 1.0.0, [7] an open source HTML5 audio player with Adobe Flash fallback.

3. Discussion


Some best practice recommendations were not followed in the creation of this resource. First, the items are spoken in isolation rather than in carrier sentences. A carrier sentence may help avoid list intonation and make it easier to measure the length of some sounds (e.g. word-initial stops) (cf. Ladefoged 2003:7–8). Second, only one speaker was recorded. Recording a larger number of speakers could help ensure that the data reflect the language as a whole and not an individual’s idiolect. Ladefoged (2003:14) recommends recording half a dozen speakers of each sex. Third, only one token of several items was recorded. Fourth, the original hand-written orthographic transcriptions were not retained, and hence not archived.


On the other hand, the project succeeded in following most of the best practice recommendations found in Bird & Simons (2003:574–579) and Simons, Olson & Frank (2007). For example, the primary recording was digitized at industry-standard archival rates and stored in open (i.e. published) formats. The transcriptions were made in both IPA and orthography, encoded in Unicode, time-aligned to the underlying recording, and stored in an open descriptive markup format. The archival form was deposited into an institutional archive dedicated to the long-term preservation and availability of the resource. A human-readable version of the resource using presentational markup was provided (i.e. this web resource). Finally, the resource was described using metadata listed with an industry-standard repository for resource discovery.

4. Audio Files


Click HERE to get to the wordlist (including all audio examples)


