A Cat_ToBI transcription for an utterance consists of the following five annotation tiers:
- The Orthographic Tier
- The Phonetic Transcription Tier
- The Break Index Tier
- The Tone Tier
- The Miscellaneous Tier
1. The Orthographic Tier
This tier contains the orthographic transcription of the text.
2. The Phonetic Transcription Tier
This tier contains the phonetic transcription of the text International Phonetic Alphabet (IPA).
3. The Break Index Tier
The break index tier contains five break indices (BI), 0, 1, 2, 3 and 4.
- BI 0 is used to mark cohesion between orthographic words. Orthographic words separated by BI 0 constitute a prosodic word (PrWord) that may bear only one pitch accent.
- BI 1 is used to indicate boundaries between prosodic words (PrWords). Items separated by BI 1 should carry at most one pitch accent each.
- BI 2 is used to indicate either a perceived disjuncture with no intonation effect, or an apparent intonational boundary but with no slowing or other break cues.
- BI 3 is used to indicate prosodic boundaries between ips (intermediate phrases).
- BI 4 is used to indicate prosodic boundaries between IPs (Intonational Phrases).
4. The Tone Tier
For the intonational analysis of Catalan utterances we recognize two types of tonal events, pitch accents and boundary tones, and two levels of phrasing, the intermediate phrase (ip) and the intonational phrase (IP). The following subsections describe the inventory of those tonal events for Catalan, describing their phonetic realizations and their distributional properties.
4.1. The pitch accents
Nine basic pitch accents have been found in Catalan:
- 2 monotonal: H* and L*
- 5 bitonal: H+L*, L*+H, L+H*, L+¡H* and L+
- 2 additional pitch accents, H*+L and ¡H+L*, have a more restricted dialectal distribution in Catalan
An explanation about the prototypical phonetic realization and distribution of these accents is described in each one of the links.
Please bear in mind that lexically stressed syllables do not necessarily bear a pitch accent and that they can be unaccented (see Vanrell & Prieto 2024 in the References section).
4.2. The boundary tones
Seven types of boundary tones at the end of IPs (marked with the % symbol after the tone) and four types of boundary tones at the end of ips (marked with the - symbol after the tone) have been found in Catalan:
- 3 monotonal: L%, !H% and H%
- 3 bitonal: HL%, LH% and L!H%
- 1 tritonal: LHL%
- 3 monotonal: L-, !H- and H-
- 1 bitonal: LH-
An explanation about the prototypical phonetic realization and distribution of these accents is described in each one of the links.
5. The Miscellaneous Tier
The Miscellaneous Tier has been used for everything from noting non-speech events to commenting on labelling difficulties. Because much of the notation in this tier has not been standardized, it has been of limited use for drawing conclusions from large labelled corpora.
The miscellaneous tier is in essence a ‘comment’ tier for the optional marking of events of any kind other than the standard words, phonetics, tones, and disjunctures marked on the orthographic tier, the phonetic transcription tier, the tone tier, and the break index tier.
Even if our examples in this webpage do not contain this tier, we encourage to use it to note the following events with the corresponding standard labels:
Event | Event subtype | Label |
---|---|---|
disfluencies | such phenomena as stumbling over a word, or abruptly cutting off a word or phrase in midstream to make a fragment | phonetic error |
lexical self-corrections of parts of sentences | repair | |
lexical self-corrections of whole sentences | fresh start | |
hesitation pause | ||
differences in rate of speech | increase of the rate of speech | fast |
decrease of the rate of speech | slow | |
cough | cough | |
laugh | laugh | |
noises | other noises that parasite the data | noise |