Acoustic analysis pipeline for focus production in Yi-Mandarin bilingual speakers.
Yi (Nuosu) is a Tibeto-Burman language that lacks prosodic focus marking (no post-focus compression). When Yi-dominant bilinguals speak Mandarin -- a language that uses both on-focus F0 expansion and post-focus compression (PFC) -- do they transfer the full Mandarin prosodic system, or produce expansion without compression? This repo contains the extraction, storage, and statistical analysis pipeline for investigating that question.
| Yi speakers | Beijing Mandarin controls | |
|---|---|---|
| N | 20 (10F, 10M) | 5 (3F, 2M) |
| L1 | Yi (Nuosu) | Beijing Mandarin |
| Task | Read Mandarin sentences | Read Mandarin sentences |
4 focus conditions (broad / subject / verb / object focus) x 4 sentences x ~5 repetitions = ~80 recordings per speaker, ~2,000 total.
Each recording includes .wav, Praat .TextGrid (words + phones tiers), and .Pitch files.
wav + TextGrid
|
v
[extract_f0.py] parselmouth: word-level F0 contours (10-point normalised),
duration, intensity, speaker-normalised semitones
|
v
[build_database.py] SQLite: speakers / recordings / f0_word / f0_contour
| + example queries (group aggregation, delta computation,
v missing-data checks)
[focus_lmm.R] RSQLite -> lme4 LMMs:
F0 delta ~ focus_region * group + (1|speaker)
+ emmeans contrasts, contour plots
# 1. extract F0 features from raw corpus
python src/extract_f0.py /path/to/yi_language/ -o data/
# 2. build SQLite database
python src/build_database.py data/df_word.csv data/df_contour.csv -o data/corpus.db
# 3. run statistical analysis (requires R with lme4, RSQLite)
Rscript analysis/focus_lmm.RYi-Mandarin bilinguals show on-focus F0 expansion comparable to Beijing Mandarin controls, but without post-focus compression -- consistent with L1 transfer of Yi's prosodic typology.
Li, C., et al. (2026). Prosodic realisation of focus in Yi-Mandarin bilingual speakers: on-focus expansion without post-focus compression. Under review, Interspeech 2026.
Python >= 3.9: pip install -r requirements.txt
R: lme4, lmerTest, emmeans, RSQLite, ggplot2
Raw audio data is not included in this repository.