The IFA corpus: A phonemically segmented Dutch speech database R.J.J.H. van Son Louis C.W. Pols Affiliation: The Institute of Phonetic Sciences of the University of Amsterdam (IFA) The Amsterdam Center for Language and Communication (ACLC) Herengracht 338 NL-1016CG Amsterdam The Netherlands The IFA corpus is an open source database of hand-segmented Dutch speech. The corpus contains high quality two-channel speech recordings, using both a head-mounted and a fixed microphone, from 4 male and 4 female speakers in 8 different speaking styles: 1.Informal story telling face-to-face to an "interviewer" (I) 2.Retelling a previously read narrative story (R) And reading aloud: 3.A narrative story (T) 4.A random list of all sentences of the narrative stories (S) 5."Pseudo-sentences" (PS) 6.Lists of selected words from the texts (W) 7.Lists of all distinct syllables from the word lists (Sy) 8.Idiomatic and "diagnostic" sequences (e.g., the alphabet, numbers, /hVd/) (Pr) The total amount of labeled speech is 5:30 hours (50,000 words and 200,000 segments), speech preparation took around 3 person-weeks per speaker. Hand segmentation took 1,000 hours of labeling alltogether. The asymptotic segmentation speed was about one word, or four boundaries, per minute. An evaluation showed that the Median Absolute Difference of the segment boundaries was 6 ms between labelers, and 4 ms within labelers. Label differences (substitutions, insertions, and deletions) were found in 8% of the segments between labelers and 5% within labelers. The IFA corpus incorporates some fundamental principles of Speech Databases. 1) All speech is annotated on several, aligned, linguistic levels, e.g., from the sentence down to the phoneme level. These annotations are strictly hiearchical so they can always be expressed in a nested XML format. 2) Each and every annotated fragment of speech, e.g., phonemes or words, has a unique identifier that is used as a pointer to both the speech fragment (i.e., storage and begin and end times) and all related data (annotations, acoustic analysis, perceptive tests). 3) All data is treated in a content neutral way. Each annotation is just an aligned ASCII text string (in Praat TextGrid format) just as each acoustic analysis is treated identical to the original Speech Waveform. 4) All annotations are amenable to reversible corrections. Both "branching" of annotations and retrieving old (original) annotations are possible. 5) Extensive meta-data on the speakers is collected. ALL data are stored in a Relational Database and are searchable with SQL. Each realization of linguistic elements has Parents, Neighbours, and Children which can be accessed by joining the relevant tables using the unique fragment pointers. Annotations (e.g., POS, Lemma's, ToDI) and measurements (e.g., Shadowing delays, reaction times, F0-F3 values) can be added and removed without affecting access to other data.