July 2022 Newsletter

Friday, July 15, 2022

New Corpora 

Qatari Corpus of Argumentative Writing

Second DIHARD Challenge Evaluation - SEEDLingS


Fall 2022 LDC Data Scholarship Program 
Student applications for the Fall 2022 LDC Data Scholarship program are being accepted now through September 15, 2022. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page

30th Anniversary Highlight: ATIS0 Complete 
The ATIS corpora were among the first publications that appeared with the launch of LDC’s catalog in 1993. ATIS0 Complete (LDC93S4A) is comprised of spontaneous speech, read speech and other material from participants in the ATIS collection that is contained in ATIS0 Pilot (LDC93S4B), ATIS0 Read (LDC93S4B-2) and ATIS0 SD-Read (LDC93S4B-3).

The ATIS (Air Travel Information Services) collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory for Computer Science, National Institute for Standards and Technology and SRI International.

The ATIS collection has been widely used to further research in spoken language understanding and slot filling (Kuo et al., 2020). Other data sets published from the collection include ATIS2 (LDC93S5), ATIS3 Training and Test Data (LDC94S19, LDC95S26) and, more recently, Multilingual ATIS (LDC2019T04) and ATIS - Seven Languages (LDC2021T04).

All ATIS corpora are available for licensing by Consortium members and non-members. Visit Obtaining Data for more information.