November 2023 Newsletter

Wednesday, November 15, 2023

New Corpora

Announcements

Join LDC for Membership Year 2024 
It’s time to renew your LDC membership for 2024. Current (2023) members who renew their membership before March 1, 2024 will receive a 10% discount. New or returning organizations will receive a 5% discount if they join the Consortium by March 1.

In addition to receiving new publications, current LDC members enjoy the benefit of licensing older data from our Catalog of 940+ holdings at reduced fees. Current-year for-profit members may use most data for commercial applications.

Plans for 2024 publications are in progress. Among the expected releases are: 

  • KASET: 147 hours of Sorani Kurdish and Kurmanji Kurdish conversational telephone speech and web broadcasts, 65 hours transcribed 
  • AIDA topic source data and annotations: multimodal source data and annotations in multiple languages (Russian, Ukrainian, English, Spanish) for information and entity extraction 
  • RATS Low Speech Density Data: 87 hours of Levantine Arabic, English, Persian, Pushto, and Urdu audio files selected from RATS speech activity detection and keyword spotting data sets, also including communications systems sounds and silence
  • Call My Net 1: 364 hours of conversational telephone speech recordings in Tagalog, Cebuano, Cantonese and Mandarin from speakers in the Philippines and China using various handsets under diverse noise conditions 
  • Ravnursson Faroese Speech and Transcripts: 109 hours of read speech from 433 native speakers with transcripts 
  • Diaspora Tibetan Speech: elicited, read and spontaneous speech from 73 native Tibetan speakers in Katmandu’s diaspora Tibetan community, some recordings transcribed
  • IARPA MATERIAL language packs: conversational telephone speech, transcripts, English translations, annotations and queries in multiple languages (e.g., Bulgarian, Somali, Georgian)
  • LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources and related tools in various languages (e.g., Farsi, Hungarian, Hindi, Amharic) 

For full descriptions of all LDC data sets, browse our Catalog. Visit Join LDC for details on membership, user accounts and payment.

Spring 2024 Data Scholarship Application Deadline
Applications are now being accepted through January 15, 2024 for the Spring 2024 LDC data scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarships page for more information about program rules and submission requirements.