Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Middle East Technical University Turkish Microphone Speech v 1.0

Item Name: Middle East Technical University Turkish Microphone Speech v 1.0
Authors: Ozgul Salor, Tolga Ciloglu, Bryan Pellom, and Mubeccel Demirekler
LDC Catalog No.: LDC2006S33
ISBN: 1-58563-384-4
Release Date: May 18, 2006
Data Type: speech
Sample Rate: 16000 Hz
Data Source(s): microphone speech
Language(s): Turkish
Language ID(s): tur
Distribution: 1 CD
Member fee: $0 for 2006 members
Non-member Fee: US $1500.00
Reduced-License Fee: US $750.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Ozgul Salor, et al.
2006
Middle East Technical University Turkish Microphone Speech v 1.0
Linguistic Data Consortium, Philadelphia

Introduction

This file contains documentation on the Middle East Technical University Turkish Microphone Speech v 1.0, Linguistic Data Consortium (LDC) catalog number LDC2006S33 and ISBN 1-58563-384-4.

The corpus has been collected at the Middle East Technical University (METU) as part of a collaborative work between the Department of Electrical and Electronics Engineering of the Middle East Technical University in Turkey and the Center for Spoken Language Research (CSLR) of the University of Colorado at Boulder, USA. The collaboration has been supported by TUBITAK, the Scientific and Technical Research Council of Turkey, through a combined doctoral scholarship program. The corpus was used to port the Speech Recognition System of CSLR, SONIC, to Turkish.

The corpus contains text, speech and alignment files. The corpus is of size ~600Mbytes. 120 speakers (60 male and 60 female) speak 40 sentences each (aproximately 300 words per speaker), which makes approximately 500 minutes of speech in total. The 40 sentences are selected randomly for each speaker from a triphone-balanced set of 2,462 Turkish sentences. The speakers are selected from students, faculty and staff at METU and all are native speakers of Turkish. The age range is from 19 to 50 years with an average of 23.9 years.

The data has been digitally recorded with a Sound Blaster sound card on a PC at a 16 kHz sampling rate.

Samples

Please listen to this audio sample and examine its companion transcript for an example of the data contained in this publication.

Content Copyright

Portions © 2001, 2002, 2005 Middle East Technical University, Tolga Ciloglu, Ozgul Salor, Bryan Pellom, Kadri Hacioglu, Mubeccel Demirekler, © 1993, 2006 Trustees of the University of Pennsylvania


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact: ldc@ldc.upenn.edu

(c) 1992-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.