Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



CALLHOME Egyptian Arabic Speech

Item Name: CALLHOME Egyptian Arabic Speech
Authors: Alexandra Canavan, George Zipperlen, and David Graff
LDC Catalog No.: LDC97S45
ISBN: 1-58563-114-0
Data Type: speech
Sample Rate: 8000 Hz
Sampling Format: 2-channel ulaw
Data Source(s): telephone conversations
Project(s): EARS, GALE, Hub5-LVCSR
Application(s): speech recognition
Language(s): Egyptian Arabic
Language ID(s): ARZ
Distribution: 3 CD
Member fee: $0 for 1997 members
Non-member Fee: US$1500.00
Reduced-License Fee: US$750.00
Extra-Copy Fee: US$450.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: Alexandra Canavan, George Zipperlen, and David Graff
1997
CALLHOME Egyptian Arabic Speech
Linguistic Data Consortium, Philadelphia

Introduction

The CALLHOME Egyptian Arabic corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Egyptian Colloquial Arabic (ECA), the spoken variety of Arabic found in Egypt. The dialect of ECA that this dictionary represents is Cairene Arabic.

Data

All calls, which lasted up to 30 minutes, originated in North America and were placed to locations overseas (typically Egypt). Most participants called family members or close friends.

This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (LDC97T19) are available separately, as is an associated lexicon and transducer (LDC97L19).

Updates

The "shorten" and "sphere" directories have been removed.

The sphere directory contained NIST "SPeech HEader REsources" (SPHERE): C-language source code libraries and utilities for manipulating NIST SPHERE-format waveform files.

The shorten directory contained files for Tony Robinson's "shorten" software for speech compression.

A more recent version of the SPHERE utilities is now available on the NIST web site; additional utilities for converting from SPHERE to other waveform file formats is also available at the LDC web site.

Content Copyright


About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

Contact: ldc@ldc.upenn.edu

(c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.