Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

Linguistic Resources  
LDC Facilities

Introduction

The Linguistic Data Consortium has its offices on the top floor of 3600 Market Street in Philadelphia's University City Science Center. The eighth floor suite, with over 11,000 usable square feet, was configured specifically for LDC with 22 single, double and triple offices, large and small conference rooms, a recording booth, a focus group room and six laboratories including separate labs for broadcast news collection, participant recruiting, annotation and publications plus specially equipped telecommunications and data closets, corpus packaging workroom/mailroom and pantry.

The large conference room seats 30 and contains Windows and Unix workstations, power and network outlets for guests, a wireless network access point, high resolution computer projector, overhead projector and a large screen TV connected to the University's cable network. The small conference room seats four with network outlets for guests.

The consortium also has a specialized server area within the Franklin Building Annex, connected to offices with a redundant fiber optic Gigabit Ethernet link.

Networking Infrastructure

LDC maintains several systems supporting the following functions:

  • Web Servers:
    • 2x Quad Core Opteron Server, 2GB of RAM
    • Xeon server with 2 GB of RAM
  • Virtualization Servers:
    • 2x Quad Core Xeon Server, 16Gb of RAM (production)
    • 2x Quad Core Xeon Server, 16Gb of RAM (development)
  • Database Servers:
    • MySQL - Itanium server with 2 GB RAM
    • MySQL - Opteron server with 2 GB of RAM
    • MySQL - Virtualized slave server with 1Gb of RAM
  • File Servers: Multiple servers supporting 70 TB of storage
    • Main storage system: Sun 5320 with redundant fiber channel RAIDs
    • Additional storage: 12 FreeBSD systems with internal and external redundat RAIDs
    • Virtualized SCP server for data distribution
  • Backup System:
    • Replication system (Sun 4500) for dynamic data
    • 2 Tape robots for static data
    • 2 Backup servers
  • Administrative:
      Mail servers:
    • Core 2 Duo server with 2GB of RAM
    • 1 Dual Opteron server with 2 GB of RAM
  • Admin Directory space - Restricted Access Storage Area
  • Network:
    • Public
      • 24 switched Gigabit Ethernet ports
      • Optic fibre to the Internet
      • CIDR /23 network (equivalent to 2 class-C networks)
    • Private - 264 fully switched Gigabit Ethernet ports
    • Wireless network 802.11n
    • Optic fibre to the main server room
    • Real time monitoring server with alarming services

    In addition we have approximately seventy (70) Annotation/Transcription workstations running various operating systems such as Solaris, Windows, FreeBSD, and Linux. Sixty of these workstations are collected in four common work areas of varying size.

    Recording Rooms

    The Focus Group Room allows flexible configuration of audio and video recording devices. The room was built asymmetrically and furnished informally and wired extensively to provide a relaxing atmosphere with excellent sound capture capability. Audio/visual faceplates positioned just above the floor on three walls accept microphone and video camera jacks routing all wiring invisibly to a central panel in the control room next door. Staff can configure the room so that the collection equipment is as unobtrusive or intensive as necessary. Microphones include: four wireless transmitters with swappable head-mounted and lavalier microphones, two dynamic studio microphones, a PZM microphone, a shot-gun microphone, two conference room microphones and an eight-element microphone array. The control room houses wireless receivers, amplifiers, a multi-track digital recorder and digital capture boards in a high-performance workstation with disk array.

    The sound booth walls are constructed of triple layers of sheetrock, covered with sound isolating panels and then with acoustic foam panels and corner traps. The solid wood door is outfitted with a sweep. A triple-paned window with acoustically sealed 1/4" glass allows visual communication between the room and a control desk outside. Custom audio faceplates connect in-room microphones with external recorders. The in-room computer, an Eon X-terminal, has no moving parts to produce noise.

    Publications Laboratory

    Production systems in the Publications Laboratory include one Rimage 8100 DVD replicatior with 4 16X DVD burners, one Rimage Producer DVD replicator with 4 8X DVD burners, and one Logicube Hard Drive replicator capable of duplicating fifteen hard drives at once. Each DVD duplicator can duplicate and print up to 100 discs without supervision.

    Recruiting and Annotation Laboratories

    The Collection and Annotation Laboratories include 60 workstations. Each work space is equipped with a Sun or Intel workstation running Unix or Windows.

    Data Collection Laboratories

    Many of the innovations in LDC's new offices involve data collection infrastructure where multiple collection systems, partially overlapping in function, provide broad, redundant coverage.

    Three telephone collection platforms connected to separate T1 circuits provide substantial collection capacity and redundancy. Each system consists of a Windows 2000 server with Dialogic telephony hardware, development libraries, and customized computer telephony software. All three systems are capable of storing thousands of hours of recorded telephone conversations, and each system can process up to 24 simultaneous active lines. These systems are connected to the main LDC network by gigabit ethernet.

    LDC maintains 7 satellite dishes, providing access to C-Band, Ku-Band, DirecTV, and Dish Network Programming. A 3.7 meter solid dish, 3.1 sectional dish, and 1.2 meter solid dish are installed on movable, horizon-to-horizon mounts and are connected to computer controlled dish movers. Four small fixed DBS dishes provide access to DirecTV and Dish Network. We are currently operating one Wegener MPEG-1 receiver for SCOLA multilingual broadcasts, two dedicated DVB-S receivers for CCTV and PhoenixTV programming, one Chapperal M100+ receiver, and thirteen FTA DVB-S/MPEG-2 receivers for free to air broadcasts and wild feeds on both C and Ku bands. We have eight Dish Network receivers which provide the capability to receive all of Dish Network's domestic and international programming and one DirecTV receiver for domestic U.S. programming. A Dresseler active shortwave antenna together with an AOR wideband antenna cover the entire range of the electromagnetic spectrum used for speech communication. Finally, the University's Penn Video Network provides local, national and international television programming.

    A control computer coordinates the activities of all satellite dishes and receivers and CATV tuners/demodulators routing signals via a Knox AV matrix switch (256 inputs / 64 outputs) to eight Ubuntu Linux-based recording nodes. Each recording node is capable of simultaneously digitizing two streams of video+stereo audio as DV25 direct to local disk. The broadcast collection system also includes substantial, flexible monitoring capabilities via an integrated LCD monitoring matrix (nine separate video monitors, 4 channels of audio). In order to support additional high volume multimedia transcoding/processing tasks, the LDC runs a set of three dedicated Ubuntu-based multiprocessor transcoding nodes as well as an Intel MFSYS25 Modular Server System configured as a Linux mini-cluster with six xeon-based compute nodes and an integrated 2TB SAN.

    Three NT servers running BBN speech recognition software for English, Chinese and Arabic provide automatic audio indexing. A gigabit Ethernet switch connects all broadcast recording hardware to the LDC's static storage and backup facilities.

    The LDC has two Non-Linear Editing/Video Digitization workstations. Both systems run Windows XP; the first system includes Canopus editing software and hardware (DVRexRT Pro), and second system includes Matrox Digisuite DTV hardware. Digitization efforts are also supported by one professional quality SVHS VTR, two 2 DV VTRs, and a Panasonic multi-VTR edit controller.


  • About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

    Contact ldc@ldc.upenn.edu
    Last modified: Wednesday, 12-Nov-2008 11:00:57 EST
    © 1992-2009 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.