![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
||||
|
|
Linguistic Resources LDC Facilities
The Linguistic Data Consortium has its offices on the top floor of 3600 Market Street in Philadelphia's University City Science Center. The eighth floor suite, with over 11,000 usable square feet, was configured specifically for LDC with 22 single, double and triple offices, large and small conference rooms, a recording booth, a focus group room and six laboratories including separate labs for broadcast news collection, participant recruiting, annotation and publications plus specially equipped telecommunications and data closets, corpus packaging workroom/mailroom and pantry. The large conference room seats 30 and contains Windows and Unix workstations, power and network outlets for guests, a wireless network access point, high resolution computer projector, overhead projector and a large screen TV connected to the University's cable network. The small conference room seats four with network outlets for guests. The consortium also has a specialized server area within the Franklin Building Annex, connected to offices with a redundant fiber optic Gigabit Ethernet link.
[ top ] IT and Networking Infrastructure LDC IT infrastructure is a comprehensive, autonomous and internally-managed system within University of Pennsylvania computing. This allows for the modularity and flexibility required to best match the needs of research projects. LDC infrastructure includes:
LDC maintains several systems supporting the following functions:
In addition we have approximately seventy (70) Annotation/Transcription workstations running various operating systems such as Solaris, Windows, FreeBSD, and Linux. Sixty of these workstations are collected in four common work areas of varying size. [ top ] Human Subjects Data Collection Laboratories LDC maintains facilities for producing recordings of speech both on-site and via telephone.
LDC operates three computer telephony systems for specifically for collecting speech from the telephone network. Each system is connected to a dedicated T-1 line, which provides 24 audio channels and has Toll-Free service enabled. The systems incorporate Dialogic telephony hardware; specifically, each system houses a Dialogic D/480JCT-2T1 telephony board which can perform interactive voice response functions and call logging functions. In addition, one of the systems incorporates an AudioCodes DP6409 Passive-Tap call logging board. The telephony hardware provides the ability to record up to12 two person conversations simultaneously. Customized IVR software is installed on each system; the telephony application handles all interactions with callers, connects callers to one another, and starts/stops recordings. Each system includes a set of supporting software which handles automatic transfers of recordings to the main LDC network.
[ top ] Broadcast Data Collection Laboratories LDC operates an extensive collection system dedicated to the capture and processing of broadcast content from a wide range of sources. The system is able to collect audio and video from satellite, CATV, and off-the-air. The satellite reception facilities allow us to address up to three simultaneous C-Band and Ku-Band satellite downlinks, as well as Dish Network and DirecTV satellite downlinks.
[ top ] Facilities For Off-Site Broadcast Collection In addition to the primary broadcast collection system, the LDC has also deployed two portable broadcast collection platforms outside of the United States. Each portable platform is a TiVO style digital video recording (DVR) system capable of recording two streams of A/V material simultaneously. The platform includes integrated analog CATV (NTSC and PAL) and digital Satellite DVB-S reception components; it supports international specifications and is capable of recording programming outside of the United States. The system has a very small footprint and is suitable for transportation as a piece of carry-on luggage. The portable platform and the main LDC collection system share the same code base and rely on a modular, unified hardware specification. Improvements in the main collection platform therefore translate into benefits for both platforms. The portable system runs Ubuntu linux, using a WinTV-PVR-500 for analog cable and a Technotrend Premium S-2300 PCI DVB-S receiver for DVB satellite reception. dvbstream is utilized for satellite recording, and ivtv is used for cable recording. The portable platform deployed in Hong Kong is currently dedicated to collecting multiple streams of CCTV programming and is maintained by local technical staff. The platform deployed in North Africa is maintained remotely by personnel at LDC. Recordings are scheduled from LDC and automatically downloaded into LDC’s collections server. In each case, LDC is able to collect high-quality broadcast data with minimal equipment and in the case of data collected in North Africa, to receive that data immediately. [ top ] The LDC Publications Group maintains a robust production capacity and can produce publications on a variety of media. The LDC's Publication Laboratory is equipped with two Rimage CD/DVD duplicators and an OmniClone hard drive replicator. The CD/DVD duplicators have capacity of two hundred discs and can print full color labels with high resolution graphics right on the disc face. With addition of a Blu-Ray DVD duplicator, LDC can now utilize high capacity optical media, allowing for DVD releases of up to 50 Gb in size on a single disc. Large data sets may also be produced to span discs. Each disc contains an install script to reassemble the parts. Publications uses a Just in Time Inventory system with a web interface to provide quick and responsive fulfillment of orders. LDC can also produce very large publications on hard drives, producing up to fifteen copies at once. The hard drive replicator can also perform diagnostics, erase sensitive data from drives in bulk as well as repair bad hard drives. This allows the Publications Group to maintain pools of reusable hard drives dedicated to specific projects. All systems employ hardware and software verification to ensure the reliability and quality of all releases. [ top ] Software Development Infrastructure
LDC's technical staff has developed a large amount of custom-built software for data collection, data processing, manual annotation of text, audio, image and video data (e.g., transcription, translation, named entity annotation, relation annotation), annotation workflow management, text indexing and searching, automatic annotation (e.g., language identification, content duplicate identification, segmentation, tokenization, tagging, morphological analysis), and quality control. These software resources are ready for reuse in similar future tasks. In particular, some of these resources, such as AGTK, are component-oriented and are specifically designed for reuse in various applications. LDC also has experience in using a wide range of third-party software for research, data production, and software development. All of these software resources are accessible from any of our centrally managed Linux and FreeBSD workstations via NFS file volumes. A newly developed application can be immediately deployed in data collection and annotation tasks by LDC staff members. LDC's software developers are equipped with desktop development workstations, computational servers, relational database servers, web servers, software development resources (e.g., various compilers, interpreters, debuggers, text editors, GUI-builders, IDEs, revision control systems), issue tracking systems, e-mail discussion lists, a wiki-based knowledge base and other documentation. [ top ] |
|||
|
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Contact ldc@ldc.upenn.edu |
||||