Membership Coordinator. Ilya fosters the continuance and development of LDC membership, oversees all distribution of corpora, and provides information regarding data licensing and usage.
Dr. Yiwola Awoyale Home Page
Yiwola's main research program is to prepare an electronic database for a modern dictionary of the Yoruba language, a Kwa language of the Niger-Congo family. Yiwola has joined LDC on sabbatical leave from the University of Ilorin, Nigeria where he is Professor and Head of Department of Linguistics and Nigerian Languages. Yiwola participates actively in the teaching and planning of the Yoruba language for many years, both at university and national levels. In addition to the dictionary project, he also co-ordinates the activities of the African Language Research Council (ALRC) jointly hosted by the LDC and the African Studies Centre. He teaches Yoruba in the African language teaching program of the African Studies Centre.
Dr. Moussa Bamba Home Page
Senior Researcher. Moussa's research programme is to prepare an electronic database for the lexicon of three Mandekan (Manding) languages of the Mande group of the Niger-Congo family: (a) Bambara (Bamanankan or Bamanan), (b) Mawukakan (also known as Mawu, Mahou or Mau) and (c) Odienne Jula. Moussa has been a Postdoctoral Fellow at IRCS since 1994 and is actively involved in the teaching of Manding languages both in Africa and outside for many years.
Ann Bies Home Page
Senior Research Associate. Ann works on projects involving the syntactic ("treebank") and semantic annotation of English and Arabic, including the Arabic Treebank, the Biomedical Information Extraction (ITR/E) project, the English/Chinese Treebank, and the English/Arabic Treebank. Her previous work includes contributing to the original Penn Treebank, as well as creating an annotated corpus of Early New High German.
Business Administrator. Natalia provides direct financial and administrative support to the Linguistic Data Consortium. Manages all financial transactions and financial reporting. Assists director, executive director and project managers in long and short term planning, creation of budgets, and budget analysis. Acts as liaison between LDC staff and management as well as multiple Penn departments, and government sponsors. Manages HR activities and payroll for all full-time and part-time employees.
Chris is a programmer analyst in the data collections group, providing support to local and remote data collection, automated speech recognition, and machine translation systems. He has worked on the LCTL, Greybeard, GALE, RATS, and TRECVID projects and currently supports Mixer, HAVIC, and LRE.
Dr. Christopher Cieri Home Page
Executive Director. Chris provides oversight for the Linguistic Data Consortium including planning, operations, project management, external relations and financial performance.
Programmer Analyst. Mike is involved with the Arabic Treebanking program, improving tools for annotation of Arabic corpora and English translations and streamlining the annotation pipeline. He is also involved in the development of online tools for teaching Arabic. His contributions include GUI and web development with an eye towards improving usability. Mike's other pursuits include generative computer music written in the Supercollider language.
Manager, External Relations. Denise is
responsible for the overall management of the External Relations area
which includes intellectual property rights, licensing, distribution,
publications, membership and the LDC's newswire and broadcast data
Lead Annotator. Joseph acts as lead linguistic annotator and coordinator
on projects including Machine Reading and Knowledge Base Population. His
responsibilities include developing annotation procedures, training and
supervising annotation staff, performing linguistic annotation, and
overseeing annotation studies involving human subjects.
Lead Programmer Analyst. As the founding member of the LDC's technical staff, Dave has played a role in the production, preparation and maintenance of virtually every data collection that has been made available through the LDC. Dave's focus has recently shifted toward design of corpus structures and specifications, design of custom user interfaces for transcription and annotation, and the planning and layout of computational and media resources to accommodate the capture and handling of data in large quantity from varied sources.
Lead Annotator. Kira is responsible for linguistic annotation, guidelines development and annotator training for projects including DARPA Machine Reading.
Dr. Stephen Grimes Home Page
Seth works on various aspects of annotation projects for English and Arabic, concerning both morphological annotation and syntactic structure ("treebanking"). He is particularly concerned these days with adapting ideas from natural language processing (NLP) to create new techniques for quality control and consistency checking in treebanks. He is also working on issues related to the integration of Arabic morphological and syntactic information in a pipeline, both for annotation and NLP.
Programmer Analyst. Haejoong's primary tasks include annotation tool development, data modeling and data handling. He is interested in linguistic database systems that enable efficient storage and query of linguistic data, and works for the Querying Linguistic Database (QLDB) project. He is currently managing the Annotation Graph Library (AGLIB) software package, a part of AGTK. Open Language Archives Community is another project that he provides with programming support.
Dr. Xuansong Li
Xuansong acts as Chinese lead annotator and coordinator for several projects including GALE Word Alignment. She is responsible for hiring and training new annotators, developing annotation guidelines and quality control methods and managing the daily activities of the Chinese annotation team.
Dr. Mark Liberman Home Page
Mark is professor of Linguistics and Computer and Information Science at the University of Pennsylvania (1990-) and director of the Linguistic Data Consortium (1992-). His research interests are in phonetics, phonology, speech technology and computational linguistics. He is on the editorial boards of Speech Communications, Computer Speech and Language and The International Journal of Corpus Linguistics. Mark came to Penn after being a member of the technical staff and department head of the Linguistics Research Department at AT&T Bell Laboratories (1975-1990).
Xiaoyi's responsibilities include using speech recognition techologies to collect and create linguistic data, researching data collection technology, contributing to LDC online and other tasks related to computational linguistics.
As of November 2001, Mohamed Maamouri is a Senior Research Administrator at LDC where he heads the Arabic Treebanking Group and the development of Arabic resources and projects. He was from1995-2001 the Associate Director of the International Literacy Institute (ILI) at the Graduate School of Education of the University of Pennsylvania. Dr. Maamouri is a Professor of Linguistics and English at the University of Manouba (1967-1995) in Tunisia, and formerly the Director of the Bourguiba Institute of Modern Languages (1975-1988) at the University of Tunis. Dr. Maamouri specializes primarily in Arabic linguistics, reading, language development, corpus linguistics, and sociolinguistics. His other interests include educational linguistics, language and literacy acquisition, language policy and planning, as well as bilingualism and multilingual issues.
Mark Mandel Home Page
Mark is the Research Administrator of PennBioIE and Less Commonly Taught Languages. For the former, he plans and manages the annotation of biomedical texts. For the latter, he does linguistic work for the collection and development of resources in languages where they are in short supply. He views the coordination part of his work as "Oh, it's an interpretation task! I can deal with that."
Lead Annotator and Technical Assistant. Justin works on English Treebank projects; his responsibilities include performing linguistic annotation, training and managing part-time annotation staff and providing technical support. Prior to joining the LDC in 2005, his graduate work concentrated on historical linguistics, Sanskrit and Paninian grammar.
Application Developer. As part of the delivery team, Kate works on sanity checking outgoing packages and scripting infrastructure for building deliveries. She supports web based annotation with work on the database backend. She also supports the Chinese and Arabic word alignment annotation pipelines for BOLT. Kate has an interdisciplinary background in logic and is interested in machine learning, functional programming, belief revision, and logic systems with more than two truth values.
Marketing/Communications Coordinator - Marian develops marketing strategies to increase the dimensions of the LDC's community in addition to conducting the LDC's market research efforts and target market identification.
Project Manager. Zhiyi manages translation activities for sponsored programs including GALE and NIST Open MT. Previously Zhiyi managed GALE Distillation and ACE and acted as lead annotator for several Chinese annotation efforts.
Senior Associate Director. Stephanie manages LDC's Annotation Group and oversees linguistic resource development for sponsored programs including GALE, MADCAT, Machine Reading, ACE, REFLEX, EARS, TIDES and NIST Open Technology Evaluations including MT, TAC, RT and TREC Video Events.
Kevin Walker Home Page
Kevin has been on the programming staff at the LDC since 1998. He is the POC for the LDC's video contributions to TREC-VID and VACE. His areas of responsibility include:
Lead Annotator/Coordinator. Dalal is responsible for Arabic translation resources including parallel text and word alignment. She manages the Arabic translation and word alignment teams, performs quality control and develops annotation guidelines. Past projects have included Biomedical Information Extraction (ITR/E) and TIDES.
Lead Annotator. Ramez acts as Arabic lead annotator for ACE and GALE Transcription, and his duties include annotator hiring, training, management and quality control. He has also acted as lead annotator for the Biomedical Information Extraction project. Ramez has a B.S. in biological sciences, M.D. degree and Master degree in Microbiology from Egypt.
Dr. Steven Bird Home Page
Senior Research Associate. Steven works on data models, formats and tools managing language resources. Steven is Associate Professor of Computer Science at Melbourne University (Australia) and collaborates on several LDC research projects.
Arabic Language Analyst. Safa performs linguistic annotation and analysis, supervises Arabic annotation staff and coordinates Arabic human subjects collection. He supports multiple projects including GALE, MADCAT and TransTac.
Programmer/Analyst. Gary performs technical support and data/tool development for LDC annotation projects including GALE translation and word alignment.
Wajdi is currently managing the Arabic POS and Treebank workflow effort locally and remotly. He had been working as an Arabic computational linguist at Nstein Technologies in Montreal and than at the JRC, a research center of the European Comission in Italy. His main interests are Arabic computational linguistics and in particular named entities extraction techniques, Propbank and lexicon creation. Wajdi holds a B.A in computational linguistics from the University of Quebec at Montreal and an M.A in linguistics from the University of Montreal.
External Relations Programmer. Angelo supports LDC's External Relations Group by developing and maintaining LDC's business systems and coordinating and preparing publications of language resources.
Tony is the primary programmer for the LDC Publications Group and is responsible for converting, documenting, and verifying LDC publications, as well as managing and training publications staff in order to assure the release of the publications on their scheduled dates. He has contact with LDC data providers and sponsors to determine data quality, structure, and output specifications.
Bill is a Systems Administrator for the LDC Systems Department with over twelve years of experience in administering BSD systems. He has a BA in Philosophy from Rutgers University in New Brunswick, NJ, and is currently taking graduate courses in Computer Information Systems and Linguistics at Penn. He lives on a boat on the Delaware River and is an avid bicyclist.
A graduate student and teaching assistant at ISLT, Basma is serving as a junior visiting scholar. While here Basma will focus on learning Arabic Treebank II annotation and methodology.
Until August 2005, James was the Manager of the External Relations areas of Membership and Intellectual Property Rights (IPR) & Licensing. He had oversight for LDC's Newswire & Broadcast News data collection efforts and coordinated LDC data resources and licenses with various research projects and sponsored evaluations.
Project Manager. Meghan manages speech annotation projects including GALE, RT and Phanotics. She also manages machine translation post-editing efforts for the GALE and MADCAT projects. Previously Meghan acted as lead annotator and manager for multiple LDC projects including HARD, TDT, TREC and DUC.
Financial Coordinator. Johnathan has several financial duties including: processing financial transactions, maintain records of activities and funding schedules for all LDC grants and administrative accounts, resolve issues with vendors, reconcile transactions, resolve wayward charges, and assist the Business Administrator with grant management and audits.
Chad Jackson is a systems programmer who is responsible for creating and administering servers for the LDC. He also provides systems-level programming for the LDC as well as desktop, workstation, and server support. Chad is currently pursuing a Masters' Degree in Computer Science from the University of Pennsylvania.
Sr. Research Programmer/Manager of Software Development. Kazuaki leads a group of technical staff to create various technical and research resources for LDC's data creation projects, such as GALE, MADCAT, MachineReading, NIST OpenMT, and TAC/KBP. He is also a linguist with interests in phonetics, phonology and computational linguistics.
Nii Olokwei Martey
Until Summer 2005, Nii managed and coordinated various work areas at the LDC, including collection, recruitment, annotation, training and quality control processes. Starting in 1996, Nii worked on and managed numerous projects at the LDC, including HUB4, HUB5, SPINE and the TDT project.
Human Subjects Data Collection Coordinator. Abby coordinates various human speech collection projects including Mixer, LVDID, and Greybeard. She helps develop guidelines, train new employees, and handle payments.
Until Fall 2005, Mike collected computational linguistic resources for Less Commonly Taught Languages (aka "Low Density Languages"). He is particularly interested in techniques to rapidly analyze the morphology of a language, including both ways of assisting linguists to do morphological and phonological analysis, and machine learning methodologies. His research interests include phonology and morphology, particularly as these relate to computational linguistics. In the past, he has helped document endangered languages of Ecuador and Colombia.
J. Michael Schultz
Until Summer 2005, Mike researched and developed search technology for use at the LDC. Two main applications of his research are LDC Online and text annotation. Mike's main interests are in information retrieval, topic detection and tracking and information extraction.
Manager. Heather manages information extraction projects including GALE
Distillation, TAC and TRECVid Events. Previously Heather managed the
Less Commonly Taught Languages project, and acted as lead annotator for LCTL, ACE and Distillation.
Project Manager: Until December 2006, Shudong functioned as a Project Manager for the collection and annotation of language data (especially Mandarin/English). He established procedures for projects and trained and supervises his staff. Shudong also provided needs assessment and high level design of project interfaces, as well as representing the LDC's efforts to sponsors and the research community.
Programmer Analyst. Until Fall 2006, Ke developd software for collecting, processing, and delivering newswire data. He was also responsible for designing and implementing annotation interfaces for broadcasting audio data. He processed and created parallel-language data for MT research community. He took principle roles in creating Gigaword, Cynewulf, and Aquaint corpora.
Dr. Olga Babko-Malaya
Project Manager, Text Annotation. Olga works on projects involving semantic annotation, including distillation task for GALE and Arabic Propbank. She previously contributed to the English Propbank and Ontobank projects and managed the Propbank II annotation effort. Her work experience includes knowledge engineering at Teknowledge Corporation and research in computational linguistics at the Academy of Sciences in St.Petersburg, Russia. Olga holds a PhD in Linguistics from Rutgers University with a specialization in lexical and formal semantics and syntax-semantics interface.
Senior Programmer Analyst. Tim is involved in Arabic POS-tagging, Treebanking, and EARS. His Arabic morphological parser is distributed by the LDC. His primary research interest is Arabic corpus-based lexicography. He previously was involved in Arabic MT at Alpnet and Arabic text input methods for cell phones at Tegic/AOL.
Andrew W. Cole
Andy is the Associate Director for Operations; which includes External Relations, Publications, Management Information Systems (MIS), and IT/System Administration. He is married to Janet Lewis, MSN/CNM (Certified Nurse Midwife), and lives in West Philadelphia.
Frank Di Maria
Programmer. Frank works for the External Relations department and has been active in the development of the web interface for LDC business systems, developing licensing and other sales data for the LDC.
External Data Coordinator. Lauren oversees technical infrastructure development for the Annotation Group and for externally funded projects including GALE. Previously Lauren acted as project manager for translation, and managed outsourced annotation
Huaichuan "Hubert" Jin
Hubert is a senior programmer analyst at LDC. His current primary project is the Arabic Treebanking, where he develops tools and manage data for the project. His interests are information retrieval, machine learning, natural language processing and speech recognition. He was a former researcher at BBN working on Hub4 and TDT.
Mark is responsible for the design, implementation and day-to-day operation of the LDC's
Dr. Wigdan A. Mekki
Research Coordinator. Wigdan joined the LDC in April of 2002. She had been working as a computational linguist at France-Telecom Research and Development as a Post-Doc Fellow before joining the LDC. Her research dealt with morpho-syntactic analysis, summarization and grammar formalization. She earned her Ph.D in France at the University of Lyon. Her mission at the LDC is to work as a lead linguist on multiple projects involving the creation of Arabic language resources, development of annotation guidelines and quality assurance measures. She will be providing training, support, and documentation for the annotation staff. Her primary project is currently Arabic TreeBanking.
Julie develops tools for use by the annotation group. Her current primary project is ACE; she is responsible for developing tools for the creation, conversion, validation and analysis of annotation data. In addition, she is in charge of the workflow system used to manage several annotation projects.
From February 2005 till January 200, Shawn designed user interfaces, developed web applications, and helped maintain systems infrastructure. He was most proud of his work on LDC Online and the Annotation Collection Kit (ACK). His main interest were user-centered design, data mining & visualization, internet technologies, large scale systems and virtual machines.
Coordinator. Alexis is responsible for human subject coordination for LDC's Telephone Collections. Previously she acted as English lead annotator for the ACE Project.
Project Manager, Information Extraction. Christopher coordinates several Information Extraction annotation projects including ACE, and he manages the Less Commonly Taught Languages project. His primary responsibilities include working with program sponsors and affiliated researchers to specify corpora and annotation tasks, identifying deliverables that support program goals, and managing the execution of those deliverables.
Carrie Ann Theisen
Lead Annotator. Carrie is the Lead Annotator for the Less Commonly Taught Languages project. She is responsible for hiring, training, and managing native speakers of Less Commonly Taught Languages.
Office Manager. Keith's responsibilities include managing the business office, weekly payroll, and facilities liaison.
About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data
Last modified: Friday, 08-Jan-2010 03:24:39 EDT
© 1996-2010 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.