LDC Papers

The following papers, presented or published by LDC staff, are listed by year and then alphabetically by the last name of the first author.

 2008 |  2007 |  2006 |  2005 |  2004 |  2003 |  2002 |  2001 |  2000 |  1999 |  1998 |  Undated 

2007

Christopher Cieri
Phonological Variation in Multi-Dialectal Italy: distinguishing e from ε
NWAV 2007, Philadelphia, October 11-14, 2007
Available: Presentation Slides

Christopher Cieri, Stephanie Strassel, Meghan Lammie Glenn, Lauren Friedman
Linguistic Resources in Support of Various Evaluation Metrics
MT Summit XI, Workshop on Automatic Procedures in MT Evaluation, Copenhagen, September 9-14,2007
Available: Presentation Slides

Christopher Cieri, Linda Corson, David Graff, Kevin Walker
Resources for New Research Directions in Speaker Recognition: The Mixer 3, 4 and 5 Corpora
Interspeech 2007, Antwerp, August 2007.
Available: Paper in PDF, Presentation Slides

K. Ganchev, K. Crammer, F. Pereira, G. Mann, K. Bellare, A. McCallum, S. Carroll, Y. Jin, P. White.
Penn/UMass/CHOP Biocreative II Systems
Biocreative 2. [In Press]
Available: Paper in PDF

Kuzman Ganchev, Fernando Pereira, Mark Mandel, Steven Carroll, Peter White
Semi-automated Named Entity Annotation
Linguistic Annotation Workshop 2007 [In Press]
Available: PDF

2006

Olga Babko-Malaya, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer, Mitch Marcus, Seth Kulick, Libin Shen
Issues in Synchronizing the English Treebank and PropBank
Frontiers in Linguistically Annotated Corpora, A Merged Workshop with 7th International Workshop on Linguistically Interpreted Corpora (LINC-2006) and Frontiers in Corpus Annotation III, Coling/ACL 2006 Available: Paper in PDF

Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, Matthew Lease
Linguistic Resources for Speech Parsing
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Christopher Cieri
Linguistic Resources, Development and Evaluation
Chapter 8 in Laila Dybkjær, Holmer, Hemsen and Wolfgang Minker, Evaluation of Text and Speech Systems, Kluwer Academic Publishers
Available: Forthcoming

Christopher Cieri, Mark Liberman, Victoria Arranz and Khalid Choukri
Linguistic Data Resources
Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual Speech Processing, Elsevier, Academic Press, ISBN 13: 978-0-12-088501-5. April 2006.
Available: Elsevier's Page

Christopher Cieri
What is Quality? Invited Talk at the Workshop on Quality Assurance and Quality Measurement for Language and Speech Resources
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Presentation Slides

Christopher Cieri, Mark Liberman
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF, Presentation Slides

Christopher Cieri, Walt Andrews, Joseph P. Campbell, George Doddington, Jack Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, Kevin Walker
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF, Presentation Slides

Ryan Gabbard, Seth Kulick, Mitchell Marcus
Fully Parsing the Penn Treebank
HLT-NAACL, 2006
Available: Paper in PDF

David Graff, Tim Buckwalter, Hubert Jin, Mohamed Maamouri
Lexicon Development for Varieties of Spoken Colloquial Arabic
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Yang Jin, Ryan McDonald, Kevin Lerman, Mark Mandel, Steven Carroll, Mark Y Liberman, Fernando Pereira, Raymond Winters, Peter White
Automated recognition of malignancy mentions in biomedical literature
Open Access: BMC Bioinformatics 7:492
Available: Paper in PDF

Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Xiaoyi Ma, Christopher Cieri
Corpus Support for Machine Translation at LDC
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Maamouri, Mohamed; Ann Bies and Seth Kulick
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
Machine Translation SIG of the British Computer Society Conference Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi
Developing and Using a Pilot Dialectal Arabic Treebank
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Kazuaki Maeda, Christopher Cieri, Kevin Walker
Low-cost Customized Speech Corpus Creation for Speech Technology Applications
LREC 2006: Fifth International Conference on Language Resources and Evaluation

Kazuaki Maeda, Haejoong Lee, Julie Medero, Stephanie Strassel
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
LREC 2006: Fifth International Conference on Language Resources and Evaluation

Mark Mandel
Integrated Annotation of Biomedical Text: Creating the PennBioIE Corpus
Presented at Text Mining, Ontologies and Natural Language Processing in Biomedicine, Manchester, UK, March 20 - 21, 2006
Available: Abstract, Presentation Slides in PDF

Ryan McDonald, Kevin Lerman, and Fernando Pereira
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
Computational Natural Language Learning (CoNLL-X), 2006 Available: Paper as PDF

Julie Medero, Kazuaki Maeda, Stephanie Strassel, Christopher Walker
An Efficient Approach for Gold-Standard Annotation: Decision Points for Complex Tasks
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF

Stephanie Strassel, Andrew W. Cole
Corpus Development and Publication
LREC 2006: Fifth International Conference on Language Resources and Evaluation Available: Paper in PDF and Poster in PPT

Stephanie Strassel, Christopher Cieri, Andy Cole, Denise DiPersio, Mark Liberman, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda
Integrated Linguistic Resources for Language Exploitation Technologies
LREC 2006: Fifth International Conference on Language Resources and Evaluation
Available: Paper in PDF, Presentation Slides

Jiahong Yuan, Mark Liberman, Christopher Cieri
Towards an Integrated Understanding of Speaking Rate in Conversation
The Ninth International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP), Pittsburgh, Pennsylvania
Available: Paper in PDF, Presentation Slides

2005

Ann Bies, Seth Kulick, Mark Mandel
Parallel Entity and Treebank Annotation
Presented at Frontiers in Corpus Annotation II: Pie in the Sky, ACL 2005 workshop, Ann Arbor, June 29, 2005
Available: Paper in PDF

Violetta Cavalli-Sforza, Mohamed Maamouri
Extensions to Histogram-Based Student Modeling Approach to Facilitate Reading in Morphologically Complex Languages
AIED: International Conference on Artificial Intelligence in Education Available: Paper in PDF

Christopher Cieri
HLT Evaluation: The Role of Data Centers
ELRA HLT Evaluation Workshop, Malta, December 2005
Available: Presentation Slides

Christopher Cieri
Modeling Phonological Variation in Multidialectal Italy
University of Pennsylvania, Doctoral Dissertation, May 2005
Available: PDF from ProQuest

Yang Jin, Ryan T. McDonald, Kevin Lerman, Mark A. Mandel, Mark Y. Liberman, Fernando Pereira, R. Scott Winters, Peter S. White
Identifying and Extracting Malignancy Types in Cancer Literature
Presented at BioLink 2005: ISMB/ACL, Detroit, June 24, 2005
Available: Paper in PDF

Mohamed Maamouri
Arabic Literacy
Lemma, 11,16 in Encyclopedia of Arabic Language and Linguistics (EALL). Vol 2 Available: Paper in PDF

Ryan McDonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, and Peter White
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
43rd Annual Meeting of the Association for Computational Linguistics, 2005
Available: Paper in PDF

2004

Tim Buckwalter (2004)
Issues in Arabic Orthography and Morphology Analysis
Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, COLING 2004, Geneva, August 28, 2004.
Available: Paper in PDF

Christopher Cieri, Joseph P. Campbell, Hirotaka Nakasone, David Miller, Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available: Paper in PDF, Poster in PowerPoint Format

Christopher Cieri, Mark Liberman
Progress Report from the Linguistic Data Consortium: recent activities in resource creation and distribution and the development of tools and standards
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available: Paper in PDF, Presentation Slides

Christopher Cieri, David Miller, Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and Evaluation, Lisbon
Available: Paper in PDF, Presentation Slides

George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, Ralph Weischedel
Automatic Content Extraction (ACE) program - task definitions and performance measures
LREC 2004: Fourth International Conference on Language Resources and Evaluation Available: Paper in PDF

Shudong Huang, Stephanie Strassel, Alexis Mitchell, Zhiyi Song
Shared Resources for Multilingual Information Extraction and Challenges in Named Entity Annotation
IJCNLP-04 Workshop on Named Entity Recognition for NLP Applications, Hainan Island, China, March 2004
Available: Paper in PDF

Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, Pete White
Integrated Annotation for Biomedical Information Extraction
Presented at HLT/NAACL Workshop BioLink 2004, Boston, May 2-7, 2004
Available: Paper in PDF, Presentation Slides

Mohamed Maamouri and Ann Bies (2004)
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools

Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
, COLING 2004, Geneva, August 28, 2004.
Available: Paper in PDF

Mohamed Maamouri, Tim Buckwalter, and Christopher Cieri (2004)
Dialectal Arabic Telephone Speech Corpus: Principles, Tool Design, and Transcription Conventions
Paper presented at the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper in PDF, Presentation Slides.

Mohamed Maamouri, Ann Bies, Tim Buckwalter, and Wigdan Mekki (2004)
The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus
Paper presented at the NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, Sept. 22-23, 2004.
Available: Paper in PDF

Mohamed Maamouri, David Graff, Hubert Jin, Christopher Cieri, and Tim Buckwalter (2004)
Dialectal Arabic Orthography-based Transcription and CTS Levantine Arabic Collection.
Paper presented at the Parallel STT-NA Tracks Session of the EARS RT-04 Workshop, Palisades IBM Executive Center, New York, Nov. 10, 2004.
Available: Paper in Word format

Ryan McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White, Fernando Pereira
An entity tagger for recognizing acquired genomic variations in cancer literature
Bioinformatics 20:3249-3251
Available: Paper in PDF

Kazuaki Maeda and Stephanie Strassel (2004)
Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium.
LREC 2004: Fourth International Conference on Language Resources and Evaluation
Available: Paper in PDF

Mike Maxwell
From Legacy Lexicon to Archivable Resource
First Steps for Language Documentation of Minority Languages: Workshop on Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, LREC 2004
Available: Paper in PDF

Douglas Oard, Dagobert Soergel, G. Craig Murray, David Doermann, Jianqiang Wang, Bhuvana Ramabhadran, Martin Franz, James Mayfield and Samuel Gustman, Stephanie Strassel
Building an Information Retrieval Test Collection for Spontaneous Conversational Speech
27th Annual International ACM SIGIR Conference (SIGIR2004), Sheffield, England, July 2004
Available: Paper in PDF

Stephanie Strassel
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
LREC 2004: Fourth International Conference on Language Resources and Evaluation
Available: Paper in PDF

Colin Warner, Ann Bies, Christine Brisson, Justin Mott
Addendum to the Penn Treebank II Style Bracketing Guidelines: BioMedical Treebank Annotation
November, 2004
Available: Paper in PDF , Paper as web page , Paper in plain text

2003

Steven Bird and Gary Simons (2003)
Seven Dimensions of Portability for Language Documentation and Description
Language 79, 557-582.
Available: Paper in PDF

Steven Bird and Gary Simons (2003)
Extending Dublin Core Metadata to support the description and discovery of language resources
Computing and the Humanities 37, 375-388.
Available: Paper in PDF

Christopher Cieri, Stephanie Strassel
Robust Sociolinguistic Methodology: Tools, Data and Best Practices
NWAV 32, Philadelphia, 2003
Available: Presentation Slides

Christopher Cieri, Mike Maxwell, Stephanie Strassel
Core Linguistic Resources for the World's Languages
ELSNET, ENABLER, ICWLR Joint Workshop, Paris, 2003
Available: Presentation Slides

Baden Hughes and Steven Bird (2003)
Grid-Enabling Natural Language Engineering By Stealth
Proceedings of the Workshop on The Software Engineering and Architecture of Language Technology Systems (SEALTS)
Available: arXiv.org

Seth Kulick, Mark Liberman, Martha Palmer, and Andrew Schein
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
ISMB Special Interest Group Meeting on Text Mining (BioLink). June 2003. Brisbane, Australia
Available: Paper in PDF , Presentation slides

Mike Maxwell
Incremental Grammar Development using Finite State Tools
Proceedings of the Workshop on Finite-State Methods in Natural Language Processing, EACL 10, Budapest, 13-14 April 2003. Available: Paper in PDF

Gary Simons and Steven Bird (2003)
The Open Language Archives Community: An infrastructure for distributed archiving of language resources
Literary and Linguistic Computing 18 (in press)
Available: arXiv.org

Gary Simons and Steven Bird (2003)
Building an Open Language Archives Community on the OAI Foundation
Library Hi Tech 21, 210-218, Special Issue on Open Archives Initiative Metadata Harvesting.
Available: Paper in PDF

Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri (2003)
Shared Resources for Robust Speech-to-Text Technology
Eurospeech 2003
Available: Paper in PDF

Stephanie Strassel (2003)
Corpus Creation for Disfluency Research
Disfluency in Spontaneous Speech Conference, Gothenburg, Sweden
Available: Abstract in PDF, Presentation Slides in PDF

Stephanie Strassel, Alexis Mitchell, Shudong Huang (2003)
Multilingual Resources for Entity Extraction
41st Annual Meeting of the Association for Computational Linguistics Workshop on Multilingual and Mixed-language Named Entity Recognition:
Combining Statistical and Symbolic Models, Sapporo Japan
Available: Paper in PDF

Stephanie Strassel, Mike Maxwell, Christopher Cieri (2003)
Linguistic Resource Creation for Research and Technology Development: A Recent Experiment
Association for Computing Machinery Transactions on Asian Language Information Processing (TALIP).  Volume 2, Issue 2, 101 - 117
Available: Paper in PDF

2002

Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee, Beth Randall, and Salim Zayat (2002)
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit
Proceedings of the Third International Conference on Language Resources and Evaluation
Available: arXiv.org

Christopher Cieri, Stephanie Strassel, David Graff, Nii Martey, Kara Rennert and Mark Liberman (2002)
Corpora for Topic Detection and Tracking
James Allan, ed. Topic Detection and Tracking: Event-based Information Organization, Kluwer International Series on Information Retrieval, Bruce Croft, series editor, Boston, Kluwer Academic Publishers.

Christopher Cieri, David Miller, Kevin Walker (2002)
Research Methodologies, Observations and Outcomes in (Conversational) Speech Data Collection
HLT 2002 The Human Language Technologies Conference, San Diego, CA, March 2002
Available: Notebook Paper.

Christopher Cieri, Stephanie Strassel, William Labov
Sharable Resources for Sociolinguistic Research
NWAV31, Stanford, 2002
Available: Presentation Slides

Scott Cotton and Steven Bird (2002)
An Integrated Framework for Treebanks and Multilayer Annotations
Proceedings of the Third International Conference on Language Resources and Evaluation
Available: arXiv.org

Xiaoyi Ma, Haejoong Lee, Steven Bird and Kazuaki Maeda (2002)
Models and Tools for Collaborative Annotation
Proceedings of the Third International Conference on Language Resources and Evaluation
Available: arXiv.org

Mohamed Maamouri, Christopher Cieri
Resources for Arabic Natural Language Processing
International Symposium on Processing Arabic, Tunis, April 2002
Available: Presentation Slides

Kazuaki Maeda, Steven Bird, Xiaoyi Ma, and Haejoong Lee (2002)
Creating Annotation Tools with the Annotation Graph Toolkit
Proceedings of the Third International Conference on Language Resources and Evaluation
Available: arXiv.org

Mike Maxwell, Gary Simons, and Larry Hayashi (2002)
A Morphological Glossing Assistant
Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics
Available: Paper in PDF

Mike Maxwell (2002)
Resources for Morphology Learning and Evaluation
LREC 2002: Third International Conference on Language Resources and Evaluation vol. III, 967-974
Available: Paper in PDF

2001

Christopher Cieri, David Graff, David Miller, Kevin Walker (2001)
Resources and Infrastructure to Support Robust, Omnipresent ASR
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation Slides.

Christopher Cieri, Andy Cole, Dave Graff, Nii Martey, Stephanie Strassel, Cristina Tofan (2001)
SPINE 2001 Data Preparation and Annotation and the SPINE Corpora
Communicator, SPINE, ROAR Workshop, Orlando, November 2001
Available: Presentation Slides.

Christopher Cieri and Steven Bird (2001)
Annotation Graphs, Annotation Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Proceedings of the Association for Computational Linguistics: Workshop on Sharing Tools & Resources Toulouse, July 2001
Available: Paper in PDF, Presentation Slides.

Lea Christiansen, Christopher Cieri, Kathleen Egan, Anita Kulman, Milton Paul (2001)
Getting SMART about Authoring
CALICO 2001, University of Central Florida, Orlando, March 2001
Available: Presentation Slides.

David Miller, Christopher Cieri and Kevin Walker (2001)
Switchboard Cellular Resources for Speaker Recognition
Speaker Recognition Workshop, Maritime Institute of Technology and Graduate Studies, Linthicum MD, March 2001
Available: Presentation Slides.

Stephanie Strassel, Christopher Cieri and Steven Bird (2001)
Shared Resources and Community Building for Corpus Linguistics and Language Teaching
Corpus Linguistics and Language Teaching Workshop Boston, MA., March 2001
Available: Presentation Slides.

Stephanie Strassel and Christopher Cieri
Data and Annotations for SocioLinguistics: A Corpus-Based Approach to Sociolinguistic Research
Penn Linguistic Colloquium, Philadelphia, PA. March 2001
Available: Presentation Slides.

Steven Bird and Mark Liberman (2001)
A formal framework for linguistic annotation
Speech Communication 33(1,2), pp 23-60.
Available: arXiv.org

Steven Bird, Gary Simons and Chu-Ren Huang (2001)
The Open Language Archives Community and Asian Language Resources
Proceedings of the Workshop on Language Resources in Asia, 6th Natural Language Processing Pacific Rim Symposium (NLPRS), Tokyo, November 2001.
Available: arXiv.org

Steven Bird and Gary Simons (2001)
The OLAC Metadata Set and Controlled Vocabularies Proceedings of the ACL Workshop on Sharing Tools and Resources for Research and Education, Toulouse, July 2001, pp 7-18.
Available: arXiv.org

Kazuaki Maeda, Steven Bird, Xiaoyi Ma and Haejoong Lee (2001)
The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools
Proceedings of HLT 2001 The Human Language Technologies Conference, San Diego, CA, March 2001
Available: Paper in PDF

Kazuaki Maeda and Steven Bird (2001)
A Framework for Annotating Animal Bioacoustic Data
The 142nd Meeting of the Acoustical Society of America, Chicago, June 2001
Available: Presentation Slides (Powerpoint).

2000

Steve Cassidy and Steven Bird (2000)
Querying databases of annotated speech
Proceedings of the Eleventh Australasian Database Conference
Available: Paper in PDF

Christopher Cieri (2000)
Multiple Annotation of Reuseable Data Resources: Corpora for Topic Detection and Tracking
In Rajman, M. and J. C. Chappelier, eds. (2000) Actes des 5es Journees internationales d'analyse statistique des donnees textuelles, volume 1, Ecole Polytechnique Federale de Lausanne
Available: Paper in PDF

Christopher Cieri (2000)
Issues and Tools for Annotating a Corpus of Sociolinguistic Field Data
Linguistic Exploration Workshop in conjunction with
Linguistic Society of American Annual Meeting, Chicago, January 2000
Available: Presentation Slides

Christopher Cieri, David Graff, Nii Martey, Stephanie Strassel (2000)
The TDT-3 Text and Speech Corpus
Presented at the Topic Detection and Tracking Workshop, Vienna, Virginia, February 28 - March 1, 2000.
Available: Paper in PostScript

Christopher Cieri, Dave Graff, Mark Liberman, Nii Martey and Stephanie Strassel (2000)
Large Multilingual Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT2 and TDT3 Corpus Efforts
In Proceedings of the Second International Language Resources and Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in PDF

Christopher Cieri and Mark Liberman (2000)
Issues in Corpus Creation and Distribution: the Evolution of the Linguistic Data Consortium
In Proceedings of the Second International Language Resources and Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in PDF

David Graff and Steven Bird (2000)
Many uses, many annotations for large speech corpora: Switchboard and TDT as case studies
2nd Language Resources and Evaluation Conference (LREC 2000) Athens, Greece, May 2000
Available: Paper in PDF    --    Paper in PostScript

Dave Graff, Stephanie Strassel and Christopher Cieri (2000)
Resources, New and Forthcoming, from LDC
Presented at the 2000 Speech Transcription Workshop, University of Maryland, May 16-19, 2000.
Available: Presentation Slides

Stephanie Strassel, Dave Graff, Nii Martey and Christopher Cieri (2000)
Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora.
In Proceedings of the Second International Language Resources and Evaluation Conference, Athens, Greece, May 2000.
Available: Paper in PDF

1999

Steven Bird and Mark Liberman (1999)
A Formal Framework for Linguistic Annotation
Technical Report MS-CIS-99-01 - Department of Computer and Information Science, University of Pennsylvania
(expanded from version presented at ICSLP-98, Sydney)
Available: Paper in PDF

Steven Bird and Mark Liberman (1999)
Annotation graphs as a framework for multidimensional linguistic data analysis
Towards Standards and Tools for Discourse Tagging -- Proceedings of the Workshop, Somerset, NJ: Association for Computational Linguistics
Available: Paper in PDF

Steven Bird (1999)
Multidimensional exploration of online linguistic field data
Proceedings of the 29th Annual Meeting of the Northeast Linguistics Society, University of Massachussetts at Amherst.
Available: Paper in PDF

Steven Bird and Stephanie Strassel (1999)
Annotated Corpora in Linguistic Research
North American Symposium on Corpora in Linguistics and Language Teaching, University of Michigan, May 21, 1999.
Available: Presentation Slides

Alexandra Canavan, Kevin Walker, David Graff and Christopher Cieri (1999)
Telephone Speech Corpora: New Needs, Languages, Methods and Technology
Presented at the Hub-5 Conversational Speech Understanding (LVCSR) Workshop, Maritime Institute Technology and Graduate Studies, Linthicum Heights, Maryland, June 1999.
Available: Presentation Slides

Christopher Cieri, David Graff, Mark Liberman, Nii Martey, Stephanie Strassel (1999)
The TDT-2 Text and Speech Corpus
Presented at the DARPA Broadcast News Workshop, Washington, DC., February 1999.
Available: Paper in PDF

Christopher Cieri (1999)
This Ain't Your Father's Digital Data: Another Perspective on Legal Information
Presented at the CALI 1999 - The Conference for Law School Computing. Eugene Oregon, June 1999.
Available: Presentation Slides, Video in RealMedia

Xiaoyi Ma and Mark Liberman (1999)
BITS: A Method for Bilingual Text Search over the Web
Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore
Available: Paper in Postscript, Paper in PDF

Xiaoyi Ma (1999)
Parallel Text Collections at the Linguistic Data Consortium
Machine Translation Summit VII, September 13th, 1999, Kent Ridge Digital Labs, National University of Singapore
Available: Paper in Postscript

Stephanie Strassel (1999)
Corpus Creation and Quality Control at the LDC
Presented at the Corpus of Spoken Dutch Workshop; Tilburg, Netherlands; November 12, 1999.
Available: Presentation Slides

Stephanie Strassel and Christopher Cieri (1999)
Corpus Sociolinguistics: Issues, Data and Tools
Presented at NWAVE-28, York University, Toronto, Ontario October, 1999.
Available: Presentation Slides

1998

Steven Bird and Mark Liberman (1998)
Towards a Formal Framework for Linguistic Annotations
Proceedings of the 5th International Conference on Spoken Language Processing.
Available: Paper in PDF

Christopher Cieri and David Graff (1998)
Topic Detection and Tracking Corpora
Presented at TREC/SDR Conference, Gaithesburg Maryland, November 1998.
Available:

David Graff and Christopher Cieri (1998)
Update on Lexical Resources and Projects at the Linguistic Data Consortium
Presented at the Ninth Hub-5 Conversational Speech Recognition (LVCSR) Workshop, Maritime Institute Technology and Graduate Studies, Linthicum Heights, Maryland, September 1998.
Available:

Mark Liberman and Christopher Cieri (1998)
The Creation, Distribution and Use of Linguistic Data
Proceedings of the First International Conference on Language Resources and Evaluation, Granada, Spain, May 1998.
Available: Paper in PDF

Undated