LDC Papers

 

2013

Violetta Cavalli-Sforza, Hind Saddiki, Karim Bouzoubaa, Lahsen Abouenour, Mohamed Maamouri, Emily Goshey
Bootstrapping a WordNet for an Arabic Dialect from Other WordNets and Dictionary Resources
AICCSA 2013: 10th ACS/IEEE International Conference on Computer Systems and Applications, Fes/Ifrane, May 27-30
Available: Paper in PDF

Christopher Cieri
Sharing, Structuring and Processing Data: Part 1: Advantages and Challenges
Sharing, Structuring and Processing Data Workshop
NWAV42: New Ways of Analyzing Variation, Pittsburgh, October 17-20
Available: Slides in PDF, Video clip

Ramy Eskander, Nizar Habash, Ann Bies, Seth Kulick, Mohamed Maamouri
Automatic Correction and Extension of Morphological Annotations
ACL 2013: 50th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Sofia, August 4-9
The 7th Linguistic Annotation Workshop & Interoperability with Discourse
Available: Paper in PDF

Seth Kulick, Ann Bies, Justin Mott, Mohamed Maamouri, Beatrice Santorini, Anthony Kroch
Using Derivation Trees for Informative Treebank Inter-Annotator Agreement Evaluation
NAACL 2013: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, June 9-15
Available: Paper in PDF

Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, Vikramjit Mitra
Using Multiple Versions of Speech Input in Phone Recognition
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing, Vancouver, May 26-31
Available: Paper in PDF

Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan,  Mark Liberman 
Articulatory Trajectories for Larger-Vocabulary Speech Recognition
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing, Vancouver, May 26-31
Available: Paper in PDF

Neville Ryant, Mark Liberman, Jiahong Yuan
Speech Activity Detection on YouTube Using Deep Neural Networks
Interspeech 2013: 14th Annual Conference of the International Speech Communication Association, Lyon, August 25-29
Available: Paper in PDF

Neville Ryant, Mark Liberman, Jiahong Yuan
Automating Phonetic Measurement: The Case of Voice Onset Time
ICA 2013: 21st International Congress on Acoustics, Montreal, June 2-7
Available: Paper in PDF

Neville Ryant, Jiahong Yuan, Mark Liberman
Scale-Space Expansion of Acoustic Features Improves Speech Event Detection
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing, Vancouver, May 26-31
Available: Paper in PDF

Wen Wang, Andreas Stolcke, Jiahong Yuan, Mark Liberman
A Cross-language Study on Automatic Speech Disfluency Detection
NAACL-HLT 2013: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, June 9-15
Available: Paper in PDF

Jiahong Yuan, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, Wen Wang
Automatic Phonetic Segmentation Using Boundary Models
Interspeech 2013: 14th Annual Conference of the International Speech Communication Association, Lyon, August 25-29
Available: Paper in PDF

2012

Eleftheria Ahtaridis, Christopher Cieri, Denise DiPersio
LDC Language Resource Papers: Building a Bibliographic Database
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Ann Bies, Denise DiPersio, Mohamed Maamouri
Linguistic resources for Arabic machine translation, The Linguistic Data Consortium (LDC) Catalog
In Abdelhadi Soudi, et al., Challenges for Arabic Machine Translation
Available: John Benjamins Publishing Company

Christopher Cieri, Marian Reed, Denise DiPersio, Mark Liberman
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Slides in PDF

Christopher Cieri, Malcah Yaeger-Dror
Toward the Harmonization of Metadata Practice for Spoken Languages Resources
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
SpeechCorpora 2012: Workshop on Best Practices for Speech Corpora in Linguistic Research
Available: Paper in PDF

Jennifer Garland, Stephanie Strassel, Safa Ismael, Zhiyi Song, Haejoong Lee
Linguistic Resources for Genre-Independent Language Technologies: User-Generated Content in BOLT
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Slides in PDF

David Graff, Mohamed Maamouri
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephen Grimes, Katherine Peterson, Xuansong Li
Automatic Word Alignment Tools to Scale Production of Manually Aligned Parallel Texts
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF

Seth Kulick, Ann Bies, Justin Mott
Further Developments in Treebank Error Detection Using Derivation Trees
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Seth Kulick, Ann Bies, Justin Mott
Using Supertags and Encoded Annotation Principles for Improved Dependency to Phrase Structure Conversion
NAACL-HLT 2012: The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF, Poster in PDF

Xuansong Li, Stephanie M. Strassel, Heng Ji, Kira Griffitt, Joe Ellis
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF

Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Mohamed Maamouri, Ann Bies, Nianwen Xue
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF

Xiaoyi Ma
LDC Forced Aligner
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Expanding Arabic Treebank to Speech: Results from Broadcast News
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Wajdi Zaghouani, Violetta Cavalli-Sforza, David Graff, Mike Ciul
Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement
NAACL-HLT 2012: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF

Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel
Linguistic Resources for Handwriting Recognition and Translation Evaluation
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Amanda Morris, Jonathan Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, Martial Michel
Creating HAVIC: Heterogeneous Audio Visual Internet Collection
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Kevin Walker, Karen Jones, Dave Graff, Christopher Cieri
New Resources for Recognition of Confusable Linguistic Varieties: The LRE11 Corpus
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, June 25-28
Available: Paper in PDF, Slides in PDF

Kevin Walker, Stephanie Strassel
The RATS Radio Traffic Collection System
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, June 25-28
Available: Paper in PDF

Jonathan Wright, Kira Griffitt, Joe Ellis, Stephanie Strassel, Brendan Callahan
Annotation Trees: LDC's Customizable, Extensible, Scalable Annotation Infrastructure
LREC 2012: 8th International Conference on Language Resources and Evaluation, Istanbul, May 21-27
Available: Paper in PDF

Wajdi Zaghouani
RENAR: A Rule-Based Arabic Named Entity Recognition System
ACM Transactions on Asian Language Information Processing: Volume 11 Issue 1, March 2012
Available: Paper in PDF

Wajdi Zaghouani, Abdelati Hawwari, Mona Diab
A Pilot PropBank Annotation for Quranic Arabic
NAACL-HLT 2012: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF

2011

Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation
With contributions from LDC: Ann Bies, Christopher Caruso, Christopher Cieri, Denise DiPersio, Lauren Friedman, Meghan Lammie Glenn, Stephen Grimes, Gary Krug, Seth Kulick, Haejoong Lee, Xuansong Li, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda, Andrea Mazzucchi, Robert Parker, Heather Simpson, Zhiyi Song, Stephanie Strassel, Kevin Walker, Dalal Zakhary
With contributions from University of Pennsylvania: Mitchell Marcus (School of Arts and Sciences)
Available: Springer

Seth Kulick, Ann Bies, and Justin Mott
Using Derivation Trees for Treebank Error Detection
ACL-HLT 2011: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, June 19-24
Available: Paper in PDF

Xuansong Li, Joe Ellis, Kira Griffit, Stephanie Strassel, Robert Parker, Jonathan Wright
Linguistic Resources for 2011 Knowledge Base Population Evaluation
TAC 2011: Proceedings of the Fourth Text Analysis Conference, Gaithersburg, Maryland, November 14-15
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Automatic Detection of "g-dropping" in American English Using Forced Alignment 
ASRU 2011: Automatic Speech Recognition and Understanding Workshop, Hawaii, December 11-15
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Automatic Measurement and Comparison of Vowel Nasalization Across Languages 
ICPhS XVII 2011: 17th International Conference of Phonetic Sciences, Hong Kong, August 17-21
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Variation in American English: A Corpus Approach
Journal of Speech Sciences 1 (2): 35-46, 2011
Available: Paper in PDF

2010

Christopher Cieri, Mark Liberman
Adapting to Trends in Language Resource Development: A Progress Report on LDC Activities 
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDFSlides in PDF

Marianna Di Paolo, Malcah Yaeger-Dror, Christopher Cieri, Stephanie Strassel, Zsuzsanna Fagyal
Towards Best Practices in Sociophonetics Workshop
NWAV39: New Ways of Analyzing Variation, San Antonio, November 4-6

 Christopher Cieri, Stephanie Strassel
Towards Best Practices in Sociophonetics: Robust, Digital, Empirical, Reproducible, Sociolinguistic, Methodology
Available: Slides in PDF
 Zsuzsanna Fagyal, Malcah Yaeger-Dror
Analyzing Rhythm I
Available: Slides in PDF
 Malcah Yaeger-Dror, Zsuzsanna Fagyal
Analyzing "Timing" 2
Available: Slides in PDF

Denise DiPersio
Some Implications of US Initiatives for "Fair Research" and Open Access on the Development and Distribution of Language Resources
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Workshop on Legal Issues for Language Resources
Available: Slides in PDF

Stephen Grimes, Xuansong Li, Ann Bies, Seth Kulick, Xiaoyi Ma, Stephanie Strassel
Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Workshop on Language Resources and Human Language Technologies for Semitic Languages
Available: Paper in PDF

Seth Kulick, Ann Bies
A Treebank Query System Based on an Extracted Tree Grammar
NAACL-HLT 2010: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Los Angeles, June 1-6
Available: Paper in PDF

Seth Kulick, Ann Bies
A TAG-derived Database for Treebank Search and Parser Analysis
TAG+10: 10th International Workshop on Tree Adjoining Grammars and Related Formalisms, New Haven, CT, June 10-12
Available: Paper in PDF

Seth Kulick, Ann Bies, Mohamed Maamouri
Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF, Poster in PDF

Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li
Transcription Methods for Consistency, Volume and Efficiency
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Workshop on Language Resources and Human Language Technologies for Semitic Languages
Available: Paper in PDF

Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda
Enriching Word Alignment with Linguistic Tags
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF, Slides in PDF

Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Xiaoyi Ma, Niyu Ge, Ann Bies, Nianwen Xue, Mohamed Maamouri
Parallel Aligned Treebank Corpora at LDC: Methodology, Annotation and Integration
TLT9: The 9th International Workshop on Treebanks and Linguistic Theories, University of Tartu, December 2
AEPC: Workshop on Annotation and Exploitation of Parallel Corpora
Available: Paper in PDF

Mark Liberman
The Future of Computational Linguistics: or, What Would Antonio Zampolli Do?
Antonio Zampolli Prize speech
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Slides in PDF, Antonio Zampolli Prize Information: Prof. Antonio Zampolli Prize

Xiaoyi Ma
Toward a Name Entity Aligned Bilingual Corpus
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Workshop on Methods for the Automatic Acquisition of Language Resources and Their Evaluation Methods
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff, Mike Ciul
From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF, Poster in PDF

Kazuaki Maeda, Haejoong Lee, Stephen Grimes, Jonathan Wright, Robert Parker, David Lee, Andrea Mazzuchi
Technical Infrastructure at Linguistic Data Consortium: Software and Hardware Resources for Linguistic Data Creation
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF

Mark Mandel
Conomastics: The Naming of Science Fiction Conventions
ADS: American Dialect Society Annual Meeting, Baltimore, MD, January 7-9
Available: Slides in PDF, Slides with notes

Paul McNamee, Hoa Trang Dang, Heather Simpson, Patrick Schone, Stephanie M. Strassel
An Evaluation of Technologies for Knowledge Base Population
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Slides in PDF

Heather Simpson, Stephanie Strassel, Robert Parker, Paul McNamee
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF

Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki Maeda
Enhanced Infrastructure for Creation and Collection of Translation Resources
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF

Stephanie Strassel, Dan Adams, Henry Goldberg, Jonathan Herr, Ron Keesing, Daniel Oblinger, Heather Simpson, Robert Schrag, Jonathan Wright
The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF

Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim, Ralf Steinberger
Adapting a Resource-light Highly Multilingual Named Entity Recognition System to Arabic
LREC 2010: 7th International Conference on Language Resources and Evaluation, Valletta, May 17-23
Available: Paper in PDF, Slides in PDF

Wajdi Zaghouani, Ralf Steinberger, Bruno Pouliquen
A Resource-light Arabic Named Entity Recognition System
GURT 2010: Georgetown University Round Table, Arabic Language and Linguistics, Georgetown, March 12-14
Available: Slides in PDF

Wajdi Zaghouani, Mona Diab, Aous Mansouri, Sameer Pradhan, Martha Palmer
The Revised Arabic PropBank
ACL-HLT 2010: 48th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Uppsala, July 11-16
The LAW IV: Proceedings of the Fourth Linguistic Annotation Workshop
Available: Paper in PDF

2009

Steven Bird, Ewan Klein, Edward Loper
Natural Language Processing with Python (book chapter)
O'Reilly Media Inc., 2009
Available: Book in HTML

Christopher Cieri
Models of Phonological Variation for Multi-dialectal Communities: the Case of L'Aquila
NWAV 38: New Ways of Analyzing Variation, Ottawa, October 22-25
Available: Slides in PDF

Christopher Cieri, Stephanie Strassel
Closer Still to a Robust, All Digital, Empirical, Reproducible Sociolinguistic Methodology
NWAV 38: New Ways of Analyzing Variation, Ottawa, October 22-25
Available: Slides in PDF

Seth Kulick, Ann Bies
Treebank Analysis and Search Using an Extracted Tree Grammar
TLT8: 8th International Workshop on Treebanks and Linguistic Theories, Milan, December 3-5
Available: Paper in PDF

Catherine Lai, Steven Bird
Querying Linguistic Trees
Journal of Logic, Language, and Information, Volume 18, 2009
Available: Paper in PDF

Mohammed Maamouri
LDC Arabic Reading Tools: "Read to Succeed"
ACTFL 2009: Arabic SIG Meeting, San Diego, CA, November 21
Available: Slides in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Creating a Methodology for Large-Scale Correction of Treebank Annotation: The Case of the Arabic Treebank
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools, Cairo, April 22-23
Available: Paper in PDF, Slides in PDF

Niklas Paulsson, Khalid Choukri, Djamel Mostefa, Denise DiPersio, Meghan Glenn, Stephanie Strassel
A Large Arabic Broadcast News Speech Data Collection
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools, Cairo, April 22-23
Available: Paper in PDF, Poster in PDF

Stephanie Strassel
Linguistic Resources for Arabic Handwriting Recognition
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools, Cairo, April 22-23
Available: Paper in PDF

2008

Chomicha Bendahman, Meghan Lammie Glenn, Djamel Mostefa, Niklas Paulsson, Stephanie Strassel
Quick Rich Transcriptions of Arabic Broadcast News Speech Data
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Linda Brandschain, Christopher Cieri, David Graff, Abby Neely, Kevin Walker
Speaker Recognition: Building the Mixer 4 and 5 Corpora
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Christopher Cieri, Stephanie Strassel, Meghan Glenn, Reva Schwartz, Wade Shen, Joseph Campbell
Bridging the Gap between Linguists and Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel
Identifying Common Challenges for Human and Machine Translation: A Case Study from the GALE Program
AMTA 2008: The 8th Conference of the Association for Machine Translation in the Americas, Waikiki, October 21-25
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel, Meghan Lammie Glenn
Explicit and Implicit Requirements of Technology Evaluations: Implications for Test Data Creation
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel, Haejoong Lee
A Quality Control Framework for Gold Standard Reference Translations: The Process and Toolkit Developed for GALE
Translating & The Computer 30 (hosted by EAMT): London, November 19-20
Available: Paper in PDF

Ryan Gabbard, Seth Kulick
Construct State Modification in the Arabic Treebank
ACL-HLT 2008: 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, June 16-18
Available: Paper in PDF

Meghan Lammie Glenn, Stephanie Strassel, Lauren Friedman, Haejoong Lee, Shawn Medero
Management of Large Annotation Projects Involving Multiple Human Judges: a Case Study of GALE Machine Translation Post-editing
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Enhanced Annotation and Parsing of the Arabic Treebank
INFOS 2008: Cairo, March 27-29
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation Guidelines
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Seth Kulick, Ann Bies
Diacritic Annotation in the Arabic Treebank and Its Impact on Parser Evaluation
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie Medero, Robert Parker, Stephanie Strassel
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Martha Palmer, Olga Babko-Malaya, Ann Bies, Mona Diab, Mohammed Maamouri, Aous Mansouri, Wajdi Zaghouani
A Pilot Arabic Propbank
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Marian Reed, Denise DiPersio, Christopher Cieri
The Linguistic Data Consortium Member Survey: Purpose, Execution and Results
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF, Slides in PDF

Gary Simons, Steven Bird
Toward a Global Infrastructure for the Sustainability of Language Resources
PACLIC 2008: 22nd Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines, November 20-22
Available: Paper in PDF

Zhiyi Song, Stephanie Strassel
Entity Translation and Alignment in the ACE-07 ET Task
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

Stephanie Strassel, Lauren Friedman, Safa Ismael, Linda Brandschain
New Resources for Document Classification Analysis and Translation Technologies
LREC 2008: 7th International Conference on Language Resources and Evaluation, Marrakech, May 28-30
Available: Paper in PDF

2007

Steven Bird, Haejoong Lee
Graphical Query for Linguistic Treebanks
PACLING 2007: 10th Conference of the Pacific Association for Computational Linguistics, Melbourne, September 19-21
Available: Paper in PDF

Christopher Cieri
Phonological Variation in Multi-Dialectal Italy: distinguishing e from ?
NWAV 36: Philadelphia, October 11-14
Available: Slides in PDF

Christopher Cieri, Linda Corson, David Graff, Kevin Walker
Resources for New Research Directions in Speaker Recognition: The Mixer 3, 4 and 5 Corpora
Interspeech 2007: 10th International Conference on Spoken Language Processing, Antwerp, August 27-31
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Stephanie Strassel, Meghan Lammie Glenn, Lauren Friedman
Linguistic Resources in Support of Various Evaluation Metrics
MT Summit XI: Workshop on Automatic Procedures in MT Evaluation, Copenhagen, September 9-14
Available: Slides in PDF

Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, Peter White
Penn/UMass/CHOP Biocreative II Systems
Biocreative II [In Press]
Available: Paper in PDF

Kuzman Ganchev, Fernando Pereira, Mark Mandel, Steven Carroll, Peter White
Semi-automated Named Entity Annotation
Linguistic Annotation Workshop 2007 [In Press]
Available: Paper in PDF

2006

Olga Babko-Malaya, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer, Mitch Marcus, Seth Kulick, Libin Shen
Issues in Synchronizing the English Treebank and PropBank
COLING-ACL 2006: Frontiers in Linguistically Annotated Corpora, A Merged Workshop with 7th International Workshop on Linguistically Interpreted Corpora (LINC-2006) and Frontiers in Corpus Annotation III, Sydney, July 22-23
Available: Paper in PDF

Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, Matthew Lease
Linguistic Resources for Speech Parsing
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Steven Bird, Yi Chen, Susan Davidson, Haejoong Lee, Yifeng Zheng
Designing and Evaluating an XPath Dialect for Linguistic Queries
ICDE 2006: 22nd International Conference on Data Engineering, Atlanta, April 3-8
Available: Paper in PDF

Christopher Cieri
Linguistic Resources, Development and Evaluation
Chapter 8 in Laila Dybkjaer, Holmer, Hemsen and Wolfgang Minker, Evaluation of Text and Speech Systems
Available: Springer Publishers

Christopher Cieri
What is Quality? Invited Talk at the Workshop on Quality Assurance and Quality Measurement for Language and Speech Resources
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Slides in PDF

Christopher Cieri, Walt Andrews, Joseph P. Campbell, George Doddington, Jack Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, Kevin Walker
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Mark Liberman
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Mark Liberman, Victoria Arranz, Khalid Choukri
Linguistic Data Resources
Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual Speech Processing, Elsevier Academic Press, ISBN 13: 978-0-12-088501-5. April 2006.
Available: Elsevier Publishers

Ryan Gabbard, Seth Kulick, Mitchell Marcus
Fully Parsing the Penn Treebank
HLT-NAACL 2006: Human Language Technology conference - North American chapter of the Association for Computational Linguistics, New York City, June 4-9
Available: Paper in PDF

David Graff, Tim Buckwalter, Hubert Jin, Mohamed Maamouri
Lexicon Development for Varieties of Spoken Colloquial Arabic
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Yang Jin, Ryan McDonald, Kevin Lerman, Mark Mandel, Steven Carroll, Mark Y Liberman, Fernando Pereira, Raymond Winters, Peter White
Automated Recognition of Malignancy Mentions in Biomedical Literature
Open Access: BMC Bioinformatics 7:492
Available: Paper in PDF

Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Xiaoyi Ma, Christopher Cieri
Corpus Support for Machine Translation at LDC
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi
Developing and Using a Pilot Dialectal Arabic Treebank
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
HCI 06: Machine Translation SIG of the British Computer Society Conference, London, September 11-16
Available: Paper in PDF

Kazuaki Maeda, Christopher Cieri, Kevin Walker
Low-cost Customized Speech Corpus Creation for Speech Technology Applications
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Kazuaki Maeda, Haejoong Lee, Julie Medero, Stephanie Strassel
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28

Mark Mandel
Integrated Annotation of Biomedical Text: Creating the PennBioIE Corpus
Text Mining, Ontologies and Natural Language Processing in Biomedicine, Manchester, March 20 - 21
Available: Abstract in PDF, Slides in PDF

Ryan McDonald, Kevin Lerman, Fernando Pereira
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
CoNLL-X: Computational Natural Language Learning, New York City, June 8-9
Available: Paper as PDF

Julie Medero, Kazuaki Maeda, Stephanie Strassel, Christopher Walker
An Efficient Approach for Gold-Standard Annotation: Decision Points for Complex Tasks
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF

Stephanie Strassel, Andrew W. Cole
Corpus Development and Publication
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Christopher Cieri, Andy Cole, Denise DiPersio, Mark Liberman, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda
Integrated Linguistic Resources for Language Exploitation Technologies
LREC 2006: 5th International Conference on Language Resources and Evaluation, Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Jiahong Yuan, Mark Liberman, Christopher Cieri
Towards an Integrated Understanding of Speaking Rate in Conversation
Interspeech 2006:The 9th International Conference on Spoken Language Processing, Pittsburgh, September 17-21
Available: Paper in PDF, Slides in PDF

2005

Ann Bies, Seth Kulick, Mark Mandel
Parallel Entity and Treebank Annotation
ACL 2005: 43rd Annual Meeting of the Association for Computational Linguistics, Frontiers in Corpus Annotation II: Pie in the Sky workshop, Ann Arbor, June 29
Available: Paper in PDF

Violetta Cavalli-Sforza, Mohamed Maamouri
Extensions to Histogram-Based Student Modeling Approach to Facilitate Reading in Morphologically Complex Languages
AIED 2005: International Conference on Artificial Intelligence in Education, Amsterdam, July 18-22
Available: Paper in PDF

Christopher Cieri
HLT Evaluation: The Role of Data Centers
HLT Evaluation Workshop (ELRA): Sliema, Malta, 1-2 December
Available: Slides in PDF

Christopher Cieri
Modeling Phonological Variation in Multidialectal Italy
University of Pennsylvania, Doctoral Dissertation, May 2005
Available: PDF from ProQuest

Meghan Lammie Glenn, Stephanie Strassel
Linguistic Resources for Meeting Speech Recognition
MLMI 2005: Machine Learning for Multimodal Interaction, Edinburgh, July 11-13
Available: Paper in PDF

Jerry Goldman, Steve Renals, Steven Bird, Franciska de Jong, Marcello Federico, Carl Fleischhauer, Mark Kornbluh, Lori Lamel, Douglas Oard, Claire Stewart, Richard Wright
Transforming Access to the Spoken Word
International Journal on Digital Libraries 5, 287-298, 2005.
Available: Paper in PDF

Yang Jin, Ryan T. McDonald, Kevin Lerman, Mark A. Mandel, Mark Y. Liberman, Fernando Pereira, R. Scott Winters, Peter S. White
Identifying and Extracting Malignancy Types in Cancer Literature
BioLink 2005: A Joint Meeting of ISMB: BioLINK SIG on Text Data Mining and ACL Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Detroit, June 24
Available: Paper in PDF

Jachym Kolar, Jan Svec, Stephanie Strassel, Christopher Walker, Dagmar Kozlakova, Josef Psutka
Czech Spontaneous Speech Corpus with Structural Metadata
Interspeech 2005: The 8th International Conference on Spoken Language Processing, Lisbon, September 4-8
Available: Paper in PDF

Mohamed Maamouri
Arabic Literacy
Lemma, 11, 16 in Encyclopedia of Arabic Language and Linguistics (EALL). Vol 2
Available: Paper in PDF

Ryan McDonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, Peter White
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
ACL 2005: 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, June 29
Available: Paper in PDF

2004

Tim Buckwalter
Issues in Arabic Orthography and Morphology Analysis
COLING 2004: 20th International Conference on Computational Linguistics, Geneva, August 28
Computational Approaches to Arabic Script-based Languages Workshop
Available: Paper in PDF

Christopher Cieri, Joseph P. Campbell, Hirotaka Nakasone, David Miller, Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF, Poster in PDF

Christopher Cieri, Mark Liberman
Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF, Slides in PDF

Christopher Cieri, David Miller, Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF, Slides in PDF

George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, Ralph Weischedel
Automatic Content Extraction (ACE) Program - Task Definitions and Performance Measures
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Shudong Huang, Stephanie Strassel, Alexis Mitchell, Zhiyi Song
Shared Resources for Multilingual Information Extraction and Challenges in Named Entity Annotation
IJCNLP-04: 1st International Joint Conference on Natural Language Processing, Hainan Island, China, March 22-24
Named Entity Recognition for NLP Applications Workshop
Available: Paper in PDF

Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, Pete White
Integrated Annotation for Biomedical Information Extraction
HLT/NAACL 2004: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, May 2-7
BioLink Workshop
Available: Paper in PDF, Slides in PDF

Mohamed Maamouri, Ann Bies
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools

COLING 2004: 20th International Conference on Computational Linguistics, Geneva, August 28
Computational Approaches to Arabic Script-based Languages Workshop
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Tim Buckwalter, Wigdan Mekki
The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus
NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, September 22-23
Available: Paper in PDF

Mohamed Maamouri, Tim Buckwalter, Christopher Cieri
Dialectal Arabic Telephone Speech Corpus: Principles, Tool Design, and Transcription Conventions
NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, September 22-23
Available: Paper in PDF, Slides in PDF

Mohamed Maamouri, David Graff, Hubert Jin, Christopher Cieri, Tim Buckwalter
Dialectal Arabic Orthography-based Transcription and CTS Levantine Arabic Collection
EARS RT-04 Workshop, Parallel STT-NA Tracks Session, Palisades IBM Executive Center, New York, November 10
Available: Paper in PDF

Kazuaki Maeda, Stephanie Strassel
Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Mike Maxwell
From Legacy Lexicon to Archivable Resource
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
First Steps for Language Documentation of Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation Workshop
Available: Paper in PDF

Ryan McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White, Fernando Pereira
An Entity Tagger for Recognizing Acquired Genomic Variations in Cancer Literature
Bioinformatics 20:3249-3251
Available: Paper in PDF

Douglas Oard, Dagobert Soergel, G. Craig Murray, David Doermann, Jianqiang Wang, Bhuvana Ramabhadran, Martin Franz, James Mayfield, Samuel Gustman, Stephanie Strassel
Building an Information Retrieval Test Collection for Spontaneous Conversational Speech
SIGIR 2004: 27th Annual International ACM SIGIR (Special Interest Group on Information Retrieval) Conference, Sheffield, July 25-29
Available: Paper in PDF

Stephanie Strassel
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Colin Warner, Ann Bies, Christine Brisson, Justin Mott
Addendum to the Penn Treebank II Style Bracketing Guidelines: BioMedical Treebank Annotation
November, 2004
Available: Paper in PDF, Paper in plain text

2003

Steven Bird, Gary Simons
Seven Dimensions of Portability for Language Documentation and Description
Language 79, 557-582

Steven Bird, Gary Simons
Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources
Computing and the Humanities 37, 375-388

Christopher Cieri, Mike Maxwell, Stephanie Strassel
Core Linguistic Resources for the World's Languages
ACL 2003 Resources Information Infrastructure Workshop: ELSNET, ENABLER, ICWLR Joint Workshop, Paris, August 28-29
Available: Slides in PDF

Christopher Cieri, Stephanie Strassel
Robust Sociolinguistic Methodology: Tools, Data and Best Practices
NWAV 32: New Ways of Analyzing Variation, Philadelphia, October
Available: Slides in PDF

Baden Hughes, Steven Bird
Grid-Enabling Natural Language Engineering By Stealth
Proceedings of the Workshop on The Software Engineering and Architecture of Language Technology Systems (SEALTS)
Available: arXiv.org

Seth Kulick, Mark Liberman, Martha Palmer, Andrew Schein
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
ISMB 2003: 11th International Conference on Intelligent Systems for Molecular Biology, Brisbane, June 29-July 3
Special Interest Group Meeting on Text Mining (BioLink)
Available: Slides in PDF

Mike Maxwell
Incremental Grammar Development using Finite State Tools
EACL 10: 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, April 12-17
Finite-State Methods in Natural Language Processing Workshop, April 13-14
Available: Paper in PDF

Gary Simons, Steven Bird
The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources
Literary and Linguistic Computing 18 (in press)
Available: arXiv.org

Gary Simons, Steven Bird
Building an Open Language Archives Community on the OAI Foundation
Library Hi Tech 21, 210-218, Special Issue on Open Archives Initiative Metadata Harvesting

Stephanie Strassel
Corpus Creation for Disfluency Research
DiSS 2002: Disfluency in Spontaneous Speech Conference, Gothenburg, September 5-8
Available: Abstract in PDF, Slides in PDF

Stephanie Strassel, Mike Maxwell, Christopher Cieri
Linguistic Resource Creation for Research and Technology Development: A Recent Experiment
Association for Computing Machinery Transactions on Asian Language Information Processing (TALIP).  Volume 2, Issue 2, 101 - 117
Available: Paper in PDF

Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri
Shared Resources for Robust Speech-to-Text Technology
Eurospeech 2003: Geneva, September 1-4
Available: Paper in PDF

Stephanie Strassel, Alexis Mitchell, Shudong Huang
Multilingual Resources for Entity Extraction
ACL 2003: 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, July 7-12
Multilingual and Mixed-language Named Entity Recognition Workshop: Combining Statistical and Symbolic Models
Available: Paper in PDF

2002

Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee, Beth Randall, Salim Zayat
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Christopher Cieri, David Miller, Kevin Walker
Research Methodologies, Observations and Outcomes in (Conversational) Speech Data Collection
HLT 2002: The Human Language Technologies Conference, San Diego, March 24-27
Available: Paper in PDF

Christopher Cieri, Stephanie Strassel
The DASL Project: a Case Study in Data Re- Annotation and Re-Use
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: Paper in PDF

Christopher Cieri, Stephanie Strassel, David Graff, Nii Martey, Kara Rennert, Mark Liberman
Corpora for Topic Detection and Tracking
James Allan, ed. Topic Detection and Tracking: Event-based Information Organization, Kluwer International Series on Information Retrieval, Bruce Croft, series editor, Kluwer Academic Publishers, Boston

Christopher Cieri, Stephanie Strassel, William Labov
Sharable Resources for Sociolinguistic Research
NWAV31: New Ways of Analyzing Variation, Stanford, October
Available: Slides in PDF

Scott Cotton, Steven Bird
An Integrated Framework for Treebanks and Multilayer Annotations
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Xiaoyi Ma, Haejoong Lee, Steven Bird, Kazuaki Maeda
Models and Tools for Collaborative Annotation
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Mohamed Maamouri, Christopher Cieri
Resources for Arabic Natural Language Processing
International Symposium on Processing Arabic, Tunis, April
Available: Slides in PDF

Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Creating Annotation Tools with the Annotation Graph Toolkit
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Mike Maxwell, Gary Simons, Larry Hayashi
A Morphological Glossing Assistant
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
International LREC Workshop on Resources and Tools in Field Linguistics
Available: Paper in PDF

Mike Maxwell
Resources for Morphology Learning and Evaluation
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2, vol. III, 967-974
Available: Paper in PDF

Horacio Saggion, Dragomir Radev, Simone Teufel, Wai Lam, Stephanie M. Strassel
Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Multi-lingual Environment
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: Paper in PDF

2001

Steven Bird, Mark Liberman
A Formal Framework for Linguistic Annotation
Speech Communication 33 (1, 2), pp 23-60
Available: arXiv.org

Steven Bird, Gary Simons
The OLAC Metadata Set and Controlled Vocabularies
ACL-EACL 2001: The 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, July 5-6, pp 7-18
Sharing Tools & Resources Workshop
Available: arXiv.org

Steven Bird, Gary Simons, Chu-Ren Huang
The Open Language Archives Community and Asian Language Resources
NLPRS 2001: 6th Natural Language Processing Pacific Rim Symposium, Tokyo, November 27-30
Workshop on Language Resources in Asia
Available: arXiv.org

Christopher Cieri, Steven Bird
Annotation Graphs, Annotation Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
ACL-EACL 2001: The 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, July 5-6
Sharing Tools & Resources Workshop
Available: Paper in PDF, Slides in PDF

Kazuaki Maeda, Steven Bird
A Framework for Annotating Animal Bioacoustic Data
The 142nd Meeting of the Acoustical Society of America, Chicago, June 4-8

Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools
HLT 2001: The Human Language Technologies Conference, San Diego, March
Available: Paper in PDF

Stephanie Strassel, Christopher Cieri
Data and Annotations for SocioLinguistics: A Corpus-Based Approach to Sociolinguistic Research
PLC 25: Penn Linguistic Colloquium, Philadelphia, March 3-4
Available: Slides in PDF

Stephanie Strassel, Christopher Cieri, Steven Bird
Shared Resources and Community Building for Corpus Linguistics and Language Teaching
Corpus Linguistics and Language Teaching Workshop, Boston, March 23-25

2000

Steve Cassidy, Steven Bird
Querying Databases of Annotated Speech
11th Australasian Database Conference, Canberra, January 31 - 2 February
Available: Paper in PDF

Christopher Cieri
Multiple Annotation of Reuseable Data Resources: Corpora for Topic Detection and Tracking
In Rajman, M. and J. C. Chappelier, eds. (2000) Actes des 5es Journees internationales d'analyse statistique des donnees textuelles, volume 1, Ecole Polytechnique Federale de Lausanne
Available: Paper in PDF

Christopher Cieri
Issues and Tools for Annotating a Corpus of Sociolinguistic Field Data
Linguistic Society of American Annual Meeting, Chicago, January 6-9
Linguistic Exploration Workshop
Available: Slides in PDF

Christopher Cieri, Dave Graff, Mark Liberman, Nii Martey and Stephanie Strassel (2000)
Large Multilingual Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT2 and TDT3 Corpus Efforts
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

Christopher Cieri, David Graff, Nii Martey, Stephanie Strassel
The TDT-3 Text and Speech Corpus
TDT 2000: Topic Detection and Tracking Workshop, Vienna, Virginia, February 28 - March 1
Available: Paper in PDF

Christopher Cieri, Mark Liberman
Issues in Corpus Creation and Distribution: the Evolution of the Linguistic Data Consortium
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

David Graff, Steven Bird
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

Dave Graff, Stephanie Strassel, Christopher Cieri
Resources, New and Forthcoming, from LDC
2000 Speech Transcription Workshop: College Park, Maryland, May 16-19
Available: Slides in PDF

Stephanie Strassel, Dave Graff, Nii Martey, Christopher Cieri
Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

1999

Steven Bird
Multidimensional Exploration of Online Linguistic Field Data
NELS 29: 29th Annual Meeting of the Northeast Linguistics Society, University of Massachusetts at Amherst
Available: Paper in PDF

Steven Bird, Mark Liberman
A Formal Framework for Linguistic Annotation
Technical Report MS-CIS-99-01, Department of Computer and Information Science, University of Pennsylvania
(expanded from version presented at ICSLP-98, Sydney)
Available: Paper in PDF

Steven Bird, Mark Liberman
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis
Towards Standards and Tools for Discourse Tagging Workshop, Somerset, NJ
Association for Computational Linguistics
Available: Paper in PDF

Steven Bird, Stephanie Strassel
Annotated Corpora in Linguistic Research
North American Symposium on Corpora in Linguistics and Language Teaching, University of Michigan, May 21
Available: Slides in PDF

Alexandra Canavan, Kevin Walker, David Graff, Christopher Cieri
Telephone Speech Corpora: New Needs, Languages, Methods and Technology
Hub-5 Conversational Speech Understanding (LVCSR) Workshop, Maritime Institute of Technology and Graduate Studies, Linthicum Heights, Maryland, June
Available: Slides in PDF

Christopher Cieri
This Ain't Your Father's Digital Data: Another Perspective on Legal Information
CALI 1999: The Conference for Law School Computing, Eugene, Oregon, June 17-19
Available: Slides in PDF

Christopher Cieri, David Graff, Mark Liberman, Nii Martey, Stephanie Strassel
The TDT-2 Text and Speech Corpus
DARPA Broadcast News Workshop, Washington, DC, February
Available: Paper in PDF

Xiaoyi Ma, Mark Liberman
BITS: A Method for Bilingual Text Search over the Web
Machine Translation Summit VII: Singapore, September 13-17
Available: Paper in PDF

Xiaoyi Ma
Parallel Text Collections at the Linguistic Data Consortium
Machine Translation Summit VII: Singapore, September 13-17
Available: Paper in PDF

Stephanie Strassel
Corpus Creation and Quality Control at the LDC
Corpus of Spoken Dutch Workshop: Tilburg, November 12
Available: Slides in PDF

Stephanie Strassel, Christopher Cieri
Corpus Sociolinguistics: Issues, Data and Tools
NWAV 28: Toronto, October
Available: Slides in PDF

1998

Steven Bird, Mark Liberman
Towards a Formal Framework for Linguistic Annotations
ICSLP 1998: 5th International Conference on Spoken Language Processing, Sydney, November 30 - December 4
Available: Paper in PDF

Christopher Cieri, David Graff
Topic Detection and Tracking Corpora
TREC/SDR Conference, Gaithersburg Maryland, November

David Graff, Christopher Cieri
Update on Lexical Resources and Projects at the Linguistic Data Consortium
9th Hub-5 Conversational Speech Recognition (LVCSR) Workshop, Maritime Institute Technology and Graduate Studies, Linthicum Heights, Maryland, September

Mark Liberman, Christopher Cieri
The Creation, Distribution and Use of Linguistic Data
LREC 1998: 1st International Conference on Language Resources and Evaluation, Granada, Spain, May 28-30
Available: Paper in PDF