LDC Papers

 

2024

Galit Agmon, Sameer Pradhan, Sharon Ash, Naomi Nevler, Mark Liberman, Murray Grossman, Sunghye Cho
Automated Measures of Syntactic Complexity in Natural Speech Production: Older and Younger Adults as a Case Study
Journal of Speech Language and Hearing Research
Available: Abstract

Ann Bies, Jennifer Tracey, Ann O'Brien, Song Chen, Stephanie Strassel
Spanless Event Annotation for Corpus-Wide Complex Event Understanding
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Song Chen, Jennifer Tracey, Ann Bies, Stephanie Strassel
Schema Learning Corpus: Data and Annotation Focused on Complex Events
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Sunghye Cho, Christopher Olm, Sharon Ash, Sanjana Shellikeri, Galit Agmon, Katheryn Counis, David Irwin, Murray Grossman, Mark Liberman, Naomi Nevler
Automatic classification of AD pathology in FTD phenotypes using natural speech
Alzheimer's & Dementia: The Journal of the Alzheimer's Association
Available: Paper in PDF

Anna Seo Gyeong Choi, Jin-seo Kim, Seo-hee Kim, Min Seok Back, Sunghye Cho 
Crosslinguistic Acoustic Feature-based Dementia Classification Using Advanced Learning Architectures
Fifth Workshop of Resources and Processing of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 21 
Available: Paper in PDF

Denise DiPersio
Selling Personal Information: Data Brokers and the Limits of US Regulation
LEGAL2024 Workshop Legal and Ethical Issues in Human Language Technologies
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20
Available: Paper in PDF

Karen Jones, Kevin Walker, Christopher Caruso, Stephanie Strassel
MAGLIC: The Maghrebi Language Identification Corpus
Odyssey 2024: The Speaker and Language Recognition Workshop 
Quebec, June 18-21
Available: Paper in PDF

Jin-seo Kim, Anna Seo Gyeong Choi and Sunghye Cho
KoFREN: Comprehensive Korean Word Frequency Norms Derived from Large Scale Free Speech Corpora 
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Naomi Nevler, Sunghye Cho, Katheryn Cousins, Sharon Ash, Christopher Olm, Shellikeri, Galit Agmon, Carmen Gonzalez-Recober, Sharon Xie, Megan Barker, Masood Manoochehri, Corey Mcmillan, David Irwin, Lauren Massimo, Laynie Dratch, Gayathri Cheran, Edward D Huey, Stephanie Cosentino, Vivianna Van Deerlin, Mark Liberman, Murray Grossman 
Changes in Digital Speech Measures in Asymptomatic Carriers of Pathogenic Variants Associated With Frontotemporal Degeneration 
Neurology: Volume 102, Number 2
Available: Abstract

Massimo Poesio,  Maciej Ogrodniczuk, Vincent Ng, Sameer Pradhan, Juntao Yu, Nafise Sadat Moosavi, Silviu Paun, Amir Zeldes, Anna Nedoluzhko, Michal Novák, Martin Popel,  Zdeněk Žabokrtský, Daniel Zeman
Universal Anaphora: The First Three Years
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward
My Science Tutor (MyST)–A Large Corpus of Children’s Conversational Speech
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Hongzhi Xu, Jingxia Lin, Sameer Pradhan, Mitchell Marcus, Ming Liu
Annotating Chinese Word Senses with English WordNet: A Practice on OntoNotes Chinese Sense Inventories 
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes
SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution
LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Turin, May 20-24
Available: Paper in PDF

2023

Sunghye Cho, Meredith Cola, Azia Knox, Maggie Rose Pelella, Alison Russell, Aili Hauptmann, Maxine Covello, Christopher Cieri, Mark Liberman, Robert T Schultz, Julia Parish-Morris
Sex differences in the temporal dynamics of autistic children’s natural conversations
Molecular Autism 14, Article number: 13, 2023
Available: Paper in PDF

Carmen Gonzalez‐Recober, Sunghye Cho, Mark Liberman, Murray Grossman, Naomi Nevler
Application of advanced language technologies in analysis of category naming fluency task in healthy participants
Alzheimer's & Dementia, Volume19, Issue S2

Carmen Gonzalez-Recober, Naomi Nevler, Sanjana Shellikeri, Katheryn A. Q. Cousins, Emma Rhodes, Mark Liberman, Murray Grossman, David Irwin, Sunghye Cho
Comparison of category and letter fluency tasks through automated analysis
Frontiers in Psychology, Volume 14, 2023
Available: Paper in PDF

Sanjana Shellikeri, Sunghye Cho, Sharon Ash, Carmen Gonzalez-Recober, Corey T. Mcmillan, Lauren Elman, Colin Quinn, Defne A. Amado, Michael Baer, David J. Irwin, Lauren Massimo, Christopher A. Olm, Mark Liberman, Murray Grossman, Naomi Nevler
Digital markers of motor speech impairments in spontaneous speech of patients with ALS-FTD spectrum disorders

Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration Volume 25, 2024 - Issue 3-4
Available: Abstract

Sunny X. Tang, Katrin Hänsel, Yan Cong, Amir H. Nikzad, Aarush Mehta, Sunghye Cho, Sarah Berretta, Leily Behbehani, Sameer Pradhan, Majnu John, Mark Liberman
Latent Factors of Language Disturbance and Relationships to Quantitative Speech Features
Schizophrenia Bulletin, March 2023
Available Paper in PDF

Jiahong Yuan, Wei Lai, Christopher Cieri, Mark Liberman 
Using Forced Alignment for Phonetics Research
Chinese Language Resources, Springer, pp 289-301
Available: Abstract 

 

2022

Michael Arrigo, Stephanie Strassel, Nolan King, Thao Tran, Lisa Mason 
CAMIO: A Corpus for OCR in Multiple Languages
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Available: Paper in PDF

Martijn Bartelds, Wietse de Vries, Faraz Sanal, Caitlin Richter, Mark Liberman, Martijn Wieling 
Neural representations for modeling variation in English speech
Journal of Phonetics, (92) May 2022
Available: Paper in PDF

Sunghye Cho, Galit Agmon, Sanjana Shellikeri, Katheryn A. Cousins, Sharon Ash, David J. Irwin, Meredith Spindler, Andres F. Deik, Lauren B. Elman, Colin Quinn, Mark Liberman, Murray Grossman, Naomi Nevler
Prosodic characteristics of prepausal words produced by patients with neurodegenerative disease
Proceedings of Speech Prosody 2022, 120-124
Available: Abstract

Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie Strassel, James Fiumara, Jonathan Wright
Reflections on 30 Years of Language Resource Development and Sharing
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Available: Paper in PDF

Denise DiPersio
Data Protection, Privacy and US Regulation
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Legal2022: Legal and Ethical Issues in Human Language Technologies Workshop
Available: Paper in PDF

James Fiumara, Christopher Cieri, Mark Liberman, Chris Callison-Burch (UPenn), Jonathan Wright, Robert Parker
The NIEUW Project: Developing Language Resources through Novel Incentives
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
NIDCP2022: The 2nd Workshop on Novel Incentives in Data from People: models, implementations, challenges and results 
Available: Paper in PDF

Karen Jones, Kevin Walker, Christopher Caruso, Jonathan Wright, Stephanie Strassel
WeCanTalk: A New Multi-language, Multi-modal Resource for Speaker Recognition 
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Available: Paper in PDF

Jianjing Kuang, May Pik Yu Chan, Nari Rhee, Mark Liberman, Hongwei Ding
The mapping between syntactic and prosodic phrasing in English and Mandarin
Interspeech 2022: 23rd Annual Conference of the International Speech Communication Association
Incheon, September 18-22
Available: Paper in PDF
 
Danni Ma, Neville Ryant, Mark Liberman
Inferring pitch from coarse spectral features
183rd Meeting of the Acoustical Society of America
Nashville, December 5-9
Available: Paper in PDF
 
Sameer Pradhan, Mark Liberman
GRAIL—Generalized Representation and Aggregation of Information Layers
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
LAW-XVI: The Sixteenth Linguistic Annotation Workshop
Available: Paper in PDF

Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt, Stephanie Strassel 
A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations 
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Available: Paper in PDF

Jennifer Tracey, Owen Rambow, Claire Cardie, Adam Dalton, Hoa Trang Dang, Mona Diab, Bonnie Dorr, Louise Guthrie, Magdalena Markowska, Smaranda Muresan, Vinodkumar Prabhakaran, Samira Shaikh, Tomek Strzalkowski
BeSt: The Belief and Sentiment Corpus 
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
Available: Paper in PDF

Jonathan Wright
Global TIMIT Thai and /aj/ raising 
NWAV-AP 7: The 7th meeting of the New Ways of Analyzing Variation – Asia Pacific Conference
Bangkok, December 14-16
Available: Slides in PDF

Juhong Zhan, Yue Jiang, Christopher Cieri, Mark Liberman, Jiahong Yuan, Yiya Chen, Odette Scharenborg
Using Mixed Incentives to Document Xi’an Guanzhong
LREC 202213th Edition of the Language Resources and Evaluation Conference
Marseille, June 20-25
NIDCP2022: The 2nd Workshop on Novel Incentives in Data from People: models, implementations, challenges and results 
Available: Paper in PDF

2021

Kenneth Church, Mark Liberman
The Future of Computational Linguistics: On Beyond Alchemy
Frontiers in Artificial Intelligence 4 (2021)
Available: Paper in PDF

Sunghye Cho, Naomi Nevler, Natalia Parjane, Christopher Cieri, Mark Y. Liberman, Murray Grossman, Katheryn A. Q. Cousins 
Automated analysis of digitized letter fluency data
Frontiers in Psychology 12, 654214
Available: Paper in PDF

Christopher Cieri, James Fiumara, Jonathan Wright
Using Games to Augment Corpora for Language Recognition and Confusability 
Interspeech 2021: 22nd Annual Conference of the International Speech Communication Association, August 30-September 3
Available: Paper in PDF

Aditya Joglekar, Seyed Omid Sadjadi, Meena Chandra-Shekar, Christopher Cieri, John H.L. Hansen
Fearless Steps Challenge Phase-3 (FSC P3): Advancing SLT for Unseen Channel and Mission Data Across NASA Apollo Audio 
Interspeech 2021: 22nd Annual Conference of the International Speech Communication Association, August 30-September 3
Available: Paper in PDF

Danni Ma, Neville Ryant, Mark Liberman
Probing Acoustic Representations for Phonetic Properties

ICASSP 2021: International Conference on Acoustics, Speech and Signal Processing
Virtual Conference, June 6-11
Available: Paper in PDF

Natalia Parjane, Sunghye Cho, Sharon Ash, Katheryn A. Q. Cousins, Sanjana Shellikeri, Mark Liberman, Leslie M. Shaw, David J. Irwin, Murray Grossman, Naomi Nevler
Digital speech analysis in progressive supranuclear palsy and corticobasal syndromes 
Journal of Alzheimer's Disease 82(1), 33-45

Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman
The Third DIHARD Diarization Challenge
Interspeech 2021: 22nd Annual Conference of the International Speech Communication Association, August 30-September 3
Available: Paper in PDF

Sunny X Tang, Reno Kriz, Sunghye Cho, Suh Jung Park, Jenna Harowitz, Raquel E. Gur, Mahendra T. Bhati, Daniel H. Wolf, João Sedoc, Mark Y. Liberman
Natural language processing methods are sensitive to sub-clinical linguistic differences in schizophrenia spectrum disorders
NPJ Schizophrenia 7, no. 15 
Available: Paper in PDF 

Jonathan Wright, Robert Parker, Jeremy Zehr, Neville Ryant, Mark Liberman, Christopher Cieri, James Fiumara
High quality recordings and transcriptions of speech via remote platforms
ASA 2021: 181st Meeting of the Acoustical Society of America, November 29-December 3, 2021
Available: Poster in PDF

2020

Martijn Bartelds, Caitlin Richter, Mark Liberman, Martijn Wieling.
A new acoustic-based pronunciation distance measure
Frontiers in Artificial Intelligence 3 (2020): 39
Available: Paper in PDF

Sunghye Cho, Mark Y. Liberman, Christopher Cieri, Neville Ryant, Meredith Cola, Victoria Petrulla, Lisa Yankowitz, Juhi Pandey, Robert Schultz Julia Parrish-Morris
Talking more and pausing less: Girls with ASD behave differently during brief natural conversation 
INSAR 2020: International Society for Autism Research Annual Meeting
Virtual Meeting, June 3
Available: Poster in PDF 

Sunghye Cho, Naomi Nevler, Sanjana Shellikeri, Natalia Parjane, David J. Irwin, Neville Ryant, Sharon Ash, Christopher Cieri, Mark Liberman, Murray Grossman
Automated lexical and acoustic analysis of young and older healthy adults
AAIC 2020: Alzheimer's Association International Conference 
Virtual Conference, July 27-31
Available: Poster in PDF 

Sunghye Cho, Naomi Nevler, Sanjana Shellikeri, Natalia Parjane, David J. Irwin, Neville Ryant, Sharon Ash, Christopher Cieri, Mark Liberman, Murray Grossman
Lexical and acoustic characteristics of young and older healthy adults
Journal of Speech, Language, and Hearing Research 64(2), 302-314

Sunghye Cho, Naomi Nevler, Sanjana Shellikeri, Sharon Ash, Mark Liberman, Murray Grossman
Automatic Classification of Primary Progressive Aphasia Patients Using Lexical and Acoustic Features 
LREC 2020: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16 
3rd RaPID Workshop: Resources and Processing of Linguistic, Para-linguistic and Extra-linguistic Data from People with Various Forms of Cognitive/Psychiatric/Developmental Impairments
Available: Paper in PDF 

Sunghye Cho, Naomi Nevler, Sharon Ash, Mark Liberman, Murray Grossman
Automatic Analysis of Lexical Features in Speech of Patients with Primary Progressive Aphasia 
AAN 2020: American Academy of Neurology Annual Meeting
Toronto, April 25-May 1
Available: Slides in PDF 

Sunghye Cho, Naomi Nevler, Sharon Ash, Sanjana Shellikeri, David J. Irwin, Lauren Massimo, Katya Rascovsky, Christopher Olm, Murray Grossman, Mark Liberman
Automated analysis of lexical features in Frontotemporal Degeneration
Cortex: 137, 215-231

Sunghye Cho, Sanjana Shellikeri, Sharon Ash, Mark Liberman, Murray Grossman, Naomi Nevler 
Automatic analysis of natural speech in patients with Alzheimer's disease 
SNL 2020: Society of Neurobiology of Language Annual Meeting
Virtual, October 21-24
Available: Poster in PDF

Christopher Cieri
Stretching Disciplinary Boundaries in Language Resource Development and Use: a Linguistic Data Consortium Position Paper
LREC 2020: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF

Christopher Cieri, James Fiumara
LanguageARC – a tutorial

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
CLLRD Workshop
: Citizen Linguistics in Language Resource Development
Available: Paper in PDF

Christopher Cieri, James Fiumara, Stephanie Strassel, Jon Wright, Denise DiPersio, Mark Liberman
A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC Community

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF

Dana Delgado, Kevin Walker, Stephanie Strassel, Karen Jones, Christopher Caruso, David Graff
The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF

James Fiumara, Christopher Cieri, Jonathan Wright, Mark Liberman
LanguageARC: Developing Language Resources Through Citizen Linguistics

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
CLLRD Workshop
: Citizen Linguistics in Language Resource Development
Available: Paper in PDF

Daniel Jaquette, Christopher Cieri, Denise DiPersio
Related Works in the Linguistic Data Consortium Catalog

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF          

Karen Jones, Stephanie Strassel, Kevin Walker, Jonathan Wright
Call My Net 2: A New Resource for Speaker Recognition

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF

Justin Mott, Ann Bies, Stephanie Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitch Marcus
Morphological Segmentation for Low Resource Languages

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
Available: Paper in PDF

Naomi Nevler, Sharon Ash, Corey McMillan, Lauren Elman, Leo McCluskey, David J. Irwin, Sunghye Cho, Mark Liberman, Murray Grossman
Automated Analysis of Natural Speech in Amyotrophic Lateral Sclerosis  
AAN 2020: American Academy of Neurology Annual Meeting
Toronto, April 25-May 1
Available: Slides in PDF  

Naomi Nevler, Sharon Ash, Sunghye Cho, Sanjana Shellikeri, David J. Irwin, Mark Liberman, Murray Grossman 
A longitudinal study of automated acoustic speech marker in FTD & PPA. 
AAIC 2020: Alzheimer's Association International Conference 
Virtual Conference, July 27-31
Available: Poster in PDF

Naomi Nevler, Sharon Ash, Sunghye Cho, Sanjana Shellikeri, Natalia Parjane, David J. Irwin, Mark Liberman, Murray Grossman
Automated semantic speech analysis in AD and lvPPA
AAIC 2020: Alzheimer's Association International Conference 
Virtual Conference, July 27-31
Available: Poster in PDF 

Naomi Nevler, Sharon Ash, Corey T. McMillan, Lauren Elman, Leo McCluskey, David J Irwin, Sunghye Cho, Mark Y Liberman, Murray Grossman
Automated analysis of natural speech in amyotrophic lateral sclerosis spectrum disorder 
Neurology: 95(12), e1629 - e1639

Natalia Parjane, Sharon Ash, Sunghye Cho, Mark Liberman, Murray Grossman, Naomi Nevler
Acoustic Prosodic Measures in Natural Speech of Progressive Supranuclear Palsy and Corticobasal Spectrum Disorders 
AAN 2020: American Academy of Neurology Annual Meeting
Toronto, April 25-May 1
Available: Slides in PDF 

Natalia Parjane, Sharon Ash, Sunghye Cho, Sanjana Shellikeri, Mark Liberman, Murray Grossman, Naomi Nevler
Acoustic measures in natural speech of progressive supranuclear palsy and corticobasal spectrum disorders
AAIC 2020: Alzheimer's Association International Conference 
Virtual Conference, July 27-31
Available: Poster in PDF 

Sanjana Shellikeri, Sunghye Cho, Sharon Ash, Natalia Parjane, Lauren Elman, Corey McMillan, Murray Grossman, Naomi Nevler
Longitudinal changes of automated speech measures in natural connected speech in ALS
AAIC 2020: Alzheimer's Association International Conference 
Virtual Conference, July 27-31
Available: Poster in PDF 

Sanjana Shellikeri, Sunghye Cho, Erica Howard, David Irwin, Murray Grossman, Naomi Nevler
Automatic analysis of natural speech in Lewy body spectrum disorders with Alzheimer's disease co-pathology
SNL 2020: Society of Neurobiology of Language Annual Meeting
Virtual, October 21-24
Available: Poster in PDF

Sunny Tang, Sunghye Cho, Reno Kriz, Olivia Franco, Jenna Harowitz, Suh Jung Park, Raquel Gur, Lyle Ungar, Mahendra Bhati, Christian Kohler, Monica Calkins, Daniel Wolf, João Sedoc, Mark Liberman
Language and Communication in Psychosis: Digital Tools as Novel Opportunities for Biomarker and Intervention 
ASCP 2020: American Society of Clinical Psychopharmacology Annual Meeting
Virtual Conference, May 29-30
Available: Poster in PDF 

Jennifer Tracey, Stephanie Strassel
Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs

LREC 2020
: 12th Edition of the Language Resources and Evaluation Conference
Marseille, May 11-16
1st Joint SLTU and CCURL Workshop
: Spoken Language Technologies for Under-resourced languages and Collaboration and Computing for Under-Resourced Languages
Available: Paper in PDF

2019

Sunghye Cho, Mark Liberman, Yong-cheol Lee
Automatic Detection of Prosodic Focus in American English

Interspeech 2019:
20th Annual Conference of the International Speech Communication Association
Graz, September 15-19
Available: Paper in PDF, Poster in PDF

Sunghye Cho, Mark Liberman, Neville Ryant, Keith Bartley, Robert T. Schultz, Julia Parish-Morris
Machine Learning Classification of Natural Conversational Utterances Using Acoustic Features Drawn from Children with ASD and Typical Controls
INSAR 2019: International Society for Autism Research
Montreal, May 1-4
Available: Poster in PDF

Sunghye Cho, Mark Liberman, Neville Ryant, Meredith Cola, Robert T. Schultz, Julia Parish-Morris
Automatic detection of Autism Spectrum Disorder in children using acoustic and text features from brief natural conversations

Interspeech 2019:
20th Annual Conference of the International Speech Communication Association
Graz, September 15-19
Available: Paper in PDF

Christopher Cieri, Denise DiPersio
The Linguistic Data Consortium: Developing and Sharing Resources for Indigenous Languages
University of Pennsylvania, Philadelphia, October 26
Available: Slides in PDF
 
Christopher Cieri, Denise DiPersio, James Fiumara
LDC Data Clinic
NWAV48: New Ways of Analyzing Variation
Eugene, October 10-12
Available: Slides in PDF, Video

Christopher Cieri, Jonathan Wright, James Fiumara, Alex Shelmire and Mark Liberman
LanguageARC: Using Citizen Science to Augment Sociolinguistic Data Collection and Coding

NWAV48
: New Ways of Analyzing Variation
Eugene, October 10-12
Available: Poster in PDF

Denise DiPersio, Christopher Cieri
The Linguistic Data Consortium: Developing and Distributing Language Resources4All
LT4All: International Conference on Language Technologies for All: Enabling Linguistic Diversity and Multilingualism Worldwide
UNESCO Headquarters, Paris, December 4-6
 
Denise DiPersio, Daniel Jaquette
The LDC Catalog: A Curated Repository of Language Resources
RDA P13: Research Data Alliance 13th Plenary Meeting
Philadelphia, April 2-4
Available: Poster in PDF

Julia Parish-Morris, Sunghye Cho, Mark Liberman, Neville Ryant, Keith Bartley, Meredith Cola, Samantha Plate, Lisa Yankowitz, Victoria Petrulla, Amanda Riiff, Casey Zampella, John Herrington, Evangelos Sariyanidi, Birkan Tunc, Elizabeth Kim, Ashley de Marchena, Juhi Pandey, Robert T. Schultz
‘Autistic’-Sounding: A Latent Class Linear Mixed Modeling Approach to Parsing Heterogeneity in Children’s Natural Conversations using Acoustic Properties of Speech
INSAR 2019: International Society for Autism Research
Montreal, May 1-4
Available: Poster in PDF

Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Mark Liberman
The Second DIHARD Diarization Challenge: Dataset, task, and baselines

Interspeech 2019:
20th Annual Conference of the International Speech Communication Association
Graz, September 15-19
Available: Paper in PDF

 

2018

Elika Bergelson, Kenneth Church, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Sanjeev Khudanpur, Diana Kowalski, Mahesh Krishnamoorthy, Rajat Kulshreshta, Mark Liberman, Yu-Ding Lu, Matthew Maciejewski, Florian Metze, Jan Profant, Neville Ryant, Lei Sun, Yu Tsao, Zhou Yu
Enhancement and Analysis of Conversational Speech: JSALT 2017 
ICASSP 2018: International Conference on Acoustics, Speech and Signal Processing
Calgary, April 15-20
Available: Paper in PDF
 
Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan, Yuqing Zhan
GlobalTIMIT: Acoustic-Phonetic Datasets for the World's Languages
Interspeech 2018: 19th Annual Conference of the International Speech Communication Association
Hyderabad, September 2-6
Available: Paper in PDF
 
Christopher Cieri, James Fiumara
LingoBoingo: Joining Forces to Promote Games for Linguistic Research and Technology Development
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Games4NLP Workshop: Games and Gamification for Natural Language Processing
Available: Slides in PDF

Christopher Cieri, James Fiumara, Mark Liberman, Chris Callison-Burch, Jonathan Wright
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF
 
Christopher Cieri, Mark Liberman, Stephanie Strassel, Denise DiPersio, Jonathan Wright, Andrea Mazzucchi, James Fiumara
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF
 
Denise DiPersio
A US Perspective on Selected Legal and Ethical Issues Affecting the Development of Language Resources and Related Technology
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Legal Issues & Ethics Workshop
Available: Paper in PDF, Slides in PDF
 
Jeremy Getman, Joe Ellis, Stephanie Strassel, Zhiyi Song, Jennifer Tracey
Laying the Groundwork for Knowledge Base Population: Nine Years of Linguistic Resources for TAC KBP
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF
 
Kira Griffitt, Jennifer Tracey, Ann Bies, Stephanie Strassel
Simple Semantic Annotation and Situation Frames: Two Approaches to Basic Text Understanding in LORELEI
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF

Tim O’Gorman, Michael Regan, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Martha Palmer
AMR Beyond the Sentence: the Multi-sentence AMR corpus
COLING: 27th International Conference on Computational Linguistics
New Mexico, August 20-26
Available: Paper in PDF

Zhiyi Song, Ann Bies, Justin Mott, Xuansong Li, Stephanie Strassel, Christopher Caruso
Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers 
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF
 
Jennifer Tracey, Stephanie Strassel
VAST: A Corpus of Video Annotation for Speech Technologies
LREC 2018: 11th Edition of the Language Resources and Evaluation Conference
Miyazaki, May 7-12
Available: Paper in PDF, Poster in PDF
 
Jiahong Yuan, Wei Lai, Chris Cieri, Mark Liberman 
Using Forced Alignment for Phonetics Research
Chinese Language Resources and Processing: Text, Speech and Language Technology, Springer
Available: Paper in PDF
 

2017 

Nancy Minyanou, Leila Bateman Christopher Cieri, Mark Liberman, Neville Ryant, Jessica Brown, Elizabeth Kim, Zach Dravis, Emily Ferguson, Keith Bartley, Alison Pomykacz, Juhi Pandey, Ashley de Marchena, Robert Schultz, and Julia Parish-Morris
Introducing a Novel Community-Based Assessment Tool: The Computerized Social Affective Language Task (C-SALT)
IMFAR 2017: International Meeting on Autism Research, San Francisco, May 10-13
Available: Poster in PDF

Julia Parish-Morris, Mark Liberman, Christopher Cieri, John Herrington, Benjamin Yerys, Leila Batman, Joseph Donaher, Emily Ferguson, Juhi Pandey, Robert Schultz
Linguistic Camouflage in Girls with Autism Spectrum Disorder
Molecular Autism published 30 September 2017
Available: Paper in PDF

Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright
Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology
Interspeech 2017: 18th Annual Conference of the International Speech Communication Association, Stockholm, August 20-24
Available: Paper in PDF

Stephanie Strassel, Ann Bies and Jennifer Tracey
Situational Awareness for Low Resource Languages: the LORELEI Situation Frame Annotation Task
SMERP 2017: First Workshop on Exploitation of Social Media for Emergency Relief and Preparedness, Aberdeen, April 9
Available: Paper in PDF

Jiahong Yuan, Hongwei Ding, Sishi Liao, Yuqing Zhan, Mark Liberman
Chinese TIMIT: A TIMIT-Like Corpus of Standard Chinese
O-COCOSDA 2017: Oriental COCOSDA international conference, Seoul, November 1-3
Available: Paper in PDF

2016

Ann Bies, Zhiyi Song, Jeremy Getman, Joe Ellis, Justin Mott, Stephanie Strassel, Martha Palmer, Teruko Mitamura, Marjorie Freedman, Heng Ji, Tim O'Gorman
A Comparison of Event Representations in DEFT
NAACL HLT 2016: 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The 4th Workshop on EVENTS: Definition, Detection, Coreference, and Representation
San Diego, June 17
Available: Paper in PDF

Christopher Cieri
Novel Incentives in Language Resource Development
LREC 2016 Workshop: Novel Incentives for Collecting Data and Annotation from People: types, implementation, tasking requirements, workflow and results
Portoroz, May 28
Available: Paper in PDF, Slides in PDF  

Christopher Cieri, Mike Maxwell, Stephanie Strassel, Jennifer Tracey 
Selection Criteria for Low Resource Language Programs 
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF, Poster in PDF

Denise DiPersio, Christopher Cieri
Trends in HLT Research: A Survey of LDC's Data Scholarship Program
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF, Slides in PDF

Denise DiPersio, Christopher Cieri, Daniel Jaquette
Data Management Plans and Data Centers
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF, Poster in PDF

Joe Ellis, Jeremy Getman, Neil Kuster, Zhiyi Song, Ann Bies, & Stephanie M. Strassel
Overview of Linguistic Resources for the TAC KBP 2016 Evaluations: Methodologies and Results 
TAC KBP 2016 Workshop: National Institute of Standards and Technology
Gaithersburg, November 14-15 
Available: Paper in PDF

Kira Griffitt, Stephanie Strassel 
The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval 
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Nancy Ide, Keith Suderman, James Pustejovsky, Marc Verhagen, Christopher Cieri, Eric Nyberg
The Language Application Grid and Galaxy
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF, Slides in PDF

Karen Jones, Stephanie Strassel, Kevin Walker, David Graff and Jonathan Wright
Multi-language Speech Collection for NIST LRE
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Jianjing Kuang, Yixuan Guo, Mark Liberman
Voice quality as a pitch-range indicator
Speech Prosody 2016: 8th Speech Prosody Conference, Boston May May 31- June 3
Availbale: Paper in PDF 

Jianjing Kuang, Mark Liberman
Pitch-range perception: the dynamic interaction between voice quality and fundamental frequency
Interspeech 2016: 17th Annual Conference of the International Speech Communication Association, San Francisco, CA September 8-12
Available: Paper in PDF

Jianjing Kuang, Mark Liberman
The effect of vocal fry on pitch perception
ICASSP 2016: the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, March 20-25
Available: Paper in PDF

Seth Kulick, Ann Bies
Rapid Development of Morphological Analyzers for Typologically Diverse Languages
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Wei Lai, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Liberman
Prosodic Strength Intrinsic to Lexical Items: A Corpus Study of Tone Reduction in Tone4+Tone4 Words in Mandarin Chinese
ISCSLP 2016: The 10th International Symposium on Chinese Spoken Language Processing
Tianjin, October 17-20
Available: Paper in PDF

Wei Lai, Jiahong Yuan, Ya Li, Xiaoying Xu, Mark Liberman
The rhythmic constraint on prosodic boundaries in Mandarin Chinese based on corpora of silent reading and speech perception
Interspeech 2016: 17th Annual Conference of the International Speech Communication Association
San Francisco, September 8-12
Available: Paper in PDF

Xuansong Li, Martha Palmer, Nianwen Xue, Lance Ramshaw, Mohamed Maamouri, Ann Bies, Kathryn Conger, Stephen Grimes, Stephanie Strassel
Large Multi-lingual, Multi-level, Multi-genre Annotation Corpus
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Xuansong Li, Jennifer Tracey, Stephen Grimes, Stephanie Strassel
Uzbek-English and Turkish-English Morpheme Alignment Corpora 
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Justin Mott, Ann Bies, Zhiyi Song, Stephanie Strassel
Parallel Chinese-English Entities, Relations and Events Corpora 
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Julia Parish-Morris, Christopher Cieri, Mark Liberman, Leila Bateman, Emily Ferguson, Robert T. Schultz
Building Language Resources for Exploring Autism Spectrum Disorders
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF, Slides in PDF

Julia Parish-Morris, Christopher Cieri, Mark Liberman, Neville Ryant, Robert T. Schultz
Linguistic Markers of Autism Spectrum Disorder: Classification Sensitivity and Specificity of Language Produced During Clinical Evaluation
IMFAR 2016: International Meeting for Autism Research, Baltimore, May 11-14
Available: Slides in PDF 

Julia Parish-Morris, Mark Liberman, Neville Ryant, Christopher Cieri, Leila Bateman, Emily Ferguson, Robert T. Schultz
Exploring Autism Spectrum Disorders Using HLT
CLPsych 2016: The Third Computational Linguistics and Clinical Psychology Workshop
San Diego, June 16
Available: Paper in PDF

Stephanie Strassel, Jennifer Tracey
LORELEI Language Packs: Data, Tools, and Resources for  Technology Development in Low Resource Languages
LREC 2016: 10th Edition of the Language Resources and Evaluation Conference
Portoroz, May 23-28
Available: Paper in PDF

Neville Ryant, Mark Liberman
Automatic Analysis of Phonetic Speech Style Dimensions
Interspeech 2016: 17th Annual Conference of the International Speech Communication Association, San Francisco, September 8-12
Available: Paper in PDF
 
Neville Ryant, Mark Liberman
Large-scale analysis of Spanish /s/-lenition using audiobooks
ICA2016: 22nd International Congress on Acoustics, Buenos Aires
Argentina, September 5-9
Available: Paper in PDF 

Ian Soboroff, Kira Griffitt, Stephanie Strassel
The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums
SIGIR 2016: 39th Annual International ACM Special Interest Group on Information Retrieval Conference, Piza, Italy, July 17-21
Available: Paper in PDF, Poster in PDF

Zhiyi Song , Ann Bies , Stephanie Strassel , Joe Ellis , Teruko Mitamura , Hoa Dang , Yukari Yamakawa , Sue Holm
Event Nugget and Event Coreference Annotation
NAACL HLT 2016: 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The 4th Workshop on EVENTS: Definition, Detection, Coreference, and Representation
San Diego, June 17
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Phoneme, Phone Boundary, and Tone in Automatic Scoring of Mandarin Proficiency
Interspeech 2016: 17th Annual Conference of the International Speech Communication Association, San Francisco, September 8-12
Available: Paper in PDF

Jiahong Yuan, Xiaoying Xu, Wei Lai, Mark Liberman
Pauses and Pause Fillers in Mandarin Monologue Speech: The Effects of Sex and Proficiency
Speech Prosody 2016: 8th Speech Prosody Conference, Boston, May May 31- June 3
Availbale: Paper in PDF

2015

Ann Bies
Balancing the Existing and the New in the Context of Annotating Non-Canonical Language
NAACL HLT: Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies
LAW IX:The 9th Linguistic Annotation Workshop, Denver, June 5
Available: Paper in PDF

Christopher Cieri, Denise DiPersio
A License Scheme for a Global Federated Language Service Infrastructure
WLSI 2015: The Second International Workshop on Worldwide Language Service Infrastructure, Kyoto, January 22-23
Available: Paper in PDF, Slides in PDF 

Joe Ellis, Jeremy Getman, Dana Fore, Neil Kuster, Zhiyi Song, Ann Bies, Stephanie M. Strassel
Overview of Linguistic Resources for the TAC KBP 2015 Evaluations: Methodologies and Results
TAC KBP Workshop 2015: National Institute of Standards and Technology
Gaithersburg, November 16-17
Available: Paper in PDF

Jianjing Kuang, Mark Libermann
The Effect of Spectral Slope on Pitch Perception
Interspeech 2015: 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10
Available: Paper in PDF 

Teruko Mitamura, Yukari Yamakawa, Susan Holm, Zhiyi Song, Ann Bies, Seth Kulick, Stephanie Strassel   
Event Nugget Annotation: Processes and Issues
NAACL HLT 2015: Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies
3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation
Denver, June 4
Available: Paper in PDF 

Vinodkumar Prabhakaran, Tomas By, Julia Hirschberg, Owen Rambow, Samira Shaikh, Tomek Strzalkowski, Jennifer Tracey, Michael Arrigo, Rupayan Basu, Micah Clark, Adam Dalton, Mona Diab, Louise Guthrie,  Anna Prokofieva, Stephanie Strassel, Gregory Werner, Janyce Wiebe and Yorick Wilks
A New Dataset and Evaluation for Belief/Factuality
*SEM 2015:Fourth Joint Conference on Lexical and Computational Semantics
Denver, June 4-5
Available: Paper in PDF

Zhiyi Song, Ann Bies, Stephanie Strassel, Tom Riese, Justin Mott, Joe Ellis, Jonathan Wright, Seth Kulick, Neville Ryant and Xiaoyi Ma
From Light to Rich ERE: Annotation of Entities, Relations, and Events
NAACL HLT 2015: 14th Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies
3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation
Denver, June 4
Available: Paper in PDF 

Mariona Taulé, M Antonia Martí, Ann Bies, Aina Garí, Montserrat Nofre, Zhiyi Song, Stephanie Strassel and Joe Ellis
Spanish Treebank Annotation of Informal Non-Standard Web Text
NLPIT 2015 1st International Workshop on Natural Language Processing for Informal Text, Rotterdam, June 23
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Investigating Consonant Reduction in Mandarin Chinese with Improved Forced Alignment
Interspeech 2015: 16th Annual Conference of the International Speech Communication Association, Dresden, September 6-10
Available: Paper in PDF

Jiahong Yuan, Xiaoying Xu, Wei Lai, Weiping Ye, Xinru Zhao, Mark Liberman
Sentence Selection for Automatic Scoring of Mandarin Proficiency
SIGHAN 2015: The 8th SIGHAN Workshop on Chinese Language Processing at ACL-IJCNLP 2015, Beijing, July 30-31
Available: Paper in PDF

Yong-cheol Lee, Bei Wang, Sisi Chen, Martine Adda-Decker, Angélique Amelot, Satoshi Nambu, Mark Liberman
A Crosslinguistic Study of Prosodic Focus
ICASSP 2015: the 40th IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, April 19-24
Available: Paper in PDF

2014

Jacqueline Aguilar, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song, Joe Ellis
A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards
ACL 2014: 52nd Annual Meeting of the Association for Computational Linguistics
Baltimore, June 22-27
2nd Workshop on Events: Definition, Detection, Coreference, and Representation
Available: Paper in PDF, Poster in PDF

Ann Bies, Justin Mott, Seth Kulick, Jennifer Garland, Colin Warner
Incorporating Alternate Translations into English Translation Treebank
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF, Poster in PDF 

Ann Bies, Zhiyi Song, Mohamed Maamouri, Stephen Grimes, Haejoong Lee, Jonathan Wright, Stephanie Strassel, Nizar Habash, Ramy Eskander, Owen Rambow
Transliteration of Arabizi into Arabic Orthography: Developing a Parallel Annotated Arabizi-Arabic Script SMS/Chat Corpus
EMNLP 2014: Conference on Empirical Methods on Natural Language Processing
Doha, October 25-29
ANLP Workshop: Arabic Natural Language Processing Workshop
Available: Paper in PDF

Steven Bird, Lauren Gawne, Katie Gelbart, Isaac McAlister
Collecting Bilingual Audio in Remote Indigenous Villages
COLING 2014: 25th International Conference on Computational Linguistics
Dublin, August 23-29
Available: Paper in PDF

Steven Bird, Florian R. Hanke, Oliver Adams, Haejoong Lee
Aikuma: A Mobile App for Collaborative Language Documentation
ACL 2014: 52nd Annual Meeting of the Association for Computational Linguistics
Baltimore, June 22-27
ComputEL Workshop: Use of Computational Methods in the Study of Endangered Languages
Available: Paper in PDF

Steven Bird, Haejoong Lee
Computational support for early elicitation and classification of tone
Language Documentation and Conservation 8, 453—461
Available: Paper in PDF

Christopher Cieri, Denise DiPersio
Intellectual Property Rights Management with Web Service Grids
COLING 2014: 25th International Conference on Computational Linguistics
Dublin, August 23-29
OIAF4HLT Workshop: Open Infrastructures and Analysis Frameworks for HLT
Available: Paper in PDF

Christopher Cieri, Denise DiPersio, Mark Liberman, Andrea Mazzucchi, Stephanie Strassel, Jonathan Wright
New Directions for Language Resource Development and Distribution
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF

Christopher Cieri, Malcah Yeager-Dror
Challenges and Alternative Data Sets 
Methods in Dialectology XV: 15th Triennial Conference on Regional, Historical and Social Language Variation, Groningen, August 11-15
Available: Slides in PDF

Christopher Cieri, Mark Liberman
Dimensions of Speaker Recognition Research Data
LSA 2014: 88th Annual Meeting of the Linguistic Society of America
Minneapolis, January 2-5
Available: Slides in PDF

Joe Ellis, Jeremy Getman, Stephanie M. Strassel
Overview of Linguistic Resources for the TAC KBP 2014 Evaluations: Planning, Execution, and Results 
TAC KBP 2014 Workshop: National Institute of Standards and Technology
Gaithersburg, November 17-18
Available: Paper in PDF

David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, Ann Sawyer 
The RATS Collection: Supporting HLT Research with Degraded Audio Data 
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF, Slides in PDF

Nancy Ide, James Pustejovsky, Christopher Cieri, Eric Nyberg, Denise DiPersio, Keith Suderman, Marc Verhagen Di Wang, Jonathan Wright
The Language Application Grid
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF

Seth Kulick, Ann Bies, Justin Mott
Inter-annotator Agreement for ERE Annotation
ACL 2014: 52nd Annual Meeting of the Association for Computational Linguistics
Baltimore, June 22-27
2nd Workshop on EVENTS: Definition, Detection, Coreference, and Representation
Available: Poster in PDF

Seth Kulick, Ann Bies, Justin Mott, Anthony Kroch, Mark Liberman, Beatrice Santorini
Parser Evaluation Using Derivation Trees: A Complement to evalb
ACL 2014: 52nd Annual Meeting of the Association for Computational Linguistics
Baltimore, June 22-27
Available: Paper in PDF

Penny Labropoulou, Christopher Cieri and Maria Gavrilidou
Developing a Framework for Describing Relations among Language Resources
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick, Michael Ciul, Nizar Habash and Ramy Eskander
Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF, Poster in PDF

Joseph Mariani, Christopher Cieri, Gil Francopoulo, Patrick Paroubek, Marine Delaborde
Facing the Identification Problem in Language-Related Scientific Data Analysis
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF

Neville Ryant, Malcolm Slaney, Mark Liberman, Elizabeth Shriberg,  Jiahong Yuan, 
Highly Accurate Mandarin Tone Classification in the Absence of Pitch Information
Speech Prosody 2014: 7th Speech Prosody Conference, Dublin, May 20-23
Available: Paper in PDF 

Neville Ryant, Jiahong Yuan, Mark Liberman
Mandarin Tone Classification without Pitch Tracking
ICASSP 2014: 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, May 4-9
Available: Paper in PDF

Zhiyi Song, Stephanie Strassel, Haejoong Lee, Kevin Walker, Jonathan Wright, Jennifer Garland, Dana Fore, Brian Gainor, Preston Cabe, Thomas Thomas, Brendan Callahan, Ann Sawyer 
Collecting Natural SMS and Chat Conversations in Multiple Languages: The BOLT Phase 2 Corpus 
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF, Poster in PDF

Andreas Stolcke, Neville Ryant, Vikramjit Mitra, Jiahong Yuan, Wen Wang, Mark Liberman
Highly Accurate Phonetic Segmentation Using Boundary Correction Models and System Fusion
ICASSP 2014: 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, May 4-9
Available: Paper in PDF

Jonathan Wright
RESTful Annotation and Efficient Collaboration
LREC 2014: 9th Edition of the Language Resources and Evaluation Conference
Reykjavik, May 26-31
Available: Paper in PDF

Jiahong Yuan, Neville Ryant, Mark Liberman
Automatic Phonetic Segmentation in Mandarin Chinese: Boundary Models, Glottal Features and Tone
ICASSP 2014: 39th  IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, May 4-9
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
F0 Declination in English and Mandarin Broadcast News Speech
Speech Communication Journal, Volume 65
Available: Paper in PDF

2013

Violetta Cavalli-Sforza, Hind Saddiki, Karim Bouzoubaa, Lahsen Abouenour, Mohamed Maamouri, Emily Goshey
Bootstrapping a WordNet for an Arabic Dialect from Other WordNets and Dictionary Resources
AICCSA 2013: 10th ACS/IEEE International Conference on Computer Systems and Applications
Fes, May 27-30
Available: Paper in PDF

Christopher Cieri
Sharing, Structuring and Processing Data: Part 1: Advantages and Challenges
Sharing, Structuring and Processing Data Workshop
NWAV42: New Ways of Analyzing Variation, Pittsburgh, October 17-20
Available: Slides in PDF, Video clip

Joe Ellis, Jeremy Getman, Justin Mott, Xuansong Li, Kira Griffitt, Stephanie M. Strassel, Jonathan Wright
Linguistic Resources for 2013 Knowledge Base Population Evaluations
TAC KBP 2013 Workshop: National Institute of Standards and Technology
Gaithersburg, November 18-19
Available: Paper in PDF

Ramy Eskander, Nizar Habash, Ann Bies, Seth Kulick, Mohamed Maamouri
Automatic Correction and Extension of Morphological Annotations
ACL 2013: 50th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Sofia, August 4-9
The 7th Linguistic Annotation Workshop & Interoperability with Discourse
Available: Paper in PDF

Seth Kulick, Ann Bies, Justin Mott, Mohamed Maamouri, Beatrice Santorini, Anthony Kroch
Using Derivation Trees for Informative Treebank Inter-Annotator Agreement Evaluation
NAACL 2013: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, June 9-15
Available: Paper in PDF

Mark Liberman, Jiahong Yuan, Andreas Stolcke, Wen Wang, Vikramjit Mitra
Using Multiple Versions of Speech Input in Phone Recognition
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing
Vancouver, May 26-31
Available: Paper in PDF

Vikramjit Mitra, Wen Wang, Andreas Stolcke, Hosung Nam, Colleen Richey, Jiahong Yuan,  Mark Liberman 
Articulatory Trajectories for Larger-Vocabulary Speech Recognition
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing
Vancouver, May 26-31
Available: Paper in PDF

Neville Ryant, Mark Liberman, Jiahong Yuan
Speech Activity Detection on YouTube Using Deep Neural Networks
Interspeech 2013: 14th Annual Conference of the International Speech Communication Association, Lyon, August 25-29
Available: Paper in PDF

Neville Ryant, Mark Liberman, Jiahong Yuan
Automating Phonetic Measurement: The Case of Voice Onset Time
ICA 2013: 21st International Congress on Acoustics, Montreal, June 2-7
Available: Paper in PDF

Neville Ryant, Jiahong Yuan, Mark Liberman
Scale-Space Expansion of Acoustic Features Improves Speech Event Detection
ICASSP 2013: 38th International Conference on Acoustics, Speech, and Signal Processing
Vancouver, May 26-31
Available: Paper in PDF

Wen Wang, Andreas Stolcke, Jiahong Yuan, Mark Liberman
A Cross-language Study on Automatic Speech Disfluency Detection
NAACL-HLT 2013: The Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, June 9-15
Available: Paper in PDF

Jiahong Yuan, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, Wen Wang
Automatic Phonetic Segmentation Using Boundary Models
Interspeech 2013: 14th Annual Conference of the International Speech Communication Association, Lyon, August 25-29
Available: Paper in PDF

2012

Eleftheria Ahtaridis, Christopher Cieri, Denise DiPersio
LDC Language Resource Papers: Building a Bibliographic Database
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Ann Bies, Denise DiPersio, Mohamed Maamouri
Linguistic resources for Arabic machine translation, The Linguistic Data Consortium (LDC) Catalog
In Abdelhadi Soudi, et al., Challenges for Arabic Machine Translation
Available: John Benjamins Publishing Company

Christopher Cieri, Marian Reed, Denise DiPersio, Mark Liberman
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF

Christopher Cieri, Malcah Yaeger-Dror
Toward the Harmonization of Metadata Practice for Spoken Languages Resources
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
SpeechCorpora 2012: Workshop on Best Practices for Speech Corpora in Linguistic Research
Available: Paper in PDF

Joe Ellis, Xuansong Li, Kira Griffitt, Stephanie M. Strassel, & Jonathan Wright
Linguistic Resources for 2012 Knowledge Base Population Evaluations
TAC KBP 2012 Workshop: National Institute of Standards and Technology
Gaithersburg, November 5-6
Available: Paper in PDF

Jennifer Garland, Stephanie Strassel, Safa Ismael, Zhiyi Song, Haejoong Lee
Linguistic Resources for Genre-Independent Language Technologies: User-Generated Content in BOLT
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Slides in PDF

David Graff, Mohamed Maamouri
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephen Grimes, Katherine Peterson, Xuansong Li
Automatic Word Alignment Tools to Scale Production of Manually Aligned Parallel Texts
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF

Seth Kulick, Ann Bies, Justin Mott
Further Developments in Treebank Error Detection Using Derivation Trees
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Seth Kulick, Ann Bies, Justin Mott
Using Supertags and Encoded Annotation Principles for Improved Dependency to Phrase Structure Conversion
NAACL-HLT 2012: The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF, Poster in PDF

Xuansong Li, Stephanie M. Strassel, Heng Ji, Kira Griffitt, Joe Ellis
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF

Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Mohamed Maamouri, Ann Bies, Nianwen Xue
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF

Xiaoyi Ma
LDC Forced Aligner
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Expanding Arabic Treebank to Speech: Results from Broadcast News
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Wajdi Zaghouani, Violetta Cavalli-Sforza, David Graff, Mike Ciul
Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement
NAACL-HLT 2012: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF

Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel
Linguistic Resources for Handwriting Recognition and Translation Evaluation
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Amanda Morris, Jonathan Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, Martial Michel
Creating HAVIC: Heterogeneous Audio Visual Internet Collection
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Kevin Walker, Karen Jones, Dave Graff, Christopher Cieri
New Resources for Recognition of Confusable Linguistic Varieties: The LRE11 Corpus
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, June 25-28
Available: Paper in PDF, Slides in PDF

Kevin Walker, Stephanie Strassel
The RATS Radio Traffic Collection System
Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore, June 25-28
Available: Paper in PDF

Jonathan Wright, Kira Griffitt, Joe Ellis, Stephanie Strassel, Brendan Callahan
Annotation Trees: LDC's Customizable, Extensible, Scalable Annotation Infrastructure
LREC 2012: 8th International Conference on Language Resources and Evaluation
Istanbul, May 21-27
Available: Paper in PDF

Wajdi Zaghouani
RENAR: A Rule-Based Arabic Named Entity Recognition System
ACM Transactions on Asian Language Information Processing: Volume 11 Issue 1, March 2012
Available: Paper in PDF

Wajdi Zaghouani, Abdelati Hawwari, Mona Diab
A Pilot PropBank Annotation for Quranic Arabic
NAACL-HLT 2012: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montreal, June 3-8
Available: Paper in PDF

2011

Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation
With contributions from LDC: Ann Bies, Christopher Caruso, Christopher Cieri, Denise DiPersio, Lauren Friedman, Meghan Lammie Glenn, Stephen Grimes, Gary Krug, Seth Kulick, Haejoong Lee, Xuansong Li, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda, Andrea Mazzucchi, Robert Parker, Heather Simpson, Zhiyi Song, Stephanie Strassel, Kevin Walker, Dalal Zakhary
With contributions from University of Pennsylvania: Mitchell Marcus (School of Arts and Sciences)
Available: Springer

Seth Kulick, Ann Bies, and Justin Mott
Using Derivation Trees for Treebank Error Detection
ACL-HLT 2011: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland,  June 19-24
Available: Paper in PDF

Xuansong Li, Joe Ellis, Kira Griffit, Stephanie Strassel, Robert Parker, Jonathan Wright
Linguistic Resources for 2011 Knowledge Base Population Evaluation
TAC 2011: Proceedings of the Fourth Text Analysis Conference
Gaithersburg, November 14-15
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Automatic Detection of "g-dropping" in American English Using Forced Alignment 
ASRU 2011: Automatic Speech Recognition and Understanding Workshop
Hawaii, December 11-15
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Automatic Measurement and Comparison of Vowel Nasalization Across Languages 
ICPhS XVII 2011: 17th International Conference of Phonetic Sciences
Hong Kong, August 17-21
Available: Paper in PDF

Jiahong Yuan, Mark Liberman
Variation in American English: A Corpus Approach
Journal of Speech Sciences 1 (2): 35-46, 2011
Available: Paper in PDF

2010

Christopher Cieri, Mark Liberman
Adapting to Trends in Language Resource Development: A Progress Report on LDC Activities 
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDFSlides in PDF

Marianna Di Paolo, Malcah Yaeger-Dror, Christopher Cieri, Stephanie Strassel, Zsuzsanna Fagyal
Towards Best Practices in Sociophonetics Workshop
NWAV39: New Ways of Analyzing Variation, San Antonio, November 4-6

Christopher Cieri, Stephanie Strassel
Towards Best Practices in Sociophonetics: Robust, Digital, Empirical, Reproducible, Sociolinguistic, Methodology
Towards Best Practices in Sociophonetics Workshop 
NWAV39: New Ways of Analyzing Variation, San Antonio, November 4-6
Available: Slides in PDF

Zsuzsanna Fagyal, Malcah Yaeger-Dror
Analyzing Rhythm I
Towards Best Practices in Sociophonetics Workshop 
NWAV39: New Ways of Analyzing Variation, San Antonio, November 4-6
Available: Slides in PDF

Malcah Yaeger-Dror, Zsuzsanna Fagyal
Analyzing "Timing" 2
Towards Best Practices in Sociophonetics Workshop 
NWAV39: New Ways of Analyzing Variation, San Antonio, November 4-6
Available: Slides in PDF

Denise DiPersio
Some Implications of US Initiatives for "Fair Research" and Open Access on the Development and Distribution of Language Resources
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Workshop on Legal Issues for Language Resources
Available: Slides in PDF

Meghan Lammie Glenn, Stephanie M. Strassel, Haejoong Lee, Kazuaki Maeda, Ramez Zakhary, Xuansong Li
Transcription Methods for Consistency, Volume and Efficiency 
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Workshop on Language Resources and Human Language Technologies for Semitic Languages
Available: Paper in PDF

Stephen Grimes, Xuansong Li, Ann Bies, Seth Kulick, Xiaoyi Ma, Stephanie Strassel
Creating Arabic-English Parallel Word-Aligned Treebank Corpora at LDC
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Workshop on Language Resources and Human Language Technologies for Semitic Languages
Available: Paper in PDF

Seth Kulick, Ann Bies
A Treebank Query System Based on an Extracted Tree Grammar
NAACL-HLT 2010: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Los Angeles, June 1-6
Available: Paper in PDF

Seth Kulick, Ann Bies
A TAG-derived Database for Treebank Search and Parser Analysis
TAG+10: 10th International Workshop on Tree Adjoining Grammars and Related Formalisms, New Haven, June 10-12
Available: Paper in PDF

Seth Kulick, Ann Bies, Mohamed Maamouri
Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF, Poster in PDF

Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda
Enriching Word Alignment with Linguistic Tags
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF, Slides in PDF

Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Xiaoyi Ma, Niyu Ge, Ann Bies, Nianwen Xue, Mohamed Maamouri
Parallel Aligned Treebank Corpora at LDC: Methodology, Annotation and Integration
TLT9: The 9th International Workshop on Treebanks and Linguistic Theories
University of Tartu, December 2
AEPC: Workshop on Annotation and Exploitation of Parallel Corpora
Available: Paper in PDF

Mark Liberman
The Future of Computational Linguistics: or, What Would Antonio Zampolli Do?
Antonio Zampolli Prize speech
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Slides in PDF, Antonio Zampolli Prize Information: Prof. Antonio Zampolli Prize

Xiaoyi Ma
Toward a Name Entity Aligned Bilingual Corpus
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Workshop on Methods for the Automatic Acquisition of Language Resources and Their Evaluation Methods
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff, Mike Ciul
From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF, Poster in PDF

Kazuaki Maeda, Haejoong Lee, Stephen Grimes, Jonathan Wright, Robert Parker, David Lee, Andrea Mazzuchi
Technical Infrastructure at Linguistic Data Consortium: Software and Hardware Resources for Linguistic Data Creation
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF

Mark Mandel
Conomastics: The Naming of Science Fiction Conventions
ADS: American Dialect Society Annual Meeting, Baltimore, January 7-9
Available: Slides in PDF, Slides with notes

Paul McNamee, Hoa Trang Dang, Heather Simpson, Patrick Schone, Stephanie M. Strassel
An Evaluation of Technologies for Knowledge Base Population
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Slides in PDF

Heather Simpson, Stephanie Strassel, Robert Parker, Paul McNamee
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF

Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki Maeda
Enhanced Infrastructure for Creation and Collection of Translation Resources
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF

Stephanie Strassel, Dan Adams, Henry Goldberg, Jonathan Herr, Ron Keesing, Daniel Oblinger, Heather Simpson, Robert Schrag, Jonathan Wright
The DARPA Machine Reading Program - Encouraging Linguistic and Reasoning Research with a Series of Reading Tasks
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF

Kevin Walker, Christopher Caruso, Denise DiPersio
Large Scale Multilingual Broadcast Data Collection to Support Machine Translation and Distillation Technology Development
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF 

Liu Yi, Pascale Fung, Yang Yongsheng, Denise DiPersio, Meghan Lammie Glenn, Stephanie M. Strassel, Christopher Cieri
A Very Large Scale Mandarin Chinese Broadcast Collection for the GALE Program
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF 

Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim, Ralf Steinberger
Adapting a Resource-light Highly Multilingual Named Entity Recognition System to Arabic
LREC 2010: 7th International Conference on Language Resources and Evaluation
Valletta, May 17-23
Available: Paper in PDF, Slides in PDF

Wajdi Zaghouani, Ralf Steinberger, Bruno Pouliquen
A Resource-light Arabic Named Entity Recognition System
GURT 2010: Georgetown University Round Table, Arabic Language and Linguistics
Georgetown, March 12-14
Available: Slides in PDF

Wajdi Zaghouani, Mona Diab, Aous Mansouri, Sameer Pradhan, Martha Palmer
The Revised Arabic PropBank
ACL-HLT 2010: 48th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Uppsala, July 11-16
The LAW IV: Proceedings of the Fourth Linguistic Annotation Workshop
Available: Paper in PDF

2009

Steven Bird, Ewan Klein, Edward Loper
Natural Language Processing with Python (book chapter)
O'Reilly Media Inc., 2009

Christopher Cieri
Models of Phonological Variation for Multi-dialectal Communities: the Case of L'Aquila
NWAV 38: New Ways of Analyzing Variation, Ottawa, October 22-25
Available: Slides in PDF

Christopher Cieri, Stephanie Strassel
Closer Still to a Robust, All Digital, Empirical, Reproducible Sociolinguistic Methodology
NWAV 38: New Ways of Analyzing Variation, Ottawa, October 22-25
Available: Slides in PDF

Seth Kulick, Ann Bies
Treebank Analysis and Search Using an Extracted Tree Grammar
TLT8: 8th International Workshop on Treebanks and Linguistic Theories
Milan, December 3-5
Available: Paper in PDF

Catherine Lai, Steven Bird
Querying Linguistic Trees
Journal of Logic, Language, and Information, Volume 18, 2009
Available: Paper in PDF

Mohammed Maamouri
LDC Arabic Reading Tools: "Read to Succeed"
ACTFL 2009: Arabic SIG Meeting, San Diego, CA, November 21
Available: Slides in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Creating a Methodology for Large-Scale Correction of Treebank Annotation: The Case of the Arabic Treebank
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools
Cairo, April 22-23
Available: Paper in PDF, Slides in PDF

Niklas Paulsson, Khalid Choukri, Djamel Mostefa, Denise DiPersio, Meghan Glenn, Stephanie Strassel
A Large Arabic Broadcast News Speech Data Collection
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools
Cairo, April 22-23
Available: Paper in PDF, Poster in PDF

Stephanie Strassel
Linguistic Resources for Arabic Handwriting Recognition
MEDAR 2009: 2nd International Conference on Arabic Language Resources and Tools
Cairo, April 22-23
Available: Paper in PDF

2008

Chomicha Bendahman, Meghan Lammie Glenn, Djamel Mostefa, Niklas Paulsson, Stephanie Strassel
Quick Rich Transcriptions of Arabic Broadcast News Speech Data
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Linda Brandschain, Christopher Cieri, David Graff, Abby Neely, Kevin Walker
Speaker Recognition: Building the Mixer 4 and 5 Corpora
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Christopher Cieri, Stephanie Strassel, Meghan Glenn, Reva Schwartz, Wade Shen, Joseph Campbell
Bridging the Gap between Linguists and Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel
Identifying Common Challenges for Human and Machine Translation: A Case Study from the GALE Program
AMTA 2008: The 8th Conference of the Association for Machine Translation in the Americas, Waikiki, October 21-25
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel, Meghan Lammie Glenn
Explicit and Implicit Requirements of Technology Evaluations: Implications for Test Data Creation
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Lauren Friedman, Stephanie Strassel, Haejoong Lee
A Quality Control Framework for Gold Standard Reference Translations: The Process and Toolkit Developed for GALE
Translating & The Computer 30 (hosted by EAMT): London, November 19-20
Available: Paper in PDF

Ryan Gabbard, Seth Kulick
Construct State Modification in the Arabic Treebank
ACL-HLT 2008: 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Columbus, Ohio, June 16-18
Available: Paper in PDF

Meghan Lammie Glenn, Stephanie Strassel, Lauren Friedman, Haejoong Lee, Shawn Medero
Management of Large Annotation Projects Involving Multiple Human Judges: a Case Study of GALE Machine Translation Post-editing
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Enhanced Annotation and Parsing of the Arabic Treebank
INFOS 2008: Cairo, March 27-29
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation Guidelines
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Mohamed Maamouri, Seth Kulick, Ann Bies
Diacritic Annotation in the Arabic Treebank and Its Impact on Parser Evaluation
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF, Poster in PDF

Kazuaki Maeda, Haejoong Lee, Shawn Medero, Julie Medero, Robert Parker, Stephanie Strassel
Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel
Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Martha Palmer, Olga Babko-Malaya, Ann Bies, Mona Diab, Mohammed Maamouri, Aous Mansouri, Wajdi Zaghouani
A Pilot Arabic Propbank
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Marian Reed, Denise DiPersio, Christopher Cieri
The Linguistic Data Consortium Member Survey: Purpose, Execution and Results
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF, Slides in PDF

Gary Simons, Steven Bird
Toward a Global Infrastructure for the Sustainability of Language Resources
PACLIC 2008: 22nd Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines, November 20-22
Available: Paper in PDF

Heather Simpson, Christopher Cieri, Kazuaki Maeda, Kathryn Baker, Boyan Onyshkevych
Human Language Technology Resources for Less Commonly Taught Languages: Lessons Learned Toward Creation of Basic Language Resources
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
SALTMIL Workshop: Free/Open-Source Language Resources for the Machine Translation of Less-Resourced Languages
Available: Paper in PDFSlides in PDF

Zhiyi Song, Stephanie Strassel
Entity Translation and Alignment in the ACE-07 ET Task
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

Stephanie Strassel, Lauren Friedman, Safa Ismael, Linda Brandschain
New Resources for Document Classification Analysis and Translation Technologies
LREC 2008: 7th International Conference on Language Resources and Evaluation
Marrakech, May 28-30
Available: Paper in PDF

2007

Steven Bird, Haejoong Lee
Graphical Query for Linguistic Treebanks
PACLING 2007: 10th Conference of the Pacific Association for Computational Linguistics
Melbourne, September 19-21
Available: Paper in PDF

Christopher Cieri
Phonological Variation in Multi-Dialectal Italy: distinguishing e from ?
NWAV 36: Philadelphia, October 11-14
Available: Slides in PDF

Christopher Cieri, Linda Corson, David Graff, Kevin Walker
Resources for New Research Directions in Speaker Recognition: The Mixer 3, 4 and 5 Corpora
Interspeech 2007: 8th International Conference on Spoken Language Processing, Antwerp, August 27-31
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Stephanie Strassel, Meghan Lammie Glenn, Lauren Friedman
Linguistic Resources in Support of Various Evaluation Metrics
MT Summit XI: Workshop on Automatic Procedures in MT Evaluation
Copenhagen, September 9-14
Available: Slides in PDF

Kuzman Ganchev, Koby Crammer, Fernando Pereira, Gideon Mann, Kedar Bellare, Andrew McCallum, Steven Carroll, Yang Jin, Peter White
Penn/UMass/CHOP Biocreative II Systems
Biocreative II
Available: Paper in PDF

Kuzman Ganchev, Fernando Pereira, Mark Mandel, Steven Carroll, Peter White
Semi-automated Named Entity Annotation
Linguistic Annotation Workshop 2007 
Available: Paper in PDF

2006

Olga Babko-Malaya, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer, Mitch Marcus, Seth Kulick, Libin Shen
Issues in Synchronizing the English Treebank and PropBank
COLING-ACL 2006: Frontiers in Linguistically Annotated Corpora, A Merged Workshop with 7th International Workshop on Linguistically Interpreted Corpora (LINC-2006) and Frontiers in Corpus Annotation III, Sydney, July 22-23
Available: Paper in PDF

Ann Bies, Stephanie Strassel, Haejoong Lee, Kazuaki Maeda, Seth Kulick, Yang Liu, Mary Harper, Matthew Lease
Linguistic Resources for Speech Parsing
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Steven Bird, Yi Chen, Susan Davidson, Haejoong Lee, Yifeng Zheng
Designing and Evaluating an XPath Dialect for Linguistic Queries
ICDE 2006: 22nd International Conference on Data Engineering, Atlanta, April 3-8
Available: Paper in PDF

Christopher Cieri
Linguistic Resources, Development and Evaluation
Chapter 8 in Laila Dybkjaer, Holmer, Hemsen and Wolfgang Minker, Evaluation of Text and Speech Systems
Available: Springer Publishers

Christopher Cieri
What is Quality? Invited Talk at the Workshop on Quality Assurance and Quality Measurement for Language and Speech Resources
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Slides in PDF

Christopher Cieri, Walt Andrews, Joseph P. Campbell, George Doddington, Jack Godfrey, Shudong Huang, Mark Liberman, Alvin Martin, Hirotaka Nakasone, Mark Przybocki, Kevin Walker
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Mark Liberman
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Christopher Cieri, Mark Liberman, Victoria Arranz, Khalid Choukri
Linguistic Data Resources
Chapter 3 in Tanja Schultz and Katrin Kirchhoff (eds.) Multilingual Speech Processing, Elsevier Academic Press, ISBN 13: 978-0-12-088501-5. April 2006.
Available: Elsevier Publishers

Ryan Gabbard, Seth Kulick, Mitchell Marcus
Fully Parsing the Penn Treebank
HLT-NAACL 2006: Human Language Technology conference - North American chapter of the Association for Computational Linguistics, New York City, June 4-9
Available: Paper in PDF

David Graff, Tim Buckwalter, Hubert Jin, Mohamed Maamouri
Lexicon Development for Varieties of Spoken Colloquial Arabic
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Yang Jin, Ryan McDonald, Kevin Lerman, Mark Mandel, Steven Carroll, Mark Y Liberman, Fernando Pereira, Raymond Winters, Peter White
Automated Recognition of Malignancy Mentions in Biomedical Literature
Open Access: BMC Bioinformatics 7:492
Available: Paper in PDF

Xiaoyi Ma
Champollion: A Robust Parallel Text Sentence Aligner
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Xiaoyi Ma, Christopher Cieri
Corpus Support for Machine Translation at LDC
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Tim Buckwalter, Mona Diab, Nizar Habash, Owen Rambow, Dalila Tabessi
Developing and Using a Pilot Dialectal Arabic Treebank
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Seth Kulick
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
HCI 06: Machine Translation SIG of the British Computer Society Conference
London, September 11-16
Available: Paper in PDF

Kazuaki Maeda, Christopher Cieri, Kevin Walker
Low-cost Customized Speech Corpus Creation for Speech Technology Applications
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Kazuaki Maeda, Haejoong Lee, Julie Medero, Stephanie Strassel
A New Phase in Annotation Tool Development at the Linguistic Data Consortium: The Evolution of the Annotation Graph Toolkit
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28

Mark Mandel
Integrated Annotation of Biomedical Text: Creating the PennBioIE Corpus
Text Mining, Ontologies and Natural Language Processing in Biomedicine
Manchester, March 20 - 21
Available: Abstract in PDF, Slides in PDF

Ryan McDonald, Kevin Lerman, Fernando Pereira
Multilingual Dependency Parsing with a Two-Stage Discriminative Parser
CoNLL 2006: Computational Natural Language Learning, New York City, June 8-9
Available: Paper as PDF

Julie Medero, Kazuaki Maeda, Stephanie Strassel, Christopher Walker
An Efficient Approach for Gold-Standard Annotation: Decision Points for Complex Tasks
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF

Stephanie Strassel, Andrew W. Cole
Corpus Development and Publication
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF, Poster in PDF

Stephanie Strassel, Christopher Cieri, Andy Cole, Denise DiPersio, Mark Liberman, Xiaoyi Ma, Mohamed Maamouri, Kazuaki Maeda
Integrated Linguistic Resources for Language Exploitation Technologies
LREC 2006: 5th International Conference on Language Resources and Evaluation
Genoa, May 22-28
Available: Paper in PDF, Slides in PDF

Jiahong Yuan, Mark Liberman, Christopher Cieri
Towards an Integrated Understanding of Speaking Rate in Conversation
Interspeech 2006:The 9th International Conference on Spoken Language Processing
Pittsburgh, September 17-21
Available: Paper in PDF, Slides in PDF

2005

Ann Bies, Seth Kulick, Mark Mandel
Parallel Entity and Treebank Annotation
ACL 2005: 43rd Annual Meeting of the Association for Computational Linguistics, Frontiers in Corpus Annotation II: Pie in the Sky workshop, Ann Arbor, June 29
Available: Paper in PDF

Violetta Cavalli-Sforza, Mohamed Maamouri
Extensions to Histogram-Based Student Modeling Approach to Facilitate Reading in Morphologically Complex Languages
AIED 2005: International Conference on Artificial Intelligence in Education, Amsterdam, July 18-22
Available: Paper in PDF

Christopher Cieri
HLT Evaluation: The Role of Data Centers
HLT Evaluation Workshop (ELRA): Sliema, Malta, 1-2 December
Available: Slides in PDF

Christopher Cieri
Modeling Phonological Variation in Multidialectal Italy
University of Pennsylvania, Doctoral Dissertation, May 2005
Available: PDF from ProQuest

Meghan Lammie Glenn, Stephanie Strassel
Linguistic Resources for Meeting Speech Recognition
MLMI 2005: Machine Learning for Multimodal Interaction, Edinburgh, July 11-13
Available: Paper in PDF

Jerry Goldman, Steve Renals, Steven Bird, Franciska de Jong, Marcello Federico, Carl Fleischhauer, Mark Kornbluh, Lori Lamel, Douglas Oard, Claire Stewart, Richard Wright
Transforming Access to the Spoken Word
International Journal on Digital Libraries 5, 287-298, 2005.
Available: Paper in PDF

Yang Jin, Ryan T. McDonald, Kevin Lerman, Mark A. Mandel, Mark Y. Liberman, Fernando Pereira, R. Scott Winters, Peter S. White
Identifying and Extracting Malignancy Types in Cancer Literature
BioLink 2005: A Joint Meeting of ISMB: BioLINK SIG on Text Data Mining and ACL Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Detroit, June 24
Available: Paper in PDF

Jachym Kolar, Jan Svec, Stephanie Strassel, Christopher Walker, Dagmar Kozlakova, Josef Psutka
Czech Spontaneous Speech Corpus with Structural Metadata
Interspeech 2005: The 8th International Conference on Spoken Language Processing, Lisbon, September 4-8
Available: Paper in PDF

Mohamed Maamouri
Arabic Literacy
Lemma, 11, 16 in Encyclopedia of Arabic Language and Linguistics (EALL). Vol 2
Available: Paper in PDF

Ryan McDonald, Fernando Pereira, Seth Kulick, Scott Winters, Yang Jin, Peter White
Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE
ACL 2005: 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, June 29
Available: Paper in PDF

2004

Tim Buckwalter
Issues in Arabic Orthography and Morphology Analysis
COLING 2004: 20th International Conference on Computational Linguistics, Geneva, August 28
Computational Approaches to Arabic Script-based Languages Workshop
Available: Paper in PDF

Christopher Cieri, Joseph P. Campbell, Hirotaka Nakasone, David Miller, Kevin Walker
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF, Poster in PDF

Christopher Cieri, Mark Liberman
Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF, Slides in PDF

Christopher Cieri, David Miller, Kevin Walker
The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, Ralph Weischedel
Automatic Content Extraction (ACE) Program - Task Definitions and Performance Measures
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Shudong Huang, Stephanie Strassel, Alexis Mitchell, Zhiyi Song
Shared Resources for Multilingual Information Extraction and Challenges in Named Entity Annotation
IJCNLP-04: 1st International Joint Conference on Natural Language Processing, Hainan Island, China, March 22-24
Named Entity Recognition for NLP Applications Workshop
Available: Paper in PDF

Seth Kulick, Ann Bies, Mark Liberman, Mark Mandel, Ryan McDonald, Martha Palmer, Andrew Schein, Lyle Ungar, Scott Winters, Pete White
Integrated Annotation for Biomedical Information Extraction
HLT/NAACL 2004: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Boston, May 2-7
BioLink Workshop
Available: Paper in PDF, Slides in PDF

Mohamed Maamouri, Ann Bies
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools

COLING 2004: 20th International Conference on Computational Linguistics, Geneva, August 28
Computational Approaches to Arabic Script-based Languages Workshop
Available: Paper in PDF

Mohamed Maamouri, Ann Bies, Tim Buckwalter, Wigdan Mekki
The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus
NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, September 22-23
Available: Paper in PDF

Mohamed Maamouri, Tim Buckwalter, Christopher Cieri
Dialectal Arabic Telephone Speech Corpus: Principles, Tool Design, and Transcription Conventions
NEMLAR International Conference on Arabic Language Resources and Tools, Cairo, September 22-23
Available: Paper in PDF, Slides in PDF

Mohamed Maamouri, David Graff, Hubert Jin, Christopher Cieri, Tim Buckwalter
Dialectal Arabic Orthography-based Transcription and CTS Levantine Arabic Collection
EARS RT-04 Workshop, Parallel STT-NA Tracks Session, Palisades IBM Executive Center, New York, November 10
Available: Paper in PDF

Kazuaki Maeda, Stephanie Strassel
Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Mike Maxwell
From Legacy Lexicon to Archivable Resource
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
First Steps for Language Documentation of Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation Workshop
Available: Paper in PDF

Ryan McDonald, R. Scott Winters, Mark Mandel, Yang Jin, Peter S. White, Fernando Pereira
An Entity Tagger for Recognizing Acquired Genomic Variations in Cancer Literature
Bioinformatics 20:3249-3251
Available: Paper in PDF

Douglas Oard, Dagobert Soergel, G. Craig Murray, David Doermann, Jianqiang Wang, Bhuvana Ramabhadran, Martin Franz, James Mayfield, Samuel Gustman, Stephanie Strassel
Building an Information Retrieval Test Collection for Spontaneous Conversational Speech
SIGIR 2004: 27th Annual International ACM SIGIR (Special Interest Group on Information Retrieval) Conference, Sheffield, July 25-29
Available: Paper in PDF

Stephanie Strassel
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
LREC 2004: 4th International Conference on Language Resources and Evaluation, Lisbon, May 24-30
Available: Paper in PDF

Colin Warner, Ann Bies, Christine Brisson, Justin Mott
Addendum to the Penn Treebank II Style Bracketing Guidelines: BioMedical Treebank Annotation
November, 2004
Available: Paper in PDF, Paper in plain text

2003

Steven Bird, Gary Simons
Seven Dimensions of Portability for Language Documentation and Description
Language 79, 557-582

Steven Bird, Gary Simons
Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources
Computing and the Humanities 37, 375-388

Christopher Cieri, Mike Maxwell, Stephanie Strassel
Core Linguistic Resources for the World's Languages
ACL 2003 Resources Information Infrastructure Workshop: ELSNET, ENABLER, ICWLR Joint Workshop, Paris, August 28-29
Available: Slides in PDF

Christopher Cieri, Stephanie Strassel
Robust Sociolinguistic Methodology: Tools, Data and Best Practices
NWAV 32: New Ways of Analyzing Variation, Philadelphia, October
Available: Slides in PDF

Baden Hughes, Steven Bird
Grid-Enabling Natural Language Engineering By Stealth
Proceedings of the Workshop on The Software Engineering and Architecture of Language Technology Systems (SEALTS)
Available: arXiv.org

Seth Kulick, Mark Liberman, Martha Palmer, Andrew Schein
Shallow Semantic Annotation of Biomedical Corpora for Information Extraction
ISMB 2003: 11th International Conference on Intelligent Systems for Molecular Biology, Brisbane, June 29-July 3
Special Interest Group Meeting on Text Mining (BioLink)
Available: Slides in PDF

Mike Maxwell
Incremental Grammar Development using Finite State Tools
EACL 10: 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, April 12-17
Finite-State Methods in Natural Language Processing Workshop, April 13-14
Available: Paper in PDF

Gary Simons, Steven Bird
The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources
Literary and Linguistic Computing 18 
Available: Paper in PDF

Gary Simons, Steven Bird
Building an Open Language Archives Community on the OAI Foundation
Library Hi Tech 21, 210-218, Special Issue on Open Archives Initiative Metadata Harvesting
Available: Paper in PDF

Stephanie Strassel
Corpus Creation for Disfluency Research
DiSS 2002: Disfluency in Spontaneous Speech Conference, Gothenburg, September 5-8
Available: Abstract in PDF, Slides in PDF

Stephanie Strassel, Mike Maxwell, Christopher Cieri
Linguistic Resource Creation for Research and Technology Development: A Recent Experiment
Association for Computing Machinery Transactions on Asian Language Information Processing (TALIP).  Volume 2, Issue 2, 101 - 117
Available: Paper in PDF

Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri
Shared Resources for Robust Speech-to-Text Technology
Eurospeech 2003: Geneva, September 1-4
Available: Paper in PDF

Stephanie Strassel, Alexis Mitchell, Shudong Huang
Multilingual Resources for Entity Extraction
ACL 2003: 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, July 7-12
Multilingual and Mixed-language Named Entity Recognition Workshop: Combining Statistical and Symbolic Models
Available: Paper in PDF

2002

Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee, Beth Randall, Salim Zayat
TableTrans, MultiTrans, InterTrans and TreeTrans: Diverse Tools Built on the Annotation Graph Toolkit
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Christopher Cieri, David Miller, Kevin Walker
Research Methodologies, Observations and Outcomes in (Conversational) Speech Data Collection
HLT 2002: The Human Language Technologies Conference, San Diego, March 24-27
Available: Paper in PDF

Christopher Cieri, Stephanie Strassel
The DASL Project: a Case Study in Data Re- Annotation and Re-Use
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: Paper in PDF

Christopher Cieri, Stephanie Strassel, David Graff, Nii Martey, Kara Rennert, Mark Liberman
Corpora for Topic Detection and Tracking
James Allan, ed. Topic Detection and Tracking: Event-based Information Organization, Kluwer International Series on Information Retrieval, Bruce Croft, series editor, Kluwer Academic Publishers, Boston

Christopher Cieri, Stephanie Strassel, William Labov
Sharable Resources for Sociolinguistic Research
NWAV31: New Ways of Analyzing Variation, Stanford, October
Available: Slides in PDF

Scott Cotton, Steven Bird
An Integrated Framework for Treebanks and Multilayer Annotations
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Xiaoyi Ma, Haejoong Lee, Steven Bird, Kazuaki Maeda
Models and Tools for Collaborative Annotation
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Mohamed Maamouri, Christopher Cieri
Resources for Arabic Natural Language Processing
International Symposium on Processing Arabic, Tunis, April
Available: Slides in PDF

Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Creating Annotation Tools with the Annotation Graph Toolkit
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: arXiv.org

Mike Maxwell, Gary Simons, Larry Hayashi
A Morphological Glossing Assistant
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
International LREC Workshop on Resources and Tools in Field Linguistics
Available: Paper in PDF

Mike Maxwell
Resources for Morphology Learning and Evaluation
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2, vol. III, 967-974
Available: Paper in PDF

Horacio Saggion, Dragomir Radev, Simone Teufel, Wai Lam, Stephanie M. Strassel
Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Multi-lingual Environment
LREC 2002: The 3rd International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, May 27 - June 2
Available: Paper in PDF

2001

Steven Bird, Mark Liberman
A Formal Framework for Linguistic Annotation
Speech Communication 33 (1, 2), pp 23-60
Available: arXiv.org

Steven Bird, Gary Simons
The OLAC Metadata Set and Controlled Vocabularies
ACL-EACL 2001: The 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, July 5-6, pp 7-18
Sharing Tools & Resources Workshop
Available: arXiv.org

Steven Bird, Gary Simons, Chu-Ren Huang
The Open Language Archives Community and Asian Language Resources
NLPRS 2001: 6th Natural Language Processing Pacific Rim Symposium, Tokyo, November 27-30
Workshop on Language Resources in Asia
Available: arXiv.org

Christopher Cieri, Steven Bird
Annotation Graphs, Annotation Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
ACL-EACL 2001: The 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, July 5-6
Sharing Tools & Resources Workshop
Available: Paper in PDF, Slides in PDF

Kazuaki Maeda, Steven Bird
A Framework for Annotating Animal Bioacoustic Data
The 142nd Meeting of the Acoustical Society of America, Chicago, June 4-8

Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools
HLT 2001: The Human Language Technologies Conference, San Diego, March
Available: Paper in PDF

Stephanie Strassel, Christopher Cieri
Data and Annotations for SocioLinguistics: A Corpus-Based Approach to Sociolinguistic Research
PLC 25: Penn Linguistic Colloquium, Philadelphia, March 3-4
Available: Slides in PDF

Stephanie Strassel, Christopher Cieri, Steven Bird
Shared Resources and Community Building for Corpus Linguistics and Language Teaching
Corpus Linguistics and Language Teaching Workshop, Boston, March 23-25

2000

Steve Cassidy, Steven Bird
Querying Databases of Annotated Speech
11th Australasian Database Conference, Canberra, January 31 - 2 February
Available: Paper in PDF

Christopher Cieri
Multiple Annotation of Reuseable Data Resources: Corpora for Topic Detection and Tracking
In Rajman, M. and J. C. Chappelier, eds. (2000) Actes des 5es Journees internationales d'analyse statistique des donnees textuelles, volume 1, Ecole Polytechnique Federale de Lausanne
Available: Paper in PDF

Christopher Cieri
Issues and Tools for Annotating a Corpus of Sociolinguistic Field Data
Linguistic Society of American Annual Meeting, Chicago, January 6-9
Linguistic Exploration Workshop
Available: Slides in PDF

Christopher Cieri, Dave Graff, Mark Liberman, Nii Martey and Stephanie Strassel (2000)
Large Multilingual Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT2 and TDT3 Corpus Efforts
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

Christopher Cieri, David Graff, Nii Martey, Stephanie Strassel
The TDT-3 Text and Speech Corpus
TDT 2000: Topic Detection and Tracking Workshop, Vienna, Virginia, February 28 - March 1
Available: Paper in PDF

Christopher Cieri, Mark Liberman
Issues in Corpus Creation and Distribution: the Evolution of the Linguistic Data Consortium
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

David Graff, Steven Bird
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

Dave Graff, Stephanie Strassel, Christopher Cieri
Resources, New and Forthcoming, from LDC
2000 Speech Transcription Workshop: College Park, Maryland, May 16-19
Available: Slides in PDF

Stephanie Strassel, Dave Graff, Nii Martey, Christopher Cieri
Quality Control in Large Annotation Projects Involving Multiple Judges: The Case of the TDT Corpora
LREC 2000: 2nd International Language Resources and Evaluation Conference, Athens, May 31 - June 2
Available: Paper in PDF

1999

Steven Bird
Multidimensional Exploration of Online Linguistic Field Data
NELS 29: 29th Annual Meeting of the Northeast Linguistics Society, University of Massachusetts at Amherst
Available: Paper in PDF

Steven Bird, Mark Liberman
A Formal Framework for Linguistic Annotation
Technical Report MS-CIS-99-01, Department of Computer and Information Science, University of Pennsylvania
(expanded from version presented at ICSLP-98, Sydney)
Available: Paper in PDF

Steven Bird, Mark Liberman
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis
Towards Standards and Tools for Discourse Tagging Workshop, Somerset, NJ
Association for Computational Linguistics
Available: Paper in PDF

Steven Bird, Stephanie Strassel
Annotated Corpora in Linguistic Research
North American Symposium on Corpora in Linguistics and Language Teaching, University of Michigan, May 21
Available: Slides in PDF

Alexandra Canavan, Kevin Walker, David Graff, Christopher Cieri
Telephone Speech Corpora: New Needs, Languages, Methods and Technology
Hub-5 Conversational Speech Understanding (LVCSR) Workshop, Maritime Institute of Technology and Graduate Studies, Linthicum Heights, Maryland, June
Available: Slides in PDF

Christopher Cieri
This Ain't Your Father's Digital Data: Another Perspective on Legal Information
CALI 1999: The Conference for Law School Computing, Eugene, Oregon, June 17-19
Available: Slides in PDF

Christopher Cieri, David Graff, Mark Liberman, Nii Martey, Stephanie Strassel
The TDT-2 Text and Speech Corpus
DARPA Broadcast News Workshop, Washington, DC, February
Available: Paper in PDF

Xiaoyi Ma, Mark Liberman
BITS: A Method for Bilingual Text Search over the Web
Machine Translation Summit VII: Singapore, September 13-17
Available: Paper in PDF

Xiaoyi Ma
Parallel Text Collections at the Linguistic Data Consortium
Machine Translation Summit VII: Singapore, September 13-17
Available: Paper in PDF

Stephanie Strassel
Corpus Creation and Quality Control at the LDC
Corpus of Spoken Dutch Workshop: Tilburg, November 12
Available: Slides in PDF

Stephanie Strassel, Christopher Cieri
Corpus Sociolinguistics: Issues, Data and Tools
NWAV 28: Toronto, October
Available: Slides in PDF

1998

Steven Bird, Mark Liberman
Towards a Formal Framework for Linguistic Annotations
ICSLP 1998: 5th International Conference on Spoken Language Processing, Sydney, November 30 - December 4
Available: Paper in PDF

Christopher Cieri, David Graff
Topic Detection and Tracking Corpora
TREC/SDR Conference, Gaithersburg Maryland, November

David Graff, Christopher Cieri
Update on Lexical Resources and Projects at the Linguistic Data Consortium
9th Hub-5 Conversational Speech Recognition (LVCSR) Workshop, Maritime Institute Technology and Graduate Studies, Linthicum Heights, Maryland, September

Mark Liberman, Christopher Cieri
The Creation, Distribution and Use of Linguistic Data
LREC 1998: 1st International Conference on Language Resources and Evaluation, Granada, Spain, May 28-30
Available: Paper in PDF