June 2011 Newsletter

Friday, June 17, 2011

New Corpora

2006 NIST Spoken Term Detection Development Set

Datasets for Generic Relation Extraction (reACE) 

English Gigaword Fifth Edition 

Announcements

LDC at ACL:  June 20-22 2011

ACL has returned to North America and LDC is taking this opportunity to interact with top HLT researchers in beautiful Portland, OR.  LDC’s exhibition table will feature information on new developments at the consortium and will also be the go-to point for exciting new, green giveaways.

LDC’s Seth Kulick will be presenting research on ‘Using Derivation Trees for Treebank Error Detection’ (S-66) during Monday’s evening poster session (20 June, 6.00 – 8.30 pm). The abstract for this paper, coauthored by LDCers Ann Bies and Justin Mott, is as follows:

This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection.

We hope to see you there.

LDC is now on your favorite Social Networks (Facebook, LinkedIn and RSS, oh my!)

Over the past few months, LDC has responded to requests from the community to increase our online presence.  We are happy to announce that LDC now has its very own Facebook page, LinkedIn profile (independent of the University of Pennsylvania) and Blog, which provides an RSS feed for LDC newsletters.  Please visit LDC on our various profiles and let us know what you think!