Penn CREF 2017 Workshop

International Workshop on Language Resource Construction: Theory, Methodology and Applications

Co-organized by LDC and Peking University, this two-day workshop brought together speakers from across China and from LDC to discuss the opportunities and challenges of constructing language resources more efficiently in the wake of 30 years of activity in the field and the advent of deep learning methods. The workshop took place on November 5-6, 2017 in Beijing, China and is part of the collaborations fostered by support from the Penn China Research and Engagement Fund (Penn CREF).

photograph of the workshop participants

Preface

Driven by the strong demand for applications of natural language information processing, the construction of natural language and knowledge resources has rapidly developed over the past 30 years and accumulated a wealth of data. It is the time now to consolidate and synthesize the existing theories and methods of language resource construction, providing new guidance for future research. The current boom of “deep learning” methods sparks debate about the meaning and value of traditional language resource data processing and about the opportunities and challenges to construct language resources more efficiently. To address these issues, the Key Laboratory of Computational Linguistics (Peking University), the Department of Language & Literature and the Center for Chinese Linguistics (Peking University), and the Linguistic Data Consortium (University of Pennsylvania) co-organized this workshop and invited experts in related research areas to present results and proposals, and to promote the future development of language resource construction.

The following topics were presented in talks and round-table discussions: (1) theory, methodology and applications of language resource construction; (2) language resources and deep learning; and (3) applications of language resources in technological, scientific, educational and clinical areas.

Day One Presentations, November 5, 2017: 9:20-16:50
Location: Lee Shau-Kee Academy of Humanities, Peking University

Language is a Complex System: the Resource Construction Based on Language Cognitive Behavior
Baoya Chen, Peking University

Chinese Deep Semantic Representation and the Resource Construction
Zhifang Sui, Peking University

Cross Lingual Knowledge Graph Building
Juanzi Li, Tsinghua University

Semantic Annotation: A Frame-based Constructional Approach
Meichun Liu, City University of Hong Kong

Aggregation of Linguistic Annotation and Neuro-Cognitive Data: A Proposal
Chu-Ren Huang, Hong Kong Polytechnic University

Novel Incentives and Engineering Unique Workflows
Chris Cieri, LDC

A Semantic Knowledge Base Construction and its Content-based Computing
Yulin Yuan, Peking University

Task Oriented Corpus Construction
Chengqing Zong, National Laboratory of Pattern Recognition, Institute of Automation

Deep Learning and Automatic Poetry Writing
Maosong Sun, Tsinghua University

From Knowledge Graph to Event Evolutionary Graph
Ting Liu, Harbin Institute of Technology

Neural Machine Translation for Low-Resource Languages
Yang Liu, Tsinghua University

Automatic Text Generation: Resources, Models and Challenges
Xiaojun Wan, Peking University

Work Progress on Chinese Dependency Treebanking: Annotation Guideline, Method and Platform
Min Zhang and Zhenghua Li, Soochow University

Chinese FrameNet Project for NLP
Ru Li, Shanxi University

Hypernym of Entity Acquire and Hierarchy Construction in BigCilin
Bing Qin, Harbin Institute of Technology

Construction of Chinese Abstract Meaning Representation Corpus with Concept-to-word Alignment
Bin Li and Weiguang Qu, Nanjing University

Round-Table Discussion 17:00-18:20

Language Resources Construction with Deep Learning
Weidong Zhan and Baobao Chang, Peking University

Day Two Presentations, November 6, 2017: 9:00-17:10
Location: Penn Wharton China Center, Beijing

Spoken Language Resource and Phonetic Research at CASS
Aijun Li, Chinese Academy of Social Sciences

Multilingual Speech Database Building and Cross-linguistic Prosodic Research
Hongwei Ding, Shanghai Jiao Tong University

Survey of Local Putonghua: A Proposal
Wen Cao, Beijing Language and Culture University

Using Large Speech Corpora for Phonetic Research
Jiahong Yuan, LDC

Acoustical Analysis and Automatic Assessment of Pathological Speech of Cantonese
Tan Lee, Chinese University of Hong Kong

Construction of Tibetan Spoken Language Database and its Study Based on Deep Learning
Longbiao Wang, Tianjin University

Deep Learning for Speech Signal Processing
Jun Du, University of Science and Technology of China

Study on Intelligent Technology for Chinese Pronunciation Teaching
Jinsong Zhang, Beijing Language and Culture University

Chinese Lexical Chunks
EnDong Xun, Beijing Language and Culture University

The Scale and Quality of Corpus -- Thoughts and Insights from Construction of Corpora for Textbooks
Xinchin Su, Xiamen University

On the Use of Corpora in Cognitive Studies of Language
Xiaolin, Zhou, Peking University

Towards a Description of Chinese Semantic Primitives for Understanding and Computing
Yang Liu, Peking University

Research on the Modern Chinese Function Word Usage Knowledge Base and its Applications
Hongying Zan, Zhengzhou University

Using Machine Learning Methods to Discriminate Translation Styles
Yue Jiang and Juhong Zan, Xi’an Jiaotong University

Project for the Protection of Language Resources of China and Application of Dialect Culture
Zhiyun Cao, Beijing Language and Culture University

An Overview of Resources Construction in Beijing Advanced Innovation Center for Language Resources
Erhong Yang, Beijing Language and Culture University

Introduction to LDC: 25 Years and Counting
Denise DiPersio, LDC
Available: Slides in PDF

Challenges and Opportunities in Human Language Science and Technology
Mark Liberman, LDC

Round-Table Discussion 17:20-18:20

Language Resources: Cooperation and Win-Win
Zhifang Sui, Peking University and Mark Liberman, LDC

Workshop Organizers

Linguistic Data Consortium (University of Pennsylvania)
Key Laboratory of Computational Linguistics (Peking University)
Department of Language & Literature and the Center for Chinese Linguistics (Peking University)