| PRICING:
1999 Commercial Members: $0
1999 Non-Profit Members: $1460
Non-Members: $14,600
Introduction
Topic Detection and Tracking (TDT) refers to automatic techniques for
finding topically related material in streams of data such as newswire
and broadcast news. The TDT2 corpus was created to support three TDT2
tasks: find topically homogeneous sections (segmentation), detect the
occurrrence of new events (detection), and track the reoccurrencce of
old or new events (tracking). For further information on TDT2 please
visit our TDT2
Information Pages.
Data
The TDT2 Audio Corpus contains a total of 1,036 waveform files. Each
file is a complete single-channel recording of 30- or 60-minute
broadcast, which has been digitized at a sample rate of 16 KHz using
16-bit samples.
The four broadcast sources represented in the corpus are as follows:
Source Program Format/frequency
----------------------------------------------------
ABC World News Tonight "traditional" network news, 30 minutes/day
CNN Headline News continuous news summaries, up to 4
30-minute samples/day
PRI The World "in-depth" radio news, 60 minutes/weekday
VOA varied 60-minute news programs, up to 2/day
Updates
There are no updates at this time.
Copyright |