Obtaining DataUsing DataProviding DataCreating Data
About LDCMembersCatalogProjectsPapersLDC OnlineSearchContact UsUPennHome

LDC Catalog | By Type and Source | By Year | Top Ten | Projects | Catalog Search



Spanish Newswire Text, Volume 2

Item Name: Spanish Newswire Text, Volume 2
Authors: David Graff and Gustavo Gallegos
LDC Catalog No.: LDC99T41
ISBN: 1-58563-162-0
Data Type: text
Data Source(s): newswire
Project(s): GALE, TIDES
Application(s): information retrieval, language modeling
Language(s): Spanish
Language ID(s): spa
Distribution: 1 CD
Member fee: $0 for 1999 members
Non-member Fee: US$1500.00
Reduced-License Fee: US$750.00
Extra-Copy Fee: US$150.00
Non-member License: yes
Online documentation: yes
Licensing Instructions: Subscription Members, Standard Members, Non-Members
Citation: David Graff and Gustavo Gallegos
1999
Spanish Newswire Text, Volume 2
Linguistic Data Consortium, Philadelphia

Introduction

This release of Spanish newswire contains data from the following sources:

  • Agence France Presse (January 13, 1996--December 13,1998)
  • Associated Press Worldstream (December 1, 1995--August 31, 1998)
  • El Norte (January 1, 1997--December 31, 1998)
  • Data

    The consistent format chosen for release consists of SGML tagging and the ISO-8859-1 (Latin1) 8-bit character set. Our general strategy for SGML tagging is as follows:

    All document units (articles) are bounded by the tags DOC and /DOC, and within these units, the text content of each article is bounded by TEXT and /TEXT. Following each DOC tag is a DOCID tag that provides a unique identifying string for that article. Other tags within the DOC unit (but external to TEXT) provide additional information that was receieved with the article (e.g. headline, dateline, byline, keywords, etc), but the inventory and nature of additional information varies from one source to the next (and in some cases, from one article to the next), and this variability is reflected in the SGML tags that are used to preserve the information. Within the TEXT units, tagging is kept to a minimum, typically consisting only of paragraph tags.

    Updates

    There are no updates at this time.

    Copyright


    About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Da ta

    Contact: ldc@ldc.upenn.edu

    (c) 1992-2008 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.