Introduction
This publication represents a release of the JURIS (Justice Department
Retrieval and Inquiry System) data collection that
has been made available to the Linguistic Data
Consortium (LDC) by the U.S. Department of Justice.
The time span of the text ranges from the 1700s to
the early 1990s.
Data
There are 1,664 individual text files in the corpus,
1011 on the first CD-ROM and 653 on the second. The
original archive consisted of 219 files ranging
between less than 1 MB and nearly 70 MB in size. In
order to make the data more accessible for research
use, we chose to divide the larger files into pieces,
such that the average file size was about 2 MB when
uncompressed (the largest uncompressed file size is
about 4.5 MB). Divisions of the files were done at
document boundaries, so all files contain whole
documents.
There are a total of 694,667 document units in the
corpus and these can be categorized to some extent
with regard to their content. The following is a
partial list of categories and their descriptions
drawn from JURIS documentation contained in the
corpus. The terminology and organization of
categories are those used in the JURIS documentation:
- Case Law
- Executive Order
- Regulations
- Federal Register
- Statutory Law
- Administrative Law
- International Agreements
- Freedom of Information Act and related documents
- Indian Law
- Tax Law
- Brief
Updates
There are no updates at this time.
Copyright
Portions © 1998 Trustees of the University of Pennsylvania |