|Archive Name:||ACDC project|
|Host Institution:||The Computational Processing of Portuguese project (MCT / SINTEF)|
|Country:||Portugal / Norway|
|Contact Person:||Diana Santos|
If the archive has a catalog in a standardized format, what fields does it
contain? If not, what contextual information about the resources are
collected? What other information would you like to collect if you could?
For each corpus, its is mentioned the source (owners, compilers, etc.), as well as its original encoding options. Then the size in tokens, types and other easy to compute numbers are provided in tabular form. It is also mentioned the version of the original corpus as well as of the encoded one, together with the date of the encoding.
If the electronic catalog conforms to some standard, please tell
us the name of the standard.
To what extent have the archived materials been cataloged
If there is an online public access catalog, please give its URL.
3.1 What geographical regions and languages are covered?
|Main Regions Covered:||Americas Europe|
|Approx Number of Languages:||1|
3.2 Please give impressionistic estimates of the archive holdings for each of the data types.
Please list any other data types which are not included above,
or any other comments on the archive holdings:
The archive holds text corpora, which are generally considered to be more than simple texts. Especially in our case that they are syntactically annotated. Still I used the "Texts" category since the others seemed even less appropriat e.
What proportion of the holdings are unique to
the archive and not available elsewhere?
a significant amount
To what extent are the archive holdings published
electronically, where "published" means that there is
a well-defined procedure such that
anyone at all can get a standard copy of the data,
either on digital media or over the internet?
To what extent are the archive holdings accessible over the web?
Is permission required before materials can be accessed?
Is there any fee for materials?
How are author and/or editor defined for the electronic publications?
Is there a bibliographical citation method?
The authors of the original corpora are mentioned in the form we could gather from the way the corpora are mentioned by themselves, and public acknowledgment is done. No other authors or editors are introduced in our project.< /i>
Do the electronic publications have ISBN numbers?
What plans are there to expand the electronic publication of archive holdings?
The archive is subbordinated to one single criterion: the materials can be made electronically available.
Who is the legal owner of archived materials?
The legal owners are (supposedly) the owners of the corpora, who simply gave us the right to distribute them.
Beyond legal ownership,
are there any asserted or perceived moral rights concerning
Do the holders of the archive see the original speakers or
their representatives as controlling publication?
In the few cases of literary texts, the original writers and/or translators had to give permission
In cases where no electronic publication is planned, why is this so?
(e.g. funding, licensing, technical know-how, lack of interest).
Is any of the data in a proprietary format (e.g. MS Word)? If so,
are there plans to transfer it to an open standard (e.g., XML)?
All data which is distributed is in plain HTML. The corpora themselves are encoded in a "proprietary" format (the IMS Corpus Workbench, Univ. of Stuttgart) for technical reasons.
Do you have any other comments about digital archives of
language material, or on this survey?
In addition to archiving corpora, we also maintain a large catalogue of (others') materials concerning Portuguese: under www.portugues.mct.pt you can find URLs mentioning, describing or containing resources for the computational proc essing of Portuguese, organized under the following main classifiers: Corpora, Lexicons, Tools, Teaching materials, Texts in Portuguese, Media, and Others. I would therefore suggest that you separate the cataloguing issue from the archiving one (in fact, the project about which I have been entering information would be better described as DISTRIBUTING instead of ARCHIVING). So one can catalogue resources by others, and in addition to only archiving them one can redistribute them further.