Writing a Corpus Cookbook Martin Wynne Oxford Text Archive Arts and Humanities Data Service 13 Banbury Road Oxford OX2 6NN martin@ota.ahds.ac.uk This paper discusses the issues arising from the planning of a 'guide to good practice' in 'developing linguistic corpora' for UK academics, a project currently being undertaken at the Oxford Text Archive (OTA). The OTA is a long-established facility supported by Oxford University, and based within the Humanities Computing Unit. Since 1996, the OTA has been working as a Service Provider for the national Arts and Humanities Data Service (AHDS) to support academics working in all areas of literary and linguistic studies in the UK. The OTA seeks to address three key areas: · Collect, catalogue, preserve, and redistribute digital resources of interest to those working in literary and linguistic studies within the UK's Higher Education and FE communities. · Develop appropriate licensing conditions and technical mechanisms for the effective distribution of such resources. · Promote good practice in the creation and use of such resources in both research and teaching. Following a study of the OTA's subject coverage and Collections Policy, it was decided that additional support should be offered in the area of Linguistics. Martin Wynne has been appointed to the post of Information Officer (Linguistics) with a brief to improve the OTA's provision for linguistics and to raise the profile of the OTA within this sector of the academic community. One of the key activities to be undertaken is the publication of an AHDS Guide to Good Practice covering the development of linguistic corpora. This paper reports on the experiences so far in planning and writing this book. The following questions are addressed: 1. What is the guide for? 2. Who is the guide for? 3. What is the UK academic linguistics community? 4. What is good/best practice? 5. How should the OTA service develop? The fourth point above is addressed in the most detail. Specifically, the question of what is best practice in digitising and/or encoding language corpora is considered. Can software and hardware dependent solutions be considered acceptable? Is the TEI the solution? How can we preserve the integrity of the texts? In view of the diversity of resources which are being and will be developed, the advantages of an open, eclectic and non-prescriptive approach are considered.