While studying empirical methods in linguistics I was taught "If it happens once, you don't know anything. If it happens twice, it suggests further investigation. If it happens three or more times, then you have something to write about!" Therefore, finding multiple data occurrences that substantiate your claims is an essential part of analytical rigor. Concordance tools provide the means to do just that. The term concordance usually refers to an alphabetical index of all the words in a text or corpus of texts, showing every contextual occurrence of a word. Here, I extend the range of the term to include indexes of more than just words - you can "concord" on any possibly recurring object that occurs in the corpus or its analyses (examples: part-of-speech, gloss, syntactic construction, lemma, morpheme, string of characters, case role, phone, etc.). Concordances are essentially filters or queries. What is unique to them is that they are filters applied to a corpus of texts rather than some higher level analysis done by the linguist (for example, the lexicon and all of its entries). This characteristic is what makes the concordance invaluable for empirical linguistics.
Historically, concordances have been printed indexes of well-known literature such as the Bible or the works of Shakespeare. Similarly, most traditional computational concordance tools look through a given corpus of texts in search of a particular phenomenon and then generate a separate and static file with a list of occurrences and any relevant associated data (reference, interlinear annotations). In contrast, using current relational and object-oriented databases, a concordance can be a view of the corpus data instances themselves rather than copies collected in a separate file. This has a number of significant advantages for empirical linguistic analysis including:
My presentation will focus on how a number of SIL tools employ the power of interactive concordances. LinguaLinks (http://www.sil.org/lingualinks) uses a robust object-oriented data model to provide easy and interactive concordance creation on a number of different linguistic objects. Speech Manager (http://www.sil.org/computing/speechtools/speechmanager.htm) utilizes a relational data model to provide interactive concordances for phonological analysis. We are currently working on a successor to LinguaLinks called FieldWorks that will provide better performance and a more complete data model for morphology, syntax and discourse analysis.
Barlow, Michael. Web site: Corpus Linguistics. http://www.ruf.rice.edu/~barlow/corpus.html. Includes a list of various text corpora available for research as well as a list of concordance tools.
Simons, Gary F. 1994. Conceptual modeling versus visual modeling: a technological key to building consensus. SIL. http://www.sil.org/cellar/ach94/ach94.html.