Entity Definitions for Oncology

Annotators' home
Oncology annotators' page


"Gene"

The definition of the "Gene" category is on its own page. (This definition does not (as of early July, 2003) mention the distinction between genomic material and gene products described below.)

There are three buttons in WordFreak under the "Gene" entity:

Gene-Gene/RNA °

"Gene-Gene/RNA" is for genes and RNA elements (see the Definition).

Gene-Protein °

"Gene-Protein" is for the non-genomic downstream products of genes and RNA elements. "XXX mRNA" and "XXX transcript" are both Gene/RNA; tag the whole phrase, e.g.,

Gene-generic °

Usually there is no problem deciding which tag to use. But the same name or symbol can be used for a gene and for a protein that expresses it. "Generic" is just for those times when either

  1. it's not clear whether the reference is genomic (Gene/RNA) or proteomic (Protein), or
  2. the author was evidently referring to both types together.
Most of the time you will use either Gene/RNA or Protein. The need for Generic is fairly uncommon.

Gene names that include other gene names

Some gene names include another gene in their name, such as "p53 tumor suppressor gene". With these, only tag the longest name, the one that includes the other(s). See the general rule on tags within tags.

"X gene", "X protein", "kinase X", etc., vs. "pseudogene"

When you see a phrase like "the p53 gene" or "an N-ras protein", don't include the word "gene" or "protein" in the tag. It tells you whether to use the Gene/RNA button or the Protein button, and once you've done that the word itself is superfluous. The same goes for "oncogene".

But a pseudogene is not a gene, so if you're tagging "XXX" include "pseudogene" as well: [2004-10-14]

"X Y-ase" is often an enzyme (a Y-ase) that acts on X, so it is safest to include the last word in the tag if it is an enzyme name. But if the "Y-ase" precedes the "X", as in "kinase STK15", you can pretty well tell that it is redundant and explanatory-- "STK15 (which, by the way, is a kinase)"-- so do not include it in the tagged string: [2004-10-14]


"Malignancy"

(Subcategories replacing information-gathering mode [2004-03-25])

Overview

Most of the tags described below are for specific attributes of Malignancy, analogous to "Variation-state-original". But the Malignancy-type tag has a special purpose: to capture the diagnostic name of a malignancy. Unlike the more specific attributes like clinical stage, a Malignancy-type tag can have other tags inside it. In other words, tag-within-tag is allowed (and fairly common) when the outer tag is Malignancy-type.

Adjectives

Many of these attributes can be expressed as adjectives. Tag such adjectives as well. [2004-08-17]

Malignancy-Type

This is how clinicians name the different types of cancer. As you can imagine, just like genes, there are different ways to name a single malignancy type: morphologic features, histological observation, anatomical location, the name(s) of the discoverer or patients, and many more. These criteria are not mutually exclusive. "Leukemia" could be considered as either an anatomical or a histological type-name, but either way it's a Malignancy-type. "Squamous cell carcinoma" and "Ewing's sarcoma" are made up of a cancer name and a modifier; the unmodified name by itself ("carcinoma", "sarcoma") would be a Malignancy-type, but we don't tag it within the more detailed name; we tag the full phrase.

We don't have a list of names of Malignancy-type, and we probably never will have a complete one. This is something like information-gathering, but with some restrictions. With your bio or medical backgrounds, you will probably be able to recognize what is meant as the name of the cancer -- the Malignancy-type -- most of the time. Tag it. But we are restricting it: no prepositions. If you see "cancer of the lung", the Malignancy-type ends at "of": it's just "cancer". Tag "lung" separately as Malignancy-site.

In the list below, the text you would tag as Malignancy-type is italicized. When you see a name that you think should qualify as a type but doesn't fit any of the criteria in this list -- morphology, histology, anatomy, or eponymy -- tag it and mention it on the onco-list mailing list.

Adjectives [2004-08-17]

Tag adjectival forms of Malignancy-type as well:

The tumor was composed of carcinomatous, sarcomatous, and transitional elements in the frontal wall of the uterine body and therefore was diagnosed as a carcinosarcoma. (PMID 11520156)

Not Malignancy-type [2004-08-05] °

Malignancy-Developmental-State

The scope of Malignancy-Developmental-State has been significantly modified with the addition of quantitative tagging, as we begin to annotate Survival-status. See Developmental-state. This older definition is being retained for reference and comparison until all files annotated under this definition have been reannotated under the new one, with the interim label Developmental-state. When that is done we will rename all the Developmental-state references to Malignancy-developmental-state, reflecting the entity's conceptual association with Malignancy. [2005-02-11] °

This represents different developmental timelines of the malignancy's host (in the sense that a parasite lives in a host): an individual (usually patients), a cell line, or a tissue. The values of this attribute can be at different levels of specificity.

Development of person or tissue:

  • on lifespan scale:
    • embryonic
    • fetal
    • pediatric
    • child
    • juvenile
    • adult
  • more specific and precise:
    • 3-year-old
    • 9 months
    • older than 5 months
    • 2 days growth

Development of cell line:

  • passage 23
  • early passage

Some comparative words can also be values of this attribute since they provide the timeline information in relative terms; e.g.:

  • older
  • younger
  • longer

Malignancy-Clinical-Stage

We use this attribute for three distinct types of mention:

Staging systems

Tumors are usually staged clinically by researchers. This attribute is used to evaluate the extent of a cancer within the body, especially whether the disease has spread from the original site to other parts of the body. There are different staging systems for different kinds of tumors.

There are three staging systems used for neuroblastoma: the Evans System, the St. Jude System, and the International Staging System. We may see any of these; any of these would be tagged as Clinical Stage. Other systems are used for other kinds of cancer. Tag them all, not just for neuroblastoma.

The International Neuroblastoma Staging System (INSS) is now universally used to stage neuroblastoma:

The older Evans system for neuroblastoma:

Besides the specific terms used in specified staging systems, some general terms can also be used to state the clinical stage of the tumor, and so should be treated as the values of this attribute as well, such as

Premalignant conditions [2004-09-23]

An annotator asked:

In source_file_3806_22708 (PMID 9718653) should MEN type 2B syndrome be tagged as a malignancy? I ask because the definition for it* stated that it is characterized by the 100% incidence of medullary thyroid carcinoma.

* in the NCI Metathesaurus

I referred the question to the domain experts. They decided that all premalignant conditions should be tagged as Clinical-stage, restricting Malignancy-type to "established cancer names". At some point in the future we may develop a separate way of annotating references to premalignant conditions, but this will do for now.

"Benign", "malignant" °

When the words "benign" and "malignant" describe a cancer or a tumor, tag them as Clinical-stage:

This includes their use in terms such as "malignant neuroblastoma", which is a Malignancy-type, so we will have tag-within-tag:

* malignant neuroblastoma
  Clin-stg-
  --------type-----------

But do not tag them when they are not describing a cancer or a tumor, e.g.:

K-ras mutation analysis seems to be a powerful tool to determine the malignant potential of cystic pancreatic tumors before and after surgery. (PMID 9671070)
Here the word "malignant" refers to a process: "malignant potential" means the ability or probability of a tumor to become malignant.

Malignancy-Histology [2004-08-04] °

This attribute specifies cell and/or tissue type(s) affected by benign or malignant tumors. It includes nothing below the cell level (subcellular components such as "nucleus" or "prokaryon", which we do not tag) and nothing above the tissue level (body structures such as "eye" or body regions such as "arm", both of which are Malignancy-site). [2004-10-19]

The terms are the same terms that are used for healthy cells. For example:

This tumor is composed of glial cells with low level differentiation.

Here "glial cells" is the phrase specifying the cell type making up the tumor, and so will be tagged as Malignancy-histology. The tag should include the word "cells", as indicated by the boldface type.

This attribute is also commonly used in naming the tumor, so Histology strings often appear as part or all of a Malignancy-type. Since Malignancy-type strings can include tagged strings of other types, such Malignancy-types will have (at least) two tags: the whole string tagged as Malignancy-type, the histological description tagged as Histology, and possibly other descriptors such as Site or Developmental-state.

In the following examples, we would tag the complete string to the left of the dash as Malignancy-type and the underlined part as Malignancy-histology, whether it is just part of the Malignancy-type (e.g., #11) or all of it. (The italicized text after the dash is the definition of the term, not part of the text.) Where the histological description consists of more than one word, tag them as a single string inside the longer Malignancy-type string, not two separate strings (see last two examples).

  1. adenoma -- epithelial cells/tissue of glandular origin (benign tumor)
  2. carcinoma -- epithelial cells/tissue
  3. glioma -- glial cells
  4. leukemia -- blood-forming tissue
  5. lymphoma -- lymphocytes
  6. melanoma -- melanocytes
  7. neuroblastoma -- neuroblasts
  8. retinoblastoma -- retinoblasts
  9. sarcoma -- connective or supportive tissue
  10. squamous cell carcinoma -- squamous cells/epithelial tissue
  11. chronic myelogenous leukemia* -- myeloid cells/blood forming tissue (not "chronic myelogenous leukemia")
  12. acute lymphoblastic leukemia -- lymphoblasts/blood forming tissue
* chronic myelogenous leukemia
          -------------------- Malignancy-histology
  ---------------------------- Malignancy-type

We will tag all references to cell type as Malignancy-histology, whether or not they actually are in a description of a malignancy. Even if, for example, "epithelial cells" appear in a sentence also mentioning "adenoma", both terms should be tagged as Malignancy-histology. (See discussion under Malignancy-site.)

Adjectives [2004-08-17]

(A list to be added to.) Tag as Malignancy-histology:

Malignancy-Site [2004-08-04]

This attribute specifies the body part(s) affected by a malignancy, including organs, parts of organs, and body systems as well as terms like "leg" and "elbow" that refer to sections of the body. Terms referring to type of tissue should be tagged as Malignancy-histology.

Like Malignancy-histology, Malignancy-site is frequently used for naming Malignancy-type; in fact, all the body parts mentioned in the tumor names are the sites of the (not necessarily primary) tumors, and so are tagged with this attribute. Examples of this kind include the following (attribute values are in boldface):

Tag body part names in references to metastases. Although metastasis references are not Malignancy-type, we are tagging body part names wherever they occur:

Sometimes a part of the body may be referred to with the word "area" or "region". It may be redundant, or the authors may be referring to a larger region than just the body part name that modifies it. Don't try to guess or figure it out or look it up, but just include it in the tagged string. But if "area" (or similar word) is accompanied by an identifier, the phrase probably refers to a very specific section of the body part mentioned or being discussed, so include the identifier as well.

Conflicts in tagging Malignancy-site

(See below for terms that refer simultaneously to a cell or tissue type and to an organ or system of the body.)

Multiple body parts may be mentioned in conjunction, possibly referring either to a single value or to different values depending upon the context. For example, one abstract may always speak of "tumors of the head and neck", while another abstract may start off discussing "tumors of the head and neck" and later go on to separate discussions of "tumors of the head" and "tumors of the neck". Rather than read the whole abstract to decide whether such a conjoined mention at the beginning should be treated as one Site or as two, you should tag "head and neck" as a single Malignancy-site. (Actually, there aren't many Sites that are conjoined in this way; maybe the only other set is "small and large intestine".)

In a coordination like "tumors of the head and of the neck", where the second conjunct has its own preposition, tag the Sites separately.

Note that "cancer of the neck" is not a Malignancy-type because of the "no prepositions" rule for that attribute. But in such expressions, do tag "cancer" by itself as Malignancy-type:

We will tag all references to location in the body as Malignancy-site, whether or not they actually are in a description of a malignancy, and even if they appear to be redundant with another mention in the sentence. For example, if "epithelial cells" appear in a sentence also mentioning "adenoma", both terms should be tagged as Malignancy-histology. In

The patient presented at the ER with a sprained left ankle, but examination and tests revealed osteosarcoma in the left tibia.
tag as follows:

We are doing this for several reasons.

Adjectives [2004-08-17]

(A list to be added to.) Tag as Malignancy-site:

(Histology or Site?) [2004-09-13]

It is not always immediately clear whether to tag a reference as Histology or as Site.

Systems: Our domain experts have decided that references to a system of tissues in the body, such as "musculature" or "autonomic [nervous system]", should be tagged as Site rather than Histology.

Similarly, adjectives referring to a system (e.g., "neural") should generally be tagged as Site (compare "neuronal", which is Histology):

But if an adjective like this modifies an explicitly histological term like "cell" or "tissue", the authors' intention is clearly Histology, not Site, and you should tag it accordingly (archive): [2004-12-02]

Both at once: A text string may refer both to cell or tissue type and to an organ or system of the body. For example, "lymphoma" refers to both lymphocytes (cell type, so Histology) and the lymphatic system (system, therefore Site). In order to avoid double tagging, we will tag such strings only as Malignancy-histology, which carries more specific information than Malignancy-site. The histology implies the site, but not necessarily vice versa.

Malignancy-Differentiation [2004-08-04]

This attribute shows the degree of tumor cell differentiation. At the early stage of normal development, cells within a particular tissue often look similar in appearance and function, a condition that is described as "undifferentiated". As development proceeds, they often change in appearance, behavior, and/or molecular characteristics, including the ability to evolve into two or more distinct cell subtypes. Many tumor cells, however, don't follow the normal developmental process, but stop differentiating at some point. This attribute indicates where that point is by specifying the degree of tumor cell differentiation.

Differentiation status of a tumor is often described roughly with phrases like

Pathologists also have a number of numerical grade systems to describe the degree of differentiation of tumor cells more precisely, with different systems used for different kinds of tumors. Higher scores usually describe well-differentiated tumors, and lower scores poorly-differentiated ones. Both the descriptive phrases and the systematic grade levels should be tagged as Malignancy-differentiation.

Malignancy-Heredity-Status [2004-08-04]

A malignancy can be partially or fully inheritable or can appear spontaneously without any similar family history. This attribute describes whether the malignancy in discussion has hereditary properties, that is, whether it can be transmitted from parent to child by information contained in the genes. The most common descriptions of this attribute are

NOTE: "congenital" does not refer to Malignancy-heredity-status. (A newborn child is a nine-month-old organism, and in that period can have developed a sporadic malignancy unrelated to the parents' germ plasm.)

(For brevity, "status" is omitted on the button in WordFreak, and probably in most of our discussions both oral and written.)


"Variation"

This category includes six tags:

Variations are extremely complex entities, actually involving a relationship between these components. Although there is a proposed standard notation to describe them, it is hardly ever used, and the literature contains a great many different ways of describing them.

Here some examples of the categories that we are now using to describe Variation. These lists are not exhaustive; they keep growing as we look at files and you ask questions.

Variation-type °

Specifies the kind of change in the genomic material in a particular instance of variation, or a particular group of instances. [2004-08-18]

Alternate names

Besides synonyms, there are many ways of referring to mutations. People may refer to any of these with or without the word "mutation". Someone may say something like "the transition". And so on.

Sometimes the name is used in an adjectival form, as in "point mutational activities". In this case we would tag "point mutational" even though "mutational" is grammatically an adjective. [2003-07-31]

Not Variation-type

Variation-location

The place within the genomic material where the change occurs. Most often the location is within a gene: [2004-08-18]

But it can also be a larger unit, such as a chromosome or chromosome arm:

The location may also be included in a single string of notation together with the original and altered states, as above or in the next section.

Genes as locations

[2003-07-23] [2005-05-12]° Sometimes the variation location can be a gene, when the entire gene rather than a part of it is the object of the variation. Just in such special cases, we double-tag the gene both as gene/RNA and as location. (See WordFreak instructions on double-tagging and clicking vs. dragging.)

NOTE: This does not apply to expressions like "a deletion mutation in the K-ras gene at codon 5", where the variation is specified as affecting a specific section of the gene rather than the whole gene. (See the discussion in the notes from 2003-08-19.) Even if no specific location within the gene is mentioned, as in "point mutations of the p53 gene", do not tag the gene as Location unless the entire gene is affected, which rules out double-tagging with most Variation-types. ("Deletion", unlike most other types, can refer to any scale in the genome, from a single nucleotide to a chromosome.) [2005-04-29]°

 * deletion of the K-ras gene
   type----        G/RNA
                   loc--

 * translocation of the H-ras gene to location such-and-such
   type---------        G/RNA         loc-------------------
                        loc--
Such double-tagging can be called for with at least the following variation types:

Ranges as locations

[2003-07-23] A location can be specified as a range, like "codons 18 through 20".

And sometimes this range is stated in terms of genes: "from gene A to gene B", "between genes A and B", etc. In such cases, we will tag the range as a location AND we will tag the genes within it (as "gene/RNA"). (This forms an exception to our usual rule against tags within tags.)
 * from gene A   to gene B
            G/R         G/R
   Loc--------------------

 * between genes A   and B
                G/R     G/R
   Loc--------------------

Variation-state-original, Variation-state-altered, Variation-state-generic

[2003-12-18] A variation is a change from one state of the genome to another. We have separate tags for the original and the altered states, as well as a "state-generic" tag for use when it isn't clear from the notation and the immediate text whether a state is original or altered. (See an example here on the mailing list archives.)

The states may be expressed in prose, as in "change of glycine to alanine", or as a formula that shows the two states linked by an arrow or similar marker. Such a formula may also include the location, as several of these examples do. The original state is shown here in red italics, the altered state in green italics, and the location in blue (not italic).

Variation-event

[2004-04-05] This category refers to the variation as a whole. It is similar in concept to the un-subdivided "Variation" category we began with, but its scope is limited to names or terms that refer to a whole variation, not long strings of text that describe it. (See Variation-Event Introduction for a fuller explanation.)

We use the Variation-event tag in two circumstances.

  1. Frequently a variation or group of variations is described in specific detail in one or two sentences and is subsequently referred to with a phrase like "the mutation" or "this deletion" or "these point mutations"; or the reference may precede the description, either in the title or in the text. As long as the reference is to a variation event that is specified in terms of location, type, and/or state, tag it as a Variation-event, excluding determiners (the, this, these, that, those, ...). If it refers to a group of variation events tag it only if they are described as a group sharing at least one kind of specification.

  2. Some genomic variations are common enough or important enough in research to have names of their own. Down's syndrome (trisomy 21) is so widespread that the name is familiar to many laypeople. Others that we have encountered in this project are

[2003-07-23] Important note: Use this tag only when there is at least some specific information about the variation: at least a location, type, or (any kind of) state. Do not include the specific information in the tagged text. It doesn't even have to be in the immediate vicinity, as long as it clearly applies to the text you're tagging as a variation event. Some examples:

Tagging specific types of variation

[2003-12-18] Some types of variation are more complex than others, or raise questions about how to tag them. Here are some specifics.

Translocations

These are a complex type of variation, in which pieces of chromosomes get swapped around. Most of them involve a single exchange between two chromosomes:

wild type: chromosome A: aaaaaaaaaaaaaaaaaaaaaaAAAAA
chromosome B:     bbbbbbbbbbbbbbbbbbBBBBBBBBB
variation: chromosome A: aaaaaaaaaaaaaaaaaaaaaaBBBBBBBBB
chromosome B:     bbbbbbbbbbbbbbbbbbAAAAA

There's a fairly standard notation for these; e.g.,

        t(1;15)(p36.3;q24.2)
That is:

Now, the original and altered state are implicit in this information, but they are not explicit there. There are two locations (here, 1p36.3 and 15q24.2), but they're not "before" and "after". But in annotating translocations we will tag one of the locations as "state-original" and the other as "state-altered". It doesn't theoretically matter which is tagged as which, but for consistency's sake let's tag the one mentioned first as original.

So we would tag this piece of notation as

        t(1;15)(p36.3;q24.2)
like this:

t var-type
1  (+) p36.3 var-state-orig
15 (+) q24.2 var-state-alt

with each of the two states being a two-part chain. When this annotation is transferred into the database it will be transformed into a more accurate description of the translocation, but there's no need for us to complicate your work by adding new sets of buttons for different types of variation.

Deletions

Deletions can be described with more or less detail and in different ways. Sometimes the states will be specified, other times they will not be. Here are some examples: The key decision here is to distinguish whether a specification of nucleotides, base pairs, or amino acids constitutes a state or a location. If the text identifies them by symbol or name, they're a state; if only by address, so to speak, they're a location.

Not Tagged [2004-08-18]

One type of question that keeps coming up is "Should we tag XYZ as...?" Here are some types of term for which the answer is always No.


CHANGE NOTES


Annotators' home
Oncology annotators' page

2005-11-30