CYP Guidelines

Annotators' home
CYP annotators' page


Contents


ABBREVIATIONS: GENERAL NOTES [2003-11-24]

Name + (Abbreviation)

Often we see a long name followed by an abbreviation in parentheses, or after a dash. If the name refers to a kind of entity that is being tagged in your domain, tag the name and the abbreviation, separately. They are two names for the same entity. Don't tag the parentheses:

monoclonal gammopathy of undetermined significance (MGUS)

[2003-12-11] When considering how much of a long name to tag, take the abbreviation into account. The fact that the authors used an abbreviation for a phrase, even if they invented it, is very strong motivation to tag the whole phrase that it stands for, even if it includes modifiers or other terms that we would otherwise exclude. For example:

The selective serotonin reuptake inhibitors (SSRIs)--paroxetine, sertraline, and fluvoxamine--have elimination half-lives of 15-26 hours.

Depending on domain-specific guidelines, a modifier such as "selective" might normally be excluded from the tagged string. But here it is obviously part of the phrase, represented by the first "S" in "SSRIs". That's sufficient reason to consider "selective" part of the full name of the entity.

Name + (Abbreviation) + More Name

But sometimes such a name+abbreviation is part of a longer name, which continues on after the parenthesized abbreviation. In such cases we do not try to separate out the abbreviation; that would make a tag within tag situation, which we try to avoid. So we just tag the longer name:

P-glycoprotein (P-gp)-related compounds

CHAINS *

Chaining is a way of annotating a piece of text that consists of more than one piece, such as "CYP1A2" in "CYP1A1/2", or "androgen receptors" in "androgen and estrogen receptors". See Chaining in WordFreak for a general discussion and the restrictions, then come back here for a few more examples. [2005-01-26]

Coordination in split references: [2005-01-26]

We chain only for conjunctive constructions, i.e., where the text uses "and" or "or" (or another word, or punctuation, used similarly) to join two or more similar elements, and some or all of the elements share a part. (In our documentation we use "(+)" to separate the links of a chain.)

  1. androgen and estrogen receptors
    The shared element, "receptors", is on the right of the conjoined section.

    • "androgen  (+)  receptors": tag as a chain
    • "estrogen receptors": single string

  2. organic acids and salts
    Here the shared element is on the left.

    • "organic acids": single string
    • "organic  (+)  salts": chain

  3. P450IA1 or IA2   =   "P450IA1 or P450IA2"
    The links of the chain are not complete tokens. You will need to adjust the tokenization to split "P450IA1" into "P450" and "IA1" so you can chain "P450" with "IA2".

    • "P450IA1": single string
    • "P450  (+)  IA2": chain

  4. CYP1A1/2   =   "CYP1A1 or CYP1A2"
    Like the previous example but with a slash instead of "or".

    • "CYP1A1": tag as a string
    • "CYP1A  (+)  2": tag as a chain

  5. H-, K-, and N-ras
    Three conjoined elements, but the principle is the same as #3, splitting "N-ras" into two tokens. Be sure to capture the hyphens (but only one per chain): capture "H-ras", not "Hras" or "H--ras".

    • "H-  (+)  ras": chain (or "H  (+)  -ras")
    • "K-  (+)  ras": chain (likewise)
    • "N-ras": single string

  6. organic and inorganic acids and salts
    Two shared elements, one on each end! Fortunately, this sort of thing is rare.

    • "organic   (+)  acids": chain
    • "organic  (+)  salts": chain
    • "inorganic acids": single string
    • "inorganic  (+)  salts": chain

Note: WordFreak will allow you to make a chain with some links in one sentence and some in another. It shouldn't allow this, in the same way as we don't allow (and WF doesn't allow) any single solid tag to span more than one sentence. So do NOT use the chaining tool to make chains across sentences.

Complex Chains:

Some chaining situations can be even more complex:

cytochrome P450 11 beta/18/19 -hydroxylase

Here we would have 3 chains:

  1. cytochrome P450 11 beta  (+)  -hydroxylase
  2. cytochrome P450  (+)  18  (+)  -hydroxylase
  3. cytochrome P450  (+)  19 -hydroxylase
(02/04)

Chains with embedded abbreviations:

When abbreviations are embedded in a split coordination, we do not skip the abbreviated part during chaining. The abbreviated part is included in the chain as per existing abbreviation rules.

cytochromes P450 (CYP) 1A1, 1A2, and 1B1

"1B1" with the left-hand part of the name, and the chaining tool would let us skip over "(CYP)". However, in order to stay consistent with existing practice, we'll treat the abbreviation as embedded in each case:

  1. cytochromes P450 (CYP) 1A1 [single tag]
  2. cytochromes P450 (CYP)  (+)  1A2 [chain]
  3. cytochromes P450 (CYP)  (+)  1B1 [chain]
(02/04)

CYTOCHROME P450

We will tag the names of cytochrome P450 enzymes as such when they occur in the text. The purpose of this annotation is to supply machine learning algorithms with enough information about the variant spellings of the enzyme, e.g., CYP450, CYP4503a, P450, etc.

A quantitative assessment of the levels of cytochromes P-450 b and P-450 c in....
                                           -----------             -------

Tag "cytochromes P-450 b" as CYP. Chain "cytochromes (+) P-450 c" and tag it also as CYP. *

Isozyme/Isoform and abbreviated CYP entities:

Often "cytochrome P450" is accompanied by the word "isozyme" or "isoform". Isozymes are any of two or more chemically distinct but functionally similar enzymes. We tag the entire string as CYP:

cytochrome P450 (CYP) 4A isoforms (4A1, 4A2, 4A3, 4A8)

  1. "cytochrome P450 (CYP) 4A isoforms" is one entity, tagged as CYP. If it were just "cytochrome P450 (CYP)" we would tag two strings, "cytochrome P450" and the parenthesized "CYP" (but not the parentheses), as standard procedure for "NAME (ABBREVIATION)". But since the parenthesized abbreviation is embedded in a longer name, we don't try to extract it. See rule.

  2. "4A1", "4A2", "4A3", and "4A8" are all separate references, each tagged as CYP. So there are five strings to tag here, all as CYP; and no chaining is required:
    • cytochrome P450 (CYP) 4A isoforms
    • 4A1
    • 4A2
    • 4A3
    • 4A8
(01/04)

Pseudo CYPs:

Often you will find entity references which have "cyp 450" embedded within them. These entities might look like CYPs but are actually complexes and hence tagged as Substance.

THESE ARE SUBSTANCES, NOT CYPs. In all of these cases, "cyp 450" (or equivalent) is part of a modifier of the last word in the expression, and the last word is the kind of thing referred to. This is standard in English. For example,

(12/03)

Not surprisingly, many of the names on the above list are synonyms. Here is a refreshingly clear [to a layman like me -- M.A.Mandel] explanation of what NADPH-cytochrome P450 reductase is, why its name includes "P450", and why it isn't a CYP450 enzyme. [2005-01-26]

Non-obvious CYP enzymes:

Many CYP450 enzymes are known by a number of names that don't include "cyp" or "450".

  1. The compounds were all shown to exhibit weak inhibitory activity against rat testes P450 17 (17,20-lyase), indicating good selectivity towards P450arom.

    There are 3 CYP tags in this sentence:

    • "P450 17" is a CYP
    • "17,20-lyase" is tagged as CYP even though it is not in the CYPxxx format. You will find it in the CYP synonyms list.
    • "P450 arom" is a CYP

  2. "Cholesterol 7 alpha-hydroxylase" is a synonym for CYP 7

  3. "Cholesterol side chain cleavage" is a cytochrome P450 enzyme, also called "P450scc".

To help you keep track of these, we have a synonyms page, which includes some discussion of non-obvious synonyms and of non-obvious non-synonyms (terms that look like CYP450 names but aren't). Whenever you spot a new member of one of these categories, please post it to the mailing list. Note: It is important to validate a nonobvious CYP enzyme. Some of these reference tools can prove useful.
(01/04, [2004-08-10])

Local Co-reference:

When an entity refers to a CYP and the entity is in the SAME SENTENCE as the CYP, then that entity should be tagged as CYP. Else it should be tagged as substance and the CYP name should be mentioned in the comment field.

  1. We investigated the effect of the steroid hormone dehydroepiandrosterone (DHEA) on the hepatic expression and activity of carcinogen-activating enzymes, the cytochromes P450 (CYP) 1A1, 1A2 and 1B1, in Sprague-Dawley rats.

    Here "carcinogen-activating enzymes" refers to a list of CYP enzymes and is in the same sentence as the CYP enzymes. Hence it should be tagged as CYP. The enzyme names should be treated as above:

    • cytochromes P450 (CYP) 1A1 - single string
    • cytochromes P450 (CYP) (+) 1A2 - chain
    • cytochromes P450 (CYP) (+) 1B1 - chain

    On the other hand, the same phrase occurs in the title of the abstract:

    Dehydroepiandrosterone inhibits the expression of carcinogen-activating enzymes in vivo.

    Here "carcinogen-activating enzymes" is tagged as Substance, because it does not refer to any CYP enzymes in the same sentence.

  2. CYP 4A plays an important role in metabolizing drug arbitor. This enzyme also...

    Again the word "enzyme" is tagged as Substance since the CYP it is referring to (CYP 4A) is not in the same sentence. Comment: CYP 4A

    (02/04)

SUBSTANCE

Some generalizations [2004-09-22]

This list of some kinds of terms that we tag as Substance is based on the original list that came out of the ChemFest meeting held on 2003-04-03, before annotation began in this project.

Genes vs. Proteins

Very commonly a gene and a protein are called by the same name. The oncology annotators tag both genes and proteins, distinguishing them where possible, but in CYP we tag proteins as substances and we don't tag genes at all.

  1. "the K-ras gene" - no tag because CYP doesn't tag genes
  2. "the K-ras protein" - tag the phrase, "K-ras protein", with the Substance label. You need to include the word "protein" to make it clear that this reference is to a Substance.
(09/03)

Organs, Species, Cells, etc.:

We do not tag organ names, body parts, species, micro-organisms, cell types, tissues, plasma, blood etc as substance. We are mainly interested in the names of enzymes, drugs, organic molecules and other chemicals and we tag these as Substance.

  1. gastric proton pump  inhibitor

    "Gastric" here means of or relating to the stomach. So we do NOT tag "gastric". But we tag "proton pump" as Substance and "inhibitor" separately as a Substance. See Split vs. solid.

  2. "human cytochrome P450 enzyme"

    Again we do not tag "human". "cytochrome P450 enzyme" is tagged as CYP.

Note: We have established two exceptions to this rule. Like much else in this project, it is subject to reconsideration as new situations arise. If in doubt, ask.

(07/03)

Parts of a molecule or site name:

A part or chunk of a molecule, such as a region, a site, a radical, or a single atom, is not a "chemical", and therefore not a "Substance", unless it's out there on its own, no longer part of what it used to belong to. For example:

But in general how do we know? The use of "the" and a specifying term ("the XXX nitrogen") is a clue that the author is referring to a specific nitrogen atom in the molecular structure, not to the substance nitrogen.

This does require some domain knowledge, but these clues should help. If you're fairly sure that a reference like this is to a particular part of a molecule rather than to a substance, don't tag it. But don't break your head trying to figure out such a situation. If it looks like a substance and you can't figure out fairly quickly whether this test applies, go ahead and tag it.
(07/03)

Substance modifiers:

Often you will encounter substance names with different types of modifiers either preceding the substance name or following it. Some of them have been listed here.

  1. Origin modifiers:

    These are modifiers that provide information about the origin of a substance. We do NOT tag such modifiers.

    Different natural and synthetic organic hydroperoxides have been found to stimulate TBARS formation in human term placental mitochondria.

    There are two modified substance terms here:

    • natural organic hydroperoxides
    • synthetic organic hydroperoxides

    If we tagged them, the first would be a chain and the second a single string. But in both cases we tag only "organic hydroperoxides"; and since it occurs only once here, we tag it only once. "Organic" describes the composition of the substance ("inorganic" would qualify in the same way). But "synthetic" and "natural" tell about the origin of it; exclude them.
    (04-04-25)

  2. Process modifiers:

    These are modifiers that give information about the procedures or processes performed on the substance. We do NOT tag them.

    Immature rat Sertoli cells synthesize and secrete a protein species which has immunological similarity with chicken egg thiamin carrier protein (TCP) as assessed by immunocytochemical localization, liquid phase radioimmunoassay (RIA), immunoprecipitation of [35S]-methionine incorporated newly synthesized proteins by polyclonal antibodies (pAbs) to chicken TCP and tryptic peptide mapping of iodinated immunoprecipitated proteins.

    • iodinated immunoprecipitated proteins

      Tag "proteins" as a substance and do not tag "iodinated" and "immunoprecipitated". Iodination and immunoprecipitation are laboratory techniques and hence will not be tagged.

    • tryptic peptide

      Only tag "peptides"as a substance. This is because [tryptic (peptide mapping)] is a process of peptide mapping which is done in a tryptic way, where tryptic means 'relating to trypsin or to its action'.

    (10/03)

General substance names:

If the general word is not part of a more specific phrase then

I. If it refers in context to a specific entity or group of entities (not necessarily a small group) - Tag it. *

II. If it's completely general and refers to all the things of its kind - Don't tag.

  1. We have investigated whether pyrene, a major compound of ....
    Tag "compound" as Substance. (Also tag "pyrene" itself as Substance.) *

  2. The six compounds tested are ICR 191, 170, 292, 372, 191-OH, and 170-OH.
    Tag "compounds" as Substance, with the comment "ICR 191, 170, 292, 372, 191-OH and 170-OH" -- or, for brevity, "ICR 191, ... 170-OH" would be enough. (Also tag the individual compound names as Substance, which will require chaining to join the last five of them with "ICR".)

  3. This phenomenon has been reported only once previously and, despite its potential clinical importance, nitromethane does not appear in published lists of substances that interfere with the Jaffe reaction.
    Here "substances" is very general and hence should not be tagged.

(09/03)

Substance name embedded in procedure name:

Substance names may also appear as part of a procedure name or lab technique. In such cases we just go ahead and tag the Substance name.

  1. "DNA analysis" - tag "DNA" as Substance
  2. "RNA-assay methods" - tag "RNA" as Substance
  3. "enzyme-linked immunosorbent assay (ELISA)" - tag "enzyme" as Substance

(10/03)

Complexes:

On many occasions, substances exist as complexes. In such situations, do not try to tag each Substance within the complex separately. Rather tag the entire complex as one string.

  1. "orphenadrine complex": Tag the whole thing as one Substance.
  2. "orphenadrine metabolite complex": Same as above.
  3. "NADPH-Cytochrome c Reductase" is tagged together and NOT "NADPH" and "cytochrome c Reductase" separately.

Tag each of these expressions, and in general "XYZ complex", as a single substance.

(11/03)

Abbreviations:

General abbreviation rules can be found here.

Abbreviations vs. Quant Name:

Sometimes abbreviations can be confused with quant name.

The EDHF-mediated dilations initiated by ACh and histamine, as well as K(ATP) activation by cromakalim, were blocked by mepacrine, a nonselective phospholipase A2 inhibitor. Mepacrine did not alter K(Ca) activation by compound NS-1619.

Here "K(ATP)" and "K(Ca)" might be confused with "K(i)", which is an inhibition constant and hence a quant name. But "K(ATP)" and "K(Ca)" are not related to "K(i)".

Tag "K", "ATP", and "Ca" as Substance names in these expressions.
(10/03)

Abbreviations that help:

Sometimes abbreviations are helpful in making a decision as to whether a particular phrase should be tagged as a substance or not. See the discussion in the General Guidelines for Entity Annotation.

  1. flavin-containing mono-oxygenase form 3 (FMO3)

    Something like this long term could go either way if it didn't have the abbreviation after it. It might be a name, or it might be just a description. But the abbreviation clinches it: it tells us that, at least in this text, "flavin-containing mono-oxygenase form 3" is functioning as a name. So we tag it as a Substance, and we tag the abbreviation as well.

  2. The objective of the current study was to monitor the effects of cotinine and cigarette smoke (CS) on the formation of O6MeG in target tissues of mice during the acute phase of NNK treatment.

    Here "cigarette smoke" is a Substance, as validated by the abbreviation "CS". Tag them both as Substance.

  3. Characterization and hormonal modulation of immunoreactive thiamin carrier protein (TCP) in immature rat Sertoli cells in culture.

    We will only tag "thiamin carrier protein" as a Substance and not tag "immunoreactive". This is because "thiamin carrier protein" occurs symbolically in the abstract as "TCP".
    (11/03)

Although we generally do not tag organ names, body parts, species, cell types, etc., if any of these appear as part of the abbreviated form, we include it in the tagged text.

  1. human placental lactogen (HPL)

    Here we see that lactogen is the substance of interest. We usually would not tag "human" or "placental". But the abbreviated form HPL tells us that the entire string is treated together as a name. So we tag "human placental lactogen" and its abbreviation "HPL" each as Substance.

  2. human chorionic gonadotropin (hCG)

    Similarly tag "human chorionic gonadotropin" and "hCG" as Substance.

Symbols vs. Abbreviations:

When abbreviations of Substance names are reduced to symbols, we generally do not tag these symbols.

  • NK-induced lung tumorigenesis is thought to involve O6-methylguanine (O6MeG) formation, leading to GC-->AT transitional mispairing and an activation of the K-ras proto-oncogene in the A/J mouse.

    Here "GC" refers to guanine and cytosine and "AT" refers to adenosine and thymine, all of which are nucleic acids. But they have been reduced to symbols in this instance. Hence we do not tag them.
    (11/03)

    Also in general if we encounter a construction of the type
        <element symbol>   <hyphen>   <name of chemical process>
    then we do not tag the element symbol. For instance, in "P-oxidation" and "N-methylation", we do not tag "P" or "N" as Substance.

    Note: This does not mean that we stop tagging "DNA" and "RNA" as substance. These are abbreviations of "deoxyribonucleic acid" and "ribonucleic acid" respectively which are specific names. Hence we continue tagging them as Substance.
    (12/03)

    We will also continue to tag "K(ATP)" and "K(Ca)", meaning "ATP-sensitive potassium channel" and "calcium-sensitive potassium channel" respectively. (See above.) [2004-08-25]

    Substance name + (Abbreviation) + Substance continued:

    See general rule on embedded abbreviations.

    Examples:

    1. ACTH receptor (ACTH-R) mRNA
      sssssssssssssssssssssssssss
    2. clausenamide (CLA) enantiomers
      ssssssssssssssssssssssssssssss
      (See also here and here.*)

    Substance + Process + (Abbreviation):

    When a phrase which has a Substance followed by a process and then an abbreviated form of the combined Substance and process, we only tag the Substance name and do not tag the abbreviated form. And we never tag just part of an abbreviation.

    1. Resveratrol slightly inhibited ethoxyresorufin O-deethylation (EROD) activity in human liver microsomes.

      Tag only "ethoxyresorufin" as Substance since we are not tagging processes. Do not attempt to tag any portion of "EROD".

      Similarly, *

    2. Aromatase activity (AA)
      sssssssss
    3. uroporphyrinogen oxidation (UROX)
      ssssssssssssssss

    (11/03)

    Enantiomers:

    Enantiomers can be confusing:

    4-OH-CLA was the major metabolite of (+)-3R, 4S, 5S, 6R-CLA [(+)-CLA], while 7-OH-CLA was the major one of (-)-3S, 4R, 5R, 6S-CLA [(-)-CLA].

    Enantiomers are mirror-image isomers, like a left and right glove. The really ugly notation above refers to two substances, each followed by an abbreviation; it is NOT a split reference for four substances! (The abbreviations are in square brackets rather than parentheses because they include parentheses.)

    ("4-OH-CLA" and "7-OH-CLA" are also Substances.) (See also here and here.*) (02/04, [2004-08-12])

    Meaningful biochemical class:

    When a string refers to a biochemical coherent category, the entire string is tagged as a substance.

    1. Competition between steroids and benzodiazepines for hepatic clearance enzymes may affect half lives of both drugs and hormones

      Here "clearance enzymes" is a class of enzymes whose role is to remove drugs and other compounds from the body. Hence we tag "clearance enzymes" as Substance and exclude "hepatic" as an organ specifier.

    2. Voriconazole is a triazole antifungal agent.

      Tag "triazole antifungal agent", as well as "voriconazole", as Substance.

    3. Other similar examples:
      • Class III antiarrhythmic drugs
      • antiarrhythmic agent
      • immunosuppressive drug
      • antiepileptic drugs
      • drug metabolic enzymes

      All these refer to meaningful biochemical categories and are tagged as Substance.

    (09/03)

    Split vs. Solid:

    When we have a string of two or more words, where the last word (which is generally in English the "head" of the expression, the word that can stand for all the rest of it) and the rest of the string can both be considered substances in some way, our general criterion for whether to tag them separately or together is whether the entire string refers to a (bio)chemically coherent category. If so, we tag the whole string as one; if not, we tag that last word separately.

    The "separate" words usually refer to uses or effects, like "inhibitor" or "substrate", while the "solid" words refer to classes like "oxide", "isomer", "mRNA", "-related compound".

    Some "split" expression end words

    Terms that end in one of these words describe a behavior or a use of what they refer to -- a CYP inhibitor inhibits CYP enzymes -- rather than forming a (biochemically) coherent category. These terms, therefore, should be tagged as separate strings rather than being combined with the Substance/CYP names that precede them, e.g., "CYP inhibitor". See the examples below. [2004-09-22]

    In either case, tag the string(s) according to the meaning of the entire string. If the text talks about "a CYP4A2 derivative" which is not itself a CYP enzyme, tag "derivative" as Substance, not CYP. [2004-09-23] *

    1. cyp-450 inducers
      ccccccc ssssssss
    2. 14 alpha-DM inhibitors
      sssssssssss ssssssssss
    3. P-glycoprotein substrate
      ssssssssssssss sssssssss

    Some "solid" expression end words

    A term ending in one of these generally refers to a meaningful class, so the whole term is tagged as a single entity. For example, "thyroid hormone" or "hexane isomer" would both be tagged as a Substance. See the examples below. [2004-09-22]

    E.g.:
    1. anti-cyp2E1 IgG
      sssssssssssssss
    2. steroid hormone
      sssssssssssssss
    3. CYP-450B2 isozymes
      cccccccccccccccccc
    4. ACTH-R cDNA probe
      sssssssssssssssss
    (01/04)

    Note on antibodies

    In general antibodies are represented in full as

        (Polyclonal/Monoclonal) {host name} anti-{antigen name} Ig{Immune globulin type}
    

    -- for example,

         polyclonal goat anti-human tpA IgG
                    host       antigen    type
                   =goat    = human tpA   = G
    

    Antibodies can be polyclonal or monoclonal; these adjectives, if present, tell us about the class of antibody. Hence we treat this as a solid expression and tag the entire expression -- in this case, "polyclonal goat anti-human tpA IgG" -- as substance. [2004-11-18] *

    QUANTITY NAME

    Formulas and Abbreviations:

    Tag only formulas and abbreviations as Quantitative-name. Sometimes you will see them pluralized, possibly with an apostrophe; include the plural marker in the tagged string (see General Guidelines for Entity Annotation) [2004-10-07]:

    Do not tag phrases, such as the following, let alone longer expressions.

    Note: This rule applies only to Quantitative-name and does not apply to Quantitative-units. See Abbreviations in Quant Units. (10/03)

    Counts and Statistical measures:

    We will not tag names that give us a count or a measure of reliability of statistical value. The commonest is "p" for a statistical measure of significance:

    "p" is also often given as less than a particular value:
    • was significantly (P < 0.05) more abundant
    • (n = 10 regions, p <.001)
    (All these examples are real.)

    Note that "p" can occur together with "n", as in the last quote. We don't want to tag them. First, they are not measurements of quantity: "n" is a count, and "p" is a measure of the reliability of a statistical value. One can distinguish "p" and "n" easily from actual quantitative measurements. We will not tag these "n"s and "p"s as quantitative entities. (10/03)

    Ratios:

    In tagging complex quantitative names, such as "V(max)/K(m)", don't tag "V(max)" and "K(m)" separately; tag the whole expression as a unit. Later in the project, either in CYP entity annotation or in Prop-banking, a type of annotation which we have not yet begun, we will annotate the relations between the name, value, and unit of a quantitative measurement. And this term refers not to "V(max)" or to "K(m)", but to the ratio between them, so that's what we call the quant-name. (02/04)

    QUANTITY VALUE

    Quantities vs. Counts:

    We are only including measurements, not countable numbers. The example that prompted this distinction was "10 volunteers". You don't measure volunteers or tests or tumors (at least in text like "we studied 18 tumors"), you count them. These are NOT quantities, and we won't tag them for quant-value, -unit, or -name.

    Note that this is not a distinction between integers and real numbers. "7 days" is a measurement, like "15 min" or "4.3 sec". You can have 7.5 days, but you sure don't want to have 7.5 volunteers! (10/03)

    Cardinals vs. Ordinals:

    We don't tag ordinal numbers ("first, second, eighty-eighth") as quantitative entities, only cardinal numbers.

    We do not tag any part of the following kind of expressions:

    They would take us too far from the basic quantitative-measurement set of
        <name>   <number>   <unit>.
    (02/04)

    Equality/Inequality Signs:

    Quantitative measurements expressed with inequality symbols -- ">", "<", ">=", or "<=", with or without a slash to represent the "or" -- include the inequality sign in the quantitative-value tag.

    ThioTEPA (50 micro M) inhibited the formation rates of 8-hydroxyefavirenz and 8,14-dihydroxyefavirenz from efavirenz (10 micro M) by >/=60% (HLMs) and >70% (CYP2B6), with Ki values < 4 micro M.

    ">/=60%", ">70%" and "<4" are all tagged as Quant Values.

    We do not tag the equal sign by itself; e.g., in "Ki = 4 microM" the quant-value is just "4".
    (10/03)

    Verbal and Mixed Equality/Inequality:

    Equality and inequality can be expressed verbally as well as symbolically:

    In the first two examples the expression of inequality ("over", "less than"), coming before the number, is part of the value; in the others it is part of the relationship between the measurements being compared. Do not tag the part that is shown in brackets, but do tag all the rest of what is shown as quantitative-value.

    Sometimes words and symbols are mixed:

    (Both examples are from PMID 9224809.) [2004-08-24]

    Approximations:

    Like expressions of inequality, expressions of approximation should be included in the quantitative-value tag:

    {2004-05-13}

    Range:

    When a quantitative value/measurement is represented as a range, the entire range should be tagged as quant value:

    (09/03)

    Sometimes ranges are represented verbally rather than in mathematical symbols. In such cases, tag the verbal range as quant value.

    (02/04)

    Dimensionless Quantities:

    Percentages, "n-fold" words, and the like are quantitative values with no unit. In terms of measurement, these are dimensionless numbers. In "14 mg" or "1.8 pmol", there is a unit (milligrams or picomoles), but a percentage is a ratio of two numbers that have the same dimensionality. In "a twofold increase from 3 mg to 6 mg", the increase is *

    (6 mg) / (3 mg) = 2

    -- no dimension, no unit, just a value: "2", or "twofold", or "200%". Often there will be no clear Quant-name to assign to these; don't try to force one.

    More examples:

    (09/03), {2004-05-13}

    QUANTITY UNITS

    Quantity units are often expressed as a combination of several unit-terms and exponents. A number attached to the name of a unit is an exponent: "mm2" is "mm2" is "square millimeters", and "min-1" or "min(-1)" is "min-1" is "per minute". All terms from the first division (the first "per" or "/" or negative exponent) onward, inclusive, are part of the denominator.

    The control value of 4 nmol min-1 mg-1

    Here

    nmol min-1 mg-1

    should be tagged as Quant Unit. It is meant to be understood as

    nmol min-1 mg-1
    (with the numbers superscript as exponents),

    read as

    nanomoles per minute per milligram ,

    and interpreted as

         nanomoles
     ------------------
     minute · milligram

    Abbreviations in Quantitative-units:

    In Quant Units, tag abbreviations as if they were spelled out in full:

    Tag both "min" and "minutes" as Quant Units.

    Similarly:

    Tag both "yrs" and "years" as Quant Units.

    Also, if a Quant-unit appears as <Quant Unit> + <Abbreviation of Quant Unit> then tag both the Quant Unit and the Abbreviation as separate Unit tags. This is similar to the Substance Name + Abbreviation rule.

    (02/04)

    Units with substance name/quantity value embedded:

    When the name of a CYP enzyme or Substance, or a Quantitative-value, appears within a Quantitative-unit string, the name or value is not tagged separately. We follow the no tags within tags rule and tag the entire string as Quant unit.

    1. B(max) of 3.14 +/- 0.26 pmol/mg protein
      "pmol/mg protein" is tagged as Quant Unit. Again, "protein" is not tagged as Substance here.

    2. pressure of 760 mmHg
      (not "mmHg")

    3. 10.7 pmols/min/10 microg protein

    (10/03)

    Be sure to distinguish between a substance (Substance or CYP) that is embedded at the end of the denominator and a substance that simply comes after the units as the substance being measured: * *

    As a rule we do not tag names of organs or body parts, but we make an exception for Quant Unit tagging. If an organ name is embedded within the units, either inside the string or at the end of the denominator (but see next paragraph), we tag the entire unit, including the organ name, as quant unit.

    1. 82 ng/10(8) cells
      "cells" is at the end of the denominator of a quantitative unit. The number "10(8)" (i.e., 108) is also in the denominator, and so is also included in the unit.

    2. 3 microM/g liver
      "liver" is at the end of the denominator string.

    3. 10 mcg insulin/ml
      "insulin" is embedded in the unit string.

    (02/04)

  • CHANGE NOTES

    2004-09-22: Added "Some generalizations" section under SUBSTANCES, a form of the list from the ChemFest.

    Moved "derivative" from "split" status, where it had been marked with "??", to "solid" status. This accords with both its conceptual similarity to "precursor" and "residue", expressing a chemical relationship to the substance whose name modifies the word, and annotation practice as shown by the database.

    2004-09-23: [see also 2004-09-22] Corrected split/solid status of "derivative" to match description and practice: solid if and only if the modifier is not tagged as a Substance or a CYP.

    2004-10-13: Adjusted highlighting in examples of Substance + Process + (Abbreviation), "Aromatase activity (AA)" and "uroporphyrinogen oxidation (UROX)", to clearly exclude process name.

    2005-01-26