Food and Administration Substance Registration System Standard Operating Procedure

1 Title: Substance Definition Manual Version: 5c Effective Date: June 10, 2007 Supersedes: Version 5b Approved By: SRS Chemistry Review Committee; FDA Data Standards Council's Vocabulary Standards Working Group

1. Purpose...... 1 a. Defining a Substance ...... 1 b. Modification of Text for Computer Use...... 1 2. Structure Set Up and Layout ...... 3 i. Example: ISOETHARINE ...... 4 3. Molecules Containing Chains...... 4 i. Example: IOPHENDYLATE ...... 4 4. Abbreviations ...... 4 i. Example: ACETOPHENONE...... 4 ii. Example: TRIFLOCIN...... 5 5. Functional Groups ...... 5 a. NITRO Groups and NITRATES ...... 5 i. Example: NITROBENZENE ...... 5 ii. Example: NITRATION...... 5 b. NITROGEN OXIDES ...... 6 i. Example: PYRIDINE 1-OXIDE ...... 6 c. OXIMES...... 6 i. Example: 2-PROPANONE, O-METHYLOXIME...... 6 d. AZIDES...... 6 i. Example: 4-AZIDOBENZAMINE...... 6 e. DIAZO and AZO Compounds...... 6 i. Example: 5-DIAZOURACIL ...... 7 ii. Example: D&C Red No. 33...... 7 f. CYANIDES and ISOCYANIDES ...... 7 i. Example: BENZOYL CYANIDE ...... 7 ii. Example: t-BUTYLISOCYANIDE...... 8 g. SULFOXIDES...... 8 i. Example: ESOMEPRAZOLE ...... 8 6. Mixtures and Salts ...... 8 a. Types of Salts ...... 9 i. Simple Salt Example: MAGNESIUM HYDROXIDE ...... 9 ii. Quaternary Ammonium Compound Example: TETRAETHYLAMMONIUM BROMIDE ...... 9 iii. Salt Formed by Reaction between Organic Base and an Acid Example: GLYCINE HYDROCHLORIDE...... 9 b. Types of Mixtures...... 9 i. Example of an ALL OF Mixture: AMYL NITRILE...... 9 7. Salts and Compounds with Ionic Bonds...... 10 i. Example: SODIUM ACETATE...... 10 ii. Example: POTASSIUM CHLORIDE...... 10 iii. Example: MAGNESIUM PHOSPHATE ...... 10 8. Salts Formed by the Reaction of Ammonia with Acids ...... 11 i. Example: AMMONIUM CHLORIDE ...... 11 ii. Example: AMMONIUM LACTATE ...... 11

1 This guide was formerly titled the Structure Drawing Guide. 9. Salts Formed by the Reaction of Nitrogenous Organic Bases with Acids...... 11 i. Example: TRIETHYLAMINE HYDROCHLORIDE...... 11 ii. Example: AMINOGUANIDINE SULFATE ...... 12 iii. Example: TETRAETHYLAMMONIUM CHLORIDE ...... 12 10. Inner Salts...... 12 i. Example: R-5-CHLORO-2-METHYL-3-(4-SULFOBUTYL)-BENZOTHIAZOLIUM INNER SALT 12 11. Bracket Usage for Salts and Solvates ...... 13 i. Example: 1:1 Compound to Salt Ratio...... 13 ii. Example: 1:2 Compound to Salt Ratio...... 13 iii. Example: LOSOXANTRONE HYDROCHLORIDE (DIHYDOCHLORIDE HEMIHYDRATE) ...... 13 12. Salts Containing a Complex Substance as a Counterion ...... 14 i. Example: FENETHAZINE TANNATE...... 14 13. Substance Of Unknown Or Variable Stoichiometry...... 14 i. Example: VIOMYCIN SULFATE...... 15 14. Partially Ionized Acids ...... 15 i. Example: 5-SULFOSALICYLATE, MONOSODIUM SALT...... 16 ii. Example: Citrate Ion with Single Negative...... 16 iii. Example: MONOSODIUM CITRATE ...... 17 iv. Example: MONOSODIUM GLUTAMATE ...... 17 v. Example: 1H-IMIDAZOLE-4-METHANOL, DIHYDROGEN PHOSPHATE (ESTER), DISODIUM SALT ...... 17 vi. Example: ALENDRONATE SODIUM ...... 17 15. Complex Inorganic and Organic Compounds ...... 18 a. Complex Inorganics...... 19 i. Example: TALC ...... 19 b. Complex Organics...... 19 16. Botanicals ...... 19 i. Example: FENNEL OIL...... 20 17. Foods ...... 20 i. Example ABALONE ...... 22 ii. Example BEEF...... 22 iii. Example SQUID, BOILED...... 23 18. Coordination compounds or complexes (Draft Proposed)...... 23 i. Example: PLATINATE(2-), TETRACHLORO-, DIPOTASSIUM , (SP-4-1)-...... 24 ii. Example: CISPLATIN and TRANSPLATIN ...... 24 iii. Example: DEXORMAPLATIN ...... 24 iv. Example: GADOLINATE(2-), (N,N-BIS)2-(BIS(((CARBOXY- .KAPPA.O)METHYL)AMINO-.KAPPA.N)ETHYL)GLYCINATO(5-)-.KAPPA.N,.KAPPA.O)-, DISODIUM ...... 25 v. Example: IRON SUCROSE (Proposed Not Official) ...... 25 19. Organometalic Compounds ...... 26 i. Example: 2-.BETA.-CARBOMETHOXY-3-.BETA.-(4- TRIMETHYLSTANNYLPHENYL) ...... 26 a. Nitrogen, Oxygen, and Fluorine Bonded to Metallic Centers...... 26 i. Example: TRIPHENYLTIN HYDROXIDE ...... 26 b. Other Nonmetals Bonded to Metallic Centers...... 26 i. Example: THIMEROSAL...... 26 20. Tautomerization and Resonance Form Presentation...... 27 a. Proton-Shift Tautomerization ...... 27 b. Ketones...... 27 i. Example: ACETOPHENONE...... 27 c. Carbonyl Compounds ...... 27 i. Example: PHENOBARBITAL ...... 27 d. Phenols ...... 27

Version 5c ii 5/13/2007 i. Example: PHENOL...... 28 e. Oximes...... 28 i. Example: BENZALDOXIME...... 28 f. Guanidines ...... 28 i. Example: 1-PHENYL-3-(4-PHENYL-2-THIAZOLYL) GUANIDINE ...... 28 ii. Example: 1-PHENYL-3-(4-PHENYL-2-THIAZOLYL)GUANIDINE ...... 29 iii. Example: TRAMAZOLINE...... 29 g. Tetrazoles ...... 29 i. Example: 5-(3-PYRIDYL)TETRAZOLE ...... 29 h. Pyrithione ...... 29 i.Thiosulfate and Thiosulfite...... 29 i. Example: SODIUM THIOSULFATE...... 30 j. Purines and Pyrimidines...... 30 i. Example: 6-THIOPURINE ...... 30 ii. Example: 2-THIOURACIL...... 30 k. Tetracyclines ...... 30 i. Example: TETRACYCLINE...... 31 l. Anhydrotetracyclines...... 31 i. Example: CHELOCARDIN HYDROCHLORIDE...... 31 m. Rubicin Antibiotics...... 31 i. Example: ANNAMYCIN...... 32 n. Resonance-Stabilized Ions...... 32 i. Example. PEMIROLAST POTASSIUM ...... 32 ii. Example: PHENOBARBITAL SODIUM...... 33 o. Other Types of Tautomerization...... 33 p. Phenolphthalein and Similar Compounds ...... 33 i. Example: PHENOLPHTHALEIN...... 33 ii. Example: PHENOLPHTHALEIN, SODIUM...... 34 21. ...... 34 a. E and Z Isomerism (Proposed Not Official) ...... 34 i. Example: NITROFURAZONE ...... 35 ii. Example: RIFAPENTINE...... 35 iii. Example CROTAMITON ...... 35 b. General Rules for Tetrahedral Stereochemistry...... 36 i. Example: 4-(R-2-(ETHYLAMINO)-1-HYDROXYETHYL)BENZENE-1,2-DIOL...... 37 ii. Example: PROPOXPHENE ...... 37 iii. Example: MEPIVACAINE ...... 37 iv. Example: SANTONIN ...... 38 v. Example: ...... 38 22. Representation of Stereochemical Classes ...... 38 a. Completely Defined ...... 39 i. Example: PREDNISOLONE...... 39 b. Racemic Substances (Optically Inactive) or Substances with Undefined or Unknown Stereochemistry ...... 39 1. A single chiral center...... 39 i. Example: ISOPROTERENOL HYDROCHLORIDE ...... 40 ii. Example: DROFENINE ...... 40 iii. Example DOXPICOMINE (-)-isomer absolute configuration undefined...... 40 2. Multiple chiral centers...... 41 i. Example: HYDROCHLORIDE ...... 41 ii. Example: BENZQUINAMIDE ...... 41 iii. Example: SARPICILLIN HYDROCHLORIDE ...... 42 iv. Example: ENISOPROST...... 42 v. Example: HEXESTROL (Meso compound) ...... 43 c. Chiral Substances (Optically Active) with Undetermined Configuration...... 44 i. Example: LEVOPHACETOPERANE HYDROCHLORIDE ...... 44

Version 5c iii 5/13/2007 d. Spiro Compounds ...... 44 i. Example: HEPTELIDIC ACID...... 44 e. Special Stereochemical Cases...... 45 i. Example: (+)-CAMPHENE...... 46 ii. Example: (+)-...... 46 iii. Example: PENTAZOCINE LACTATE ...... 46 f. Axial ...... 47 i. Example (S)-1,3-DIMETHYLALLENE...... 47 ii. Example ADENALLENE (+/-)...... 48 23. Isotopically Labeled Compounds ...... 48 a. Position of Substitution Known...... 48 i. Example: IODOANTIPYRINE I...... 48 b. Labeled compound with the position of isotopic atom is unknown ...... 48 i. Example: SODIUM, I-131, Position of Substitution Not Specified...... 49 ii. Example: C-14 CHOLESTEROL; Position of Substitution Not Specified...... 49 iii. Example: DEUTERIUM OXIDE...... 49 24. Substituted Glycerine Compounds (Guidance needs to be developed) ...... 49 i. Example: CALCIUM GLYCEROPHOSPHATE...... 49 25. Carbohydrates and Other Saccharides...... 50 i. Example: DEXTROSE...... 50 ii. Example: D-FRUCTOSE ...... 50 iii. Example LACTOSE, ANHYDROUS...... 51 26. Peptides/Proteins...... 51 a. Description Field...... 52 b. Structure Field and Complete Polypeptide/Protein Substance Definitions...... 53 i. Example: COSYNTROPIN...... 53 ii. Example: BIVALIRUDIN...... 54 iii. Example: ENFUVIRTIDE ...... 55 iv. Example: LUTEINIZING HORMONE ...... 55 v. Example: INSULIN HUMAN...... 56 vi. Example: SOMATROPIN...... 57 vii. Example: EFALIZUMAB...... 57 viii. Example: PEGINTERFERON ALFA-2A...... 59 ix. Example. INFLIXIMAB ...... 60 x. Example: TOSITUMOMAB I-131 ...... 61 c. Salts of peptides and proteins: ...... 63 i. Example: INSULIN HUMAN ZINC ...... 63 ii. Example: GONADORELIN ACETATE...... 64 27. Oligonucleotides (Proposed Not official)...... 65 i. Example: DNA sequence d(AGTC) ...... 66 ii. Example: FOMIVIRSEN SODIUM...... 66 28. Polymers (Proposed Not Official) ...... 67 a. Structure Drawing...... 67 b. Description Field ...... 67 i. Example. Low Density POLYETHYLENE...... 69 ii. Example. High Density POLYETHYLENE ...... 70 iii. Example. MICROCRYSTALLINE CELLULOSE...... 71 iv. Example. POWDERED CELLULOSE...... 72 v. Example. CELLULOSE ACETATE...... 74 vi. Example. CELLACEFATE...... 75 vii. Example. POLOXAMER 124 ...... 77 viii. Example. ACRYLONITRILE- BUTADIENE-STYRENE TERPOLYMER ...... 78 ix. Example: PHENOL FORMALDEHYDE RESINS (Condensation polymers) ...... 80 x. Example: NAPTHOL FORMALDEHYDE RESIN...... 81 xi. Example: 6,6-NYLON...... 83 xii. Example:.HEPARIN (Not Official)...... 83

Version 5c iv 5/13/2007 xiii. Example: POLACRILIN...... 86 29. Active Moieties ...... 87 i. Example: ALENDRONATE SODIUM active moiety is ALENDRONIC ACID...... 88 ii. Example: CHELOCARDIN HYDROCHLORIDE active moiety is CHELOCARDIN ...... 88

Version 5c v 5/13/2007 Version 5c vi 5/13/2007 1. Purpose

This guide is used to standardize the entry of substances into the Food and Drug Administration (FDA) Substance Registration System (SRS). The primary purpose of this guide is to prevent duplicate entries of a single substance. Conventions for drawing structures and for organizing the characteristics of substances are included. The guide also provides limited aesthetic guidelines for the structures as they are intended to be shared with other databases and may be used in professional publications. a. Defining a Substance

In the SRS, a substance is defined using two fields: Structure and Identifying Description. The preferred method to define a structure is using a two-dimensional chemical structure. Chemical structures are easily searched and stereochemistry is readily apparent to the user. Many substances cannot be described completely or at all by a single structure or sets of structures; they require other forms of representation. The Identifying Description Field is used for substances that are not easily defined using structures. Data is captured in the Identifying Description Field using a hierarchy of tags. Different classes of substances, such as polymers and peptides, have specific sets of tags used to describe substances. These tags are explained in detail under the topics in which they apply. Tags are also used to provide information that allows the classification of substances that cannot be completely described by the structure alone.2

For chemical substances, the FDA SRS differentiates substances at the molecular level. Different salts and solvates of a drug are defined as different substances. However, different tautomeric forms are considered the same substance. This guide includes conventions for standardizing the representation of tautomers in order to provide the user with a consistent view and to aid in searching. The SRS allows the registration of stereoisomers through the use of wedge bonds, the chiral flag and Identifying Description tags. Duplicate structures are permitted in the SRS only if the Identifying Description Fields differentiate the substances.

Substances can be mixtures of other substances. Mixtures that contain greater than ten chemically defined substances are considered complex substances and chemical structures are not drawn. Substances that form non-covalent interactions with other added substances are not new substances or mixtures of substances; they are defined as separate substances. Mixtures are either related compounds or compounds that are isolated together. Examples: Beta-cyclodextrin complexed with is described in the SRS system as two separate substances and is not considered a new substance (nor is it treated as a mixture). Synthetic conjugated estrogens are considered a mixture since the substances are closely related to one another.

The Description Field is used to add additional information about a substance or to describe substances for which a structure is not drawn. The Description Field is tagged with xml tags that are defined throughout the guide.3 The Description Field is part of a substance definition and the combination of Structure and Identifying Description must be unique for a given substance. If multiple sets of description tags are used to define a substance the order of the set of tags are alphabetical according to the primary tag of the set (i.e. tags precede tags). The primary tag is the initial tag that defines the type of information defined USING xml tags. The following is a list of primary tags:

BOTANICAL COMPLEX_SUBSTANCE

FOOD NUCLEIC_ACID POLYMER PROTEIN STEREOCHEMISTRY b. Modification of Text for Computer Use

2 Examples in this document include the Identifying Description field only when they contain data. 3 XML stands or Extended Markup Language and is a computer format used for internet pages.

The rules listed below are used both for Chemical Names, that are both synonyms and preferred terms and the Identifying Description Field.

a. Convert text to uppercase.

Benzene

becomes

BENZENE

b. Convert square brackets, [ and ], and braces, { and }, to parentheses, ( and ).

4H-[1,3]oxathiolo[5,4-b]pyrrole

becomes

4H-(1,3)OXATHIOLO(5,4-B)PYRROLE

c. Spell out Greek letters. Delimit the letter's name with periods. androstan-17-one, 3-hydroxy-, (3α,5α)-

becomes

ANDROSTAN-17-ONE, 3-HYDROXY-, (3.ALPHA.,5.ALPHA.)-

d. Convert an arrow, →, to the two character sequence hyphen/greater than sign, ->.

D-Glucose, O-4,6-dideoxy-4-[[(1S,4R,5S,6S)-4,5,6-trihydroxy-3- (hydroxymethyl)-2-cyclohexen-1-yl]amino]-α-D-glucopyranosyl-(1→4)-O-α- D-glucopyranosyl-(1→4)-

becomes

D-GLUCOSE, O-4,6-DIDEOXY-4-(((1S,4R,5S,6S)-4,5,6-TRIHYDROXY-3- (HYDROXYMETHYL)-2-CYCLOHEXEN-1-YL)AMINO)-.ALPHA.-D-GLUCOPYRANOSYL- (1−>4)-O-.ALPHA.-D-GLUCOPYRANOSYL-(1−>4)-

e. Do not use the stereochemical descriptors d and l (for dextro and levo). Replace d with (+), l with (-), and dl with (+/-).

Camphor, d-

becomes

CAMPHOR, (+)-

f. Represent primes using single quotes. Do not use double quotes.

' is used for a single prime. '' is used for a double prime

Version 5c 2 5/13/2007 ''' is used for a triple prime '''' is used for a quadruple prime

Technetium-99Tc, chloro[bis[(1,2-cyclohexanedione oxime oximato-κO)(1-)][[1,2-cyclohexanedione (oxime-κO)oximato](2- )]methylborato(2-)-κN,κN',κN'',κN''',κN'''',κN''''']-, (TPS- 7-1-232'4'54)-

becomes

TECHNETIUM-99TC, CHLORO(BIS((1,2-CYCLOHEXANEDIONE OXIME OXIMATO-.KAPPA.O)(1-))((1,2-CYCLOHEXANEDIONE (OXIME- .KAPPA.O)OXIMATO)(2-))METHYLBORATO(2-)- .KAPPA.N,.KAPPA.N',.KAPPA.N'',.KAPPA.N''',.KAPPA.N'''',.KAP PA.N''''')-, (TPS-7-1-232'4'54)-

g. Enter superscripts using the term "SUP" separated from the other descriptors by a space. Place "SUP" and the superscripted information in parentheses.

1,3,5,7-Tetraazatricyclo[3.3.1.13,7]decane

becomes

1,3,5,7-TETRAAZATRICYCLO(3.3.1.1(SUP 3,7))DECANE

h. Subscripts are entered using the term "SUB" separated from the other descriptors by a space. "SUB" and the subscripted information are placed in parentheses.

(15R)-15-methylprostaglandin E2

becomes

(15R)-15-METHYLPROSTAGLANDIN E(SUB 2)

i. Convert (±) to (+/-).

1,2-Benzenediol, 4-[1-hydroxy-2-[(1-methylethyl)amino]butyl]-, hydrochloride, (R*,S*)- (±)-

becomes

1,2-BENZENEDIOL, 4-(1-HYDROXY-2-((1-METHYLETHYL)AMINO)BUTYL)-, HYDROCHLORIDE, (R*,S*)-(+/-)-

j. Convert superscript on radioisotopes to normal text in front of atom symbol 131I becomes 131I. This is an exception to the superscript rule above.

2. Structure Set Up and Layout

Version 5c 3 5/13/2007 Structures are drawn using the landscape format. Atoms should not overlap when hydrogens are displayed on terminal carbon atoms and on heteroatoms. When using MDL software, terminal hydrogens and hydrogens on heteroatoms are not drawn but are displayed when the option is set to hetero-, terminal.4

i. Example: ISOETHARINE

CH3 OH O NH CH3 N CH OH 3 O OH O

Correct Hydrogens Displayed Incorrect Hydrogens Not displayed

3. Molecules Containing Chains

Each atom in a substance containing a long chain is explicitly drawn. Brackets containing repeating units are not used for substances other than polymers and simple salts.5

i. Example: IOPHENDYLATE

I I O

OCH3 O CH3 8 CH CH O 3 3

Correct Incorrect

4. Abbreviations

Abbreviations for chemical groups, such as acetyl, ethyl, isopropyl, methyl, and phenyl, are not used. Explicitly draw each non-hydrogen atom of every chemical group.

i. Example: ACETOPHENONE

4 When the structure is displayed with a 1-cm bond length and a 12 pt arial font. 5 Polymers typically do not have a defined number of repeating units but a range of repeating units. See section 27 on Polymers.

Version 5c 4 5/13/2007

O

O CH3

Ph Me Correct Incorrect

Condensed groups, such as CO2H, COOH, NO2, SO3H, and CF3, are not used. Explicitly draw each atom and bond.

ii. Example: TRIFLOCIN

O OH F CO H F H H 2 N F C N F 3 N N

Correct Incorrect

5. Functional Groups a. NITRO Groups and NITRATES

Nitro groups and nitrates are drawn with a positive charge on the nitrogen and a negative charge on one of the oxygens. They are not drawn with five bonds to the nitrogen.

i. Example: NITROBENZENE

O O + N N O O

Correct Incorrect

ii. Example: NITRATION

O O

+ N N O O O O

Correct Incorrect

Version 5c 5 5/13/2007 b. NITROGEN OXIDES

Nitrogen oxides are drawn with a positive charge on the nitrogen and a negative charge on the oxygen.

i. Example: PYRIDINE 1-OXIDE

O O + N N

Correct Incorrect

c. OXIMES

Oximes are drawn without a charge on any atom.

i. Example: 2-PROPANONE, O-METHYLOXIME

O N CH3

CH CH 3 3

d. AZIDES

The azido group is drawn with all three nitrogen atoms in a straight line. The terminal nitrogen has a negative charge and the middle nitrogen has a positive charge.

i. Example: 4-AZIDOBENZAMINE

NH2 NH2 N N + N N N N

Correct Incorrect e. DIAZO and AZO Compounds

Version 5c 6 5/13/2007

The terminal diazo group is drawn with the carbon bonded to the two nitrogen atoms in a linear configuration. The terminal nitrogen has a negative charge and the middle nitrogen has a positive charge.

i. Example: 5-DIAZOURACIL

N O N O

NH NH + N N N N O O

Correct Incorrect

Diazo or azo dyes are normally thought to exist entirely in an E or trans configuration and are drawn as such unless information is provided otherwise. A Z or cis configuration usually forces the aromatic rings out of the plane and is usually a “high energy” configuration. These dyes are also drawn as diazo and not as hydrazone tautomers.

ii. Example: D&C Red No. 33

O O O O O O O O S S S S O O O O N N NH OH N 2 NH2 O NH + + Na Na 2 2

Correct Incorrect

f. CYANIDES and ISOCYANIDES

Compounds with a cyano or an isocyanide group are drawn with the nitrogen and carbon atoms of the cyano group in line with the attachment atom.

i. Example: BENZOYL CYANIDE

Version 5c 7 5/13/2007

O O N N

Correct Incorrect

ii. Example: t-BUTYLISOCYANIDE

CH 3 CH 3CH 3 3CH + C CH N + 3 CH N C 3

Correct Incorrect

g. SULFOXIDES

The sulfoxide group is drawn in a charge-separated configuration to allow for the assignment of chirality. A single bond is drawn between the sulfur and the oxygen atoms. A negative charge is placed on the oxygen atom and a positive charge on the sulfur atom.

i. Example: ESOMEPRAZOLE

H Chiral H Chiral N O N O + S N S N CH CH 3 N 3 N O CH3 O CH3

CH O CH CH O CH 3 3 3 3

Correct Incorrect

6. Mixtures and Salts

Substances that contain multiple molecules or ions are presented as either a salt or a mixture. For a salt, all the components are stored in a single record. A mixture is represented by two or more substance records that are brought together to define a new substance. The registration system has a module design for entering mixtures. When a mixture is viewed, each component is listed in a table and each substance is displayed separately.

For the SRS, a simple salt is defined as an ionic compound of fixed, undefined, or variable stoichiometry. This includes simple salts, quaternary ammonium compounds, and salts formed by reaction between organic bases and acids.

Version 5c 8 5/13/2007 a. Types of Salts

i. Simple Salt Example: MAGNESIUM HYDROXIDE

2+ Mg OH 2

ii. Quaternary Ammonium Compound Example: TETRAETHYLAMMONIUM BROMIDE

+ 3CH N CH3 Br CH CH 3 3

iii. Salt Formed by Reaction between Organic Base and an Acid Example: GLYCINE HYDROCHLORIDE6

O

2NH ClH OH

b. Types of Mixtures

Three types of mixtures are possible: “ALL OF”, “ANY OF”, and “ONE OF”. An “ALL OF” mixture is used when two or more substances are always present. An “ANY OF” mixture is used when at least one component is present, but there is the possibility that many if not all the components are present. An 'ONE OF' mixture is used when only one of the components is present.

i. Example of an ALL OF Mixture: AMYL NITRILE

6 A salt formed by the reaction of an organic base with an acid is drawn as the organic base and the acid form of the counter ion, not as two ions (for more examples, see section 7 on Salts Formed by the Reaction of Nitrogenous Organic Bases with Acids).

Version 5c 9 5/13/2007

United States Pharmacopeia (USP) defines Amyl Nitrate as a mixture of the nitrite esters of 3- methyl-1-butanol and 2-methyl-1-butanol.

Mixture Type: “ALL OF”

CH O O 3 CH N 3 CH O O CH 3 3 N

α-Lactose β-Lactose

7. Salts and Compounds with Ionic Bonds

Ionic bonds are drawn by displaying a charge on the appropriate atoms.

i. Example: SODIUM ACETATE

O O O

+ Na (0) CH O Na CH O CH OH Na 3 3 3 Correct Incorrect Incorrect

For simple salts, place the cation on the left and the anion on the right. When the stoichiometry is 1:1, simply place the components next to each other.

ii. Example: POTASSIUM CHLORIDE

+ K Cl

Indicate other stoichiometric ratios by placing brackets around any ion that is repeated. Place the repeat count at the bottom right hand corner of the brackets.

iii. Example: MAGNESIUM PHOSPHATE

Version 5c 10 5/13/2007

O 2+ Mg O P O 3 O 2

8. Salts Formed by the Reaction of Ammonia with Acids

A salt formed by the reaction of ammonia with an acid is represented as the ammonium ion and the conjugate base of the acid.

i. Example: AMMONIUM CHLORIDE

+ NH Cl NH ClH 4 3

Correct Incorrect

ii. Example: AMMONIUM LACTATE

O O

+ NH4 OH OH

O OH NH3 CH CH 3 3

Correct Incorrect

9. Salts Formed by the Reaction of Nitrogenous Organic Bases with Acids

A salt formed by the reaction of a nitrogenous organic base with an acid is represented as the free base and the acid form of the counter ion.

i. Example: TRIETHYLAMINE HYDROCHLORIDE

Version 5c 11 5/13/2007

H + N CH CH3 H3C 3 CH N ClH 3 Cl CH CH 3 3

Correct Incorrect

ii. Example: AMINOGUANIDINE SULFATE

O H O H NH N 2NH N 2 NH OHS O NH2 OH S OH 2 + NH O NH O 2

Correct Incorrect

The nitrogen atom in a quaternary ammonium compound is drawn with a positive charge . iii. Example: TETRAETHYLAMMONIUM CHLORIDE

CH3 CH3

+ CH H CH N 3 N 3

3CH 3CH

CH Cl CH ClH 3 3

Correct Incorrect

10. Inner Salts

Inner salts are substances in which positively and negatively charged groups are contained within the same molecule.

i. Example: R-5-CHLORO-2-METHYL-3-(4-SULFOBUTYL)-BENZOTHIAZOLIUM INNER SALT

Version 5c 12 5/13/2007 Chiral O Cl OH Cl Chiral O S O O S O

+ H H3C N 3CH N

CH S CH S 3 3

Correct Incorrect

11. Bracket Usage for Salts and Solvates

Salts and solvates contain two or more molecular entities in the structure field. If the ratio of the components is 1:1, they are placed side by side. If the ratio is not 1:1, brackets are used to indicate the ratio of the components. Brackets are always labeled with whole numbers or S-text. Brackets are not labeled with fractions. For hemihydrates and similar situations use the ratio with the lowest whole numbers.

i. Example: 1:1 Compound to Salt Ratio

S NH2 Cl H N H N 2

ii. Example: 1:2 Compound to Salt Ratio

Chiral

NH2 O S O CH3 ClH 3CH S O 2 O NH 2

iii. Example: LOSOXANTRONE HYDROCHLORIDE (DIHYDOCHLORIDE HEMIHYDRATE)

Version 5c 13 5/13/2007

H N OH NN ClH OH 4 2

OH O NH OH N H 2

12. Salts Containing a Complex Substance as a Counterion

When a substance consists of a well-defined component and a complex counterion, the resulting salt is treated as a mixture.7 The mixture consists of one record for the well-defined component and one record, for the complex counter ion.

i. Example: FENETHAZINE TANNATE

Mixture Type: “ALL OF”

Fenethazine tannate is drawn as a mixture of fenethazine and a tannic acid. This represents a mixture of a mixture (tannic acid)

CH CH 3 N 3

N

S

FENETHAZINE

Tannic acid: See section 15.

13. Substance Of Unknown Or Variable Stoichiometry

7 For an example of a salt with a well-defined counterion, see section on Salts Formed by the Reaction of Nitrogenous Organic Bases with Acids.

Version 5c 14 5/13/2007

For substances with undefined stoichiometry, enclose the undefined component in brackets. Attach the word “UNDEFINED” to each bracket as S-text.8

For substances with two or more components, the main component is not placed in brackets. Only the minor component, which typically is a solvate or counter ion, is bracketed.

i. Example: VIOMYCIN SULFATE

OH Chiral H N NH OH O O NH O H O N NH HSO OH O NH 2 O NH O UNDEFINED NH NH 2 O H NH2 O H N

OH N NH2 H

14. Partially Ionized Acids

If a molecule contains multiple ionizable groups and is only partially ionized, the group that is the strongest acid is ionized first. The order of ionization for acidic functional groups is listed in the following table in order decreasing acidity:

Functional Group Acidity Table -OSO3H ,–SO3H -OSO2H -SO2H -OPO3H2 -PO3H2 -CO2H Thiophenol - (-OPO3H) - (-PO3H) Phthalimide, CO3H (peracetyl), .alpha.carbon-hydrogen to nitro group -SO2NH2 (sulfonamide)

8 The use of "S-text", which is a text associated with the structure, is used in a limited number of situations. S-text is always attached to generic brackets.

Version 5c 15 5/13/2007 -OBO2H2 -BO2H2 phenol SH (aliphatic) - (-OBO2H) - (-BO2H) Cyclopentadiene -CONH2 (amide) imidazole -OH (aliphatic ) .alpha.carbon-hydrogen to keto group .alpha.carbon-hydrogen to acetyl ester group sp Carbon hydrogen .alpha.carbon-hydrogen to sulfone group alpha.carbon-hydrogen to sulfoxide group -NH2 (amine) benzyl hydrogen sp2-Carbon hydrogen sp3-Carbon hydrogen

i. Example: 5-SULFOSALICYLATE, MONOSODIUM SALT

OH O OH O

+ + OH Na O Na

OOS OOS O OH

Correct Incorrect

Salts of partially ionized acids are drawn with either all of the same functional groups completely ionized or completely unionized. The extent of ionization is sufficient to account for all the counterions (metal ions; non-hydrogen ions) present in a substance. Hydrogen ions from the extended periodic table are added to balance the charges in the substance. The only exceptions to these rules are the case where all the acidic functional groups are equivalent prior to ionization. For these substances, ionization only occurs to the extent necessary to account for all of the counterions. This exception only applies when all of the same functional groups are equivalent. (see ALENDRONATE SODIUM and MONOSODIUM CITRATE).

When a compound contains two or more of the same functional groups that are not equivalent, two or more structures are possible when it is partially ionized.

ii. Example: Citrate Ion with Single Negative

Citric acid contains three carboxylate groups. The two end groups are equivalent; the central one is not.

Version 5c 16 5/13/2007 OOHO OOHO

HOO HOO H

O OH O O

All of the carboxylate groups in citric acid would be ionized and two hydrogen ion superatoms, H+, from the MDL extended periodic table are added to charge balance the structure.

iii. Example: MONOSODIUM CITRATE

OOHO OOHO

+ + O O Na H+ OH OH Na 2 O O O O

Correct Incorrect

iv. Example: MONOSODIUM GLUTAMATE

O O Chiral O O Chiral

O O OH O

+ Na H+ + NH2 NH Na 2 Correct Incorrect

v. Example: 1H-IMIDAZOLE-4-METHANOL, DIHYDROGEN PHOSPHATE (ESTER), DISODIUM SALT

OO OO

H P P N + N + O O Na O O Na H+ 2 2

N N

Correct Incorrect

vi. Example: ALENDRONATE SODIUM

Version 5c 17 5/13/2007 When acidic functional groups are equivalent before ionization, ionization only precedes to the extent necessary to account for all of the counterions. No H+ superatoms are added to the structure.

+ + Na Na O O O O P P OH OH NH 2NH 2 OH O OH OH H+ P P OH OH O OH2 O OH2 3 3

Correct Incorrect

15. Complex Inorganic and Organic Compounds

Tags for Complex Substances

Unpurified or partially purified animal products, plant products that are not botanicals, substances that contain more than 10 active components or chemicals that cannot be represented by a two-dimensional structure are entered using the Identifying Description field. Complex substances can also be mixtures of several well defined active components in a complex matrix of inactive components. The following tags are used to describe complex substances.

Parent tags indicating that the substance is complex and does not have a structure. The type of material that the substance is derived from. Examples: PLANT, ANIMAL, MINERAL, FUNGUS, BACTERIA The genus of a given animal, fungi, or bacteria. The species of a given plant, animal, fungi, or bacteria. Subspecies or variant from which the species is prepared. The part of a plant or animal that is used to prepare the substance or contains the substance. The part is always singular. Examples: PANCREAS, URINE, THYROID The author of the classification of a particular species. Example: LINNAEUS. The genus, species, subspecies, and author are obtained from either the United States Pharmacopeia (USP) or the USDA's Integrated Taxonomic Information System (ITIS). The molecular formula of a substance. This is used for inorganic substances. Molecular formulas are listed using the Hill Order with an underscore “_” between elements. The Hill order is C followed by H, the rest of the elements are listed in alphabetical order. All letters are capitalized. A separate formula is written for each independent molecular fragment. Multiple fragments in an entity are enclosed by parenthesis followed by the number of

Version 5c 18 5/13/2007 multiples. X following a parenthesis represents unknown or variable. Example: Quindecamine acetate which is a dihydrate would have the molecular formula C30_H38_N4.(C2_H4_O2)2.(H2_O)2. Any information about process that resulted in the substance. Examples: EXTRACT, OIL, WATER EXTRACT, ETHANOL EXTRACT, TOLUENE EXTRACT, GLYCERIN EXTRACT, METHANOL EXTRACT Information about a substance used to distinguish it from a closely related substance. Example: Details on a purification process. a. Complex Inorganics

Many inorganic substances exist as complex crystalline or amorphous structures that cannot be easily represented using MDL software. The following is an example of how tags are used for complex inorganic substances.

i. Example: TALC

MINERAL MG_SI4_O10.(OH)2.(H2_O)X

b. Complex Organics

i. Example: THYROID

ANIMAL SUS SCOFA THYROID DRIED POWDER

16. Botanicals

Botanicals are represented in the SRS solely through the use of the Identifying Description Field which includes the Latin trinomial (genus, species, and variety), the author (e.g. Linnaeus, abbreviated L.), the plant part, and the process (e.g. 'extract'). If only the genus is known, the designation 'SPP' is inserted in the species tags to show general applicability to all species under that genus. If the variety, plant part, or process is unknown or not applicable, then they may be omitted. Latin binomials, trinomials, and authors are validated against accepted entries in the United States Pharmacopeia (USP) or the United States

Version 5c 19 5/13/2007 Department of Agriculture's (USDA) Integrated Taxonomic Information System (ITIS) available at http://www.itis.usda.gov/. All plant parts are validated against listed terms in the Plant Part monograph in the Center of Drug Evaluation and Research (CDER) Data Standards Manual available at http://www.fda.gov/cder/dsm/drg/drg00924.htm. Particular extract types (e.g., 'water', 'ethanol') are not permitted for botanicals that originate from within FDA, but shall be permitted for botanicals that originate from a non-FDA source (e.g., International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH), European Medicines Agency (EMEA). When the extract type must be represented, it is inserted prior to the word 'extract' (e.g., 'water extract'). Only the following extract solvents are allowed in SRS botanical nomenclature: ethanol, ether, glycerin, toluene, water, ammonia, methanol, and acetone. All plant-derived complex substances would be considered a botanical. Any characterized substances that are present, or are possibly present, in a botanical are purposely excluded from SRS botanical definition.

Parent tags indicate the substance is a botanical. The genus of a given plant. The species of a given plant. Subspecies or variant from which the species is prepared. The part of a plant that is used to prepare the substance or contains the substance. The part is always singular. All plant parts are validated against listed terms in the Plant Part monograph in the CDER Data Standards Manual available at http://www.fda.gov/cder/dsm/drg/drg00924.htm. The author of the classification of a particular species. LINNAEUS. The genus. species, subspecies and author are derived from either the United States Pharmacopeia (USP) or the USDA's Integrated Taxonomic Information System (ITIS). The full name of the author is spelled out if available due to inconsistencies across sources. Any information about process that resulted in the substance. EXTRACT, OIL, WATER EXTRACT, ETHANOL EXTRACT, TOLUENE EXTRACT, GLYCERIN EXTRACT, METHANOL EXTRACT Information about a substance used to distinguish it from a closely related substance.

i. Example: FENNEL OIL

FOENICULUM VULGARE MILLER. FRUIT OIL

17. Foods

Food indexing is based on the LanguaL international thesaurus for the description of foods. Information on LanguaL may be found at http://www.langual.org

Authoritative sources for information on foods are: Herbst, Sharon Tyler. FOOD LOVER'S COMPANION: COMPREHENSIVE DEFINITIONS OF OVER 6000 FOOD, DRINK, AND CULINARY TERMS .Third Edition. Hauppauge, NY: Barron's Educational Series, c.2001.

Version 5c 20 5/13/2007 US Department of Agriculture Food Safety Inspection Service (FSIS) FACT SHEETS http://www.fsis.usda.gov

Every food should be indexed with as much detail as is given in the food name. This might include extent of heat treatment, cooking method, or preservation method. Every food substance/ingredient must have a unique string of LanguaL codes in order to be assigned a UNII code.

The Substance Registration System indexes only food substances/ingredients, not food products (also known as "finished foods"). Examples of food substances/ingredients are "milk", "egg", "pineapple", or "cherry". Examples of food products are "eggnog" or "fruitcake". Food substances/ingredients are the building blocks for food products. Thus, "peanut", "salt", and "sugar" would be indexed in the SRS but "peanut butter" would not.

It is possible that a food may already be in the dictionary as a complex substance or botanical; an example would be some vegetable oils or some seeds or spices. For foods originating from plants it may also be worthwhile to check the description field for all substances of the same genus to determine if the food has already been entered as a complex substance. To obtain the genus of a plant, the United States Department of Agriculture's (USDA) Integrated Taxonomic Information System (ITIS), available at http://www.itis.usda.gov/, should be used. It can be searched by common name. Care should be taken to ensure that the complex substance or botanical refers to the same part of the plant that the food is derived. If the food is already entered as a complex substance or botanical the food tags should be added to the existing entry.

The following tags are used to define foods:

First tag in description field indicates that the description contains a food. Indicates a common name that Langual has assigned to a particular food. Indicates a Langual facet of the food name at the FPT, or product type, level. Example: SEAFOOD OR SEAFOOD PRODUCT Facet term used to describe a given food. Code associated with the given Langual facet term. Indicates a Langual facet of the food name at the FSO, or food source, level. Example: ABALONE Indicates a Langual facet of the food name at the FPP, or part of plant or animal, level. Example: MEAT PART Indicates a Langual facet of the food name at the FPN, or extent of heat treatment, level. Indicates a Langual facet of the food name at the FKM, or cooking method, level.

Indicates a Langual facet of the food name at the FTR, or treatment applied, level. Example: IRRADIATION.

Indicates a Langual facet of the food name at the FPV or preservation level. Example: DRIED.

Version 5c 21 5/13/2007

ABALONE SEAFOOD OR SEAFOOD PRODUCT A0267 ABALONE B1408 MEAT PART C0103 ii. Example BEEF BEEF MEAT OR MEAT PRODUCT (FROM MAMMAL) A0150 CATTLE B1161 SKELETAL MEAT PART C0175

Version 5c 22 5/13/2007

iii. Example SQUID, BOILED SQUID, BOILED SEAFOOD OR SEAFOOD PRODUCT A0267 SQUID B1205 MEAT PART C0103 FULLY HEAT-TREATED F0014 BOILED G0014

18. Coordination compounds or complexes (Draft Proposed)

Version 5c 23 5/13/2007 Coordination compounds are defined using the structure field and S-Text which is associated with the bracket enclosing the coordination compound. In cases where it is customary to use a coordination or dative bond, a single bond is drawn and the valence of the coordinating atom is adjusted as necessary.9 The coordination compound is placed in brackets and the brackets labeled using S-Text with a polyhedral . Polyhedral descriptors are described in the Chemical Abstracts Service (CAS) Index guide.

i. E xample: PLATINATE(2-), TETRACHLORO-, DIPOTASSIUM , (SP-4-1)-

SP-4-1 Cl

+ 2+ K Cl Pt Cl 2 Cl

The brackets and polyhedral descriptor are required to differentiate two isomers that differ only in their configuration around the central atom. The placement of ligands around the central atom is for aesthetic purposes only—MDL software does not provide a mechanism for indicating stereochemistry around the central atom.10 The polyhedral descriptor is added using S-Text and is placed next to the upper right hand corner of the brackets. The S-text field FDAREG_SGROUP is used to store the bracket information .

ii. Example: CISPLATIN and TRANSPLATIN

SP-4-2 (IV) SP-4-1 (IV) NH NH3 3 2+ 2+ (IV) Cl Pt Cl Cl Pt NH3 (IV) Cl NH3

Cis Trans

Without the brackets and coordination descriptors, the MDL software would treat these compounds as duplicates.

Wedge bonds are only used for tetrahedral stereochemistry. Tetrahedral stereochemistry on ligands is treated the same as in non-coordination compounds. The Chiral flag only applies to tetrahedral .

iii. Example: DEXORMAPLATIN

9 The MDL software does not include a coordination bond (dative bond). 10 MDL’s software only handles stereochemistry for the following central atoms: C, N, O, S (sulfate and sulfoxide), P, Si, and pentavalent N and P (3 or 4 attachments with one double bond). Only tetrahedral stereochemistry can be represented.

Version 5c 24 5/13/2007

Chiral OC-6-22 Cl H2 (IV) N Cl

4+ Pt

(IV) N Cl H2 Cl

When the arrangement of atoms and/or the chirality around the central atom is not specified in the source, no polyhedral descriptor is used and no brackets are drawn around the coordination compound.

iv. Example: GADOLINATE(2-), (N,N-BIS)2-(BIS(((CARBOXY-.KAPPA.O)METHYL)AMINO- .KAPPA.N)ETHYL)GLYCINATO(5-)-.KAPPA.N,.KAPPA.O)-, DISODIUM

O

(IV) O O O N

3+ + O O Gd O O Na 2 (IV) (IV) N O N

O

v. Example: IRON SUCROSE (Proposed Not Official)

Both the structure field and Complex Substance tags will be used to describe this colloidal substance.

OH OH OH O O + 3+ 2- OH O Na Fe O OH OH2 2 5 8 3 X OH OH OH OH Y

MINERAL< SPECIES>< FORMULA>(NA2_FE5_O8_(OH).(H2_O)3)X.(C12_H22_O11)YPOLYNUCLEAR IRON (III) HYDROXIDE/SUCROSE COMPLEX; MW: 34000 TO 60000

Version 5c 25 5/13/2007 19. Organometalic Compounds

Carbon to metal sigma bonds are drawn as covalent bonds.

i. Example: 2-.BETA.-CARBOMETHOXY-3-.BETA.-(4- TRIM ETHYLSTANNYLPHENYL)TROPANE

O CH3 Chiral O

CH3

N Sn CH3

CH3 CH 3 a. Nitr ogen, Oxygen, and Fluorine Bonded to Metallic Centers

Nitrogen-metal, oxygen-metal, and fluorine-metal bonds are always treated as ionic bonds.

i. Example: TRIPHENYLTIN HYDROXIDE

+ Sn

OH b. Othe r Nonmetals Bonded to Metallic Centers

The type of bond between metal and nonmetallic elements (excluding nitrogen, oxygen, and fluorine) is determined by the elements position in the periodic table. Transition metal-nonmetal (except mercury) bonds and aluminum-nonmetal bonds are drawn as ionic bonds. The following metals are drawn with covalent bonds to nonmetals: mercury, gallium, germanium, indium, tin, antimony, thallium, lead, bismuth, and polonium.

i. Example: THIMEROSAL

O + O Na + O Na O Hg CH S 3 CH3 S 2+ 2CH Hg Correct Incorrect

Version 5c 26 5/13/2007

20. Tautomerization and Resonance Form Presentation

In most instances, the SRS does not allow the registration of different tautomeric or resonance forms of the same compound. To maintain a more consistent set of structures, use the following conventions.11 a. Proto n-Shift Tautomerization

The following categories include tautomers that interconvert by proton transfer. b. Ke tones

The keto-form is drawn when there is potential keto-enol tautomerism.

i. Example: ACETOPHENONE

O OH

CH3 CH2

Correct Incorrect

c. Carbonyl Compounds

In general, carbonyl compounds that tautomerize are drawn in the form containing the carbonyl group, not th e form containing the hydroxyl group.

i. Example: PHENOBARBITAL

CH CH O 3 O 3 CH O 3 NH N N O N O OH N O H H OH N OH

Correct Incorrect Incorrect

d. Ph enols

Phenols are drawn in the hydroxyl form, not the keto form.

11 When the structure of a compound prevents the preferred tautomer from being formed, the compound is drawn in the alternate form.

Version 5c 27 5/13/2007 i. Example: PHENOL

OH O O

Correct Incorrect Incorrect e. Oximes

Oximes are drawn in the hydroxyl form, not the nitroso form.

i. Example: BENZALDOXIME

OH O N N

Correct Incorrect

f. Guan idines

In acyclic guanidines, the double bond is placed between the central carbon and the least substituted nitrogen.

i. Example: 1-PHENYL-3-(4-PHENYL-2-THIAZOLYL) GUANIDINE

H H N N N

NH S

If all nitrogens have substituents, draw the double bond to the nitrogen with the lowest priority as determined using Cahn-Ingold-Prelog sequence rules.12

12 Cahn-Ingold-Prelog sequence rules are outlined in the following papers: R.S. Cahn, C.K. Ingold and V. Prelog, Angew. Chem. 78, 413–447 (1966) , Angew. Chem. Internat. Ed. Eng. 5, 385–415, 511 (1966); and V. Prelog and G. Helmchen , Angew. Chem. 94, 614–631 (1982) , Angew. Chem. Internat. Ed. Eng. 21, 567–583 (1982). 1996 , 68, 2203

Version 5c 28 5/13/2007 ii. Example: 1-PHENYL-3-(4-PHENYL-2-THIAZOLYL)GUANIDINE

CH3 N

CH3 3CH N N H H

If a guanidine is incorporated into a ring, the double bond is placed inside the ring.

iii. Example: TRAMAZOLINE

N H N

N H

g. Tetrazoles

In tetrazoles, the hydrogen is placed at the one position on the tetrazole ring.

i. Example: 5-(3-PYRIDYL)TETRAZOLE

N N N N N N N N NH H N

Correct Incorrect

h. Pyrithione

Pyrithione and pyrithione salts are drawn in the thiol-nitroxide form, not the thione-hydroxylamine tautomer.

O OH + N SH N S

Correct Incorrect

i.Thiosulfate and Thiosulfite

Thiosulfate and thiosulfite are represented with a sulfur-sulfur single bond and a hydrogen (or negative charge) on the outer sulfur.

Version 5c 29 5/13/2007 i. Example: SODIUM THIOSULFATE

O O + + Na O S S Na O S S 2 2 O O Correct Incorrect j. Purines and Pyrimidines

All purines and pyrimidines are drawn as Watson-Crick like forms with the keto (thione)-tautomer form assumed and a hydrogen atom at the one position in pyrimidines and at the nine position in purines.

i. Example: 6-THIOPURINE

S SH S H N N N NH N NH

N N N N H N N H

Correct Incorrect Incorrect

ii. Example: 2-THIOURACIL

O O OH OH

NH NH N N

N SH N S N SH N S H H

Correct Incorrect Incorrect Incorrect

k. Tetracyclines

Tetracyclines can tautomerize into numerous isomers. In the SRS, tetracylcines are drawn from the following sub-structure:

Version 5c 30 5/13/2007 OH O OH O O OH

NH2

OH HH N CH CH 3 3

i. Example: TETRACYCLINE

OH O OH O O Chiral OH O O O O Chiral OH OH

NH2 NH2

OH OH H H H H OH CH3 N OH CH3 N CH CH CH CH 3 3 3 3

Correct Incorrect

l. Anhydrotetracyclines

5a-6-Anhydrotetracyclines are drawn from the following sub-structure:

OH OH O O

i. Example: CHELOCARDIN HYDROCHLORIDE

OHOH O OO Chiral OHOH O O OH Chiral OH OH 3CH 3CH CH3 ClH CH3 ClH

OH O H H CH NH CH NH 3 2 3 2

Correct Incorrect

m. Rubicin Antibiotics

Version 5c 31 5/13/2007

Antibiotics in the rubicin family are drawn from the following sub-structure:

O

OOH

i. Example: ANNAMYCIN

O OH O OH Chiral OH O O OH Chiral

OH OH

H H O OH O I OH O O I

O OH O OH

CH OH CH OH 3 3

Correct Incorrect

n. Resonance-Stabilized Ions

Tautomers are distinct isomers that interconvert, typically by the migration of a hydrogen atom. The same ion is produced regardless of which tautomer losses a proton. The ion produced has a resonance form that corresponds to each tautomer. Draw the resonance form that corresponds to tautomer that is drawn using the conventions listed above.

i. Example. PEMIROLAST POTASSIUM

Version 5c 32 5/13/2007

When PEMIROLAST POTASSIUM loses a proton to form an anion, the anion has the following two resonance forms:

O O N N NN + K + N N N N K N N

N N

CH CH 3 . 3 Correct Incorrect

The structure on the left is the correct anion to enter as it corresponds to the following PEMIROLAST tautomer:

O H N N

N N N

N

CH 3

The negative charge in the anion and the hydrogen in the free base are both placed at the one position on the tetrazole ring.

When chirality is created in a symmetrical molecule by the loss of a proton, the stereochemistry is not described in either the structure or description field.

ii. Example: PHENOBARBITAL SODIUM

CH O 3 CH O 3

+ Na N + Na NH

O N O H O N O

Correct Incorrect o. Other Types of Tautomerization

Some types of tautomerization involve more than just a proton shift. The SRS software does not recognize these tautomers as the same substance.13 p. Phenolphthalein and Similar Compounds

The cyclic ester is drawn for the neutral compounds. The carboxylic acid salt is drawn for the anion.

i. Example: PHENOLPHTHALEIN

13 The SRS registrars are responsible for ensuring that only one tautomer is entered into the system.

Version 5c 33 5/13/2007 O O OH O OH OH

OH O

Correct Incorrect

ii. Example: PHENOLPHTHALEIN, SODIUM

O + Na O + O Na O OH O

O OH

Correct Incorrect

21. Stereochemistry

MDL software has the capability to represent stereochemistry around a double bond and around certain tetrahedral stereocenters. a. E and Z Isomerism (Proposed Not Official)

Molecules that have a double bond are drawn with the substituents arranged according to the double bond’s stereochemistry. Substances that are known to contain both E and Z isomers are handled as “ALL OF” mixtures.

CH CH 3 3 CH 3CH 3

3CH 3CH

Cis (Z) Trans (E)

Version 5c 34 5/13/2007 If the configuration of the double bond is unspecified or unknown in the source document, it is represented by the MDL double either bond..

i. Example: NITROFURAZONE

O

O N NH2 + N N O H O

Identifying Description

ii. Example: RIFAPENTINE

CH CH Chiral 3CH CH3 3 3 O O OH OH O 3CH OH CH OH 3 3CH NH

H3C O O N N O OH O N CH3

Identifying Description

If a substance is known to be a mixture of Z and E isomers both isomers will be drawn as individual substances and an “ALL OF” mixture will be created to represent the mixture.

iii. Example CROTAMITON

Crotamiton is an “ALL OF” mixture of CIS and TRANS isomers. Both isomers are drawn as separate substances.

Version 5c 35 5/13/2007 Mixture type: “All Of”

CH3 O

N

CH3 CH 3

O

3CH N

CH3 CH 3

b. General Rules for Tetrahedral Stereochemistry

Chiral atoms are indicated by the presence of a solid or dashed wedge bond.14 The wider end of the bond is always drawn away from the chiral center. A solid wedge implies that the group attached to the chiral center is projected towards the viewer coming out of the plane formed by the chiral center. A dashed wedge implies the group is projected behind the plane, away from the viewer.

A hydrogen atom is usually not displayed on a chiral center.15 Their presence is inferred. The MDL software locates this "inferred" hydrogen in the larger angle formed by the central atom and the two substituents that are in the plane of the paper.16

Larger Angle H B A C B A C Smaller Angle

If the substituent with the wedge bond is also in the larger angle, the hydrogen is given the opposite orientation. D D H

B B A C A C

14 Racemates with only one chiral center do not contain wedge bonds at the chiral center. 15 In the examples below, hydrogen atoms are included for clarity. 16 See Appendix A, Chemical Representation in the MDL ISIS™/Base Database Maintenance Manual.

Version 5c 36 5/13/2007

If the substituent with the wedge bond is in the smaller angle, the hydrogen is given the same orientation.

H H

B B = B A C A C A C D D D

The wedge bond is placed in the larger angle whenever possible.

i. Example: 4-(R-2-(ETHYLAMINO)-1-HYDROXYETHYL)BENZENE-1,2-DIOL

OH Chiral H H Chiral OH N OH N CH3 CH3 OH OH OH

Correct Incorrect

If a chiral center has nonhydrogen atoms attached, a solid wedge is drawn to one atom and dashed wedge bond to another atom; the two other atoms are attached to the center using a normal bond

ii. Example: PROPOXPHENE

Chiral

O 3CH O

3CH N CH CH 3 3

When a chiral center is part of a ring, draw the wedge bond(s) to the group(s) outside the ring.

iii. Example: MEPIVACAINE

Version 5c 37 5/13/2007

Chiral Chiral

CH3 CH3 O CH3 O CH3

N N N N H H

CH3 CH3

Correct Incorrect

When a chiral center is located at the junction of two rings, an explicit hydrogen is added to indicate the configuration.

iv. Example: SANTONIN

Chiral Chiral

CH CH H 3 3

CH O 3 3 CH O O O H

CH3 CH3

O O

Correct Incorrect

A wedge bond should not be drawn to an adjacent chiral center. To prevent this from occurring, add an explicit hydrogen or reposition the wedge on a different bond.

v. Example: METHYLPHENIDATE

OO Chiral OO Chiral CH CH H 3 H 3 N N

H

Correct Incorrect

22. Representation of Stereochemical Classes

Substances are classified as follows: (1) , (2) racemates, (3) substances with no or partial information about stereochemical orientation and (4) substances which are chiral but the absolute stereochemistry is not known. Substances can have single or multiple stereogenic centers. The following substances are drawn as single substances: (1) racemates, (2) substances which contain no or partial

Version 5c 38 5/13/2007 information on stereochemical orientation and (3) chiral substances where the absolute stereochemistry is not known. Substances that are known to contain several diastereoisomers are drawn as mixtures.

The following tags are used in the Description Field to describe racemic substances and those substances that have unknown or incompletely described stereochemistry. The Substance Registration System (SRS) defines a racemic substance as any substance that contains significant amounts of both enantiomers. A significant amount is 5% of one enantiomer. Substances that contain greater than 95% of a single enantiomer are considered to be a single chiral substance. The Description Field and tags refer to all of the undefined stereochemistry present in a structure or molecular fragment, not to the individual chiral atoms or centers within the structure.

Tags that surround information about stereochemistry. : Indicates information about the optical activity in the substance. An empty field indicates that the optical activity is unknown; (+) indicates that the substance is dextrorotary; (-) indicates that the substance is levorotary; and (+/-) indicates that the substance does not rotate polarized light (used for racemates and meso compounds). Indicates the type of stereochemistry that is not completely defined by the structure. Values include UNKNOWN; RACEMIC; RELATIVE; MESO; AXIAL, UNKNOWN; AXIAL, RACEMIC; AXIAL, R; AXIAL, S. Examples below illustrate the use of each of these terms. Any comments that could help distinguish the isomers. a. Completely Defined Enantiomer

For a chiral molecule in which the configuration of all stereocenters is known, the chiral flag is attached to the structure. The Identifying Description Field does not contain any stereochemistry tags.

i. Example: PREDNISOLONE

OHChiral O

CH3 OH OH

CH3 H

H H

O

Identifying Description is left blank.

b. Racemic Substances (Optically Inactive) or Substances with Undefined or Unknown Stereochemistry

1. A single chiral center

Substances that contain a single chiral atom that is unspecified or known to be racemic are drawn using a regular (non-wedge) single bond . The Identifying Description Field distinguishes the two cases.

Version 5c 39 5/13/2007 A single structure is drawn for a racemate with one chiral center. Wedge bonds are not used. The chiral center is linked to its neighbors by single bonds.17 The tags contain the value “(+/-)” and the tags contain the value RACEMIC.

i. Example: ISOPROTERENOL HYDROCHLORIDE

OH

CH3

OH N CH H 3 OH ClH

Identifying Description (+/- )RACEMIC

A single structure is drawn for substances that contain a single chiral center where the stereochemistry is unknown. The tags are left blank. The tags contain the value “UNKNOWN”. Normal single bonds are drawn at the stereogenic atom.

ii. Example: DROFENINE

O N CH3 O CH 3

Identifying Description UNKNOWN

A single structure is drawn containing normal single bonds for substances that are known to be chiral but where the absolute stereochemistry is not defined. The tag contains the direction in which plane polarized light is rotated, i.e., either “(+)” or “(-)”. The tag contains the value “UNKNOWN” .

iii. Example DOXPICOMINE (-)-isomer absolute configuration undefined.

17 In cases where phosphorus is the chiral atom, a double bond may also be present.

Version 5c 40 5/13/2007 N O

O

N CH CH 3 3

Identifying Description OPT_ACT>(- )UNKNOWN

2. Multiple chiral centers

The same tags used to describe substances with a single chiral center are used to describe substances with multiple chiral centers. The tags refer to the entire structure not individual chiral centers. A single structure is drawn for a racemate that contains more than one chiral center and in which the relative configuration of all the centers is known. Wedge bonds are drawn at each . A chiral flag is not used in the structure diagram. The tags contain the value “(+/-)” and the tags contain the value “RACEMIC”.18

i. Example: PENTAZOCINE HYDROCHLORIDE

CH3

N CH3

CH3 ClH

CH3 OH

Identifying Description (+/- )RACEMIC

Where there is no information on relative stereochemistry (the synthesis or purification strategy used to prepare the substance is not known) a single structure is drawn that uses normal single bonds at each chiral center and tags in the Identifying Description Field indicate the lack of knowledge. The tags do not have a value and the tags contain the value “UNKNOWN”

ii. Example: BENZQUINAMIDE

2-(acetyloxy)-N,N-diethyl-1,3,4,6,7,11b-hexahydro-9,10-dimethoxy-2H-benzo[a]quinolizine-3- carboxamide

18 In the literature, a chemical structure drawn in this format may represent a single enantiomer. In the SRS, this structure represents a racemate or a chiral compound in which the absolute configuration has not been determined.

Version 5c 41 5/13/2007 USAN does not indicate the configuration at any of the chiral centers.

O

3CH O O

N CH3 O 3CH N CH3 CH 3 O

Identifying Description UNKNOWN

Substances in which the stereochemistry is partially defined consists of a single record in which all known stereochemistry is drawn and normal bonds are used at chiral centers in which the absolute configuration is unknown. The chiral flag is used to indicate that the stereochemistry defined in the drawing is absolute. tags do not have a value and the tags contain the value “UNKNOWN”.

iii. Example: SARPICILLIN HYDROCHLORIDE

Methoxymethyl (2S,5R,6R)-6-(2,2-dimethyl-5-oxo-4-phenyl-1-imidazolidinyl)-3,3-dimethyl-7- oxo-4-thia-1-azabicyclo(3.2.0)heptane-2-carboxylate hydrochloride

The Chemical Abstract name provides absolute stereochemistry for all the chiral centers except the 4 position on the imidazole ring. The name does not indicate if this is a single enantiomer or a mixture of . All the known chirality is drawn and the chiral flag is used. The tags refer to the chiral atoms in the substance in which the stereochemistry is not defined.

Chiral

O H NH S N CH3 ClH CH N 3 CH CH 3 O 3 CH3 O O O

Identifying Description UNKNOWN

Both stereoisomers are drawn as an “ALL OF” mixture for substances known to contain two different diastereoisomers or contain racemic mixtures of two diastereoisomers.

iv. Example: ENISOPROST

Version 5c 42 5/13/2007

11,16-dihydroxy-16-methyl-9-oxo-, Prosta-4,13-dien-1-oic acid methyl ester, (4Z,11.alpha.,13E)- (+/-)

USAN indicates the relative stereochemistry between all the chiral centers on the cyclopentane ring. The substance contains two diastereomeric racemates and is represented by an “ALL OF” mixture.

O CH O 3

O

3CH OH

CH3 HO

Identifying Description (+/- )RACEMIC

O CH O 3

O

3CH OH

CH3 OH

Identifying Description (+/- )RACEMIC

v. Example: HEXESTROL (Meso compound)

MESO-3,4-BIS(P-HYDROXYPHENYL)-N-HEXANE

3CH OH OH CH 3

Identifying Description (+/- )MESO

Version 5c 43 5/13/2007

c. Chiral Substances (Optically Active) with Undetermined Configuration

This section covers the following: (1) substances that are optically active and that have not been completely characterized and (2) optically active mixtures of diastereomers. These substances with undetermined composition are divided into two groups: (1)substances with more than one chiral center in which the relative configuration of all the centers is known but the absolute configuration is not and (2) substances that contain more than one chiral center in which the configuration of one or more chiral centers is unknown.

i. Example: LEVOPHACETOPERANE HYDROCHLORIDE

The Chemical Abstract name indicates that this compound is chiral and that the absolute stereochemistry is either R,R or S,S. The substance is drawn as an R,R isomer. The tags indicate RELATIVE

O ClH

3 CH O H H N

Identifying Description (- )RELATIVE

d. Spiro Compounds

Molecules containing two rings joined at a single carbon are called spiro compounds. The atom at the ring junction is connected to four other atoms and may be chiral. When it is chiral, the wedge bonds are placed in a ring. The solid and dashed wedge bonds are placed in the smaller of the two rings whenever possible.

i. Example: HEPTELIDIC ACID

Version 5c 44 5/13/2007

O Chiral O H O

O H

CH CH OH 3 3

e. Special Stereochemical Cases

Due to the algorithm that MDL uses to determine the absolute configuration of a chiral center, caution must be taken when adding wedge bonds19. Two situations in which error can arise are 3-membered rings and bridgehead chiral centers.

The MDL software interprets the following epoxides as enantiomers.

O O 3CH

CH 3

To avoid this problem, the wedge bond is drawn to a group outside the 3-membered ring.

CH3 CH3

O O

3CH 3CH

O O CH3 CH3 CH 3CH 3 CH3 CH3 O O O O CH 3 CH3 O O Correct Incorrect

Molecules with ring systems that contain a bridge are drawn "flattened," not in a 3D perspective. (See Section 21b)

19 See section 21b on the General Rules for Tetrahedral Stereochemistry.

Version 5c 45 5/13/2007

i. Example: (+)-CAMPHENE

CH3 Chiral CH3

CH3 CH3

CH CH 2 2

Correct Incorrect

When there is stereochemistry at the bridgehead, the wedge bond is drawn within a ring, not to an exocyclic atom or group.

ii. Example: (+)-CAMPHOR

CH3 Chiral CH3 Chiral

O O CH3 Chiral CH3 CH3 O CH3 CH3 CH3

CH3 H H

Correct Incorrect Incorrect: Opposite Enantiomer

If two molecular entities in a single structure field contain chiral centers that are not completely defined by the structure field two sets of stereochemistry tags are necessary. The first set of stereochemistry tags refer to the higher molecular weight species. The comments contain the name of the fragment.

iii. Example: PENTAZOCINE LACTATE

Version 5c 46 5/13/2007

The structure field contains the following chemical drawing.

OH OH H3C CH3 O

N CH3

CH3

CH3 HO

The identifying description tags contain the following tags.

(+/- )RACEMICPENTAZOCINE< /STEREOCHEMISTRY>UNKN OWN LACTATE

f.

MDL database software is unable to recognize chirality resulting from restricted rotation about sp2 hybiridized carbons. As a result, compounds like (S)-1,3-Dimethylallene are not recognized as chiral.20

H

H

Description tags are used to distinguish the stereoisomers. Consult the CAS Index guide for the methodology used to determine R or S chirality.

i. Example (S)-1,3-DIMETHYLALLENE

20 If presented in a journal, this is a valid structure. However, the SRS software is not able to interpret the chirality.

Version 5c 47 5/13/2007

(-)AXIAL, S

ii. Example ADENALLENE (+/-)

2NH N N N OH N

(+/-)AXIAL, RACEMIC

23. Isotopically Labeled Compounds a. Position of Substitution Known

When the position of isotopic labeling is known, the compound is drawn with the appropriate isotope at that position.

i. Example: IODOANTIPYRINE I

CH3 N 3CH N

131 O I

b. Labeled compound with the position of isotopic atom is unknown

Version 5c 48 5/13/2007 When a compound is labeled, but the position of the labeled atom is unknown, the label is drawn unattached to the molecule, but linked to an asterisk(s). The substance could be a pure compound or a mixture. It might have more than one label attached.

i. Example: ROSE BENGAL SODIUM, I-131, Position of Substitution Not Specified

I I

O O O + Na 2 I I O

Cl 131 * I O

Cl Cl

Cl

ii. Example: C-14 CHOLESTEROL; Position of Substitution Not Specified

* 14 CH Chiral C 3 CH3 * * CH * 3 H CH3

CH3 CH3

H H

OH

iii. Example: DEUTERIUM OXIDE

Deuterium is represented by the superatom D, an isotopically labeled hydrogen is not searchable in the database.

O D D

24. Substituted Glycerine Compounds (Guidance needs to be developed)

There are a number of substances that contain an ester or ether of glycerine. When no information is available on which hydroxyl group undergoes substitution, it is drawn as an “ANY OF” mixture.

i. Example: CALCIUM GLYCEROPHOSPHATE

Version 5c 49 5/13/2007 The commercial product is a mixture of calcium β- and (+/-)-α-glycerophosphates.

Mixture Type: “ANY OF” Comment [s1]: This is not the best example as this should be an “ALL OF” mixture. OH OH O O 2+ 2+ OH O P O Ca O P O Ca OH O O

25. Carbohydrates and Other Saccharides

Monosaccharide and the terminal sugar in many polysaccharides are in equilibrium between a chain form and one or more cyclic isomers. These saccharides are represented as a single structure. When multiple cyclic forms are possible, the pyranose isomer is depicted. The anomeric carbon on the terminal end of a monosaccharide or polysaccharide is drawn without any indication of chirality unless the anomeric configuration is explicitly specified.

i. Example: DEXTROSE

Dextrose as defined in USP can be either glucose monohydrate or glucose anhydrous, and is drawn as an “ANY OF” mixture.

Mixture Type: “ANY OF” Chiral

O OH OH

OH OH

OH Chiral

O OH OH OH 2

OH OH

OH

ii. Example: D-FRUCTOSE

Version 5c 50 5/13/2007

Although fructose is a mixture of furanose and pyranose isomers, it is drawn as a single pyranose sugar.

O OH Chiral OH

OH OH

OH

iii. Example LACTOSE, ANHYDROUS

OH O OHChiral

OH OO OH H OH HO OH

OH 26. Peptides/Proteins.

Since most proteins and peptides are inherently heterogeneous, the SRS system will not attempt to capture all aspects of heterogeneity. Heterogeneity can arise from a wide variety of post-translational modifications. The SRS system will only attempt to capture end-group modification (e.g., amidation, acetylation), glycosylation in a general sense, and other modifications that are essential for activity. For example the SRS will capture γ−carboxylation that is essential for enzymatic activity in a variety of clotting factors, but it will not attempt to capture the extent of C-terminal lysine heterogeneity in antibodies. The assignment of a UNII to protein substances does not imply that two substances with the same UNII are either biosimilar or therapeutically equivalent.

Peptides containing three or less amino acid residues are drawn showing all atoms and will not require an entry into the description field. Peptides/proteins containing up to 100 residues will have an entry in the structure field and will be drawn using the standard single letter amino acid abbreviations for standard amino acids.21 All peptides/proteins with well defined sequences that are greater than three amino acids will be described using the description field tags. The SRS description field does not indicate if a peptide/protein was chemically synthesized, naturally derived, or produced using a recombinant expression system. For glycoproteins, the SRS does not indicate the extent of glycosylation - only that a protein is glycosylated and the type of glycosylation in a general sense. The SRS does not indicate the state of aggregation or self-aggregation due to noncovalent interactions. For peptides up to 100 amino acids in length, the structure field indicates intra- or intermolecular disulfide linkages, whereas the description field does not. For protein conjugates, the conjugate is drawn in the structure field (e.g., PEG, toxin) and

21 Peptides and oligonucleotides are an exception to the rule that abbreviations are not used for chemical groups.

Version 5c 51 5/13/2007 includes the first atom of the protein to which it is linked but the protein is described in the description field.

The sequence of a protein or peptide is the most important feature in distinguishing one from another. Prior to entry of a new substance, search the description field using the protein/peptide’s sequence. For monoclonal antibodies sequences outside of the complementarity determining regions (CDR) of the antibody may be common so the entire sequence should be included in the search. An effort should be made to determine information for all of the following fields, particularly when a partial sequence or no sequence is available: , , , , . a. Description Field. Defined tags for peptides/proteins:

Parent tags that indicate the substance is a protein or peptide. Indicates whether the sequence is “COMPLETE”, “PARTIAL”, OR “UNKNOWN.” Indicates the number of subunits present in a protein. A subunit is defined as a complete linear sequence of amino acids linked through normal peptide bonds. Peptides or proteins linked through disulfide or other covalent bonds are considered separate subunits. Examples: Somatropin contains one subunit, insulin contains two subunits, and most monoclonal antibodies contain four subunits. Indicates the beginning of the description for a given subunit. Subunits are listed in order of decreasing length of their amino-acid sequence. If a protein contains two identical subunits, the information is added twice to the description field. Numeric tag identifying the subunit, starting with 1. Second tag between tags indicates the length of the subunit in number of amino acid residues. This tag is used for a quality control check. When the sequence is either unknown or partial, the length tag is left blank. Third tag between tags containing the sequence using the standard twenty single-letter amino acid codes. Sequences are listed from the N- terminal residue to the C-terminal residue. X is used to indicate the presence of a non-standard amino- acid or an unknown amino acid. X is identified in the tags. Fourth tag between tags indicating the presence of an N-terminal modification. Use the following entries: PYROGLU, ACETYL, FORMYL, MYRISTOYL. A blank entry indicates either the absence of an N-terminal modification or that no information is available. Last tag between tags indicating the presence of a C-terminal modification. Use the following entries: AMIDE, ETHYL ESTER, HYDRAZIDE. A blank entry indicates either the absence of a C-terminal modification or that no information is available. Tags used to describe any modified amino-acids present in the sequence or any modifications of a protein that are essential for activity. Entry of modification begins with the first subunit and first residue. If the position of a modification is not specified, the modification is listed following specified modifications. Modifications are delimited by semi-colons, data elements within a modification are delimited by a comma. The extent of conjugation should be listed if known. A blank field indicates there are no modifications that are essential for activity present in a protein, or that modifications are not known to be present in a protein. Examples: 1_14=D-PHE, indicates that the fourteenth residue on the first subunit is a D- residue; CONJUGATE=OZOGAMICIN, 4 CONJUGATES PER MOLECULE indicates that four ozogamicin molecules are conjugated to each protein; the location of the conjugation is not known or can vary. Indicates whether a protein is glycosylated. The field will contain the type of glycosylation. The accepted values are YEAST, YEAST (HUMANIZED), INSECT, PLANT, MAMMALIAN, MAMMALIAN (-FUCOSE) or HUMAN, when a protein is glycosylated. The type of glycosylation can be determined from the cells that the protein is produced in. Recombinant proteins produced in CHO, VERO, or MOUSE cell lines or derived from mammalian tissues will use the term MAMMALIAN. Proteins produced in human cell lines or derived from human tissues will use the

Version 5c 52 5/13/2007 term HUMAN. If the field is blank, the protein is either not glycosylated or if it is not known whether the protein is glycosylated. The extent and sites of glycosylation will not be indicated. The SRS will also not indicate whether glycosylation is biantennary, triantennary etc., or the extent of sialylation. Proteins that contain a different sites or extent of glycosylation are not distinguished from one another and are given the same Unique Identifier (UNII). Indicates whether a protein is pegylated. The field contains the value “YES” if a protein is known to be pegylated. A representation of the polyethylene glycol (PEG) chain and the linking group are drawn in the structure field. A blank field indicates that the protein is not pegylated or no information is available on pegylation.. Indicates the type of protein. Examples: MONOCLONAL ANTIBODY; ENZYME; FC FUSION PROTEIN; RECEPTOR, SOLUBLE; HORMONE. Indicates the subtype of the protein, i.e. the subtype of antibody, the name of an enzyme or ligand, or the description of heavy and light chains. Example: IGG1, KAPPA, IGG4, LAMDA, .ALPHA.-GLUCOSIDASE, TNF-RECEPTOR-FC FUSION, GROWTH FACTOR Indicates the species of origin of the protein. This is not the species of the cell-line from which it is produced. Examples: HUMAN; MURINE; CHIMERIC, HUMAN/MURINE; HUMANIZED The ligand or natural substrate to which the antibody or enzyme binds. For human protein targets the NCBI GENEBANK id number is used followed by the common name or title for the receptor or ligand from GENEBANK. This can be found by searching the GENE database on Entrez: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=full_report&list_uids=3643 . For non-human targets a common name of the ligand can be used. Refers to the human gene that the peptide or protein was derived from. This field can be filled in for even truncated peptides or proteins. The gene ID will be obtained from NCBI genebank. The entry should consist of the NCBI genebank id number followed by the common name also from genebank Comments used to distinguish one protein from another when the tags above are not sufficient.

b. Structure Field and Complete Polypeptide/Protein Substance Definitions

In the structure field, peptides are displayed with 15 amino acids per line and are divided into groups of five.

i. Example: COSYNTROPIN

Structure Field

NH2 SWYMS E H F R G K VP G KChiral

COOH K R VPR KYV P

Description Field COMPLETE 1 1 24 SYSMEHFRWGKPVGKKRRPVKVYP < N_TERMINAL_MOD>

Version 5c 53 5/13/2007 HORMONE CORTICOTROPIN HUMAN 4158, MC2R MELANOCORTIN 2 RECEPTOR (ADRENOCORTICOTROPIC HORMONE) 5443, POMC (ADRENOCORTICOTROPIN/ BETA-/ ALPHA-MELANOCYTE STIMULATING HORMONE/ BETA- MELANOCYTE STIMULATING HORMONE/ BETA-ENDORPHIN)

In the structure field, amino acids are assumed to be L unless otherwise specified. D-Amino acids are abbreviated using the three letter code preceded by D- (for instance, D-alanine is abbreviated D-Ala).

ii. Example: BIVALIRUDIN

Structure Field

D-Phe PFRPG G GDG NEG E IChiral

COOH P E E Y L

Description Field COMPLETE 1 1 20 XPRPGGGGNGDFEEIPEEYL 1_1=DPHE 2147, F2 COAGULATION FACTOR II (THROMBIN)

C-Terminal amides are abbreviated @-NH2 where @ represents a one letter amino acid code. They are created using the ISIS Draw residue tool. The amino acid is first expanded and the residue portion eliminated. An NH2 is then added. A new residue is then created given the abbreviation @-NH2. N- Terminal acetyl group is handled in a like manner. The new residue is given the abbreviation Ac-@.

Version 5c 54 5/13/2007

iii. Example: ENFUVIRTIDE

Structure Field

Ac-Y T SQL IEH SL I E S N QChiral

Q ELK NE Q E L E L DKAW

S LWN W F-NH 2

Description field COMPLETE 1 1 36 YTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWF ACETYL AMIDE HIV-1 GP41

Other nonstandard amino acids are drawn expanded, showing all atoms, within the peptide chain.

iv. Example: LUTEINIZING HORMONE

5- Pyroglutamate does not have a standard single letter abbreviation and is drawn expanded. The letter X is used in the tags to indicate that pyroglutamate (PYROGLU) is not a standard amino acid.

Structure Field

Chiral H O N HYRW S GGL P -NH H 2 O

Version 5c 55 5/13/2007 Description field COMPLETE 1 1 10 XHWSYGLRPG PYROGLU AMIDE 1_1=PYROGLU HORMONE HUMAN 3973, LHCGR LUTEINIZING HORMONE/CHORIOGONADOTROPIN RECEPTOR

The SRS software indicates disulfide bonds by placing x-number (e.g. x2) above the cysteine residues.

v. Example: INSULIN HUMAN

Structure Field

x2 x3 x2 NH 2 G I V E Q C C TLS I C S Y Q Chiral

x1 L E N Y C N COOH

x3 NH 2 F V N Q H LSC GVHAEL L

x1 COOH Y L V C G E RTG F F Y P KT

Description field COMPLETE 2 1 30 FVNQHLCGSHLVEALYLVCGERGFFYTPKT 2 21 GIVEQCCTSICSLYQLENYCN

Version 5c 56 5/13/2007 HORMONE INSULIN HUMAN 3643, INSULIN RECEPTOR 3630, INSULIN

Peptides with more than 100 residues are defined using only the description field. There is no entry in the structure field.

vi. Example: SOMATROPIN

Structure Field: Blank

Description field COMPLETE 1 1 191 FPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEEAYIPKEQKYSFLQNPQ TSLCFSESIPTPSNREETQQKSNLELLRISLLLIQSWLEPVQFLRSVFANSLVYGASDSN VYDLLKDLEEGIQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCF RKDMDKVETFLRIVQCRSVEGSCGF 2690, GROWTH HORMONE RECEPTOR 2688, GROWTH HORMONE 1

vii. Example: EFALIZUMAB

Structure Field: Blank

Description field COMPLETE

Version 5c 57 5/13/2007 4 1 451 EVQLVESGGGLVQPGGSLRLSCAASGYSFTGHWMNWVRQAPGKGLE WVGMIHPSDSETRYNQKFKDRFTISVDKSKNTLYLQMNSLRAEDTAVYYCARGIYFY GTTYFDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVS WNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKK VEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEV KFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKA LPAPIEKTISKAKGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNG QPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSL SLSPGK 2 451 EVQLVESGGGLVQPGGSLRLSCAASGYSFTGHWMNWVRQAPGKGLE WVGMIHPSDSETRYNQKFKDRFTISVDKSKNTLYLQMNSLRAEDTAVYYCARGIYFY GTTYFDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVS WNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKK VEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEV KFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKA LPAPIEKTISKAKGQPREPQVYTLPPSREEMTKNQVSLTCLVKGFYPSDIAVEWESNG QPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSL SLSPGK 3 214 DIQMTQSPSSLSASVGDRVTITCRASKTISKYLAWYQQKPGKAPKLLI YSGSTLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQHNEYPLTFGQGTKVEIK RTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVT EQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC 4 214 DIQMTQSPSSLSASVGDRVTITCRASKTISKYLAWYQQKPGKAPKLLI YSGSTLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQHNEYPLTFGQGTKVEIK RTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVT EQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC MAMMALIAN

Version 5c 58 5/13/2007 MONOCLONAL ANTIBODY IGG1 KAPPA CHIMERIC, HUMAN/MURINE 3683, ITGAL INTEGRIN, ALPHA L (ANTIGEN CD11A (P180), LYMPHOCYTE FUNCTION-ASSOCIATED ANTIGEN 1; ALPHA POLYPEPTIDE)

viii. Example: PEGINTERFERON ALFA-2A

This substance is described using both polymer and protein tags. When two sets of tags are present the order is alphabetical according to the parent tags. No delimiter is added between the sets of tags. The molecular weight given in the polymer tags is for the polymer portion of the molecule and not the entire molecule. For details on the polymer tags, see section 27 on Polymers.

Structure

O Chiral

O n CH O N 3 H O H N O CH n O N * 3 H O

Description field

HOMOPOLYMER 1 1 HEAD- TAIL

Version 5c 59 5/13/2007 NUMBER AVERAGE 40000 COMPLETE 1 1 165 CDLPQTHSLGSRRTLMLLAQMRKISLFSCLKDRHDFGFPQEEFGNQFQ KAETIPVLHEMIQQIFNLFSTKDSSAAWDETLLDKFYTELYQQLNDLEACVIQGVGVT ETPLMKEDSILAVRKYFQRITLYLKEKKYSPCAWEVVRAEIMRSFSLSTNLQESLRSK E ONE PEG MOIETY PER MOLECULE YES CYTOKINE INTERFERON HUMAN 3455, IFNAR2 INTERFERON (ALPHA, BETA AND OMEGA) RECEPTOR 2 3440, IFNA2 INTERFERON, ALPHA 2

ix. Example. INFLIXIMAB

Structure Field: Blank

Description field PARTIAL 4 1

Version 5c 60 5/13/2007 EVKLEESGGGLVQPGGSMKLSCVASGFIFSNHWMNWVRQSPEKGLE WVAEIRSKSINSATHYAESVKGRFTISRDDSKSAVYLQMTDLRTEDTGVYYCSRNYY GSTYDYWGQGTTLTVS 2 EVKLEESGGGLVQPGGSMKLSCVASGFIFSNHWMNWVRQSPEKGLE WVAEIRSKSINSATHYAESVKGRFTISRDDSKSAVYLQMTDLRTEDTGVYYCSRNYY GSTYDYWGQGTTLTVS 3 DILLTQSPAILSVSPGERVSFSCRASQFVGSSIHWYQQRTNGSPRLLIKY ASESMSGIPSRFSGSGSGTDFTLSINTVESEDIADYYCQQSHSWPFTFGSGTNLEVK 4 DILLTQSPAILSVSPGERVSFSCRASQFVGSSIHWYQQRTNGSPRLLIKY ASESMSGIPSRFSGSGSGTDFTLSINTVESEDIADYYCQQSHSWPFTFGSGTNLEVK MONOCLONAL ANTIBODY IGG1, KAPPA CHIMERIC, HUMAN/MURINE 7124, TUMOR NECROSIS FACTOR

Radiolabeled Proteins will be handled like other radioactive compounds. If the site of attachment is not known the radioisotope will be drawn in the structure box and linked to an asterisk.

x. Example: TOSITUMOMAB I-131

131I *

Description field

Version 5c 61 5/13/2007 COMPLETE 4 1 451 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQ GLEWIGAIYPGNGDTSYNQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCA RVVYYSNSYWYFDVWGTGTTVTVSGPSVFPLAPSSKSTSGGTAALGCLVKDYFP EPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKP SNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCV VVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWL NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLV KGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVF SCSVMHEALHNHYTQKSLSLSPGK 2 451 QAYLQQSGAELVRPGASVKMSCKASGYTFTSYNMHWVKQTPRQ GLEWIGAIYPGNGDTSYNQKFKGKATLTVDKSSSTAYMQLSSLTSEDSAVYFCA RVVYYSNSYWYFDVWGTGTTVTVSGPSVFPLAPSSKSTSGGTAALGCLVKDYFP EPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKP SNTKVDKKAEPKSCDKTHTCPPCPAPELLGGPSVFLFPPKPKDTLMISRTPEVTCV VVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWL NGKEYKCKVSNKALPAPIEKTISKAKGQPREPQVYTLPPSRDELTKNQVSLTCLV KGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVF SCSVMHEALHNHYTQKSLSLSPGK 3 220 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKP WIYAPSNLASGVPARFSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAG TKLELKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQ SGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFN R 4 220 QIVLSQSPAILSASPGEKVTMTCRASSSVSYMHWYQQKPGSSPKP WIYAPSNLASGVPARFSGSGSGTSYSLTISRVEAEDAATYYCQQWSFNPPTFGAG TKLELKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQ SGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFN R IODINE I 131 LABELED MAMMALIAN

Version 5c 62 5/13/2007 MONOCLONAL ANTIBODY IGG2A MOUSE 931, MS4A1 MEMBRANE-SPANNING 4-DOMAINS, SUBFAMILY A, MEMBER 1

c. Salts of peptides and proteins:

Because of multiple sites of ionization on most peptides and proteins, peptides and proteins will not be drawn as ionized moieties. Counterions, unless the valency is well defined will be drawn within “GENERIC” brackets and UNDEFINED will be entered as stext associated with the bracket. There will only be one UNII associated with each salt of undefined or variable stoichiometry. Metal salts will be drawn as charged entities. Acid salts such as acetates, hydrochlorides and sulfates will be drawn as uncharged acids. Substances will not be distinguished by the extent of crystallinity, or release characteristics. Prompt Insulin Human Zinc and Extended Insulin Human Zinc will be given the same UNII as Insulin Human Zinc.

i. Example: INSULIN HUMAN ZINC

Structure Field

NH2 G I V E Q C CSTLS I C Y Q Chiral

L E N Y C N COOH

NH2 F V N Q H LSC GVHAEL L

Y L CV G E RTG F F Y P KT COOH

2+ Zn UNDEFINED

Description field

COMPLETE 2 1 30 FVNQHLCGSHLVEALYLVCGERGFFYTPKT

Version 5c 63 5/13/2007 2 21 GIVEQCCTSICSLYQLENYCN HORMONE INSULIN HUMAN 3643, INSULIN RECEPTOR 3630, INSULIN

ii. Example: GONADORELIN ACETATE

Structure Field

Chiral

O H N HGW SY GLPR -NH2 H O

O

OH2 UNDEFINED 3CH OH UNDEFINED

Description field

COMPLETE 1 1 10 XHWSYGLRPG PYROGLU AMIDE 1_1=PYROGLU HORMONE

Version 5c 64 5/13/2007 2798, GNRHR GONADOTROPIN-RELEASING HORMONE RECEPTOR 2796, GNRH1 GONADOTROPIN-RELEASING HORMONE 1 (LUTEINIZING- RELEASING HORMONE)

27. Oligonucleotides (Proposed Not official)

Oligonucleotides with three or less nucleotides are drawn showing all atoms. Oligonucleotides up to 30 nucleotides long are drawn using a two letter code for the nucleoside and a one or two letter code for the phosphate linkage. The following abbreviations are used: dA = 2-deoxyadenosine, dC = deoxycytidine, dG =2-deoxyguanosine, dT = thymidine, p = (3'-5')phosphoryl linkage, and ps = (3'-5')phosphorothioyl linkage. The phosphate groups in the chain are negatively charged, so the appropriate cation is added in brackets next to the sequence.

Description Field.

Defined tags for nucleic acids.

First tags in description field indicate that substance is a nucleic acid. Indicates the number of subunits or strands present in the substance. Multiple strands can exist in a substance when two or more strands are based paired. Strands that are not based paired will be treated as separate substances. Indicates the beginning of the description for a given strand/subunit. Subunits are listed in order of decreasing length of nucleic acid sequence. If the lengths are the same then MW would be used. Each subunit is enclosed by subunit tags. Alphanumeric identifier of subunit group. Second tag between tags. Indicates the number of nucleic acid residues in a subunit. This tag is used for a quality control check. Third tag between tags. Contains the sequence abbreviated using the standard four letter DNA codes, I for inosine, and U for uracil. All non-standard bases are indicated by an X. <5_MODIFICATION> Fourth tag between tags. Describes any modification of the sugar at the 5’ end of the strand. All strands are assumed to end in a 5’ hydroxyl group. Examples: TRITYL, PHOSPHATE, and DIGOXIN.

Version 5c 65 5/13/2007 <3_MODIFICATION> Fifth tag between tags. Describes any modification of the sugar at the 3’ end of the strand. All strands are assumed to end in a 3’ hydroxyl group. Examples: PHOSPHATE, DIGOXIN, and ACETATE. Sixth tag between tags. Describes modifications of the sugar throughout the strand. Sugars are assumed to be .BETA.-D-deoxyribose. Each type of modification is listed only once and the residues that have the modification are then listed starting from the 5’ end; gaps are separated by commas. Modifications are listed in order of first appearance. Example: BETA.-D-RIBOSE= 1-5,8,10-16 means residues 1-5, 8 and 10-16 contain a ribose sugar. Seventh tag between tags. Describes modifications to the phosphate groups. Each linkage is associated with its 3’ nucleoside. Each type of modified linkage is listed only once starting from the 5’ end of the strand. Example: PHOSPHOTHIORATE= 1-6 means the linkages connecting residues 1-7 have a phosphothiorate linkage. For substances that contain phosphothiorate, methylphosphonate and other linkages that contain a chiral center, a single structure is drawn. Both and tags are used to describe these substances. Normal bonds are used where stereochemistry is not specified and the stereochemistry tags contain the value “UNKNOWN”. Eighth tag between tags. Describes modifications to the bases—represented by X in the tags. Each modification is listed only once. Examples: 2,6-DIAMINOPURINE= 4 this means that residue 4 contains the base 2,6-diaminopurine Comments used to distinguish one nucleic acid from another when the tags above are not sufficient.

Oligonucleotides with more than 30 residues are defined ONLY by descriptions.

i. Example: DNA sequence d(AGTC)

Structure Field

Chiral + 5' dA p dG p dT p dC 3' Na 3

Description Field 114AGTC<5_MODIFICATION>< /5_MODIFICATION>><3_MODIFICATION>

Phosphorothioate oligonucleotides are usually a mixture of many diastereomers since the configuration of the phosphorothioate groups is not controlled during synthesis. No wedge bonds are drawn to phosphorothioate groups:

O S ps = P R O O R 1 2

ii. Example: FOMIVIRSEN SODIUM

Version 5c 66 5/13/2007

Structure Field

5' dG ps dC ps dG ps dT ps dT ps dT ps dG ps dCChiral

+ ps dT ps dC ps dT ps dT ps dC ps dT ps dT ps Na 20 dC ps dT ps dT ps dG ps dC ps dG 3'

Description Field 1121GCGTTTGCTCTTCTTCTTGC<5_MODIFICATION><3_MODIFICATION> PHOSPHOTHIORATE=1- 20

28. Polymers (Proposed Not Official)

All polymers that contain known repeating units and/or substituents are defined using both structure and description fields. For each polymer as much detail as is available is added to the description field. Polymers that differ by molecular weight (MW) or distribution of molecular weights are considered different polymers. a. Structure Drawing

The structure field contains the smallest repeating unit that results from the polymerization of the monomer. MDL’s structure repeating unit (SRU) brackets, that are labeled with a single capital letter identifier (A, B, C, etc), are used for homopolymers and copolymers. Tags are used in the description field to indicate the orientation of polymerization for both homopolymers and copolymers.

End groups, if known, are drawn and connected to the ends of a polymer. End groups on copolymers that are not definitely connected to a given polymerized monomer are drawn and enclosed in generic brackets labeled END_GROUP using S-Text.

Cross-linking SRUs in copolymers are treated like other repeating units. When substitution on a polymer exists at multiple sites and is incomplete, the substituent is enclosed in generic brackets labeled R# using S- Text. R-groups are numbered in order of decreasing MW. The repeating unit contains an asterisk at the position of potential substitution. The substituent also contains an asterisk at the site of attachment (See examples #v, and vi).

b. Description Field

The description field contains the following tags:

Indicates that the substance is a polymer. Indicates the type of polymer or copolymer. Typical entries include: HOMOPOLYMER: A polymer that has a single structural repeating unit. This term is used when tacticity does not exist. HOMOPOLYMER UNKNOWN: A polymer that has a single structural repeating unit. This term is used when tacticity is possible, but the type of tacticity is unknown.

Version 5c 67 5/13/2007 HOMOPOLYMER ISOTACTIC, HOMOPOLYMER SYNTACTIC, HOMOPOLYMER ATACTIC: A polymer that has a single structural repeating unit where tacticity is known. COPOLYMER RANDOM: A polymer represented by multiple structural repeating units where the order of the repeating units is not controlled. COPOLYMER BLOCK: A polymer represented by multiple structural repeating units where blocks of identical repeating units exist. COPOLYMER GRAFT: A polymer represented by multiple structural repeating units where one polymer is attached to the main chain of another polymer. COPOLYMER UNKNOWN: A polymer represented by multiple structural repeating units where no information about connectivity between structure repeating units is known. Indicates the number of different structural repeating units in a polymer. For a homopolymer the value is one. Tags surrounding information about a given structural repeating unit of a polymer. . The value should be identical to the bracket identifier used in the structure field. Repeating units are labeled in alphabetical order starting from the highest MW structural repeating unit. The bracket identifier is a single capital letter associated with a bracket. Attached R-Groups are not initially considered in the ordering of the subunits. If two subunits have identical MW then the MW of attached R-Groups is taken into consideration. Ta gs used to describe the orientation of homopolymers and block copolymers. Examples: HEAD- TAIL, HEAD-HEAD, RANDOM, UNKNOWN Tags that surround the percentage tags of a given repeating unit of a polymer. The average percentage of a given repeating unit that defines a polymeric substance. The lowest percentage of a given repeating unit that defines a polymeric substance. The highest percentage of a given repeating unit that defines a polymeric substance. Tags that surround information about substituents added to a given SRU. The value should be identical to the bracket identifier used in the structure field. R-Groups are numbered using R# (R1,R2…). R1 has the largest MW, R2 the second largest MW, and so on. Type of limit used to describe the substitution. Examples: WEIGHT PERCENT, DEGREE OF SUBSTITUTION. The average degree of substitution. The lower limit of substitution for the R-Group. The highest limit of substitution for the R-Group. Degree of polymerization is defined as the average number of monomers which make up a given polymer. The average degree of polymerization for the defined polymer. The lower limit for the degree of polymerization. The higher limit for the degree of polymerization. Tags that provide information about the molecular weight of the polymer. Polydispersity is indicated by the ratio of the weight average to number average. Both values are added if known. Tags that surround information on the polymers MW. The type of MW reported for the polymer. Examples: NUMBER AVERAGE, WEIGHT AVERAGE, VISCOSITY AVERAGE. The average MW of the polymer. Polymers are considered different if their average molecular weights differ by a significant amount. The lowest limit of the average molecular weight that describes a polymer. The highest limit of the average molecular weight that describes a polymer.

Version 5c 68 5/13/2007 Tags that provide information about the distinguishing physical properties of a polymer. Distinguishing physical property. Examples:DENSITY, CRYSTALLINITY, VISCOSITY, and PERMEABILITY [Note-Care must be taken when using physical properties to define a polymer. The addition of a new physical property does not necessarily lead to a new substance. Physical properties should only be used when they distinguish one polymer from another.] A second physical property can be added with a second set of tags. The order should be alphabetical based on the distinguishing physical property. The average value for a physical property. The lower limit for a physical property. The upper limit for a physical property. Unit of measurement for a physical property. EXAMPLE: Density = G/ML General comments about a polymer that are important to the definition of a polymer. Comments should be used sparingly; only information essential to the definition of a polymer should be included.

In the description field tags, high and low limit tags either contain product specification limits or limits at the 95% confidence interval if a standard deviation is given (i.e., +/- 2 times the standard deviation). Example: MW of a given polymer is 5000 +/- 500. The tags contain the value 5000, the contains the value 4000, the contains the value 6000.

Particle size, percent water, and percent solvent are not distinguishing factors for polymers.

Care must be taken to ensure that two representations of the same polymer are not entered into the SRS. The SRS software allows registration of the two polymers below:

O + 3CH OOSO Na n O O + H3C OOSO Na nO

A SRS register must use substructure searches prior to entering any polymer that has two or more valid representations.

i. Example. Low Density POLYETHYLENE

Structure Field

* A *

Version 5c 69 5/13/2007 Description Field HOMOPOLYMER 1 A DENSITY 0.91 0.93 G/ML >

ii. Example. High Density POLYETHYLENE

Structure Field

* A *

Version 5c 70 5/13/2007 Description Field HOMOPOLYMER 1 A DENSITY .94 .97 G/CC

iii. Example. MICROCRYSTALLINE CELLULOSE

Degree of polymerization is less than 350.

Version 5c 71 5/13/2007 Structure Field

OH Chiral O * * O H A OH OH

Description Field HOMOPOLYMER 1 A HEAD-TAIL 350

iv. Example. POWDERED CELLULOSE

Version 5c 72 5/13/2007

Degree of polymerization is less than 2250 and more than 440.

Structure Field

OH Chiral O * * O H A OH OH

Description Field HOMOPOLYMER 1 A HEAD-TAIL 440 2250

Version 5c 73 5/13/2007

v. Example. CELLULOSE ACETATE

Cellulose acetate is defined using the USP criteria. The weight percent of acetate is not less than 29.0 percent and not greater than 44.8. Positions of potential substitution are indicated by an asterisk for both the polymer and the substituent. Hydrogen atoms are not listed but assumed to occupy positions that are not completely substituted. Polymers are drawn to retain as many of the atoms of the polymer as possible.

Structure Field

* Chiral O

O O * * O H A 3CH * O R1 O * *

Correct

* Chiral

O O * * O A * H 3CH O * * R1

Incorrect

Description Field HOMOPOLYMER 1 A

Version 5c 74 5/13/2007 HEAD-TAIL R1 WEIGHT PERCENT 29.0 44.8

vi. Example. CELLACEFATE

Version 5c 75 5/13/2007

Structure Field

* Chiral O O O O * * * O A H 3CH * OH

O R2 O * O * R1

Description Field HOMOPOLYMER 1 A HEAD-TAIL R1 WEIGHT PERCENT 30.0 36.0 R2 WEIGHT PERCENT 21.5 26.0

Version 5c 76 5/13/2007

vii. Example. POLOXAMER 124

Structure Field

OH O OH O O B A B CH 3

Description Field COPOLYMER, BLOCK 2 A HEAD-TAIL 51.4 55.2 20 B HEAD-TAIL 44.8

Version 5c 77 5/13/2007 48.6 12 NUMBER AVERAGE 2090 2360

viii. Example. ACRYLONITRILE- BUTADIENE-STYRENE TERPOLYMER

Structure Field N * A * * * B * * C

Description Field COPOLYMER, GRAFT 3 A

Version 5c 78 5/13/2007 40 60 B 5 30 C 15 30

Version 5c 79 5/13/2007 STYRENE AND ACRYLONITRILE ARE GRAFTED ONTO A BUTADIENE BACKBONE FREE STYRENE-ACRYLONITRILE COPOLYMER CHAINS ALSO PRESENT

ix. Example: PHENOL FORMALDEHYDE RESINS (Condensation polymers)

Phenol formaldehyde resins are usually three-dimensional network polymers. Only one structure is drawn for each condensation polymer. The polymer structure is simplified with only two positions of substitution indicated. Substitution of the methylene group will either be ortho to the primary functional group or at the position nearest to the primary functional group.

Structure Field

OH

* A *

Description Field CONDENSATION, NETWORK

Version 5c 80 5/13/2007 1 A FORMS A THREE DIMENSIONAL NETWORK, EXTENT OF NETWORK DEPENDS ON RATIO OF FORMALDEHYDE TO PHENOL; PHENOL PARA, META- POSITIONS CAN ALSO UNDERGO SUBSTITION

In cases where symmetrical substitution is not possible the methylene group is attached to the ortho position or to the nearest position of substitution. The polymer attachment site would be at the nearest site on the opposite side of the functional group.

x. Example: NAPTHOL FORMALDEHYDE RESIN

Version 5c 81 5/13/2007

Structure Field

* OH n *

Description Field CONDENSATION, NETWORK 1 A FORMS A THREE DIMENSIONAL NETWORK, EXTENT OF NETWORK DEPENDS ON RATIO OF FORMALDEHYDE TO NAPTHOL; OTHER POSITIONS ON RINGS CAN ALSO UNDERGO SUBSTITION

Version 5c 82 5/13/2007 xi. Example: 6,6-NYLON.

Each of the polymerized monomers is fully drawn out and linked together .

Structure Field

O O H N 2NH OH N N A H H O O Description Field CONDENSATION 1 A NUMBER AVERAGE 10,000

xii. Example:.HEPARIN (Not Official).

When two moieties can substitute on a given atom such as acetyl and sulfate groups on the amino-group of the sugar, both moieties will be drawn as defined repeating units (see B and C below). When the polymeric

Version 5c 83 5/13/2007 moiety and substituents within a bracket have the same overall molecular weight, the repeating unit that has the most positions of potential substitution will be given priority (see A, B below).

O O O * * S OH OH H O * O MOD_1 O O * OH * O n H O O H O * O O OH * O O N S OH O O n H * OH H * O O OH OH NH O * S OH O CH3 MOD_1 C A

O * S OH O MOD_1 O O * OH H O * O * O n O H O OH OH O N S OH * H O B

COPOLYMER, NATURAL PRODUCT, RANDOM 3 A

Version 5c 84 5/13/2007 B C NUMBER AVERAGE

Version 5c 85 5/13/2007 12,000 15,000 AVERAGE NEGATIVE CHARGE PER RESIDUE 2.3

xiii. Example: POLACRILIN

Polacrilin is a synthetic polymer consisting of acrylate residues crosslinked with divinylbenzene. The amount of divinylbenzene present is between 2 and 10%. In the structure field, divinylbenzene is represented by the para-moiety even though it is often a mixture of ortho, meta, and para isomers. The following statement “THE CROSSLINKING MOIETY DIVINYLBENZENE IS A MIXTURE OF THE META, ORTHO AND PARA ISOMERS” is placed in the comments. A similar statement is used if the crosslinking agent is a single isomer.

Structure Field

*

* * B * OH O * * A

Description Field COPOLYMER 2 A 2 10

Version 5c 86 5/13/2007 B 90 98 THE CROSSLINKING MOIETY DIVINYLBENZENE IS A MIXTURE OF THE META, ORTHO AND PARA ISOMERS

29. Active Moieties

Active moieties are included in the FDA/USP Substance Registration System (SRS). An active moiety is defined in 21CFR316.3 as “the molecule or ion, excluding those appended portions of the molecule that cause the drug to be an ester, salt (including a salt with hydrogen or coordination bonds), or other noncovalent derivative (such as a complex, chelate, or clathrate) of the molecule, responsible for the physiological or pharmacological action of the drug substance.”

Version 5c 87 5/13/2007

Active moieties will be added to the SRS without regard to the actual charged state of the molecule in-vivo. By convention in the FDA/USP SRS, the active moiety of a hydrochloride salt of a base will be the free base and not the protonated form of the base. Likewise, the active moiety of a metal acid salt will be the free acid. This definition applies when a molecule can be neutralized by addition or loss of a proton. For quaternary ammonium cations that contain pharmacological activity, the active moiety is the charged cation. This ensures there will be only one active moiety for each single drug substance.

Care should be taken to ensure that an ester is not essential for the pharmacological activity of a substance. An active moiety can be contained within either the alcohol or acidic portion of the molecule. The reviewing division within FDA should provide individual guidance for each ester to determine the active moiety or whether an active moiety exists.

i. Example: ALENDRONATE SODIUM active moiety is ALENDRONIC ACID

+ Na O O O OH P P OH OH NH 2NH 2 OH OH OH OH P P OH OH O O 2OH 3

ii. Example: CHELOCARDIN HYDROCHLORIDE active moiety is CHELOCARDIN

OHOH O O O Chiral OHOH O O O Chiral OH OH 3CH 3CH CH3 CH3

OH HCl OH H H CH NH CH NH 3 2 3 2

Version 5c 88 5/13/2007