Secondary structures can be recorded using several notations. Most
of the experimental and published secondary structure data are just
figures with bases shown to be either sensitive or protected from
certain restriction enzymes or chemical agents. Because of its
simplicity we decided to use the "bracket" style to represent the
secondary structure of nucleic acids. We are aware of the fact that
this syntax cannot represent in more detail the susceptibility of the
bases to the enzymatic cleavage etc. If this turns to be a limitation
we can improve this part of IRESite. At the moment this is not
considered a drawback but rather one of the things which keep IRESite
Please consider the following secondary structure example with its
corresponding bracket representation:
G | A
A | C
5'-AAAAA | CCCCCC-3'
The above structure can be represented using the bracket notation in the following string:
-----------^ the bulge (the protruding A base at position 9)
The following simple rule can be used even for highly branched structures:
the paired base being on the left-hand side from the stem axis should be represented by the left bracket, the unpaired base using the dot and the paired base being on the right-hand side of the stem should be represented by the right bracket.
As already explained, this notation cannot quantify the accessibility of bases to restriction enzymes and chemical agents as well as it cannot represent pseudoknots, chemical modifications like methylations, pseudouridylations, atomic distances etc. There are more sophisticated notations (for example RNA-ML) but it is not clear whether most of the users would benefit from that syntax. Definitely, this would impose a serious burden on the curators and any other users submitting new data. We are open to discuss this issue with you.
Reduced representation of the sequence and structure.
Detailed structure of IRES segment has been determined only in few cases (Brown et al., 1991; Kieft et al., 1999; Odreman-Macchioli et al., 2001; Nishiyama et al., 2003; Spahn et al., 2004; Lukavsky and Puglisi, 2005). IRESite includes and will further continue to include secondary structures of RNA which were experimentally determined in IRES region. We employ simple but yet sufficient bracket representation (Hofacker et al., 1994; for brief comparison to other secondary structure notations see (http://www.tbi.univie.ac.at/~ivo/RNA/RNAlib.html). We are aware of various nice other notations which can represent even atomic coordinates and base modifications as well (the most comprehensive would be RNAML syntax; Waugh et al., 2002). However, the published data to be processed contain only limited information (whether an individual base is in a single- or double-stranded region). There is no information about e.g. post-transcriptional modification of such bases. Further, the structural data to be processed are often partly predicted consensus structures which prediction was constrained by experimentally determined criteria (base-paired or unpaired bases were or were not susceptible to enzymatic cleavage or chemical modification or were even derived by covariance analysis).
The only loss of information we expect at the moment is that:
We cannot represent the susceptibility of the bases to be more or less likely/strongly within a single- or double-stranded region (e.g. the so called breathing of structures reported sometimes).
We cannot represent some of the base modifications (sequence data is represented in IRESite database using the IUPAC standardized letters (the extended syntax) but still this is not sufficient). Other databases like RNAbase (Murthy and Rose, 2003) are not ahead in solving this particular problem either (using the 3-letter syntax of PDB/NDB database) while their input data originate from totally different experiments.
We doubt one could accommodate the extra information about modified bases while using secondary structure prediction programs either (ViennaPackage -- Hofacker et al., 1994; MFOLD -- Zuker, 1989, 2003; Thurner et al., 2004, Mathews et al., 2004). There are just no calorimetric data describing behaviour of such modified bases and thus calculations cannot account for them. Therefore, the user would anyway have to convert the representation to the more reduced format containing only information about paired/unpaired bases (to the format we use right now). Indeed, we will evaluate the possibilities to include such additional data into the database after more structural records accumulate and will improve the secondary structure representation if necessary.
Needless to note, most of the IRES-related publications lack any secondary structure data and if they do contain some structures, mostly they are computer-based predictions (mostly MFOLD output). So, there are quite a few structures (about 20 -- 40 structures expected) to be accumulated and analyzed. Still, we have to include these few structures in the database as they are thought to be important for IRES activity and the goal of IRESite is to present all critical experimental data regardless the fact whether they are known in 10 or 4 000 cases.
Brown E. A., Day S. P., Jansen R. W., Lemon S. M. (1991) The 5' nontranslated region of hepatitis A virus RNA: secondary structure and elements required for translation in vitro. J. Virol. 65: 5828-5838
Hofacker I. L., Fontana W., Stadler P. F., Bonhoeffer S., Tacker M., Schuster P. (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188
Kieft J. S., Zhou K., Jubin R., Murray M. G., Lau J. Y., Doudna J. A. (1999) The hepatitis C virus internal ribosome entry site adopts an ion-dependent tertiary fold. J. Mol. Biol., 292: 513-529
Lukavsky P. J., Puglisi J. D. (2005) Structure Determination of Large Biological RNAs. Met. In Enzymol. 394: 399-416
Mathews D. H., Disney M. D., Childs J. L., Schroeder S. J., Zuker M., Turner D. H. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. PNAS U.S.A. 101: 7287-7292
Murthy V. L., Rose G. D. (2003) RNABase: an annotated database of RNA structures. RNA 31: 502-504
Nishiyama T., Yamamoto H., Shibuya N., Hatakeyama Y., Hachimori Y., Uchiumi T., Nakashima N. (2003) Structural elements in the internal ribosome entry site of Plautia stali intestine virus responsible for binding with ribosomes. Nucleic Acids Res. 31: 2434-2442
Odreman-Macchioli F., Baralle F. E., Buratti E. (2001) Mutational Analysis of the Different Bulge Regions of Hepatitis C Virus Domain II and Their Influence on Internal Ribosome Entry Site Translational Ability. J. Biol. Chem. 276: 41648-41655
Spahn C. M., Jan E., Mulder A., Grassucci R. A., Sarnow P., Frank J. (2004) Cryo-EM visualization of a viral internal ribosome entry site bound to human ribosomes: the IRES functions as an RNA-based translation factor. Cell 118: 465-75
Thurner C., Witwer C., Hofacker I. L., Stadler P. F. (2004). Conserved RNA secondary structures in Flaviviridae genomes. J . Gen. Virol. 85: 1113-1124
Waugh A., Genderson P., Altman R., Brown J. W., Case D., Gautheret D., Harvey S. C., Leontis N., Westbrook J., Westhof E., Zuker M., Major F. (2002) RNAML: A standard syntax for exchanging RNA information. RNA 8: 707-717
Zuker M. (1989) On finding all suboptimal foldings of an RNA molecules. Science 244: 48-52
Zuker M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31: 3406-3415