<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1742-4682-3-28</ui>
	<ji>1742-4682</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Indications that "codon boundaries" are physico-chemically defined and that protein-folding information is contained in the redundant exon bases</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Biro</snm>
					<mnm>Charles</mnm>
					<fnm>Jan</fnm>
					<insr iid="I1"/>
					<email>jan.biro@sbcglobal.net</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Homulus Foundation, San Francisco, CA 94 105, USA</p>
				</ins>
			</insg>
			<source>Theoretical Biology and Medical Modelling</source>
			<issn>1742-4682</issn>
			<pubdate>2006</pubdate>
			<volume>3</volume>
			<issue>1</issue>
			<fpage>28</fpage>
			<url>http://www.tbiomed.com/content/3/1/28</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">16893453</pubid><pubid idtype="doi">10.1186/1742-4682-3-28</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>16</day>
					<month>12</month>
					<year>2005</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>07</day>
					<month>8</month>
					<year>2006</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>07</day>
					<month>8</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Biro; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>All the information necessary for protein folding is supposed to be present in the amino acid sequence. It is still not possible to provide specific <it>ab initio </it>structure predictions by bioinformatical methods. It is suspected that additional folding information is present in protein coding nucleic acid sequences, but this is not represented by the known genetic code.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>Nucleic acid subsequences comprising the 1st and/or 3rd codon residues in mRNAs express significantly higher free folding energy (FFE) than the subsequence containing only the 2nd residues (<it>p </it>&lt; 0.0001, <it>n </it>= 81). This periodic FFE difference is not present in introns. It is therefore a specific physico-chemical characteristic of coding sequences and might contribute to unambiguous definition of codon boundaries during translation. The FFEs of the 1st and 3rd residues are additive, which suggests that these residues contain a significant number of complementary bases and that may contribute to selection for local RNA secondary structures in coding regions. This periodic, codon-related structure-formation of mRNAs indicates a connection between the structures of exons and the corresponding (translated) proteins. The folding energy dot plots of RNAs and the residue contact maps of the coded proteins are indeed similar. Residue contact statistics using 81 different protein structures confirmed that amino acids that are coded by partially reverse and complementary codons (Watson-Crick (WC) base pairs at the 1st and 3rd codon positions and translated in reverse orientation) are preferentially co-located in protein structures.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Exons are distinguished from introns, and codon boundaries are physico-chemically defined, by periodically distributed FFE differences between codon positions. There is a selection for local RNA secondary structures in coding regions and this nucleic acid structure resembles the folding profiles of the coded proteins. The preferentially (specifically) interacting amino acids are coded by partially complementary codons, which strongly supports the connection between mRNA and the corresponding protein structures and indicates that there is protein folding information in nucleic acids that is not present in the genetic code. This might suggest an additional explanation of codon redundancy.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The protein folding problem has been one of the grand challenges in computational molecular biology. The problem is to predict the native three-dimensional structure of a protein from its amino acid sequence. It is widely believed that the amino acid sequence contains all the information necessary to make up the correct three-dimensional structure, since protein folding is apparently thermodynamically determined; i.e., given a proper environment, a protein will fold up spontaneously. This is called Anfinsen's thermodynamic principle <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
			<p>The thermodynamic principle has been confirmed many times on many different kinds of proteins in vitro. Critics says that the in vivo chemical conditions are different from those in vitro, the correct folding is determined by interactions with other molecules (chaperons, hormones, substrate, etc.) and protein folding is much more complex than re-naturation of denatured poly-amino acids. The fact that many naturally-occurring proteins fold reliably and quickly to their native state, despite the astronomical number of possible configurations, has come to be known as Levinthal's Paradox <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
			<p>Anfinsen's principle was formulated in the 1960s using purely chemical experiments and a lot of intuition. Today, many sequences and structures are available to establish a logical and understandable link between sequence, structure and function. But it is still not possible to predict the structure (or a range of possible structures) correctly from the sequence alone, ab initio and in silico <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
			<p>There are two potential, external sources of additional and specific protein folding information: (a) the chaperons (other proteins that assist in the folding of proteins and nucleic acids <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>; and (b) the protein-coding nucleic acid sequences themselves (which are templates for protein synthesis, but are not defined as chaperons).</p>
			<p>The idea that the nucleotide sequence itself could modulate translation and hence affect co-translational folding and assembly of proteins has been investigated in a number of studies <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Studies on the relationships between synonymous codon usage and protein secondary structural units are especially popular <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. The genetic code is redundant (61 codons code 20 amino acids) and as many as 6 synonymous codons can code the same amino acid (Arg, Leu, Ser). The "wobble" base has no effect on the meaning of most codons, but nevertheless codon usage (wobble usage) is not randomly defined <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp> and there are well-known, stable species-specific differences in codon usage. It seems logical to search for some meaning (biological purpose) for the wobble bases and try to associate them with protein folding.</p>
			<p>Another observation concerning the code redundancy dilemma is that there is a widespread selection (preference) for local RNA secondary structure in protein coding regions <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. A given protein can be encoded by a large number of distinct mRNA species, potentially allowing mRNAs to optimize desirable RNA structural features simultaneously, in addition to their protein coding function. The immediate question is whether there is some logical connection between the possible optimal RNA structures and the possible optimal biologically active protein structures.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<p>Single-stranded RNA molecules can form local secondary structures through the interactions of complementary segments. Watson-Crick (WC) base pair formation lowers the average free energy, d<it>G</it>, of the RNA and the magnitude of change is proportional to the number of base pair formations. Therefore the free folding energy (FFE) is used to characterize the local complementarity of nucleic acids <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The free folding energy is defined as FFE = (d<it>G</it><sub>shuffled </sub>- d<it>G</it><sub>native</sub>)/<it>L </it>&#215; 100, where <it>L </it>is the length of the nucleic acid, i.e., free energy difference between native and shuffled (randomized) nucleic acids per 100 nucleotides. Higher positive values indicate stronger bias toward secondary structure in the native mRNA, and negative values indicate bias against secondary structure in the native mRNA.</p>
			<p>We used a nucleic acid secondary structure predicting tool, the <it>mfold </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp> to obtain d<it>G </it>values and the lowest d<it>G </it>was used to calculate the FFE. The mfold also provided the folding energy dot plots, which are very useful for visualizing the energetically most favored structures in a 2D matrix.</p>
			<p>A series of JAVA tools were used: SeqX to visualize the protein structures in 2D as amino acid residue contact maps <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>; SeqForm to select sequence residues in predefined phases (every third in our case) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Structural data were downloaded from PDB <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, NDB <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and the Integrated Sequence-Structure Database (ISSD) <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
			<p>Structures were generally randomly selected regarding species and biological function (a few exceptions are mentioned in the Results). Care was taken to avoid very similar structures in the selections. A propensity for alpha helices was monitored during selection and structures with very high and very low alpha helix contents were also selected to ensure a wide range of structural representations.</p>
			<p>Linear regression analyses and Student's <it>t</it>-tests were used for statistical analysis of the results.</p>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>A selection of 81 different protein structures together with the corresponding protein and coding sequences was used for this study. These 81 proteins represented different (randomly selected) species and different (also randomly selected) protein functions and therefore the results might be regarded as more generally valid. The propensity for different secondary structure elements was recorded (as annotated in different databases). The proportion of alpha helices ranged from 0 to 90% in the 81 proteins and showed a significant negative correlation to the proportion of beta sheets (not shown). The coding sequences were phase separated by SeqForm into three subsequences, each containing only the 1<sup>st</sup>, 2<sup>nd </sup>and 3<sup>rd </sup>letters of the codons. Similar phase separation was made for intronic sequences immediately before and after the exon. There are, of course, no known codons in the intronic sequences, therefore we continued the same phase that we applied to the exon, assuming that this kind of selection is correct, and maintained the denotation of the phase even for non-coding regions. Subsequences corresponding to the 1<sup>st </sup>and 3<sup>rd </sup>codon letters in the coding regions had significantly higher FFEs than subsequences corresponding to the 2<sup>nd </sup>codon letters. No such difference was seen in non-coding regions (Figure <figr fid="F1">1A&#8211;C</figr>).</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Free folding energies in different codon residues</p>
				</caption>
				<text>
					<p><b>Free folding energies in different codon residues</b>. Free folding energies (FFE) were determined in phase-selected subsequences of 81 different genes. The original nucleic acids contained the intact three-letter codons (1<sup>st</sup>+2<sup>nd</sup>+3<sup>rd</sup>). Subsequences were constructed by periodic removal of one letter from the codon and maintaining the other two (1<sup>st</sup>+2<sup>nd</sup>, 1<sup>st</sup>+3<sup>rd</sup>, 2<sup>nd</sup>+3<sup>rd</sup>) or removing two letters and maintaining only one (1<sup>st</sup>, 2<sup>nd</sup>, 3<sup>rd</sup>). Distinction was made between exons (B and D) and the preceding (-1, A) and following (+1, C) sequences (introns). The d<it>G </it>values were determined by <it>mfold </it>and the FFE was calculated. Each bar represents the mean &#177; SEM, <it>n </it>= 81.</p>
				</text>
				<graphic file="1742-4682-3-28-1"/>
			</fig>
			<p>Higher FFEs in subsequences of 1<sup>st </sup>and 3<sup>rd </sup>codon residues than in the 2<sup>nd </sup>indicate the presence of a larger number of complementary bases at the right positions of these subsequences. However, this might be the case only because the first and last codon residues form simpler subsequences and contain longer repeats of the same nucleotide than the 2<sup>nd </sup>residues. This would not be surprising for the 3<sup>rd </sup>(wobble) base but would not be expected for the 1<sup>st </sup>residue, even though it is known that the central codon letters are the most important for distinguishing among amino acids (as shown in the in the <it>Common Periodic Table of Codons and Amino Acids </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>). It is more significant that the FFEs for the 1<sup>st </sup>and 3<sup>rd </sup>residues are additive and together they represent the entire FFE of the intact mRNA (Figure <figr fid="F1">1D</figr>).</p>
			<p>There is a correlation between the protein structure and the FFEs associated with codon residues. This correlation is especially prominent when the FFE ratios are compared to the helix/sheet ratios (Figure <figr fid="F2">2</figr>).</p>
			<fig id="F2">
				<title>
					<p>Figure 2</p>
				</title>
				<caption>
					<p>FFE associated with codon positions vs. protein structure</p>
				</caption>
				<text>
					<p><b>FFE associated with codon positions vs. protein structure</b>. Free Folding Energies associated with 1<sup>st</sup>, 2<sup>nd </sup>and 3<sup>rd </sup>codon residues in 78 different mRNA sequences were calculated and compared to the helix/sheet ratios of the corresponding protein structures. Linear regression analyses, where pink symbols represent the linear regression line.</p>
				</text>
				<graphic file="1742-4682-3-28-2"/>
			</fig>
			<p>The unique, codon-related FFE pattern and its correlation to alpha helix content suggested some similarity between protein structures and the possible structures of the coding sequences. This possibility was examined by visual comparison of 16 randomly selected protein residue contact maps and the energy dot plots of the corresponding RNAs. We could see similarities between the two different kinds of maps (Figure <figr fid="F3">3</figr>). However, this type of comparison is not quantitative and direct statistical evaluation is not possible.</p>
			<fig id="F3">
				<title>
					<p>Figure 3</p>
				</title>
				<caption>
					<p>Comparison of protein and corresponding mRNA structures</p>
				</caption>
				<text>
					<p><b>Comparison of protein and corresponding mRNA structures</b>. Residue contact maps (RCM) were obtained from the PBD files of protein structures using the SeqX tool (left triangles). Energy dot plots (EDP) for the coding sequences were obtained using the mfold tool (right triangles). The two kinds of maps were aligned along a common left diagonal axis to make possible an easy visual comparison of the different kind of representations. The black dots in the RCMs indicate amino acids that are within 6&#197; of each other in the protein structure. The colored (grass-like) areas in the EDPs indicate the energetically mostly likely RNA interactions (color code in increasing order: yellow, green, red, black).</p>
				</text>
				<graphic file="1742-4682-3-28-3"/>
			</fig>
			<p>Another similar, but still not quantitative, comparison of protein and coding structures was performed on four proteins that are known to have very similar 3D structures although their primary structures (sequences), and their mRNA sequences, are less than 30% similar. These four proteins exemplify the fact that the tertiary structures of proteins are much more conserved than the amino acid sequences. We asked whether this is also true for the RNA structures and sequences. We found that there are signs of conservation even in the RNA secondary structure (as indicated by the energy dot plots) and there are similarities between the protein and nucleic acid structures (Figure <figr fid="F4">4</figr>). Comparisons of the protein residue contact map with the nucleic acid folding maps suggest similarities between the 3D structures of these different kinds of molecules. However, this is a semi-quantitative method.</p>
			<fig id="F4">
				<title>
					<p>Figure 4</p>
				</title>
				<caption>
					<p>Comparison of protein and mRNA secondary structures</p>
				</caption>
				<text>
					<p><b>Comparison of protein and mRNA secondary structures</b>. Residue contact maps (RCM) were obtained from the PBD files of 4 protein structures (1CBI, 1EIO, 1IFC, 1OPA) using the SeqX tool (left column). Energy dot plots (EDP) for the coding sequences were obtained using the mfold tool (right column). The left diagonal portions of these two kinds of maps are compared in the central part of the figure. Blue horizontal lines in the background correspond to the main amino acid co-location sites in the RCM. Intact RNA (123) as well as subsequences containing only the 1st and 3rd codon letters (13) are compared. The black dots in the RCMs indicate amino acids that are within 6&#197; of each other in the protein structure. The colored (grass-like) areas in the EDPs indicate the energetically most likely RNA interactions (color code in increasing order: yellow, green, red, black).</p>
				</text>
				<graphic file="1742-4682-3-28-4"/>
			</fig>
			<p>More direct statistical support might be obtained by analyzing and comparing residue co-locations in these structures. Assume that the structural unit of mRNA is a tri-nucleotide (codon) and the structural unit of the protein is the amino acid. The codon may form a secondary structure by interacting with other codons according to the WC base complementary rules, and contribute to the formation of a local double helix. The 5'-A1U2G3-3' sequence (Met, M codon) forms a perfect double string with the 3'-U3A2C1-5' sequence (His, H codon, reverse and complementary reading). Suboptimal complexes are 5'-A1X2G3-3' partially complemented by 3'-U3X2C1-5' (AAG, Lys; AUG, Met; AGG, Arg; ACG, Thr; and CAU, His; CUU, Leu; CGU, Arg; CCU, Pro, respectively).</p>
			<p>I searched for some pattern in the codons of co-locating amino acids and analyzed the frequencies of the 8 possible patterns in the 64 nucleic acid triplets (Figure <figr fid="F5">5</figr>). The codons were either complementary to each other in all three (-123-) or at least 2 (-12X-, 1X3-, -X23-) codon positions. In these latter cases the codon complementarity was partial, because complementarity was not required for one codon position (X). The complementary codons were translated in the same (5'&gt;3' &amp; 3'&gt;5', only complementary, C) or reversed and complementary (5'&gt;3' &amp; 5'&gt;3', RC) directions.</p>
			<fig id="F5">
				<title>
					<p>Figure 5</p>
				</title>
				<caption>
					<p>Amino acid pairs coded by complementary codons</p>
				</caption>
				<text>
					<p><b>Amino acid pairs coded by complementary codons</b>. Two optimal (perfect) and six suboptimal (partial) codon complementarity situations (codon codes) are listed. In the perfect complementarity situation a codon (AUG), which is transcribed from the sense (pos) DNA strand in the 5'&gt;3' direction, is complemented with the UAC codon that is transcribed from the antisense (neg) DNA strand in complementary (C-123) or reverse-complementary (RC-321) orientations. In the suboptimal codon codes, one codon residue is undefined (X) and may or may not complemented in the corresponding codon on the negative strand in that residue position. Translation of the codon and its complementary pair will result in different amino acid pairs, depending on the codon pattern. This is illustrated in examples where the undefined X residues uniformly replaced by A in the positive and U in the negative strands. For example, the meaning of 5'-AAG-3'/3'-UUC-5' (from the D_1X3/RC_3X1 codon code pattern) is that this codon pair will be translated into the amino acids Lys (K) and Leu (L) and will result in K&gt;&lt;L residue pairs. Letter -P at the end of a codon pattern indicates the presence (-P), in contrast to the non-presence (-N), of that particular codon code to determine a specific amino acid co-location in a concrete protein structure (this is used in Figure 6).</p>
				</text>
				<graphic file="1742-4682-3-28-5"/>
			</fig>
			<p>These perfectly or partially complementary codon patterns, read in direct or opposite directions, defined 8 different ways in which amino acids can be paired on the basis of their codon complementarities: the 8 possible amino acid &#8211; amino acid (or protein-protein) interaction codes.</p>
			<p>Our experiments with FFE indicate that local nucleic acid structures are formed under this suboptimal condition, i.e., when the 1<sup>st </sup>and 3<sup>rd </sup>codon residues are complementary but the 2<sup>nd </sup>is not. If this is the case, and there is a connection between nucleic acid and protein 3D structures, one might expect that the 4 amino acids coded by 5'-A1X2G3-3' codons will preferentially co-locate with 4 other amino acids coded by 3'-U3X2C1-5' codons. We have constructed 8 different complementary codon combinations and found that the codons of co-locating amino acids are often complementary at the 1<sup>st </sup>and 3<sup>rd </sup>positions and follow the D-1X3/RC-3X1 formula but not the 7 other formulae (Figure <figr fid="F6">6A&#8211;B</figr>). This means that amino acids that are coded by partially reverse and complementary codons (WC base pairs at the 1<sup>st </sup>and 3<sup>rd </sup>codon positions and translated in reverse orientation) are preferentially co-located in protein structures.</p>
			<fig id="F6">
				<title>
					<p>Figure 6</p>
				</title>
				<caption>
					<p>Complementary codes vs. amino acid co-locations</p>
				</caption>
				<text>
					<p><b>Complementary codes vs. amino acid co-locations</b>. A: The propensities for the 400 possible amino acid pairs were monitored in 81 different protein structures with the SeqX tool. The tool detected co-locations when two amino acids were closer than 6&#197; to each other (neighbors on the same strand were excluded). The total number of co-locations was 34,630. Eight different complementary codes were constructed for the codons (two optimal and six suboptimal). In the two optimal codes all three codon residues (123) were complementary (C) or reverse-complementary (RC) to each other. In the suboptimal codes only two of three codon residues were C or RC to each other (12, 13, 23), while the third was not necessarily complementary (X). (For example, complementary code RC_3X1 means that the first and third codon letters are always complementary (to D_1X3), but not the second, and the possible codons are read in reverse orientation). The 400 co-locations were divided into 20 subgroups corresponding to 20 amino acids (one of the co-locating pairs), each group containing 20 amino acids (corresponding to the other amino acids in each co-locating pair). If the codons of the amino acid pairs followed the predefined complementary code, the co-location was regarded as positive (P); if not, the co-location was regarded as negative (N). Each symbol represents the mean frequency of P or N co-locations corresponding to the indicated amino acid. Paired Student's <it>t</it>-test, <it>n </it>= 20. B: The ratio of positive (P) and negative (N) co-locations was calculated on data from (A). Each bar represents the mean &#177; SEM, <it>n </it>= 20.</p>
				</text>
				<graphic file="1742-4682-3-28-6"/>
			</fig>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>It is well known that coding and non-coding DNA sequences (exons/introns) are different and this difference is somehow related to the asymmetry of the codons, i.e. that the third codon letter (wobble) is less important in defining the meaning of the codon than the first and second letters. Many Markov models have been formulated to find this asymmetry and predict coding sequences (genes) de novo. These in silico methods work rather well but not perfectly and some scientists remain unconvinced that codon asymmetry explains the exon-intron differences satisfactorily.</p>
			<p>Another codon-related problem is that the well-known, non-overlapping, triplet codon translation process is extremely phase-dependent and there is theoretically no tolerance for any phase shift. There are famous examples of single nucleotide deletions that destroy the meaningful translation of a sequence and are incompatible with life. However, considering the magnitude and complexity of the eukaryotic proteome, the precision of translation is astonishingly good. Such physical precision is not possible without a massive and consistent physico-chemical underpinning. Therefore, discovery of the existence of secondary structure bias (folding energy differences) in coding regions of many organisms <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> was very welcome because it clearly defined codon boundaries on a physico-chemical basis.</p>
			<p>Our experiments with free folding energy (FFE) confirmed that this bias exists. In addition, there is a very consistent and very significant pattern of FFE distribution along the nucleotide sequence. Comparing the FFEs of phase-selected subsequences, those subsequences comprising only the 1<sup>st </sup>or only the 3<sup>rd </sup>codon letters showed significantly higher FFE than those consisting only of the 2<sup>nd </sup>letters. This FFE difference was not present in the intronic sequences preceding and following the exons, but it was present in exons from different species. This is an interesting observation because these phenomena might not only distinguish between exons and introns on a physico-chemical basis, but might also clearly define the tri-nucleotide codons and thus the phase of the translation. This codon-related phase-specific variation in FFE may explain why mRNAs have greater negative free folding energies than shuffled or codon choice randomized sequences <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
			<p>Free folding energy in nucleic acids is always associated with WC base pair formation. A higher FFE indicates more WC pairs (presence of complementarity) and a lower FFE indicates fewer WC pairs (less complementarity). The FFEs in the 1<sup>st </sup>and 3<sup>rd </sup>codon positions were additive, while the 2<sup>nd </sup>letter did not contribute to the total FFE; the total FFE of the entire (intact) nucleic acid was the same as that of subsequences containing only the 1<sup>st </sup>and 3<sup>rd </sup>codon letters (2<sup>nd </sup>deleted). This indicates that local RNA secondary structure bias is caused by complementarity of the 1<sup>st </sup>and 3<sup>rd </sup>codon residues in local sequences. This partial, local complementarity is more optimal in reverse orientation of the local sequences, as expected with loop formations.</p>
			<p>FFEs are obtained by considering free folding energies of substrings that do not represent the real molecule: forcing nucleotides to be consecutive is an extreme methodological approach to measuring the structure features of coding sequences. However, this bioinformatical method has been successfully used by others <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B21">21</abbr></abbrgrp>. In addition, the behaviors of the 1<sup>st</sup>, 2<sup>nd </sup>and 3<sup>rd </sup>codon bases separately are useful for showing that the 2<sup>nd </sup>codon position does not have the same significance in the codons as the other two positions <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Intronic sequences do not contain codons and consequently thy show no position-related periodic-FFE variation.</p>
			<p>It is known that single-stranded RNA molecules can form local secondary structures through the interactions of complementary segments. The novel observation here is that these interactions preferentially involve the 1<sup>st </sup>and 3<sup>rd </sup>codon residues. This connection between the RNA secondary structure and codons immediately directs attention toward the question of protein folding and its long-suspected connection to RNA folding <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>.</p>
			<p>Only about one-third (20/64) of the genetic code is used for protein coding, i.e., there is a great excess of information in the mRNA. At the same time, the information carried by amino acids seems to be insufficient (as stated by some scientists) to complete unambiguous protein folding. Therefore, it is believed that the third codon residue (wobble base) contains information additional to that already present in the genetic code. A specialized database, the ISSD <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, was established in an effort to connect different features of protein structure to wobble bases <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> with more or less success.</p>
			<p>We found a significant correlation between FFE ratios and the helix/sheet contents of protein structures. It was possible to make direct visual comparison of mRNA structures (as statistically predicted by mfold energy dot-plots) and protein structures (as 2D residue contact maps). This method suggests similarity between nucleic acid and protein structures.</p>
			<p>It is known that some complex protein structures are very similar even if there is less than 30% sequence similarity. It was interesting to see whether the same principle might apply to nucleic acids, and structural similarity might exist even when the sequence similarity is low. Furthermore, significant similarity between nucleic acid and protein structures might exist even without translational connection. Structure seems to be more preserved, even in nucleic acids, than sequence. However, although the matrix comparisons are suggestive, they remain semi-quantitative. Better support was necessary.</p>
			<p>A working hypotheses grew out of these observations, namely that (a) partial, local reverse-complementarity exists in nucleic acids that form the nucleic acid structure; (b) there is some degree of similarity between the folding of nucleic acids and proteins; (c) protein structure determines the amino acid co-locations; (4) in consequence, amino acids coded by interacting (partially reverse complementary) codons might show preferential co-locations in the protein structures. And it seems to be the case: codons that contain complementary bases at the 1<sup>st </sup>and 3<sup>rd </sup>positions and are translated in reverse orientation result in preferentially co-located (interacting) amino acids in the 3D protein structure. Other complementary residue combinations or translation in the same (not reverse) direction (as many as seven combinations in total) did not result in any preferentially co-locating subset of amino acid pairs.</p>
			<p>Construction of residue contact maps for protein structures and statistical evaluation of residue co-locations is a frequently used method for visualization and analysis of spatial connections among amino acids <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. The amino acid co-locations in real protein structures are clearly not random <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> and therefore residue co-location matrices are often used to assist in the prediction of novel protein structures <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. We have carefully examined the physico-chemical properties of specifically interacting amino acids in and between protein structures, and concluded that these interactions follow the well-known physico-chemical rules of size, charge and hydrophobic compatibility (unpublished data), well in line with Anfinsen's prediction. The recent study supports the conclusion that there is a previously unknown connection between the codons of specifically interacting amino acids; those codons are complementary at the 1<sup>st </sup>and 3<sup>rd </sup>(but not the 2<sup>nd</sup>) codon positions.</p>
			<p>The idea that sequence complementarity might explain the nature of specific protein-protein interactions is not new and was suggested as long ago as 1981 <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. I was never able to confirm my own original theory experimentally, the suggestion of perfect complementarity between codons of interacting amino acids <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>, though others were more successful <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. The explanation is that codon complementarity is suboptimal and does not involve the 2<sup>nd </sup>codon residue. Experimental in vitro confirmation is required to validate this recent theoretical and in silico prediction.</p>
			<p><b>Availability: </b><url>http://www.janbiro.com/downloads</url>: SeqX, SeqForm.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>The author of this article (J.C.B.) believes that he was the first scientist suggesting the existence of a "proteomic code". The original idea was published in 1981 in the Medical Hypotheses <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp> as well as some aspects of the recent concept of a "protein-protein interaction code" <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> that was further developed in this article.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Studies on the gross structure, cross-linkages, and terminal sequences in ribonuclease</p>
				</title>
				<aug>
					<au>
						<snm>Anfinsen</snm>
						<fnm>CB</fnm>
					</au>
					<au>
						<snm>Redfield</snm>
						<fnm>RR</fnm>
					</au>
					<au>
						<snm>Choate</snm>
						<fnm>WI</fnm>
					</au>
					<au>
						<snm>Page</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Carroll</snm>
						<fnm>WR</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1954</pubdate>
				<volume>207</volume>
				<fpage>201</fpage>
				<lpage>210</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">13152095</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>How to fold graciously in Mossbauer spectroscopy in biological systems</p>
				</title>
				<aug>
					<au>
						<snm>Levinthal</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proceedings of a Meeting held at Allerton House, Monticello, IL</source>
				<publisher>Urbana, IL: University of Illinois Press</publisher>
				<editor>Debrunner P, Tsibris JCM, Munck E</editor>
				<pubdate>1969</pubdate>
				<fpage>22</fpage>
				<lpage>24</lpage>
			</bibl>
			<bibl id="B3">
				<title>
					<p>ASTRA-FOLD: a combinatorial and global optimization framework for ab initio prediction of three-dimensional structures of proteins from the amino acid sequence</p>
				</title>
				<aug>
					<au>
						<snm>Klepeis</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Floudas</snm>
						<fnm>AC</fnm>
					</au>
				</aug>
				<source>Biochem J</source>
				<pubdate>2003</pubdate>
				<volume>85</volume>
				<fpage>2119</fpage>
				<lpage>2146</lpage>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Molecular chaperones &#8211; cellular machines for protein folding</p>
				</title>
				<aug>
					<au>
						<snm>Walter</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Buchner</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Angew Chem Int Ed Engl</source>
				<pubdate>2002</pubdate>
				<volume>41</volume>
				<fpage>1098</fpage>
				<lpage>1113</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1521-3773(20020402)41:7&lt;1098::AID-ANIE1098&gt;3.0.CO;2-9</pubid>
						<pubid idtype="pmpid" link="fulltext">12491239</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Cotranslational folding of globin</p>
				</title>
				<aug>
					<au>
						<snm>Komar</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Kommer</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Krasheninnikov</snm>
						<fnm>IA</fnm>
					</au>
					<au>
						<snm>Spirin</snm>
						<fnm>AS</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1997</pubdate>
				<volume>272</volume>
				<fpage>10646</fpage>
				<lpage>10651</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.272.16.10646</pubid>
						<pubid idtype="pmpid" link="fulltext">9099713</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Protein secondary structural types are differentially coded on messenger RNA</p>
				</title>
				<aug>
					<au>
						<snm>Thanaraj</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Argos</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Protein Sci</source>
				<pubdate>1996</pubdate>
				<volume>5</volume>
				<fpage>1973</fpage>
				<lpage>1983</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8897597</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level</p>
				</title>
				<aug>
					<au>
						<snm>Brunak</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Engelbrecht</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Proteins</source>
				<pubdate>1996</pubdate>
				<volume>25</volume>
				<fpage>237</fpage>
				<lpage>252</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/(SICI)1097-0134(199606)25:2&lt;237::AID-PROT9&gt;3.3.CO;2-Y</pubid>
						<pubid idtype="pmpid">8811739</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Studies on the relationships between the synonymous codon usage and protein secondary structural units</p>
				</title>
				<aug>
					<au>
						<snm>Gupta</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Majumdar</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bhattacharya</snm>
						<fnm>TK</fnm>
					</au>
					<au>
						<snm>Ghosh</snm>
						<fnm>TC</fnm>
					</au>
				</aug>
				<source>Biochem Biophys Res Commun</source>
				<pubdate>2000</pubdate>
				<volume>269</volume>
				<fpage>692</fpage>
				<lpage>696</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/bbrc.2000.2351</pubid>
						<pubid idtype="pmpid" link="fulltext">10720478</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code</p>
				</title>
				<aug>
					<au>
						<snm>Chiusano</snm>
						<fnm>ML</fnm>
					</au>
					<au>
						<snm>Alvarez-Valin</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Di Giulio</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>D'Onofrio</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Ammirato</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Colonna</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Bernardi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2000</pubdate>
				<volume>261</volume>
				<fpage>63</fpage>
				<lpage>69</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(00)00521-7</pubid>
						<pubid idtype="pmpid" link="fulltext">11164038</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>The relationship between synonymous codon usage and protein structure in Escherichia coli and Homo sapiens</p>
				</title>
				<aug>
					<au>
						<snm>Gu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Zhou</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ma</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Sun</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Lu</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Biosystems</source>
				<pubdate>2004</pubdate>
				<volume>73</volume>
				<fpage>89</fpage>
				<lpage>97</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.biosystems.2003.10.001</pubid>
						<pubid idtype="pmpid" link="fulltext">15013221</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Synonymous codon usage in bacteria</p>
				</title>
				<aug>
					<au>
						<snm>Ermolaeva</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Curr Issues Mol Biol</source>
				<pubdate>2001</pubdate>
				<volume>3</volume>
				<fpage>91</fpage>
				<lpage>97</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11719972</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Hidden massages in hidden sub-sequences: a study on collagens</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Biro</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Biro</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>30th FEBS Congress &#8211; 9th IUBMB Conference, Budapest, Hungary, 2&#8211;7 July 2005</source>
				<pubdate>2005</pubdate>
				<note>abstract.</note>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Widespread selection for local RNA secondary structure in coding regions of bacterial genes</p>
				</title>
				<aug>
					<au>
						<snm>Katz</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Burge</snm>
						<fnm>CB</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>2042</fpage>
				<lpage>2051</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">403678</pubid>
						<pubid idtype="pmpid" link="fulltext">12952875</pubid>
						<pubid idtype="doi">10.1101/gr.1257503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Mfold web server for nucleic acid folding and hybridization prediction</p>
				</title>
				<aug>
					<au>
						<snm>Zuker</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>3406</fpage>
				<lpage>3415</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">169194</pubid>
						<pubid idtype="pmpid" link="fulltext">12824337</pubid>
						<pubid idtype="doi">10.1093/nar/gkg595</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>SeqX: a tool to detect, analyze and visualize residue co-locations in protein and nucleic acid structures</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Fordos</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>6</volume>
				<fpage>170</fpage>
				<url>http://www.janbiro.com/downloads</url>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1182355</pubid>
						<pubid idtype="pmpid" link="fulltext">16011796</pubid>
						<pubid idtype="doi">10.1186/1471-2105-6-170</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>SeqForm</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<pubdate>2005</pubdate>
				<url>http://www.janbiro.com/downloads</url>
			</bibl>
			<bibl id="B17">
				<title>
					<p>The Protein Data Bank</p>
				</title>
				<aug>
					<au>
						<snm>Berman</snm>
						<fnm>HM</fnm>
					</au>
					<au>
						<snm>Westbrook</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Feng</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Gilliland</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Bhat</snm>
						<fnm>TN</fnm>
					</au>
					<au>
						<snm>Weissig</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Shindyalov</snm>
						<fnm>IN</fnm>
					</au>
					<au>
						<snm>Bourne</snm>
						<fnm>PE</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>235</fpage>
				<lpage>242</lpage>
				<url>http://www.pdb.org/</url>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102472</pubid>
						<pubid idtype="pmpid" link="fulltext">10592235</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.235</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>The Nucleic Acid Database: a comprehensive relational database of three-dimensional structures of nucleic acids</p>
				</title>
				<aug>
					<au>
						<snm>Berman</snm>
						<fnm>HM</fnm>
					</au>
					<au>
						<snm>Olson</snm>
						<fnm>WK</fnm>
					</au>
					<au>
						<snm>Beveridge</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Westbrook</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gelbin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Demeny</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hsieh</snm>
						<fnm>SH</fnm>
					</au>
					<au>
						<snm>Srinivasan</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Schneider</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Biophys J</source>
				<pubdate>1992</pubdate>
				<volume>63</volume>
				<fpage>751</fpage>
				<lpage>759</lpage>
				<url>http://ndbserver.rutgers.edu/index.html</url>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1262208</pubid>
						<pubid idtype="pmpid">1384741</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>ISSD Version 2.0: taxonomic range extended</p>
				</title>
				<aug>
					<au>
						<snm>Adzhubei</snm>
						<fnm>IA</fnm>
					</au>
					<au>
						<snm>Adzhubei</snm>
						<fnm>AA</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>268</fpage>
				<lpage>271</lpage>
				<url>http://www.protein.bio.msu.su/issd/</url>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148153</pubid>
						<pubid idtype="pmpid" link="fulltext">9847198</pubid>
						<pubid idtype="doi">10.1093/nar/27.1.268</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>A common periodic table of codons and amino acids</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Benyo</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Sansom</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Szlavecz</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Fordos</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Micsik</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Benyo</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Biochem Biophys Res Commun</source>
				<pubdate>2003</pubdate>
				<volume>306</volume>
				<fpage>408</fpage>
				<lpage>415</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0006-291X(03)00974-4</pubid>
						<pubid idtype="pmpid" link="fulltext">12804578</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>mRNA has greater negative folding free energies than shuffled or codon choice randomized sequences</p>
				</title>
				<aug>
					<au>
						<snm>Seffens</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Digby</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>1578</fpage>
				<lpage>1584</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148359</pubid>
						<pubid idtype="pmpid" link="fulltext">10075987</pubid>
						<pubid idtype="doi">10.1093/nar/27.7.1578</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Tracing specific synonymous codon-secondary structure correlations through evolution</p>
				</title>
				<aug>
					<au>
						<snm>Oresic</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Dehn</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Korenblum</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Shalloway</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>2003</pubdate>
				<volume>56</volume>
				<fpage>473</fpage>
				<lpage>4840</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00239-002-2418-x</pubid>
						<pubid idtype="pmpid" link="fulltext">12664167</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>The base composition of the genes is correlated with the secondary structures of the encoded proteins</p>
				</title>
				<aug>
					<au>
						<snm>D'Onofrio</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Ghosh</snm>
						<fnm>TC</fnm>
					</au>
					<au>
						<snm>Bernardi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Gene</source>
				<pubdate>2002</pubdate>
				<volume>300</volume>
				<fpage>179</fpage>
				<lpage>187</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1119(02)01045-4</pubid>
						<pubid idtype="pmpid" link="fulltext">12468099</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>The relationship between synonymous codon usage and protein structure</p>
				</title>
				<aug>
					<au>
						<snm>Xie</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ding</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>FEBS Lett</source>
				<pubdate>1998</pubdate>
				<volume>434</volume>
				<fpage>93</fpage>
				<lpage>96</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0014-5793(98)00955-7</pubid>
						<pubid idtype="pmpid" link="fulltext">9738458</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Distribution of amino acid residues and residue-residue contacts in molecular chaperons</p>
				</title>
				<aug>
					<au>
						<snm>Kumarevel</snm>
						<fnm>TS</fnm>
					</au>
					<au>
						<snm>Gromiha</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Ponnuswamy</snm>
						<fnm>MN</fnm>
					</au>
				</aug>
				<source>Prep Biochem Biotechnol</source>
				<pubdate>2001</pubdate>
				<volume>31</volume>
				<fpage>163</fpage>
				<lpage>183</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1081/PB-100103382</pubid>
						<pubid idtype="pmpid" link="fulltext">11426704</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Comparison of helix interactions in membrane and soluble alpha-bundle proteins</p>
				</title>
				<aug>
					<au>
						<snm>Eilers</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Patel</snm>
						<fnm>AB</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>SO</fnm>
					</au>
				</aug>
				<source>Biochem J</source>
				<pubdate>2002</pubdate>
				<volume>82</volume>
				<fpage>2720</fpage>
				<lpage>2736</lpage>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Residue frequencies at protein-protein interfaces</p>
				</title>
				<aug>
					<au>
						<snm>Glaser</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Steinberg</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Vakser</snm>
						<fnm>IA</fnm>
					</au>
					<au>
						<snm>Ben-Tal</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Proteins Struct Funct Genet</source>
				<pubdate>2001</pubdate>
				<volume>43</volume>
				<fpage>89</fpage>
				<lpage>102</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1097-0134(20010501)43:2&lt;89::AID-PROT1021&gt;3.0.CO;2-H</pubid>
						<pubid idtype="pmpid" link="fulltext">11276079</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Amino acid pair interchanges at spatially conserved locations</p>
				</title>
				<aug>
					<au>
						<snm>Naor</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Fisher</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Jernigan</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Wolfson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Nussinov</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1996</pubdate>
				<volume>256</volume>
				<fpage>924</fpage>
				<lpage>938</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1996.0138</pubid>
						<pubid idtype="pmpid" link="fulltext">8601843</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Interchanges of spatially neighboring residues in structurally conserved environment</p>
				</title>
				<aug>
					<au>
						<snm>Azarya-Sprinzak</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Naor</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Wolfson</snm>
						<fnm>HJ</fnm>
					</au>
					<au>
						<snm>Nussinov</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>1997</pubdate>
				<volume>10</volume>
				<fpage>1109</fpage>
				<lpage>1122</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/protein/10.10.1109</pubid>
						<pubid idtype="pmpid" link="fulltext">9488136</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Prediction of protein residue contacts with a PDB-derived likelihood matrix</p>
				</title>
				<aug>
					<au>
						<snm>Singer</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Vriend</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Bywater</snm>
						<fnm>RP</fnm>
					</au>
				</aug>
				<source>Protein Eng</source>
				<pubdate>2002</pubdate>
				<volume>15</volume>
				<fpage>721</fpage>
				<lpage>725</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/protein/15.9.721</pubid>
						<pubid idtype="pmpid" link="fulltext">12456870</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Predicting inter-residue contacts using templates and pathways</p>
				</title>
				<aug>
					<au>
						<snm>Shao</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Bystroff</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proteins Struct Funct Genet</source>
				<pubdate>2003</pubdate>
				<volume>53</volume>
				<fpage>497</fpage>
				<lpage>502</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/prot.10539</pubid>
						<pubid idtype="pmpid" link="fulltext">14579339</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Comparative analysis of specificity in protein-protein interactions. Part II: The complementary coding of some proteins as the possible source of specificity in protein-protein interactions</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Med Hypotheses</source>
				<pubdate>1981</pubdate>
				<volume>7</volume>
				<fpage>981</fpage>
				<lpage>993</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0306-9877(81)90094-3</pubid>
						<pubid idtype="pmpid" link="fulltext">7289918</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Frequent occurrence of short complementary sequences in nucleic acids</p>
				</title>
				<aug>
					<au>
						<snm>Segersteen</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Nordgren</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<source>Biochem Biophys Res Commun</source>
				<pubdate>1986</pubdate>
				<volume>139</volume>
				<fpage>94</fpage>
				<lpage>101</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0006-291X(86)80084-5</pubid>
						<pubid idtype="pmpid" link="fulltext">3533060</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Specific interactions between sense and complementary peptides: the basics for the proteomic code</p>
				</title>
				<aug>
					<au>
						<snm>Hela</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Roberts</snm>
						<fnm>GW</fnm>
					</au>
					<au>
						<snm>Raynes</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Bhakoo</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>AD</fnm>
					</au>
				</aug>
				<source>Chembiochem</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>136</fpage>
				<lpage>151</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1439-7633(20020301)3:2/3&lt;136::AID-CBIC136&gt;3.0.CO;2-7</pubid>
						<pubid idtype="pmpid" link="fulltext">11921391</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Comparative analysis of specificity in protein-protein interactions. Part I: A theoretical and mathematical approach to specificity in protein-protein interactions</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Med Hypotheses</source>
				<pubdate>1981</pubdate>
				<volume>7</volume>
				<fpage>969</fpage>
				<lpage>79</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0306-9877(81)90093-1</pubid>
						<pubid idtype="pmpid" link="fulltext">7289917</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Comparative analysis of specificity in protein-protein interactions. Part III: Models of the gene expression based on the sequential complementary coding of some pituitary proteins</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Med Hypotheses</source>
				<pubdate>1981</pubdate>
				<volume>7</volume>
				<fpage>995</fpage>
				<lpage>1007</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0306-9877(81)90095-5</pubid>
						<pubid idtype="pmpid" link="fulltext">7289919</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>A novel intra-molecular protein-protein interaction code based on partial complementary coding of co-locating amino acids</p>
				</title>
				<aug>
					<au>
						<snm>Biro</snm>
						<fnm>JC</fnm>
					</au>
				</aug>
				<source>Med Hypotheses</source>
				<pubdate>2006</pubdate>
				<volume>66</volume>
				<fpage>137</fpage>
				<lpage>42</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/j.mehy.2005.07.014</pubid>
						<pubid idtype="pmpid" link="fulltext">16168570</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
