Open Access Highly Accessed Research

Measuring the functional sequence complexity of proteins

Kirk K Durston1*, David KY Chiu2, David L Abel3 and Jack T Trevors4

Author Affiliations

1 Department of Biophysics, University of Guelph, Guelph, ON, N1G 2W1, Canada

2 Department of Computing and Information Science, University of Guelph, Guelph, ON, N1G 2W1, Canada

3 Program Director, The Gene Emergence Project, The Origin-of-Life Foundation, Inc., 113 Hedgewood Drive Greenbelt, MD 20770-1610, USA

4 Department of Environmental Biology, University of Guelph, Guelph, ON, N1G 2W1, Canada

For all author emails, please log on.

Theoretical Biology and Medical Modelling 2007, 4:47  doi:10.1186/1742-4682-4-47

Published: 6 December 2007



Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity.

Methods and Results

We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain.


For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.