That the structure determines the function of proteins is a central paradigm in biology. However, protein...
That the structure determines the function of proteins is a central paradigm in biology. However, protein functions are more directly related to cooperative effects at the residue and multi-residue scales. As such, current representations based on atomic coordinates can be considered inadequate. Bridging the gap between atomic-level structure and overall protein-level functionality requires parameterizations of the protein structure (and other physicochemical properties) in a quasi-continuous range, from a simple collection of unrelated amino acids coordinates to the highly synergistic organization of the whole protein entity, from a microscopic view in which each atom is completely resolved to a "macroscopic" description such as the one encoded in the three-dimensional protein shape.
Here we propose such a parameterization and study its relationship to the standard Euclidian description based on amino acid representative coordinates. The representation uses multipoles associated with residue Cα coordinates as shape descriptors. We demonstrate that the multipoles can be used for the quantitative description of the protein shape and for the comparison of protein structures at various levels of detail. Specifically, we construct a (dis)similarity measure in multipolar configuration space, and show how such a function can be used for the comparison of a pair of proteins. We then test the parameterization on a benchmark set of the protein kinase-like superfamily. We prove that, when the biologically relevant portions of the proteins are retained, it can robustly discriminate between the various families in the set in a way not possible through sequence or conventional structural representations alone. We then compare our representation with the Cartesian coordinate description and show that, as expected, the correlation with that representation increases as the level of detail, measured by the highest rank of multipoles used in the representation, approaches the dimensionality of the fold space.
The results described here demonstrate how a granular description of the protein structure can be achieved using multipolar coefficients. The description has the additional advantage of being immediately generalizable for any residue-specific property therefore providing a unitary framework for the study and comparison of the spatial profile of various protein properties.