Rossmann-Like Proteins: Function and Evolution Analysis of a Fifth of the Protein World
The classical Rossmann fold, also known as a doubly-wound three layer a/b-sandwich, consists of two-units (321456 topology) that form a single parallel sheet flanked by alpha-helices on both sides and contain a characteristic crossover between strands 3 and 4. We defined its core minimal Rossmann-like motif (RLM) unit of three beta-strands flanked by two alpha-helices and found all known protein structures containing the RLM. We show that RLM enzymes function predominantly in metabolism, covering 38% of reference metabolic pathways. We find that closely related RLM enzyme families can catalyze different reaction chemistries using similar folds. Alternatively, different RLM folds can converge on catalyzing the same reactions. We showed that RLM enzymes utilize ligands from 20 chemical superclasses of organic and inorganic compounds. Homologous RLM domains can exhibit diverging active sites that accommodate alternate ligands, but with similar binding modes. The Rossmann fold is considered one of the most ancient folds, utilizing iron-sulfur clusters as cofactors and being the part of ancient energy metabolism, the Wood-Ljungdahl pathway, used by LUCA. Our data suggests that the top three disease categories with mutations in RLM proteins are diseases of endocrine system, nervous system and developmental anomalies.
Minimal Rossmann-like motif (RLM) definition
Papers and Data:
1) Kirill E. Medvedev, Lisa N. Kinch, R. Dustin Schaeffer, Jimin Pei, Nick V. Grishin (2020) A Fifth of the Protein World: Rossmann-like Proteins as an Evolutionarily Successful Structural Unit. Journal of Molecular Biology 433(4): 166788 PMID: 33387532
Data based on ECOD version 20200517 (develop275):
2) Kirill E. Medvedev, Lisa N. Kinch, R. Dustin Schaeffer, Nick V. Grishin (2019) Functional analysis of Rossmann-like domains reveals convergent evolution of topology and reaction pathways. Plos Computational Biology 15(12): e1007569 PMID: 31869345
Data based on ECOD version 20181017 (develop214):
a) Data set of RLM containing PDBs. Each PDB mapped to UniProt KB, ECOD database, EC number (if available). Substrate and Product information was retrieved from KEGG, Modified residues, Nucleic acid contact and Polysaccharide contact—from Protein Data Bank.
Data based on ECOD version 20161205 (develop159):