RESEARCH ARTICLE Deep evolutionary analysis reveals the design principles of fold A glycosyltransferases Rahil Taujale1,2, Aarya Venkat3, Liang-Chin Huang1, Zhongliang Zhou4, Wayland Yeung1, Khaled M Rasheed4, Sheng Li4, Arthur S Edison1,2,3, Kelley W Moremen2,3, Natarajan Kannan1,3* 1Institute of Bioinformatics, University of Georgia, Athens, Georgia; 2Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia; 3Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia; 4Department of Computer Science, University of Georgia, Athens, Georgia Abstract Glycosyltransferases (GTs) are prevalent across the tree of life and regulate nearly all aspects of cellular functions. The evolutionary basis for their complex and diverse modes of catalytic functions remain enigmatic. Here, based on deep mining of over half million GT-A fold sequences, we define a minimal core component shared among functionally diverse enzymes. We find that variations in the common core and emergence of hypervariable loops extending from the core contributed to GT-A diversity. We provide a phylogenetic framework relating diverse GT-A fold families for the first time and show that inverting and retaining mechanisms emerged multiple times independently during evolution. Using evolutionary information encoded in primary sequences, we trained a machine learning classifier to predict donor specificity with nearly 90% accuracy and deployed it for the annotation of understudied GTs. Our studies provide an evolutionary framework for investigating complex relationships connecting GT-A fold sequence, structure, function and regulation. *For correspondence:
[email protected] Introduction Competing interests: The Complex carbohydrates make up a large bulk of the biomass of any living cell and play essential authors declare that no roles in biological processes ranging from cellular interactions, pathogenesis, immunity, quality con- competing interests exist.