Rigorous analysis of the structures of thousands of plant proteins by Tetsuya Sakurai and colleagues from the RIKEN Center for Sustainable Resource Science has led to the construction of a database that will help scientists identify the functions of more plant genes.
Although the complete genomes have been sequenced for a number of plants and their genes have been identified, the functions of many of these genes remain unknown. Arabidopsis thaliana, also known as thale cress, is one such model plant with a fully sequenced genome, and due to the genome's small size is used routinely in plant research. Yet despite the entire genome structure of Arabidopsis having been known for 15 years, the function of at least one third of its genes remains unclear. As these gaps in knowledge can hamper research, Sakurai and his colleagues from RIKEN and Tokyo University of Agriculture and Technology aimed to fill some of these gaps by analyzing the structures of proteins encoded by these 'unknown' genes.
"Each protein is folded, causing the linear chain of amino acids forming the protein to attain a defined three-dimensional structure," explains Sakurai. "Information about these folded protein structures can help to elucidate the function of the corresponding gene because the structures are highly specific and related to each protein's functions."
Using published information to target only non-redundant protein sequences, the researchers performed computational modeling to predict the physicochemical and structural properties of these proteins from the complete genomes of Arabidopsis and five other plants: soybean, poplar, rice, moss and alga.
The team then analyzed the features of the three-dimensional structures that are specific to proteins with particular functions. For example, for each protein they counted the number of transmembrane helices—coiled structures characteristic of proteins that sit in cell membranes and act as receptors for external molecules or as channels for cross-membrane molecular transmission.
With further analysis of the protein structures, the regions in the proteins most likely to be functional were identified. The team identified over 52,000 functional regions in proteins from the six plants. The results formed the basis for a new RIKEN database, called the Plant Protein Annotation Suite, or Plant-PrAS.
"Protein structural research in plants lags behind that in animals and bacteria with respect to the structural analysis of individual proteins and gene functional annotation," says Sakurai. "We developed the Plant-PrAS database to resolve such problems. It houses unique information about the plant proteome, which is downloadable and extensively searchable."
Explore further: Unravelling the complexity of proteins
Kurotani, A., Yamada, Y., Shinozaki, K., Kuroda, Y. & Sakurai, T. "Plant-PrAS: a database of physicochemical and structural properties and novel functional regions in plant proteomes." Plant and Cell Physiology 56, e11 (2015). DOI: 10.1093/pcp/pcu176