The number of unique transmembrane (TM) protein structures doubled in the last four years that can be attributed to the revolution of cryo-electron microscopy. In addition, the AlphaFold2 (AF2) deep learning algorithm also provided a large number of predicted structures with high quality. However, if a specific protein family is the subject of a study, collecting the structures of the family members is highly challenging in spite of existing general and protein domain-specific databases.
We demonstrate this and assess the applicability of automatic collection of protein structures via the ABC protein superfamily. We developed a pipeline to identify and classify transmembrane ABC protein structures and also to determine their conformational states based on special geometric measures, conftors. This and similar processes need alignment of structures with a run time of 1-10s that was feasible on the scale of experimental structures (n<300K). However, the ~100M theoretical, high quality AF2 protein structures renders the calculations challenging and requires reimplementation of various algorithms.
Since the AlphaFold database contains structure predictions only for single chains, we performed AF-Multimer predictions for human ABC half transporters functioning as dimers. Our AF2 predictions warn of possibly ambiguous interpretation of some biochemical data regarding interaction partners and call for further experiments and experimental structure determination. In order to organize structural data and made novel structure predictions and their annotation available for the broader scientific community, we joined the 3D-Beacon Network community to develop data and API standards.