SCIENTIFIC EDUCATIONAL CENTER science idea

An international group of scientists has made a major update to HOCOMOCO, a database of models of nucleotide sequences of DNA sections binding transcription factors, created in 2013. The article was published in the journal Nuclear Acid Research.

HOCOMOCO in Russian stands for "An extensive collection of models for Homo Sapiens." This database stores models of transcription factor binding sites. Each such model is a mathematical representation of DNA regions that a transcription factor can bind to — one of the proteins that suppress or, conversely, activate the work of various genes. More than one and a half thousand transcription factors are encoded in the human genome. The database stores models of transcription factor binding sites for mouse and human orthologs — genes that originated from the same gene in a species — the common ancestor of mouse and human.

Researchers from all over the world turn to such a database as HOCOMOCO for their experiments. Using models from the database, it is possible, for example, to predict the binding sites of transcription factors to nucleotides in the DNA chain. That is, with places in the genome where gene expression is influenced by transcription factors. After that, based on the predicted binding sites, it is possible to build models of regulatory networks that explain the mechanisms of gene switching in various conditions. Such networks are needed to understand the biological picture of gene expression in a particular process, for example, in the development of cancer.

In order to create such a database, scientists collected the results of experiments on the interaction of DNA and transcription factors from several open databases. An important role in this work was played by the staff of the Institute of Information and Computing Technologies of the Siberian Branch of the Russian Academy of Sciences under the leadership of Fyodor Kolpakov, head of the Bioinformatics Laboratory. Thanks to their work, it was possible to obtain a huge collection of DNA fragments binding to proteins. To create HOCOMOCO, motifs were found in these fragments using computational analysis of the DNA text - small DNA sequences with which transcription factors bind. Before the motifs get into the final database, they are annotated — with the help of special tools, the structure of the corresponding proteins and their function are determined. Further, reliability is determined for motives, showing how reliable the interaction of the DNA carrying the motif and the transcription factor is in experiments of various types.

But this is not the only test. Before entering the database, each model undergoes computational experiments to see how well it helps predict the binding sites of DNA to the transcription factor. The results of computational experiments are compared with real data obtained in the laboratory. Based on several different comparisons of each model, estimates of accuracy, sensitivity and specificity are made. After all these procedures, the final line for each model is compiled into a common database. The database is open, and scientists from all over the world can use the data from it to plan their experiments. Compared to the previous version, more new models have appeared in HOCOMOCO, their accuracy and validity have increased. In addition, collections of special models have been built to predict DNA-protein interactions in vivo, in vitro, as well as to predict individual variants in the genome that affect the binding of regulatory proteins.

"We believe that HOCOMOCO is a reliable database that expands the possibilities of molecular biology and epigenetics. To replenish and update it, our team studied data from 14,183 ChIP sequencing experiments and 2,554 HT-SELEX experiments, which made it possible to obtain more than 400 thousand candidate motifs, from which 1,443 motifs were selected characterizing DNA regions binding 949 human transcription factors and 720 of their mouse analogues," says Vsevolod Makeev, Corresponding Member of the Russian Academy of Sciences, Head of the Laboratory of Systems Biology and Computational Genetics, IOGen named after N.I. Vavilova RAS, Head of the Department of Bioinformatics and Systems Biology at MIPT.

The work was attended by scientists from the N.I. Vavilov Institute of General Genetics of the Russian Academy of Sciences, the Institute of Information and Computing Technologies (Novosibirsk), the Institute of Protein Research of the Russian Academy of Sciences, Lomonosov Moscow State University, MIPT, the Institute of Biochemistry and Genetics of the Ufa FIT RAS, Skoltech, the Institute of Information Transmission Problems of the Russian Academy of Sciences, NUST Sirius", Biosoft LLC.<url>" (Novosibirsk), Scientific Research Center of Biotechnology of the Russian Academy of Sciences, Institute of Fundamental Medicine and Biology (Kazan), as well as from the USA and Canada.

Information provided by the MIPT press service

Photo source: ru.123rf.com

The information is taken from the portal "Scientific Russia" (https://scientificrussia.ru /)

Certificate of registration of mass media ЭЛ № ФС 77 - 78868 issued by Roskomnadzor on 07.08.2020