Welcome to the Bioinformatics Research Lab

The Bioinformatics Research Laboratory (BRL) is a joint effort by the Department of Computer Science and the Department of Biological Sciences at Missouri S&T. Currently we are studying gene organization in Glycine Max (Soybean) using completely automated computational methods. One such method is based on the distinctive patterns of negative selection seen between gene family members.

Fast Automated Identification of Functionally Significant Inverted Repeats in Whole Genomes

When genomic sequences showing dyad symmetry (inverted repeats) are transcribed into RNA, they typically adopt hairpin-like, cloverleaf, or similar structures that act as recognition sites for proteins. Such structures are the precursors of important regulatory sequences like microRNA (miRNA) and small-interfering RNA (siRNA). Even more vital, are sequences like Transfer RNA (tRNA) and Ribosomal RNA (rRNA) that are involved in protein synthesis. However, the number of such sequences that have been identified is only a fraction of the number of inverted repeats that can be found in genomic DNA. This makes it difficult to claim that any inverted repeat in a genome has functional significance. However, by collecting statistically significant information from the known set of functional RNA that distinguishes them from other inverted repeats in a genome, we can isolate and identify potentially functional dyad symmetries among the tens of thousands of inverted repeats in a genome.

Automated Gene Family Identification using Computational Methods

A gene family is a set of genes defined by common ancestry (presumed homology). A significant proportion of genes that make up a genome are part of larger families of related genes resulting from duplications of individual genes, genomic segments, or even whole genomes. The study of the molecular processes by which functional innovation occurs interests not only evolutionary biologists, but protein engineers and medical and agricultural biologists. A clearer understanding of the extent to which gene families contribute to the selected traits in our most important crop species help guide decisions regarding future improvements.

A significant proportion of genes in plants are members of multigene families. However, only a fraction have been discovered and characterized. Research aimed at the identification of specific gene families and their constituent members has increased significantly in last few decades. Although experimental approaches generally produce the most reliable results, they are time consuming and labor-intensive. Most strategies of gene family identification are computational approaches that take advantage of database mining and analysis tools to improve the capability and efficiency in dealing with large amounts of sequenced data.

Computational methods using EST datasets have been successful at identifying one or a few families at a time. What is needed is a less family-specific strategy that can identify many gene families at a time. In our research, we are developing automated techniques to search the Glycine Max (soybean) dbEST using seed genes and identify new gene families in soybean. Our method is based on Negative Selection Patterns (NSP). To verify the accuracy of our techniques, we are using Arabidopsis genome which is fully sequenced and publicly available.