Automated Gene Family Identification using Computational Methods
A significant proportion of genes in plants are members of multigene families. However, only a fraction have been discovered and characterized. Research aimed at the identification of specific gene families and their constituent members has increased significantly in last few decades. Although experimental approaches generally produce the most reliable results, they are time consuming and labor-intensive. Most strategies of gene family identification are computational approaches that take advantage of database mining and analysis tools to improve the capability and efficiency in dealing with large amounts of sequenced data.
Computational methods using EST datasets have been successful at identifying one or a few families at a time. What is needed is a less family-specific strategy that can identify many gene families at a time. In our research, we are developing automated techniques to search the Glycine Max (soybean) dbEST using seed genes and identify new gene families in soybean. Our method is based on Negative Selection Patterns (NSP). To verify the accuracy of our techniques, we are using Arabidopsis genome which is fully sequenced and publicly available.