The Biosynthetic Gene Cluster Family (GCF) database is an online repository for "homologous" groups of biosynthetic gene clusters (BGCs) putatively encoding the production of similar specialized metabolites. By taking large-scale, global collections of BGCs identified from currently available genomes and MAGs as a data source, BiG-FAM provides an explorable "atlas" of microbial secondary metabolic diversity to browse and search biosynthetic diversity across taxa. BiG-FAM facilitates querying putative BGCs to rapidly find their position on the diversity map and gain a better understanding of their novelty or (probable) functions, based on relationships with other known and predicted BGCs from publicly available data.
This database version (1.0.0) is based on a GCF clustering of 1,225,071 BGCs taken from multiple publicly available sources (details). This large-scale analysis was performed using the BiG-SLiCE software (1.0.0) with an arbitrary clustering threshold (T=900.0), which resulted in the construction of 29,955 GCF models, each representing distinct protein domain and sequence features shared by the BGCs.
Please read these important caveats before working with the data!