The secondary metabolism of bacteria, fungi, and plants constitutes a rich source of bioactive compounds of potential pharmaceutical value, comprising biosynthetic pathways of many chemicals that have been and are being utilized in medicine, food manufactoring and agriculture. The genes encoding the biosynthetic pathways responsible for the production of these metabolites are very often spatially clustered on the chromosome; these genomic loci are referred to as "biosynthetic gene clusters" (BGCs). This genetic architecture has opened up the possibility for straightforward detection of specialized metabolic capacities in the form of known and unknown biosynthetic pathways by locating their gene clusters.
With a drop in the costs of sequencing bacterial and fungal genomes (and the ability to reconstruct large numbers of genomes from metagenomes), large numbers of BGCs can now be found in publicly available data. To analyze which BGCs are similar between organisms, several algorithms have been developed to group BGCs into "gene cluster families" (GCFs), which represent groups of gene clusters from different genomes that are genetically similar and are involved in producing the same or similar compounds. Recently, the BiG-SLiCE algorithm was developed, which, for the first time, allowed reconstructing GCFs from all publicly available (meta)genomic data (ref).
The BiG-FAM database contains GCFs calculated by this tool from over 1.2 million genomes, and allows users to easily search and browse them to analyze patterns of biosynthetic diversity across taxa.
Additionally, it allows users to query their own BGCs against all GCFs contained in BiG-FAM, in order to see how their BGCs of interest are related to gene clusters from publicly available genomes.