Help

Do you have a question regarding or related to the database? Feel free to drop us an e-mail (please put [BiG-FAM HELP] on the subject). Be sure to first check these Frequently Asked Questions to see if your question is already answered there.

Purpose of the database

Example use cases

Exploring rhantipeptides BGC diversity

Rhantipeptides (previously known as "SCIFF peptides") are non-bacteriocin post-translationally modified peptides (RiPPs) prevalent in the genus Clostridia (ref), although GC-content analysis indicated that their genes might be horizontally-transferred (ref). Recent analysis shows that these peptides played an important role in regulating cell population, i.e. via quorum sensing mechanism (ref). During our previous effort in charting the global diversity of 1.2 million BGCs, we captured a large group (6,800) of putative rhantipeptide BGCs with diverse patterns of gene neighborhoods flanking the precursor peptides (ref).

To explore this diversity, we can use BiG-FAM’s "GCF search" function and use the two signature domains of this BGC class (AS-TIGR03973 and Radical_SAM) as query baits (Panel A). The search result shows 79 GCFs, each representing a distinct pattern of the BGCs and their distribution across the taxonomy (Panel B). By clicking on the link to each GCF’s detail page, we will be provided the information about the taxonomy, nucleotide length, calculated radius and biosynthetic features shared by BGCs within the GCF (Panel C). Furthermore, a comparative multi-genes visualization of those BGCs provides an all-in-one view on the diversity of gene neighborhoods flanking the rhantipeptide precursor genes (Panel D).

A. By clicking on the "GCF" page link (box 1) from the main menu, users will be provided an interface to search GCF based on multiple criteria, in this case we search for "bacterial GCFs harboring AS-TIGR03973 and Radical_SAM biosynthetic domains in at least ~80% of their BGCs" (box 2). B. After applying the filter function (box 3), BiG-FAM returned a list of 79 GCFs satisfying the criteria. C. Clicking on the "view" button of a GCF (box 4) will take users to a detail page that shows several statistics related to the GCF’s taxonomy, length, and features (domains) distribution. D. In the GCF detail page, users may also choose to view an "arrower" visualization of the BGCs (box 5), which in this case shows the occurrence of neighboring biosynthetic genes (depicted in colored arrows) flanking the queried cysteine-rich precursor + rSAM gene pairs (blue boxes).

GCF analysis on a newly sequenced Streptomyces

Recently, a draft genome has been published (ref) for Streptomyces tunisialbus, a new streptomycete species isolated from the rhizospheric soil of lavender plants (Lavandula officinalis) in Tunisia (ref).

To showcase how BiG-FAM can be used to assess biosynthetic novelty and capture distant relationships of newly sequenced BGCs, we downloaded the assembled genome from ENA (accession: OKRJ01) and uploaded it to antiSMASH web server, returning a unique job id ("bacteria/fungi-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx") which (after the run is done) can then directly be used to perform GCF analysis in BiG-FAM (Panel A). The entire analysis for the 36 antiSMASH-predicted BGCs was completed in less than a minute, resulting in a summary table of the best BGC-to-GCF hit pairs (Panel B).

One interesting BGC in this genome is the complete, 46.5 kilobase-pair long Type-I PKS protocluster from "Region 15.1", which shows an overall low hit rate both in its ClusterBlast and KnownClusterBlast results. A quick look at the GCF analysis result for the BGC shows a significant hit only to one singleton GCF (Panel C), which after a follow-up inspection turned out to be coming from the NCBI-submitted entry of the same genome (accession: GCA_900290435.1), suggesting the novelty of the PKS BGC in question.

Another useful feature is the "tracking" of biosynthetic domains of the query BGC across hundreds to thousands of distant BGCs, showing the domain architectural similarity shared between the genes (Panel D).

A. When users clicked on the "Query" section of the main menu (box 1), they will be an input form to put a finished antiSMASH or fungiSMASH job id in. After pressing "Submit", BiG-FAM will immediately execute (or put into queue) the downloading, preprocessing and GCF matching of all BGCs (i.e. regions) included in the submitted run. B. A list will then be shown with the summary of all best BGC-to-GCF pairings, with distance lower than 900 (original threshold value) highlighted in green, depicting a good match to at least one GCF in the database. A particular query BGC, "Region 15.1" was selected for a detailed look (box 3) as mentioned in the main text. C. A list of five best-matching GCFs and their model’s distances to the query BGC, showing an exact match (d=0) to a singleton GCF from Streptomyces (GCF_24649, box 4) which turned out to be the same BGC from the same genome. Looking at the visualization of the second closest GCF on the list (GCF_06303 with d = 1,609, box 5), we can see D. co-occurrence of protein domains across the distantly related BGCs, where some similar but non-identical PKS genes (longest multi-domain gene in each GCF) seems to act as an "anchor" that defines the GCF. While this group of anchor genes have similar domain architecture to the PKS gene of the queried BGC (box 6), a quick BLASTp analysis against one example gene (box 7) shows only 52.63% similarity, suggesting that the BGC does not actually belong to the GCF.

Frequently Asked Questions (FAQs)

How were the GCFs hosted in BiG-FAM calculated?

How do I compare my own BGC against the GCFs in BiG-FAM?

How do I search for BGCs or GCFs with certain characteristics (protein domains, taxonomy, etc.)?

How is BiG-FAM related to antiSMASH, MIBiG, BiG-SCAPE and BiG-SLiCE?

Can I set up a copy of this database on my own (local) servers?

What is the privacy policy of antiSMASH concerning the sequence data used for query mode?

From which studies were the genomes and MAGs used in BiG-FAM sourced?

How do I (bulk) download GCF data from BiG-FAM?

How do I cite BiG-FAM?