Statistical Analytics

The Analytics tab provides confound decomposition to answer: are regional differences in plasmid mobility biological or sampling bias?

1. Matched Comparison

Question: Do mobility differences persist when comparing the same species across countries?

Method: Heatmap of conjugative % for each (species, country) pair. Green = more conjugative, red = less.

Key finding: Escherichia in China is 54% conjugative vs 35% in Germany — the effect persists within the same species, suggesting a real biological or sampling-context difference.

2. Rarefaction Analysis

Question: Are mobility differences artifacts of sample size?

Method: Each country subsampled to equal n (50 to 2000), bootstrapped 50 times. Error bars show standard deviation.

Interpretation: If curves for different countries converge at the same value → differences are noise. If they stabilize at different values → differences are real.

3. XGBoost + SHAP

Question: What features best predict mobility?

Method: XGBoost classifier trained on 57,580 plasmids. Features: country, genus, host source, year, length, GC%, Inc groups.

Results: 82.6% accuracy. SHAP values show feature importance.

If “Country” ranks high after controlling for species, host source, and year → regional differences are independent.

5. Simpson’s Paradox Detector

Question: Does the overall mobility trend for an Inc group reverse when stratified by species?

Method: For each Inc group with >50 plasmids, compare overall conjugative % to per-species conjugative %. Flag cases with >20 percentage point divergence.

Finding: No true paradoxes detected — regional mobility differences are consistent across species.

6. AMR Co-occurrence Network

Question: Which resistance genes travel together?

Method: Heatmap of pairwise gene co-occurrence on the same plasmid (minimum 100 co-occurrences). Reveals multi-drug resistance cassettes.

7. Integron & Gene Cassette Analysis

Question: What gene cassettes are carried by class 1 integrons?

Method: Identify plasmids with qacEdelta1 + sul1 (class 1 integron 3’ conserved segment). Map all genes within 5 kb of qacEdelta1.

Finding: 5,371 plasmids carry class 1 integrons. Top cassettes: sul1, aadA2, dfrA12, arr-3, catB3, blaOXA-1. 74% on conjugative plasmids.

8. Co-mobilization Analysis

Question: Which mobilizable plasmids are likely co-transferred with conjugative plasmids?

Findings:

  • 45% of mobilizable plasmids share a host with a conjugative plasmid

  • IncFII conjugative plasmids most frequently carry ColRNAI mobilizable plasmids

  • ML predictor (77% accuracy) identifies relaxase type as the strongest predictor of co-mobilization

Relaxase x T4SS Compatibility

Heatmap of observed co-location vs literature-known compatibility:

  • MOBF → MPF_F (known)

  • MOBP → MPF_T, MPF_F (known)

  • MOBC → MPF_F (885 co-occurrences, not in standard rules — potential novel pathway)

9. Retro-mobilization & HGT Routes

Question: How can non-mobilizable plasmids spread horizontally?

Method: For each of the 28,816 non-mobilizable plasmids, check whether they share a bacterial host (same BioSample) with conjugative or mobilizable plasmids, and assess alternative transfer routes.

Key findings:

  • Retro-mobilization: 9,555 (33%) share a host with a conjugative plasmid — the conjugative T4SS can pull non-mobilizable DNA back into the donor cell (reverse transfer)

  • Mobilizable relay: 10,736 (37%) have a mobilizable partner that could relay transfer

  • Transduction: 8,585 (30%) are small enough (<10 kb) to fit in phage capsids

  • AMR at risk: 2,226 non-mobilizable plasmids carry AMR genes AND have a conjugative partner — direct retro-mobilization risk for resistance dissemination

Conjugative partners: IncFIB/FII, IncL/M, IncI-gamma/K1, IncN are the most frequent “helper” plasmids co-existing with non-mobilizable ones.

10. blaKPC Transposon Context

Question: What genetic elements carry blaKPC, and are NTEKPC elements true transposons?

Method: Analyse 4,800+ blaKPC-carrying plasmids. Map genes within 5 kb of blaKPC. Cross-reference with Inc groups and mobility.

Key findings:

  • blaTEM is the most common gene near blaKPC (192 plasmids) — part of NTEKPC-IId structure

  • 59% of KPC plasmids are conjugative (self-transmissible)

  • IncFIB/FII dominate, followed by IncN and IncL/M

  • NTEKPC elements share structural features with classical transposons (IRs, TSDs, IS element associations), supporting their classification as true transposons

11. Integron ML & Transposon-AMR Correlations

Question: What predicts integron carriage? Which IS families co-occur with which resistance classes?

Method: Random Forest predicting class 1 integron carriage. IS family x AMR drug class co-occurrence heatmap from PGAP + AMR data.

Findings: Plasmid length, Inc group, and genus are the strongest predictors of integron carriage. IS26 co-occurs most strongly with aminoglycoside and beta-lactam resistance.