# Statistical Analytics The Analytics tab provides confound decomposition to answer: **are regional differences in plasmid mobility biological or sampling bias?** ## 1. Matched Comparison **Question**: Do mobility differences persist when comparing the same species across countries? **Method**: Heatmap of conjugative % for each (species, country) pair. Green = more conjugative, red = less. **Key finding**: *Escherichia* in China is 54% conjugative vs 35% in Germany — the effect persists within the same species, suggesting a real biological or sampling-context difference. ## 2. Rarefaction Analysis **Question**: Are mobility differences artifacts of sample size? **Method**: Each country subsampled to equal n (50 to 2000), bootstrapped 50 times. Error bars show standard deviation. **Interpretation**: If curves for different countries converge at the same value → differences are noise. If they stabilize at different values → differences are real. ## 3. XGBoost + SHAP **Question**: What features best predict mobility? **Method**: XGBoost classifier trained on 57,580 plasmids. Features: country, genus, host source, year, length, GC%, Inc groups. **Results**: 82.6% accuracy. SHAP values show feature importance. If "Country" ranks high after controlling for species, host source, and year → regional differences are independent. ## 4. Temporal Trends **Question**: Are AMR genes and Inc groups increasing or declining over time? **Method**: Line charts showing % prevalence per year (2010-2024), normalized by total plasmids submitted that year. ## 5. Simpson's Paradox Detector **Question**: Does the overall mobility trend for an Inc group reverse when stratified by species? **Method**: For each Inc group with >50 plasmids, compare overall conjugative % to per-species conjugative %. Flag cases with >20 percentage point divergence. **Finding**: No true paradoxes detected — regional mobility differences are consistent across species. ## 6. AMR Co-occurrence Network **Question**: Which resistance genes travel together? **Method**: Heatmap of pairwise gene co-occurrence on the same plasmid (minimum 100 co-occurrences). Reveals multi-drug resistance cassettes. ## 7. Integron & Gene Cassette Analysis **Question**: What gene cassettes are carried by class 1 integrons? **Method**: Identify plasmids with qacEdelta1 + sul1 (class 1 integron 3' conserved segment). Map all genes within 5 kb of qacEdelta1. **Finding**: 5,371 plasmids carry class 1 integrons. Top cassettes: sul1, aadA2, dfrA12, arr-3, catB3, blaOXA-1. 74% on conjugative plasmids. ## 8. Co-mobilization Analysis **Question**: Which mobilizable plasmids are likely co-transferred with conjugative plasmids? **Findings**: - **45%** of mobilizable plasmids share a host with a conjugative plasmid - IncFII conjugative plasmids most frequently carry ColRNAI mobilizable plasmids - ML predictor (77% accuracy) identifies relaxase type as the strongest predictor of co-mobilization ### Relaxase x T4SS Compatibility Heatmap of observed co-location vs literature-known compatibility: - MOBF → MPF_F (known) - MOBP → MPF_T, MPF_F (known) - MOBC → MPF_F (885 co-occurrences, **not in standard rules** — potential novel pathway) ## 9. Retro-mobilization & HGT Routes **Question**: How can non-mobilizable plasmids spread horizontally? **Method**: For each of the 28,816 non-mobilizable plasmids, check whether they share a bacterial host (same BioSample) with conjugative or mobilizable plasmids, and assess alternative transfer routes. **Key findings**: - **Retro-mobilization**: 9,555 (33%) share a host with a conjugative plasmid — the conjugative T4SS can pull non-mobilizable DNA back into the donor cell (reverse transfer) - **Mobilizable relay**: 10,736 (37%) have a mobilizable partner that could relay transfer - **Transduction**: 8,585 (30%) are small enough (<10 kb) to fit in phage capsids - **AMR at risk**: 2,226 non-mobilizable plasmids carry AMR genes AND have a conjugative partner — direct retro-mobilization risk for resistance dissemination **Conjugative partners**: IncFIB/FII, IncL/M, IncI-gamma/K1, IncN are the most frequent "helper" plasmids co-existing with non-mobilizable ones. ## 10. blaKPC Transposon Context **Question**: What genetic elements carry blaKPC, and are NTEKPC elements true transposons? **Method**: Analyse 4,800+ blaKPC-carrying plasmids. Map genes within 5 kb of blaKPC. Cross-reference with Inc groups and mobility. **Key findings**: - blaTEM is the most common gene near blaKPC (192 plasmids) — part of NTEKPC-IId structure - 59% of KPC plasmids are conjugative (self-transmissible) - IncFIB/FII dominate, followed by IncN and IncL/M - NTEKPC elements share structural features with classical transposons (IRs, TSDs, IS element associations), supporting their classification as true transposons ## 11. Integron ML & Transposon-AMR Correlations **Question**: What predicts integron carriage? Which IS families co-occur with which resistance classes? **Method**: Random Forest predicting class 1 integron carriage. IS family x AMR drug class co-occurrence heatmap from PGAP + AMR data. **Findings**: Plasmid length, Inc group, and genus are the strongest predictors of integron carriage. IS26 co-occurs most strongly with aminoglycoside and beta-lactam resistance.