Sequence Analysis
The Seq Analysis tab scans any DNA sequence for functional elements, cloning artifacts, and engineering indicators.
Input Methods
NCBI accession: Enter an accession (e.g.,
MH595533) and click “Fetch & Analyze”File upload: Drag & drop a FASTA or GenBank file
Paste/Edit: Type or paste directly into the textarea, modify, then click “Analyze Sequence”
Analysis Pipeline
1. Restriction Site Mapping
Scans for 25 common Type II restriction enzymes including:
6-cutters: EcoRI, BamHI, HindIII, SalI, XhoI, NdeI, NcoI, XbaI, PstI, SacI, KpnI, NotI
Type IIS (Golden Gate): BsaI, BbsI, BsmBI, SapI
Both forward and reverse complement orientations
Hotspot detection: 500 bp sliding windows with 3+ restriction sites indicate potential cloning junctions.
2. Promoter Prediction
Sigma-70 promoter consensus search:
-35 box: TTGAC[AT]
Spacer: 14-20 bp (optimal 17 bp)
-10 Pribnow box: TA[AT]AAT
Scored 0-100 based on consensus match and spacer length.
3. Ribosome Binding Sites
Shine-Dalgarno sequence detection:
Strong: AAGGAG, AGGAGG
Moderate: AGGAG
Weak: GAGG
Checks for downstream ATG start codon within 5-13 bp.
4. Vector Backbone Signatures
Detects sequences from common cloning vectors:
Signature |
Source |
Score impact |
|---|---|---|
T7/T3/SP6 promoter |
Synthetic only |
+15 (strong) |
lacZ alpha |
Synthetic only |
+15 (strong) |
f1 origin |
Synthetic only |
+15 (strong) |
CMV promoter |
Synthetic only |
+15 (strong) |
ColE1/pBR322/pUC origin |
Shared natural/synthetic |
+3 (weak) |
5. IS Element & Transposon Detection
Scans for terminal inverted repeats of common IS families:
IS1, IS26, IS903, IS10, IS3, IS5, ISEcp1
Tn3, Tn21, Tn1721
IS elements near RE hotspots reduce the engineering score (natural transposon boundaries, not cloning junctions).
6. Direct Repeats / Target Site Duplications
Finds 4-12 bp direct repeats separated by 500-5000 bp. These are created when a transposon inserts into DNA and are hallmarks of natural mobile element activity.
7. NTEKPC Transposon Detection
For plasmids carrying blaKPC:
Tn4401: Classic ISKpn7-blaKPC-ISKpn6 structure
NTEKPC-IId: ISKpn27-blaKPC-tnpA structure
NTEKPC variants: Other non-Tn4401 contexts
8. Codon Usage Analysis
Per-window Codon Adaptation Index (CAI) relative to E. coli K-12. High CAI regions (>0.7) may indicate codon optimization. Low CAI regions (<0.2) indicate unusual codon usage.
9. K-mer Naturalness
4-mer frequency analysis:
Entropy ratio: Natural DNA has higher entropy than engineered
Palindrome fraction: High palindrome content correlates with restriction site density
Perfect repeats: Long identical repeats (>20 bp) are common in synthetic constructs
10. Mobilization Assessment
Classifies plasmid transfer capability from CDS annotations:
Category |
Criteria |
Example |
|---|---|---|
Self-transmissible |
Relaxase + T4SS |
Large IncF conjugative plasmids |
Mobilizable |
Relaxase, no T4SS |
Many small ColE1 plasmids |
Mobilizable (MOB) |
MOB genes, no T4SS |
MH595533 (mobA + mobC) |
Retro-mobilizable |
Only oriT |
Can only transfer with helper |
Non-mobilizable |
No transfer genes |
Cryptic plasmids |
Engineering Score
Composite score (0-100) calibrated against the PLSDB natural plasmid baseline:
Engineering scars (paired BsaI sites, MCS): +20-40 per scar
Synthetic-specific vector signatures: +15 each
Shared natural/synthetic signatures: +3 each
RE density above PLSDB 95th percentile: +5-15
Unexplained RE hotspots (not near IS elements): +5-15
IS elements present: -5 to -15 (natural indicator)
K-mer deviation: +0-20
Interpretation
Range |
Classification |
Typical examples |
|---|---|---|
0-19 |
Natural |
Wild-type resistance plasmids |
20-39 |
Natural (resistance platform) |
Multi-drug resistance plasmids with natural origins |
40-59 |
Ambiguous |
Plasmids with ColE1-type origins (shared with lab vectors) |
60-79 |
Likely engineered |
Plasmids with synthetic promoters or MCS |
80-100 |
Engineered |
Lab vectors (pUC19, pET, etc.) |
Case Study: MH595533 (pKPN535a)
This 14,873 bp IncQ1 plasmid carries blaKPC-2 and was isolated from Klebsiella pneumoniae.
Score: 45/100 (Ambiguous)
Why ambiguous, not engineered?
ColE1/pBR322 origins are shared between natural IncQ plasmids and lab vectors (+3 each, not +15)
No synthetic-specific signatures (no T7 promoter, no lacZ, no f1 ori)
No IS elements detected (the ISKpn27 flanking blaKPC uses a different signature)
Direct repeats found (7 TSDs) — evidence of natural transposon activity
Mobilization: mobA + mobC present, classified as “Mobilizable (MOB genes)”