Sequence Analysis

The Seq Analysis tab scans any DNA sequence for functional elements, cloning artifacts, and engineering indicators.

Input Methods

  1. NCBI accession: Enter an accession (e.g., MH595533) and click “Fetch & Analyze”

  2. File upload: Drag & drop a FASTA or GenBank file

  3. Paste/Edit: Type or paste directly into the textarea, modify, then click “Analyze Sequence”

Analysis Pipeline

1. Restriction Site Mapping

Scans for 25 common Type II restriction enzymes including:

  • 6-cutters: EcoRI, BamHI, HindIII, SalI, XhoI, NdeI, NcoI, XbaI, PstI, SacI, KpnI, NotI

  • Type IIS (Golden Gate): BsaI, BbsI, BsmBI, SapI

  • Both forward and reverse complement orientations

Hotspot detection: 500 bp sliding windows with 3+ restriction sites indicate potential cloning junctions.

2. Promoter Prediction

Sigma-70 promoter consensus search:

  • -35 box: TTGAC[AT]

  • Spacer: 14-20 bp (optimal 17 bp)

  • -10 Pribnow box: TA[AT]AAT

Scored 0-100 based on consensus match and spacer length.

3. Ribosome Binding Sites

Shine-Dalgarno sequence detection:

  • Strong: AAGGAG, AGGAGG

  • Moderate: AGGAG

  • Weak: GAGG

Checks for downstream ATG start codon within 5-13 bp.

4. Vector Backbone Signatures

Detects sequences from common cloning vectors:

Signature

Source

Score impact

T7/T3/SP6 promoter

Synthetic only

+15 (strong)

lacZ alpha

Synthetic only

+15 (strong)

f1 origin

Synthetic only

+15 (strong)

CMV promoter

Synthetic only

+15 (strong)

ColE1/pBR322/pUC origin

Shared natural/synthetic

+3 (weak)

5. IS Element & Transposon Detection

Scans for terminal inverted repeats of common IS families:

  • IS1, IS26, IS903, IS10, IS3, IS5, ISEcp1

  • Tn3, Tn21, Tn1721

IS elements near RE hotspots reduce the engineering score (natural transposon boundaries, not cloning junctions).

6. Direct Repeats / Target Site Duplications

Finds 4-12 bp direct repeats separated by 500-5000 bp. These are created when a transposon inserts into DNA and are hallmarks of natural mobile element activity.

7. NTEKPC Transposon Detection

For plasmids carrying blaKPC:

  • Tn4401: Classic ISKpn7-blaKPC-ISKpn6 structure

  • NTEKPC-IId: ISKpn27-blaKPC-tnpA structure

  • NTEKPC variants: Other non-Tn4401 contexts

8. Codon Usage Analysis

Per-window Codon Adaptation Index (CAI) relative to E. coli K-12. High CAI regions (>0.7) may indicate codon optimization. Low CAI regions (<0.2) indicate unusual codon usage.

9. K-mer Naturalness

4-mer frequency analysis:

  • Entropy ratio: Natural DNA has higher entropy than engineered

  • Palindrome fraction: High palindrome content correlates with restriction site density

  • Perfect repeats: Long identical repeats (>20 bp) are common in synthetic constructs

10. Mobilization Assessment

Classifies plasmid transfer capability from CDS annotations:

Category

Criteria

Example

Self-transmissible

Relaxase + T4SS

Large IncF conjugative plasmids

Mobilizable

Relaxase, no T4SS

Many small ColE1 plasmids

Mobilizable (MOB)

MOB genes, no T4SS

MH595533 (mobA + mobC)

Retro-mobilizable

Only oriT

Can only transfer with helper

Non-mobilizable

No transfer genes

Cryptic plasmids

Engineering Score

Composite score (0-100) calibrated against the PLSDB natural plasmid baseline:

  • Engineering scars (paired BsaI sites, MCS): +20-40 per scar

  • Synthetic-specific vector signatures: +15 each

  • Shared natural/synthetic signatures: +3 each

  • RE density above PLSDB 95th percentile: +5-15

  • Unexplained RE hotspots (not near IS elements): +5-15

  • IS elements present: -5 to -15 (natural indicator)

  • K-mer deviation: +0-20

Interpretation

Range

Classification

Typical examples

0-19

Natural

Wild-type resistance plasmids

20-39

Natural (resistance platform)

Multi-drug resistance plasmids with natural origins

40-59

Ambiguous

Plasmids with ColE1-type origins (shared with lab vectors)

60-79

Likely engineered

Plasmids with synthetic promoters or MCS

80-100

Engineered

Lab vectors (pUC19, pET, etc.)

Case Study: MH595533 (pKPN535a)

This 14,873 bp IncQ1 plasmid carries blaKPC-2 and was isolated from Klebsiella pneumoniae.

Score: 45/100 (Ambiguous)

Why ambiguous, not engineered?

  • ColE1/pBR322 origins are shared between natural IncQ plasmids and lab vectors (+3 each, not +15)

  • No synthetic-specific signatures (no T7 promoter, no lacZ, no f1 ori)

  • No IS elements detected (the ISKpn27 flanking blaKPC uses a different signature)

  • Direct repeats found (7 TSDs) — evidence of natural transposon activity

  • Mobilization: mobA + mobC present, classified as “Mobilizable (MOB genes)”