Info

NCBI taxon id: 1100960  NCBI; ENA; GoaT
Order: Lepidoptera
Family: Oecophoridae
NCBI lineage: Eukaryota;Metazoa;Ecdysozoa;Arthropoda;Hexapoda;Insecta;Pterygota;Neoptera;Endopterygota;Lepidoptera;Glossata;Ditrysia;Gelechioidea;Oecophoridae;Oecophorinae;Crassa;
GoaT genome size (M): 600 (ancestor)
GoaT asm span (M): 397 (ancestor)
GoaT chr no.: 21 (ancestor)
GoaT haploid no.: 11 (ancestor)
GoaT ploidy: 2 (ancestor)
ToLID prefix: ilCraUnit

Specimens

Below is information about specimens collected for this species retrieved from the Sample Tracking System (STS).

tolid specimen_id gal collector_affiliation date_of_collection sex organism_part biosample biospecimen lifestage symbiont family order or group genus taxon_id scientific_name common_name tube_or_well_id

Spectra estimates

Below are estimates of genome size, repeat size, heterozygosity based on k-mer specta analysis with GenomeScope2.

source specimen k-mer k-cov haploid size repeat (%) heterozygosity (%) model fit (%) model error (%) histogram

Sequence data


PacBio run data

Below are stats for each PacBio seqeuncing run collected for this species.

pipeline model specimen sample date instrument run id movie well movie length tag tag sequence library load name reads yield N50 PCR dups (%) filtered (%) A (%) C (%) G (%) T (%) sample accession run accession exp accession study accession species barcode

ONT run data

Below are stats for each ONT seqeuncing run collected for this species.

pipeline model specimen sample date instrument run id flowcell type library name reads yield N50 A (%) C (%) G (%) T (%) sample accession species report

Illumina run data

Below are stats for each Illumina run collected for this species. Click on a row to see associated plots from samtools stats.

pipeline model source specimen date run id read pairs yield avg qual avg length mapped bases (%) dups (%) MQ0 (%) avg insert sample accession run accession exp accession study accession sample tag sequence tag2 sequence run status npg status species barcode

Cobionts

Below are results from a screen of the PacBio data using Mash screen against RefSeq assemblies. Only results with identity over 90% are displayed.

identity shared-hashes median-multiplicity p-value query info

Species composition by small subunit (SSU) presence in the assembly with MarkerScan.

specimen contig SSU length attributed taxonomy by SSU cluster

MarkerScan cobiont assembly by read separation based on observed families (see above). These reads are both aligned to the assembly and independently re-assembled. The quality of these assemblies is assessed by their completeness according to BUSCO, their span and the number of reads they encompass. For more information here.

specimen family classified reads original assembly re-assembly
count (%) BUSCO BUSCO contigs contig length number of reads BUSCO contigs contig length number of reads

Visualisation of a classification of the PacBio reads using a variation autoencoder on the k-mer counts.

specimen visualisation

Canonical tetranucleotide counts for each contig or scaffold reduced to two dimensions with UMAP to allow visualisation.

Features (colours represent quantile bins):

  • Hexamer: Estimated coding density (expected to be higher in microbes than in animals).
  • FastK: The median number of times each 60-mer in the sequence occures across the whole assembly (illustrates repetitiveness)
  • Unique_15mers: Number of unique 15-mers per base pair (illustrates sequence diversity)
  • Is_Connected: Presence of at least one Hi-C connection to another scaffold (absence of connections can indicate contamination)
  • Connections_Base: Number of Hi-C connections per base pair


Assemblies

In-progress assembly QC.

specimen asm asm version date contig N50 contigs largest contig scaffold N50 scaffolds largest scaffold length BUSCO BUSCO lineage BUSCO version merqury

Organelles

In-progress organelle results from MitoHiFi or Oatk.

specimen asm organelle date length genes frameshifts is circular seqs shared unique missing reference