University of Leicester
Browse

INPHARED_DATABASE

Download (4.22 GB)
dataset
posted on 2021-04-13, 07:44 authored by Andrew MillardAndrew Millard, Ryan Cook

inphared.pl (INfrastructure for a PHAge REference Database) is a perl script which downloads and filters phage genomes from Genbank to provide the most complete phage genome database possible.

Useful information, including viral taxonomy and bacterial host data, is extracted from the Genbank files and provided in a summary table. Genes are called on the genomes using Prokka and this output is used to gather metrics which are summarised in the output files, as well as useful input files for vConTACT2.

The data provided is all genomes up to Jan 2021. This can be downloaded so users do not have to repeat the process of consistent gene calling on existing genomes.

The folder GenomesDB contains subfolders each containing a subfolder that is named on the accession number of each phage.

Within each folder are re-called genes in the following format

*.ffn

*.faa

The complete genome *fna and genbank file without any annotation *gbf

See https://github.com/RyanCook94/







Funding

The MRC Consortium for Medical Microbial Bioinformatics

Medical Research Council

Find out more...

CLIMB-BIG-DATA: A Cloud Infrastructure for Big-Data Microbial Bioinformatics

Medical Research Council

Find out more...

History

Usage metrics

    Genetics and Genome Biology

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC