posted on 2008-01-23, 14:30authored byHong-Yu Ou, Rebecca J. Smith, Sacha Lucchini, Jay Hinton, Roy R. Chaudhuri, Mark Pallen, Michael R. Barer, Kumar Rajakumar
ArrayOme is a new program that calculates the size
of genomes represented by microarray-based
probes and facilitates recognition of key bacterial
strains carrying large numbers of novel genes.
Protein-coding sequences (CDS) that are contiguous
on annotated reference templates and classified
as ‘Present’ in the test strain by hybridization to
microarrays are merged into ICs (ICs). These ICs
are then extended to account for flanking intergenic
sequences. Finally, the lengths of all extended ICs
are summated to yield the ‘microarray-visualized
genome (MVG)’ size. We tested and validated
ArrayOme using both experimental and in silicogenerated
genomic hybridization data. MVG sizing
of five sequenced Escherichia coli and Shigella
strains resulted in an accuracy of 97–99%, as compared
to true genome sizes, when the comprehensive
ShE.coli meta-array gene sequences (6239 CDS)
were used for in silico hybridization analysis. However,
the E.coli CFT073 genome size was underestimated
by 14% as this meta-array lacked probes for
many CFT073 CDS. ArrayOme permits rapid recognition
of discordances between PFGE-measured
genome and MVG sizes, thereby enabling highthroughput
identification of strains rich in novel
genes. Gene discovery studies focused on these
strains will greatly facilitate characterization of the
global gene pool accessible to individual bacterial
species.