Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Novak, Jaroslav P.; Kim, Seon-Young; Xu, Jun; Modlich, Olga; Volsky, David J.; Honys, David; Slonczewski, Joan L.; Bell, Douglas A.; Blattner, Fred R.; Blumwald, Eduardo; Boerma, Marjan; Cosio, Manuel; Gatalica, Zoran; Hajduch, Marian; Hidalgo, Juan; McInnes, Roderick R.; Miller, Merrill C. 3rd; Penkowa, Milena; Rolph, Michael S.; Sottosanto, Jordan; St-Arnaud, Rene; Szego, Michael J.; Twell, David; Wang, Charles

1745-6150-1-27.pdf (1.3 MB)

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

journal contribution

posted on 2006-10-03, 13:11 authored by Jaroslav P. Novak, Seon-Young Kim, Jun Xu, Olga Modlich, David J. Volsky, David Honys, Joan L. Slonczewski, Douglas A. Bell, Fred R. Blattner, Eduardo Blumwald, Marjan Boerma, Manuel Cosio, Zoran Gatalica, Marian Hajduch, Juan Hidalgo, Roderick R. McInnes, Merrill C. 3rd Miller, Milena Penkowa, Michael S. Rolph, Jordan Sottosanto, Rene St-Arnaud, Michael J. Szego, David Twell, Charles Wang

Background:DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data. Results:Here we examine the expression data obtained from 682 Affymetrix GeneChips[superscript ®] with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution. Conclusion:In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the K[subscript α] coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the K[subscript α] distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.

History

Citation

Biology Direct, 2006, 1:27

Version

VoR (Version of Record)

Published in

Biology Direct

Publisher

Biomed Central

eissn

1745-6150

Copyright date

2006

Available date

2006-10-03

Publisher DOI

https://doi.org/10.1186/1745-6150-1-27

Publisher version

http://www.biologydirect.com/content/1/1/27

Language

en

Administrator link

https://leicester.figshare.com/account/articles/10078436

Usage metrics

Keywords

IR content

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

History

Citation

Version

Published in

Publisher

eissn

Copyright date

Available date

Publisher DOI

Publisher version

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports