CONSTRUCTION AND VERIFICATION OF MATHEMATICAL MODEL OF MASS SPECTROMETRY DATA

Małgorzata Plechawska-Wójcik

gosiap@cs.pollub.pl
Lublin University of Technology, Faculty of Electrical Engineering, Institute of Computer Science, Lublin (Poland)

Abstract

The article presents issues concerning construction, adjustment and implementation of mass spectrometry mathematical model based on Gaussians and Mixture Models and the mean spectrum. This task is essential to the analysis and it needs specification of many parameters of the model.


Keywords:

Maldi-Tof mass spectrometry, Gaussians, Gaussian Mixture Models, SVM-RFE classification

Akaike H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control, 9 s.716–723, 1974.
DOI: https://doi.org/10.1109/TAC.1974.1100705   Google Scholar

Baggerly K.A., Morris J., Wang J., Gold D., Xiao L.C., Coombes K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics, s. 1667–1672, 2003.
  Google Scholar

Banfield J., Raftery A.: Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 s. 803–821, 1993.
DOI: https://doi.org/10.2307/2532201   Google Scholar

Boster B., Guyon I., Vapnik V.: A training algorithm for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory, s. 114– 152, 1992.
DOI: https://doi.org/10.1145/130385.130401   Google Scholar

Bozdogan H.: Choosing the number of component clusters in the mixturemodel using a new informational complexity criterion of the inverse-fisher informational matrix. Springer-Verlag,Heidelberg, 19 s. 40–54, 1993.
DOI: https://doi.org/10.1007/978-3-642-50974-2_5   Google Scholar

Bozdogan H.: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statictics, Theory and Methods, 19 s. 221–278, 1990.
DOI: https://doi.org/10.1080/03610929008830199   Google Scholar

Celeux G., Soromenho G.: An entropy criterion for assessing the number of clusters in a mixture model. Classification Journal, 13, s. 195–212, 1996.
DOI: https://doi.org/10.1007/BF01246098   Google Scholar

Clyde M.A., House L.L., Wolpert R.L.: Nonparametric models for proteomic peak identification and quatification. ISDS Discussion Paper, s. 2006–2007, 2006.
  Google Scholar

Coombes K., Baggerly K., Morris J.: Pre-processing mass spectrometry data, Fundamentals of Data Mining in Genomics and Proteomics, W Dubitzky, M Granzow, and D Berrar, eds. Kluwer, s. 79-99. 2007, Boston.
DOI: https://doi.org/10.1007/978-0-387-47509-7_4   Google Scholar

Coombes K.R., Koomen J.M., Baggerly K.A., Morris J., Kobayashi R.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Informatics, 1 s. 41–52, 2005.
DOI: https://doi.org/10.1177/117693510500100103   Google Scholar

Comon P.: Independent component analysis – a new concept? Signal Processing, 36 s. 287–314, 1994.
DOI: https://doi.org/10.1016/0165-1684(94)90029-9   Google Scholar

Dempster A.P., Laird N.M., Rubin D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., 39,1 s. 1-38, 1977.
DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x   Google Scholar

Du P., Kibbe W., Lin S.: Improved peak detection in mass spectrum by incorporating continous wavelet transform-based pattern matching. Genome analysis, 22 s. 2059-2065, 2006.
  Google Scholar

Dubitzky W., Granzow M., Berrar D.: Fundamentals of data mining in genomics and proteomics. Springer, Kluwer Boston, 2007.
DOI: https://doi.org/10.1007/978-0-387-47509-7   Google Scholar

Fung E.T., Enderwick C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques, Suppl., 32 s. 34–41, 2002.
DOI: https://doi.org/10.2144/mar0205   Google Scholar

Gyaourova A., Kamath C., Fodor I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA, 2002.
DOI: https://doi.org/10.2172/15002085   Google Scholar

Gentzel M., Kocher T., Ponnusamy S., Wilm M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics, 3, s. 1597–1610, 2003.
  Google Scholar

Gras R., Muller M., Gasteiger E., Gay S., Binz P.A., Bienvenut W., Hoogland C., Sanchez J.C., Bairoch A., Hochstrasser D.F., Appel R.D.: Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20 s. 3535-3550, 1999.
  Google Scholar

Jutten C., H´erault J.. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24 s. 1-10, 1991.
DOI: https://doi.org/10.1016/0165-1684(91)90079-X   Google Scholar

Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Nonlinear processing of a shift invariant DWT for noise reduction. Proc. SPIE. Wavelet Applications II, 2491 s. 640-651, 1995.
  Google Scholar

Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Noise reduction using an undecimated discrete wavelet transform. IEEE Signal Processing Letters, 3 s. 10-12, 1996.
DOI: https://doi.org/10.1109/97.475823   Google Scholar

Lewandowicz A., Bakun M., Imiela J., Dadlez M.: Proteomika w uronefrologii - nowe perspektywy diagnostyki nieinwazyjnej? Nefrologia i dializoterapia polska, 1 s. 15–21, 2009.
  Google Scholar

Mantini D., Petrucci F., Pieragostino D., Del Boccio P., Di Nicola M., Di Ilio C., Federici G., Sacchetta P., Comani S., Urbani A.: Limpic: a computational method for the separation of protein signals from noise. BMC Bioinformatics, 8:101, 2007.
  Google Scholar

Mantini D., Petrucci F., Del Boccio P., Pieragostino D., Di Nicola M., Lugaresi A., Federici G., Sacchetta P., Di Ilio C., Urbani A.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics, 24 s.63 – 70, 2008.
DOI: https://doi.org/10.1093/bioinformatics/btm533   Google Scholar

McLachlan G.: Finite mixture models. John Wiley and Sons, 2001.
DOI: https://doi.org/10.1002/0471721182   Google Scholar

Morris J., Coombes K., Kooman J., Baggerly K., Kobayashi R..: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics, 21(9): 1764-1775. 2005.
DOI: https://doi.org/10.1093/bioinformatics/bti254   Google Scholar

Norris J., Cornett D., Mobley J., Anderson M., Seeley E., Chaurand P, Caprioli R.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis. National institutes of health. 2007, USA.
DOI: https://doi.org/10.1016/j.ijms.2006.10.005   Google Scholar

Plechawska-Wójcik M.: Comprehensive analysis of mass spectrometry data – a case study. Foundations of Computing and Decision Sciences. Vol. 36 - No. 3-4, s. 275-292, 2011.
  Google Scholar

Plechawska M.: Comparing and similarity determining of gaussian distributions mixtures. Polish Journal of Environmental Studies, 17, No. 3B s. 341–346, 2008.
  Google Scholar

Polanska J., Plechawska M.: Comparison of convergence criterions used in expectation-maximization algorithm. Symbiosis, 2008.
  Google Scholar

Randolph T., Mithcell B., McLerran D., Lampe P., Feng Z.: Quantifying peptide signal in maldi-tof mass spectrometry data. Molecular & Cellular Proteomics, 4 s. 1990–1999, 2005.
  Google Scholar

Schwarz G.: Estimating the dimension of a model. Annals of Statistics, 6 s. 461–464, 1978.
DOI: https://doi.org/10.1214/aos/1176344136   Google Scholar

Tibshirani R., Hastiey T., Narasimhanz B., Soltys S., Shi G., Koong A., Le Q.T.: Sample classification from protein mass spectrometry, by ’peak probability contrasts’. Bioinformatics, 20 s. 3034 – 3044, 2004.
  Google Scholar

Tversky A., Hutchinson J.W.: Nearest neighbor analysis of psychological spaces. Psychological review, 93(1) s. 3–22, 1993.
DOI: https://doi.org/10.1037/0033-295X.93.1.3   Google Scholar

Vapnik V.N.: The Nature of Statistical Learning Theory. Springer, 1995.
DOI: https://doi.org/10.1007/978-1-4757-2440-0   Google Scholar

Vapnik V.N.: Statistical Learning Theory. Wiley, 1998.
  Google Scholar

Windham M.P. Cutler A.: Information ratios for validating cluster analyses. Journal of the American Statistical Association, 87 s. 1188–1192, 1993.
DOI: https://doi.org/10.1080/01621459.1992.10476277   Google Scholar

Wold H.: Estimation of principal components and related models by iterative least squares. Multivariate Analysis, s. 391–420, 1966.
  Google Scholar

Yasui Y., Pepe M., Thompson M.L., Adam B.L., Wright G.L., Qu Y., Potter J.D., Winget M., Thornquist M., Feng Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics, 4 s. 449-463, 2003.
DOI: https://doi.org/10.1093/biostatistics/4.3.449   Google Scholar

Download


Published
2013-02-14

Cited by

Plechawska-Wójcik, M. . (2013). CONSTRUCTION AND VERIFICATION OF MATHEMATICAL MODEL OF MASS SPECTROMETRY DATA. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 3(1), 9–14. https://doi.org/10.35784/iapgos.1430

Authors

Małgorzata Plechawska-Wójcik 
gosiap@cs.pollub.pl
Lublin University of Technology, Faculty of Electrical Engineering, Institute of Computer Science, Lublin Poland

Statistics

Abstract views: 196
PDF downloads: 130