CONSTRUCTION AND VERIFICATION OF MATHEMATICAL MODEL OF MASS SPECTROMETRY DATA
Małgorzata Plechawska-Wójcik
gosiap@cs.pollub.plLublin University of Technology, Faculty of Electrical Engineering, Institute of Computer Science, Lublin (Poland)
Abstract
The article presents issues concerning construction, adjustment and implementation of mass spectrometry mathematical model based on Gaussians and Mixture Models and the mean spectrum. This task is essential to the analysis and it needs specification of many parameters of the model.
Keywords:
Maldi-Tof mass spectrometry, Gaussians, Gaussian Mixture Models, SVM-RFE classificationReferences
Akaike H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control, 9 s.716–723, 1974.
DOI: https://doi.org/10.1109/TAC.1974.1100705
Google Scholar
Baggerly K.A., Morris J., Wang J., Gold D., Xiao L.C., Coombes K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics, s. 1667–1672, 2003.
Google Scholar
Banfield J., Raftery A.: Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 s. 803–821, 1993.
DOI: https://doi.org/10.2307/2532201
Google Scholar
Boster B., Guyon I., Vapnik V.: A training algorithm for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory, s. 114– 152, 1992.
DOI: https://doi.org/10.1145/130385.130401
Google Scholar
Bozdogan H.: Choosing the number of component clusters in the mixturemodel using a new informational complexity criterion of the inverse-fisher informational matrix. Springer-Verlag,Heidelberg, 19 s. 40–54, 1993.
DOI: https://doi.org/10.1007/978-3-642-50974-2_5
Google Scholar
Bozdogan H.: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statictics, Theory and Methods, 19 s. 221–278, 1990.
DOI: https://doi.org/10.1080/03610929008830199
Google Scholar
Celeux G., Soromenho G.: An entropy criterion for assessing the number of clusters in a mixture model. Classification Journal, 13, s. 195–212, 1996.
DOI: https://doi.org/10.1007/BF01246098
Google Scholar
Clyde M.A., House L.L., Wolpert R.L.: Nonparametric models for proteomic peak identification and quatification. ISDS Discussion Paper, s. 2006–2007, 2006.
Google Scholar
Coombes K., Baggerly K., Morris J.: Pre-processing mass spectrometry data, Fundamentals of Data Mining in Genomics and Proteomics, W Dubitzky, M Granzow, and D Berrar, eds. Kluwer, s. 79-99. 2007, Boston.
DOI: https://doi.org/10.1007/978-0-387-47509-7_4
Google Scholar
Coombes K.R., Koomen J.M., Baggerly K.A., Morris J., Kobayashi R.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Informatics, 1 s. 41–52, 2005.
DOI: https://doi.org/10.1177/117693510500100103
Google Scholar
Comon P.: Independent component analysis – a new concept? Signal Processing, 36 s. 287–314, 1994.
DOI: https://doi.org/10.1016/0165-1684(94)90029-9
Google Scholar
Dempster A.P., Laird N.M., Rubin D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., 39,1 s. 1-38, 1977.
DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
Google Scholar
Du P., Kibbe W., Lin S.: Improved peak detection in mass spectrum by incorporating continous wavelet transform-based pattern matching. Genome analysis, 22 s. 2059-2065, 2006.
Google Scholar
Dubitzky W., Granzow M., Berrar D.: Fundamentals of data mining in genomics and proteomics. Springer, Kluwer Boston, 2007.
DOI: https://doi.org/10.1007/978-0-387-47509-7
Google Scholar
Fung E.T., Enderwick C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques, Suppl., 32 s. 34–41, 2002.
DOI: https://doi.org/10.2144/mar0205
Google Scholar
Gyaourova A., Kamath C., Fodor I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA, 2002.
DOI: https://doi.org/10.2172/15002085
Google Scholar
Gentzel M., Kocher T., Ponnusamy S., Wilm M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics, 3, s. 1597–1610, 2003.
Google Scholar
Gras R., Muller M., Gasteiger E., Gay S., Binz P.A., Bienvenut W., Hoogland C., Sanchez J.C., Bairoch A., Hochstrasser D.F., Appel R.D.: Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20 s. 3535-3550, 1999.
Google Scholar
Jutten C., H´erault J.. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24 s. 1-10, 1991.
DOI: https://doi.org/10.1016/0165-1684(91)90079-X
Google Scholar
Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Nonlinear processing of a shift invariant DWT for noise reduction. Proc. SPIE. Wavelet Applications II, 2491 s. 640-651, 1995.
Google Scholar
Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Noise reduction using an undecimated discrete wavelet transform. IEEE Signal Processing Letters, 3 s. 10-12, 1996.
DOI: https://doi.org/10.1109/97.475823
Google Scholar
Lewandowicz A., Bakun M., Imiela J., Dadlez M.: Proteomika w uronefrologii - nowe perspektywy diagnostyki nieinwazyjnej? Nefrologia i dializoterapia polska, 1 s. 15–21, 2009.
Google Scholar
Mantini D., Petrucci F., Pieragostino D., Del Boccio P., Di Nicola M., Di Ilio C., Federici G., Sacchetta P., Comani S., Urbani A.: Limpic: a computational method for the separation of protein signals from noise. BMC Bioinformatics, 8:101, 2007.
Google Scholar
Mantini D., Petrucci F., Del Boccio P., Pieragostino D., Di Nicola M., Lugaresi A., Federici G., Sacchetta P., Di Ilio C., Urbani A.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics, 24 s.63 – 70, 2008.
DOI: https://doi.org/10.1093/bioinformatics/btm533
Google Scholar
McLachlan G.: Finite mixture models. John Wiley and Sons, 2001.
DOI: https://doi.org/10.1002/0471721182
Google Scholar
Morris J., Coombes K., Kooman J., Baggerly K., Kobayashi R..: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics, 21(9): 1764-1775. 2005.
DOI: https://doi.org/10.1093/bioinformatics/bti254
Google Scholar
Norris J., Cornett D., Mobley J., Anderson M., Seeley E., Chaurand P, Caprioli R.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis. National institutes of health. 2007, USA.
DOI: https://doi.org/10.1016/j.ijms.2006.10.005
Google Scholar
Plechawska-Wójcik M.: Comprehensive analysis of mass spectrometry data – a case study. Foundations of Computing and Decision Sciences. Vol. 36 - No. 3-4, s. 275-292, 2011.
Google Scholar
Plechawska M.: Comparing and similarity determining of gaussian distributions mixtures. Polish Journal of Environmental Studies, 17, No. 3B s. 341–346, 2008.
Google Scholar
Polanska J., Plechawska M.: Comparison of convergence criterions used in expectation-maximization algorithm. Symbiosis, 2008.
Google Scholar
Randolph T., Mithcell B., McLerran D., Lampe P., Feng Z.: Quantifying peptide signal in maldi-tof mass spectrometry data. Molecular & Cellular Proteomics, 4 s. 1990–1999, 2005.
Google Scholar
Schwarz G.: Estimating the dimension of a model. Annals of Statistics, 6 s. 461–464, 1978.
DOI: https://doi.org/10.1214/aos/1176344136
Google Scholar
Tibshirani R., Hastiey T., Narasimhanz B., Soltys S., Shi G., Koong A., Le Q.T.: Sample classification from protein mass spectrometry, by ’peak probability contrasts’. Bioinformatics, 20 s. 3034 – 3044, 2004.
Google Scholar
Tversky A., Hutchinson J.W.: Nearest neighbor analysis of psychological spaces. Psychological review, 93(1) s. 3–22, 1993.
DOI: https://doi.org/10.1037/0033-295X.93.1.3
Google Scholar
Vapnik V.N.: The Nature of Statistical Learning Theory. Springer, 1995.
DOI: https://doi.org/10.1007/978-1-4757-2440-0
Google Scholar
Vapnik V.N.: Statistical Learning Theory. Wiley, 1998.
Google Scholar
Windham M.P. Cutler A.: Information ratios for validating cluster analyses. Journal of the American Statistical Association, 87 s. 1188–1192, 1993.
DOI: https://doi.org/10.1080/01621459.1992.10476277
Google Scholar
Wold H.: Estimation of principal components and related models by iterative least squares. Multivariate Analysis, s. 391–420, 1966.
Google Scholar
Yasui Y., Pepe M., Thompson M.L., Adam B.L., Wright G.L., Qu Y., Potter J.D., Winget M., Thornquist M., Feng Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics, 4 s. 449-463, 2003.
DOI: https://doi.org/10.1093/biostatistics/4.3.449
Google Scholar
Authors
Małgorzata Plechawska-Wójcikgosiap@cs.pollub.pl
Lublin University of Technology, Faculty of Electrical Engineering, Institute of Computer Science, Lublin Poland
Statistics
Abstract views: 197PDF downloads: 130
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Most read articles by the same author(s)
- Małgorzata Plechawska-Wójcik, Kinga Wesołowska, Martyna Wawrzyk, Monika Kaczorowska, Mikhail Tokovarov, ANALYSIS OF APPLIED REFERENCE LEADS INFLUENCE ON AN EEG SPECTRUM , Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska: Vol. 7 No. 2 (2017)
- Małgorzata Plechawska-Wójcik , METHODS OF EEG ARTIFACTS ELIMINATION , Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska: Vol. 5 No. 2 (2015)