CONSTRUCTION AND VERIFICATION OF MATHEMATICAL MODEL OF MASS SPECTROMETRY DATA

Małgorzata  Plechawska-Wójcik

doi:10.35784/iapgos.1430

KONSTRUKCJA I WERYFIKACJA MATEMATYCZNEGO MODELU DANYCH WIDM MASOWYCH

Małgorzata Plechawska-Wójcik

gosiap@cs.pollub.pl
Politechnika Lubelska, Wydział Elektrotechniki i Informatyki, Instytut Informatyki, Lublin (Polska)

DOI: https://doi.org/10.35784/iapgos.1430

Abstrakt

Artykuł przedstawia kwestie związane z konstrukcją, dopasowaniem i implementacją modelu matematycznego widm masowych opartego o rozkłady normalne i mieszaniny rozkładów oraz o widmo średnie. To zadanie jest kluczowe dla analizy, wymaga też określenia wielu parametrów modelu.

Słowa kluczowe:

spektrometria masowa Maldi-Tof, rozkłady Gaussa, mieszaniny rozkładów Gaussa, klasyfikacja SVM-RFE

Bibliografia

Akaike H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control, 9 s.716–723, 1974.
DOI: https://doi.org/10.1109/TAC.1974.1100705 Google Scholar

Baggerly K.A., Morris J., Wang J., Gold D., Xiao L.C., Coombes K.R.: A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization time of flight proteomics spectra from serum samples. Proteomics, s. 1667–1672, 2003.
Google Scholar

Banfield J., Raftery A.: Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 s. 803–821, 1993.
DOI: https://doi.org/10.2307/2532201 Google Scholar

Boster B., Guyon I., Vapnik V.: A training algorithm for optimal margin classifiers. Fifth Annual Workshop on Computational Learning Theory, s. 114– 152, 1992.
DOI: https://doi.org/10.1145/130385.130401 Google Scholar

Bozdogan H.: Choosing the number of component clusters in the mixturemodel using a new informational complexity criterion of the inverse-fisher informational matrix. Springer-Verlag,Heidelberg, 19 s. 40–54, 1993.
DOI: https://doi.org/10.1007/978-3-642-50974-2_5 Google Scholar

Bozdogan H.: On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models. Communications in Statictics, Theory and Methods, 19 s. 221–278, 1990.
DOI: https://doi.org/10.1080/03610929008830199 Google Scholar

Celeux G., Soromenho G.: An entropy criterion for assessing the number of clusters in a mixture model. Classification Journal, 13, s. 195–212, 1996.
DOI: https://doi.org/10.1007/BF01246098 Google Scholar

Clyde M.A., House L.L., Wolpert R.L.: Nonparametric models for proteomic peak identification and quatification. ISDS Discussion Paper, s. 2006–2007, 2006.
Google Scholar

Coombes K., Baggerly K., Morris J.: Pre-processing mass spectrometry data, Fundamentals of Data Mining in Genomics and Proteomics, W Dubitzky, M Granzow, and D Berrar, eds. Kluwer, s. 79-99. 2007, Boston.
DOI: https://doi.org/10.1007/978-0-387-47509-7_4 Google Scholar

Coombes K.R., Koomen J.M., Baggerly K.A., Morris J., Kobayashi R.: Understanding the characteristics of mass spectrometry data through the use of simulation. Cancer Informatics, 1 s. 41–52, 2005.
DOI: https://doi.org/10.1177/117693510500100103 Google Scholar

Comon P.: Independent component analysis – a new concept? Signal Processing, 36 s. 287–314, 1994.
DOI: https://doi.org/10.1016/0165-1684(94)90029-9 Google Scholar

Dempster A.P., Laird N.M., Rubin D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., 39,1 s. 1-38, 1977.
DOI: https://doi.org/10.1111/j.2517-6161.1977.tb01600.x Google Scholar

Du P., Kibbe W., Lin S.: Improved peak detection in mass spectrum by incorporating continous wavelet transform-based pattern matching. Genome analysis, 22 s. 2059-2065, 2006.
Google Scholar

Dubitzky W., Granzow M., Berrar D.: Fundamentals of data mining in genomics and proteomics. Springer, Kluwer Boston, 2007.
DOI: https://doi.org/10.1007/978-0-387-47509-7 Google Scholar

Fung E.T., Enderwick C.: Proteinchip clinical proteomics: computational challenges and solutions. Biotechniques, Suppl., 32 s. 34–41, 2002.
DOI: https://doi.org/10.2144/mar0205 Google Scholar

Gyaourova A., Kamath C., Fodor I.K.: Undecimated wavelet transforms for image de-noising. Technical Report UCRL-ID-150931, Lawrence Livermore National Laboratory, Livermore, CA, 2002.
DOI: https://doi.org/10.2172/15002085 Google Scholar

Gentzel M., Kocher T., Ponnusamy S., Wilm M.: Preprocessing of tandem mass spectrometric data to support automatic protein identyfication. Proteomics, 3, s. 1597–1610, 2003.
Google Scholar

Gras R., Muller M., Gasteiger E., Gay S., Binz P.A., Bienvenut W., Hoogland C., Sanchez J.C., Bairoch A., Hochstrasser D.F., Appel R.D.: Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection. Electrophoresis, 20 s. 3535-3550, 1999.
Google Scholar

Jutten C., H´erault J.. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24 s. 1-10, 1991.
DOI: https://doi.org/10.1016/0165-1684(91)90079-X Google Scholar

Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Nonlinear processing of a shift invariant DWT for noise reduction. Proc. SPIE. Wavelet Applications II, 2491 s. 640-651, 1995.
Google Scholar

Lang M., Guo H., Odegard J.E., Burrus C.S., Well R.O.Jr.: Noise reduction using an undecimated discrete wavelet transform. IEEE Signal Processing Letters, 3 s. 10-12, 1996.
DOI: https://doi.org/10.1109/97.475823 Google Scholar

Lewandowicz A., Bakun M., Imiela J., Dadlez M.: Proteomika w uronefrologii - nowe perspektywy diagnostyki nieinwazyjnej? Nefrologia i dializoterapia polska, 1 s. 15–21, 2009.
Google Scholar

Mantini D., Petrucci F., Pieragostino D., Del Boccio P., Di Nicola M., Di Ilio C., Federici G., Sacchetta P., Comani S., Urbani A.: Limpic: a computational method for the separation of protein signals from noise. BMC Bioinformatics, 8:101, 2007.
Google Scholar

Mantini D., Petrucci F., Del Boccio P., Pieragostino D., Di Nicola M., Lugaresi A., Federici G., Sacchetta P., Di Ilio C., Urbani A.: Independent component analysis for the extraction of reliable protein signal profiles from Maldi-ToF mass spectra. Bioinformatics, 24 s.63 – 70, 2008.
DOI: https://doi.org/10.1093/bioinformatics/btm533 Google Scholar

McLachlan G.: Finite mixture models. John Wiley and Sons, 2001.
DOI: https://doi.org/10.1002/0471721182 Google Scholar

Morris J., Coombes K., Kooman J., Baggerly K., Kobayashi R..: Feature extraction and quantification for mass spectrometry data in biomedical applications using the mean spectrum. Bioinformatics, 21(9): 1764-1775. 2005.
DOI: https://doi.org/10.1093/bioinformatics/bti254 Google Scholar

Norris J., Cornett D., Mobley J., Anderson M., Seeley E., Chaurand P, Caprioli R.: Processing MALDI mass spectra to improve mass spectral direct tissue analysis. National institutes of health. 2007, USA.
DOI: https://doi.org/10.1016/j.ijms.2006.10.005 Google Scholar

Plechawska-Wójcik M.: Comprehensive analysis of mass spectrometry data – a case study. Foundations of Computing and Decision Sciences. Vol. 36 - No. 3-4, s. 275-292, 2011.
Google Scholar

Plechawska M.: Comparing and similarity determining of gaussian distributions mixtures. Polish Journal of Environmental Studies, 17, No. 3B s. 341–346, 2008.
Google Scholar

Polanska J., Plechawska M.: Comparison of convergence criterions used in expectation-maximization algorithm. Symbiosis, 2008.
Google Scholar

Randolph T., Mithcell B., McLerran D., Lampe P., Feng Z.: Quantifying peptide signal in maldi-tof mass spectrometry data. Molecular & Cellular Proteomics, 4 s. 1990–1999, 2005.
Google Scholar

Schwarz G.: Estimating the dimension of a model. Annals of Statistics, 6 s. 461–464, 1978.
DOI: https://doi.org/10.1214/aos/1176344136 Google Scholar

Tibshirani R., Hastiey T., Narasimhanz B., Soltys S., Shi G., Koong A., Le Q.T.: Sample classification from protein mass spectrometry, by ’peak probability contrasts’. Bioinformatics, 20 s. 3034 – 3044, 2004.
Google Scholar

Tversky A., Hutchinson J.W.: Nearest neighbor analysis of psychological spaces. Psychological review, 93(1) s. 3–22, 1993.
DOI: https://doi.org/10.1037/0033-295X.93.1.3 Google Scholar

Vapnik V.N.: The Nature of Statistical Learning Theory. Springer, 1995.
DOI: https://doi.org/10.1007/978-1-4757-2440-0 Google Scholar

Vapnik V.N.: Statistical Learning Theory. Wiley, 1998.
Google Scholar

Windham M.P. Cutler A.: Information ratios for validating cluster analyses. Journal of the American Statistical Association, 87 s. 1188–1192, 1993.
DOI: https://doi.org/10.1080/01621459.1992.10476277 Google Scholar

Wold H.: Estimation of principal components and related models by iterative least squares. Multivariate Analysis, s. 391–420, 1966.
Google Scholar

Yasui Y., Pepe M., Thompson M.L., Adam B.L., Wright G.L., Qu Y., Potter J.D., Winget M., Thornquist M., Feng Z.: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics, 4 s. 449-463, 2003.
DOI: https://doi.org/10.1093/biostatistics/4.3.449 Google Scholar

Pobierz

pdf (English)

Opublikowane

2013-02-14

Cited By / Share

Plechawska-Wójcik, M. . (2013). KONSTRUKCJA I WERYFIKACJA MATEMATYCZNEGO MODELU DANYCH WIDM MASOWYCH. Informatyka, Automatyka, Pomiary W Gospodarce I Ochronie Środowiska, 3(1), 9–14. https://doi.org/10.35784/iapgos.1430