A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA

Katarzyna  ORZECHOWSKA; Tymon RUBEL; Robert KURJATA; Krzysztof ZAREMBA

doi:10.35784/acs-2022-10

pdf

Published: Jun 30, 2022

DOI: https://doi.org/10.35784/acs-2022-10

Issue Vol. 18 No. 2 (2022)

Articles

USE OF SERIOUS GAMES FOR THE ASSESSMENT OF MILD COGNITIVE IMPAIRMENT IN THE ELDERLY
Moon-gee CHOI

5-15
A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA
Katarzyna ORZECHOWSKA, Tymon RUBEL, Robert KURJATA, Krzysztof ZAREMBA

16-27
CONTRAST ENHANCEMENT OF SCANNING ELECTRON MICROSCOPY IMAGES USING A NONCOMPLEX MULTIPHASE ALGORITHM
Zaid ALSAYGH, Zohair AL-AMEEN

28-42
STABILITY AND FAILURE OF THIN-WALLED COMPOSITE STRUCTURES WITH A SQUARE CROSS-SECTION
Błażej CZAJKA, Patryk RÓŻYŁO, Hubert DĘBSKI

43-55
TOMATO DISEASE DETECTION MODEL BASED ON DENSENET AND TRANSFER LEARNING
Mahmoud BAKR, Sayed ABDEL-GABER, Mona NASR, Maryam HAZMAN

56-70
KNEE JOINT OSTEOARTHRITIS DIAGNOSIS BASED ON SELECTED ACOUSTIC SIGNAL DISCRIMINANTS USING MACHINE LEARNING
Robert KARPIŃSKI

71-85
CYBER SECURITY IN INDUSTRIAL CONTROL SYSTEMS (ICS): A SURVEY OF ROWHAMMER VULNERABILITY
Hakan AYDIN, Ahmet SERTBAŞ

86-100
APPLICATION OF FINITE DIFFERENCE METHOD FOR MEASUREMENT SIMULATION IN ULTRASOUND TRANSMISSION TOMOGRAPHY
Konrad KANIA, Mariusz MAZUREK, Tomasz RYMARCZYK

101-109

DOI

https://doi.org/10.35784/acs-2022-10

Authors

Katarzyna ORZECHOWSKA

orzechowska@ire.pw.edu.pl

Warsaw University of Technology, Institute of Radioelectronics and Multimedia Technology, Poland

Tymon RUBEL

trubel@ire.pw.edu.pl

Warsaw University of Technology, Institute of Radioelectronics and Multimedia Technology, Poland

Robert KURJATA

r.kurjata@ire.pw.edu.pl

Warsaw University of Technology, Institute of Radioelectronics and Multimedia Technology, Poland

Krzysztof ZAREMBA

k.zaremba@ire.pw.edu.pl

Warsaw University of Technology, Institute of Radioelectronics and Multimedia Technology,, Poland

Abstract

Tandem mass spectrometry is an analytical technique widely used in proteomics for the high-throughput characterization of proteins in biological samples. Modern in-depth proteomic studies require the collection of even millions of mass spectra representing short protein fragments (peptides). In order to identify the peptides, the measured spectra are most often scored against a database of amino acid sequences of known proteins. Due to the volume of input data and the sizes of proteomic databases, this is a resource-intensive task, which requires an efficient and scalable computational strategy. Here, we present SparkMS, an algorithm for peptide and protein identification from mass spectrometry data explicitly designed to work in a distributed computational environment. To achieve the required performance and scalability, we use Apache Spark, a modern framework that is becoming increasingly popular not only in the field of “big data” analysis but also in bioinformatics. This paper describes the algorithm in detail and demonstrates its performance on a large proteomic dataset. Experimental results indicate that SparkMS scales with the number of worker nodes and the increasing complexity of the search task. Furthermore, it exhibits a protein identification efficiency comparable to X!Tandem, a widely-used proteomic search engine.

Keywords:

proteomics, mass spectrometry, distributed computing, Apache Spark

References

ORZECHOWSKA, K. ., RUBEL, T., KURJATA, R., & ZAREMBA, K. (2022). A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA. Applied Computer Science, 18(2), 16–27. https://doi.org/10.35784/acs-2022-10

A DISTRIBUTED ALGORITHM FOR PROTEIN IDENTIFICATION FROM TANDEM MASS SPECTROMETRY DATA

Issue Vol. 18 No. 2 (2022)

Archives

DOI

Authors

Abstract

Keywords:

References

License

Article Sidebar

Issue Vol. 18 No. 2 (2022)

Archives

Main Article Content

DOI

Authors

Abstract

Keywords:

References

Article Details

License