Distributed Computing in a Pandemic: A Review of Technologies available for Tackling COVID-19 A Preprint Jamie J. Alnasir Department of Computing Imperial College London, UK
[email protected] May 15, 2021 Abstract The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Some important areas of focus are: vaccine development, designing or repurposing existing pharmacological agents for treatment by identifying druggable targets, predicting and diagnosing the disease, and tracking and reducing the spread. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures — various types of clusters, grids and clouds — that can be leveraged to performthese tasks at scale, at high-throughput,with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters, which aggregate their compute nodes using high- bandwidth networking and support a high-degree of inter-process communication, are ubiquitous across scientific research — they will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, arXiv:2010.04700v3 [cs.DC] 18 May 2021 and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world’s fastest supercom- puters, such as IBM’s SUMMIT — for ensemble docking high-throughput screening against SARS- CoV-2 targets for drug-repurposing,and high-throughputgeneanalysis — andSentinel, an XPE-Cray based system used to explore natural products.