By Mohd Firdaus Raih
There has been no escaping the pervasive imagery of the ball-like shape with appendages that we know as the COVID-19 virus. Although we often refer to both the virus and the disease as COVID-19, the official scientific name for the virus is actually SARS-CoV-2. For us Malaysians, the shape of a single SARS-CoV-2 is rather reminiscent of a rambutan. The virus’ physical shape serves the sole purpose of delivering the genetic payload contained within, referred to as a genome, to the next host. Once inside, the virus hijacks the host cells to make more viruses using the information encoded in the genome.
A genome is the total genetic content of an organism. All livings things have a genome and although viruses are not considered as alive, they too have genomes. The SARS-CoV-2 genome is composed of a chemical substance called RNA. It is very similar to the DNA that we are perhaps more familiar with because the majority of organisms use DNA as the constituents of their genomes. We understand this fact to the extent that we often use the term DNA to refer to any lineage in general. Several viruses such as SARS-CoV-2 and HIV, use RNA and not DNA as their genomic material.
The information encoded within the genome can be extracted by a method called genome sequencing. The process of genome sequencing generates large amounts of data in the form of the letters – A, C, G and T – that are strung along in various combinations. Each letter, is referred to as a nucleotide, and is the basic unit of a genome (nucleotides are also referred to as bases and base pairs; but we’ll not get into those differences here).
The genome of SARS-CoV-2 contains nearly 30,000 nucleotides. Compared to other viruses, it is relatively large. However, compared to many other living organisms, its genome is miniscule – humans have more than 3 billion nucleotides, many bacteria will have millions; even the smallest known genome for a bacteria contains hundreds of thousands nucleotides.
The sequence of these nucleotides in the genome determine the type of proteins that the virus needs to assemble into a complete virus inside the host cell and continue the vicious cycle of infecting the next host. Fit for purpose algorithms and specific software applications running on powerful computers, including supercomputers, process the sequence data in order to decipher the role each protein plays in the life cycle of the virus. This field that utilizes information science and computations to study biological function is called bioinformatics and computational biology.
In general, scientists analyse the genome sequences to identify variations, sometimes termed as mutations. Such information can also be used to track the genetic footprint left behind by the virus as it transmits from host to host. In addition to using the genome as a molecular tracking system, analysis of the variations can also reveal to scientists the potential effects of the mutations towards the virus’ capacity to infect its host and cause disease.
It was the discovery of a specific mutation profile that led scientists to classify the COVID-19 virus as a newly discovered (novel) coronavirus that is distinct from its genetic cousin, the SARS coronavirus. Those particular mutations point to SARS-CoV-2 being more adept at invading the human host and as a consequence perhaps makes it more transmissible than SARS. To date, more than 30,000 SARS-CoV-2 genomes have been sequenced throughout the world. The data from this global effort has been deposited in a database called GISAID – (Global Initiative on Sharing All Influenza Data).
Genome sequences from Malaysia, generated by the Malaysia Genome Institute (MGI), the National Public Health Laboratory (MKAK), the Institute for Medical Research (IMR) and the University of Malaya (UM), have also been added to this global repository. With the currently available data, the viruses isolated in Malaysia can be classified into three lineages or genetically distinct set of virus types – lineage A, B and B.6. There are currently more than 40 lineages that have been determined based on the mutations detected in the genomes.
Genetic mutations are expected. In fact, those variations are a factor that differentiate us as individuals. Mutations do not necessarily lead to sinister outcomes as perhaps often depicted in fiction. SARS-CoV-2 is known to be a relatively slow mutating virus. Despite that, scientists are still keeping a close watch on the evolving virus in order to assess the possibility of future mutations being able to significantly change the virus’ capacity to become deadlier.
Although there is no direct evidence linking any of the observed mutations to the emergence of a more lethal virus, it is clear that some patients are more severely affected by an infection than others. For some individuals, the infection passes through with no symptoms. Yes, it is true many deaths have been in the elderly population, but the GISAID data have revealed that there are also many in their 90s who have survived or have tested positive for COVID-19 despite the lack of symptoms.
This points to the genetics and environmental circumstances of the individual hosts as being crucial factors that determine how each body responds to the infection or is affected by the virus. Therefore, the obvious next course of investigation to understand COVID-19 is to understand what sets all these individuals apart. To do that, scientists would have to gather massive amounts of data that include the genome sequences of individuals who have
succumbed to COVID-19 together with those who have survived and are asymptomatic as comparisons.
But even that will not be enough. Scientists now understand that external and environmental pressures can also exert their influence into how genes behave. As a result, input data regarding personal habits, work and home environment, perhaps even diet, will need to be integrated and analyzed together with the genetics. The use of deep learning or artificial intelligence algorithms can facilitate the discovery of correlations between numerous factors and individual genetic profiles.
Armed with the knowledge of how certain patients can be more susceptible, or can be treated in some specific way, physicians may be able to come up with better and more precise clinical management strategies. Should vaccines or drugs be unavailable in the coming months or years, such an approach would be the best hope of allowing the global healthcare infrastructure to cope and to allow us to live with COVID-19 in our midst.
The writer is a bioinformatician at Universiti Kebangsaan Malaysia and President of the Malaysian Society for Bioinformatics and Computational Biology.