Human Genome Project — the race to human genome sequencing

Angana Borah
4 min readDec 12, 2020

--

You might have heard about the Human Genome Project that gave us the ability to read nature’s complete genetic blueprint for building a human being for the first time. So to begin with, let me first give you a brief understanding about genome sequencing and it’s importance. A genome is an organism’s complete set of DNA and DNA is basically the molecular material inside each our cells which define how our bodies work. Our bodies’ entire program of development is included in our genome. When cells divide, the entire DNA inside it has to be copied. And if there is an accident in cell division or if a cell is damaged, mutations (changes in our genome) take place. And it is believed that this change might sometimes lead to an abnormal cell division and cause further complications like cancer. Genome sequencing is therefore considered to be at the heart of the genomics revolution. It is used to measure and determine the DNA sequence of an organism’s genome. And it has been fundamental in many research areas including cancer sequencing. The Human Genome Project (HGP) was the first initiative towards making genome sequencing easier and feasible.

HGP was first proposed in 1987 by the US Department of Energy (DOE). It originally started as a joint effort between DOE and National Institutes of Health (NIH). Both signed a memorandum of understanding to coordinate all research and technical activities related to human genome. HGP officially started in 1990 in the US and across many countries, The Sanger Center in England being one of the largest genomics research institutes in the world. HGP wasn’t a race when it started in the 1990s. It’s two important goals were — sequencing 300 billion base-pairs present in the human body (a base is a unit of the DNA, which can be adenine (A), guanine (G), thymine (T), or cytosine ©) and reducing the cost of sequencing.

In the initial years, researchers focused on creating “maps” for sequencing. Large chunks of DNA called Bacterial Artificial Chromosomes (BAC), which are about 150k base-pairs long were taken and sequenced. Post that individual pieces of BAC were attached together to form a genome. And in this way, pieces of BAC were mapped to the genome. But the issue was that it was difficult to figure out the location of BACs in a genome. But then in 1995, The Institute for Genomic Research (TIGR) sequenced the first complete bacterial genome consisting of 1.8 million bases using whole genome shotgun sequencing. They didn’t engage in the concept of creating maps, rather they took the whole genome, fragmented it to tiny pieces and randomly sequenced them, then used an assembler to put the pieces back together. Although this was not performed on a human genome, but it was a huge breakthrough for the project.

BAC (Bacterial Artificial Chromosomes) (Photo Courtesy: www.genome.gov)

In the year 1998, a new sequencing machine was built by Applied Biosystems which used capillaries for sequencing, thus automating the whole process. It was during this time when Craig Venter and others left TIGR to form Celera Genomics, which was a non-profit aiming to sequence the human genome using whole genome shotgun sequencing. And the real race began thereafter. Celera Genomics’ first accomplishment was whole genome sequencing of a Fruit Fly, which is 130 million base-pairs long. Now, this was 20 times larger than the infectious bacteria that TIGR sequenced. Subsequently, NIH and The Sanger Center joined forces for the race. And eventually both Celera Genomics and NIH + The Sanger Center published two separate papers in Feb 2001, which had similar results. Both published an estimate of the number of genes present in the human body which was found to be between 30k-40k. And therefore, both the teams were successful in achieving something that was long hoped for. The HGP’s ultimate goal was to identify all genes, identify the sequences, and understand what they do and use this for better treatments and improve human health.

Dr. Craig Venter (L) played an influential role in the Human Genome Project
(Photo Courtesy: www.dailymaverick.co)

Today, we have an estimate that human beings have around 20k-23k genes. We have definitely narrowed down our results, but we still aren’t sure about the exact number present. It is worth mentioning HGP was successful in achieving it’s second goal as well. Genome sequencing initially cost $10/base and today, it has lowered down to $1/3000000 bases. We have come a long way in this journey, but there are still a lot of unanswered questions that the scientific community is currently aiming to solve. This field is an intersection of biology, computer science and statistics. And therefore, it is very essential to understand what role each of them play in sequencing. The movement that human genome sequencing started is now continued through it’s several applications like Exome Sequencing, RNA-Sequencing, ChIP-Seq, MethylSeq, to just name a few.

To know more about the HGP, you can watch this short video. And to understand the sequencing methods used for the first whole shotgun sequencing, you can go through these papers: Initial sequencing and analysis of the human genome (NIH and The Sanger Academy) and The Sequence of the Human Genome (Celera Genomics)

--

--

No responses yet