Decoding the Book of Life: Sequencing the Human Genome

    Written by: Dr Azra Raza
    Posted on: August 15, 2012 | Post your comment here Comments | 中文

    Sequencing of Human Genome - Sequencing of Human Genome

    Sequencing of Human Genome

    The majority of non-infectious chronic human diseases can be traced to either congenital or acquired defects in our cells. Cancer is a prime example of such a disease where exposure to toxins in the environment combined with a predisposing susceptibility to retain such damage can lead to the development of a cell which has managed to shut off the genes that control its death signals. This makes the cell essentially immortal and it keeps expanding its population until it kills the host. Finding a cure for cancer would therefore be greatly facilitated if one could precisely define the genetic mutations that have converted a benign cell into a malignant one. While such information is available for a few genes, accurately defining the sequence of every single gene in a human cell had remained a dream until recently because of the overwhelming nature of resources and manpower it would require. The start of this century saw all of that change in one dramatic sweep, and now for the first time, cancer researchers like myself are starting to feel confident that within a short period of time, armed with this type of information about every individual cancer patient, we can convert this malignant disease into a chronic one that patients can live with and not die from.

    The United States has produced two dramatic maps. The first was the result of an expedition by Lewis and Clark two centuries ago as they traveled from the East coast, charting the frontiers all the way to the Pacific, a map which expanded the imagination of Americans forever. The second, even more wondrous map which was announced in 2000 affected all of mankind as it was the first survey of the entire human genome and represents the language in which the book of life is written.

    The functions performed by a cell depend on proteins. For example, hemoglobin is made up of a protein called globin attached to iron (heme). The instruction for making globin is contained in two strands of DNA made up of four chemical bases (adenine, thymine, guanosine and cytosine or A, T, G and C) that occur repetitively (A always pairs with T and G always pairs with C) in precise sequences. The DNA strands are woven together in a helical pattern and condense to form chromosomes present in the nucleus of a cell. A “gene” is the segment of DNA which contains the complete information for making a specific protein such as globin. Collectively, all the genes in a cell are called the genome and precisely defining the exact sequence of the nucleotides is what we mean by Sequencing the Human Genome.

    In 1988, the National Institute of Health (NIH) brought together a team of scientists to identify all the genes in human DNA, determine the sequences of the chemical base pairs that make up human DNA and store this information in public databases. The Government allocated $3 billion over 15 years for the project and work began in earnest in the early 1990’s However, the techniques for sequencing the bases was exceedingly labor intensive and time consuming. In 1991, a brilliant NIH scientist, Dr. Craig Venter proposed that instead of sequencing the entire human genome, only the genes that code for actual proteins should be sequenced which would be much easier since genes make up only 2% of the DNA.  Even within a gene, long segments called “introns” are removed and only the remaining segments called “exons” are spliced together to form a much reduced version of the gene which actually codes for the protein. Venter proposed that only exons should be sequenced. His suggestion was not taken seriously, so he quit his job at the NIH and began the human genome sequencing project with private funding. The race between the government and private funded projects proved to be a dramatic catalyst for development of faster and better sequencing technology and even more importantly, for refinement of the computational analysis which is the backbone of the project.

    The first issue was to decide whose genome to sequence. Eventually, it was decided to sequence the DNA obtained from a diverse group of volunteers and their identities were intentionally kept secret. Every part of the genome sequenced was made public immediately - in fact, new data on the genome was posted every 24 hours. The results were jointly announced by the government and private sources through President Clinton in a conference held in the East Room of the White House on June 26, 2000. Here are some astonishing facts that emerged:

    • The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G)
    • The average gene consists of 3000 bases
    • Instead of the anticipated more than 100,000 genes, the total number turned out to be a paltry 30,000
    • Almost all (99.9%) nucleotide bases are exactly the same in all people (so much for racism!)
    • Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231)

    Anticipated Benefits:

    • Enhance our ability to precisely diagnose the genetic defect causing disease
    • Identify individuals with genetic predisposition to certain diseases and develop methods to prevent their occurrence
    • Use gene therapy
    • Design “custom drugs” based on genetic profiles for individual needs

    In 2010, Craig Venter reached another milestone by creating a “synthetic bacterium” in his lab (this earned him the title of a fiend trying to play God by the right wing fundamentalists in the country). One benefit of a capitalist system is that it leads to intense competition and accelerates technical progress. The Human Genome Project was conceived of in the 1990s with a view of completing it in 15 years at an expense of $3 billion. Instead of 15 years, it finally took 3 years to complete the project using the “shotgun exome sequencing” proposed by Venter, and instead of the $3 billion that it cost the government, Venter completely sequenced the human genome for $300 million. By 2013 we expect to have a portable cartridge that doubles as a USB drive and can be plugged into a computer to sequence a genome on the spot. This will yield the $1,000 genome in under an hour!

    Here is what Venter had to say, “Some have said to me that sequencing the human genome will diminish humanity by taking the mystery out of life. Poets have argued that genome sequencing is an example of sterilizing reductionism. Nothing could be further from the truth. The complexities and wonder of how the inanimate chemicals that are our genetic code give rise to the imponderables of the human spirit should keep poets and philosophers inspired for millennia.” Interestingly, Dr. Venter added “watermarks” - bits of non-functioning genetic code - to mark the new microbes. Built into the new synthetic bacterium is a non coding sequence that spells out this line by James Joyce: “To live, to err, to fall, to triumph, and to re-create life out of life.”