Mr. Chairman, I welcome the opportunity to testify today before your subcommittee about Celeras progress in deciphering the human genome sequence and its relationship to healthcare and to the federally funded Human Genome Project. My name is J. Craig Venter and I am the President and Chief Scientific Officer of Celera Genomics headquartered in Rockville, Maryland with several additional locations in California. In June of 1998, I testified before this Subcommittee about the impact of private sector developments on the federally funded Human Genome Project. PE Corporation and I had just launched Celera Genomics. Our goal was to build an information company to provide researchers in industry and academia with an integrated information and discovery system available on a subscription basis. At the time the federal human genome effort was scheduled to complete its task in 2005. Even scientists within the effort were worried 2005 was optimistic. Celera set out, using the new ABI PRISM® 3700 DNA Sequencers produced by PE Biosystems and the whole genome shotgun strategy developed by me and my colleagues at The Institute for Genomic Research (TIGR), to accelerate the completion of the human genome sequence to 2001. Why? At Celera we have adopted the motto Speed matters because Discovery cant wait. Since the Congress began funding the human genome effort over 5 million Americans have died of cancer and over a million people have died because of adverse reactions to drugs. Many scientists associated with the federal effort said we could not accelerate the completion date for the human genome project. Some said the ABI 3700 would not work as hoped. Many of the same scientists who harshly criticized my proposal in 1994 to use the whole genome shotgun strategy on a bacterial chromosome, in 1998 said that this same strategy, would fail with larger and more complicated genomes like the human and fruit fly, Drosophila melanogaster. One of the witnesses on that day said, show me the data! He predicted we would failfail catastrophically. He was wrong--and I am happy to again show the Subcommittee and the world the data.
On March 24, 2000 the genome sequence of Drosophila (a key model organism for biomedical researchers) was published in Science. Celera started its sequencing of this genome in May 1999. It was the product of the finest scientific collaboration I have ever participated in and included Dr. Gerry Rubin and the Berkeley Drosophila Genome Project members along with the European Drosophila Genome Project members. Over 40 scientists from around the world came to Celera in November of 1999, to participate in an annotation jamboree, to begin the process of annotating the Drosophila genome. In total we identified 13,601 genes of which only 2,500 genes were previously known. Our publications in Science had 240 authors from eight different countries (see attached list of authors). The gene sequence exceeds community quality standards. The genome sequence and annotation of the Drosophila genome published in Science is equal to, if not superior to the quality of the recently published sequence of human Chromosome 22 and the C. elegans genome published in 1998. The Berkeley Drosophila Genome group will complete the final process of minor gap sequencing in a few months time. We are confident that the final product will be the equal of the high quality standard established by TIGR in the 14 genomes and chromosomes that they have published in the scientific literature to date.
While the whole genome shotgun strategy clearly worked with the Celeras data set alone, I believe that all of the collaborators in this project would say that the combination of Celeras whole genome shotgun strategy and the draft sequence from the BAC-by-BAC approach has led to a greater level of knowledge about the Drosophila genome and a higher quality sequence than either approach alone would have provided. Another testimonial to the quality of the result of the whole genome shotgun strategy in Drosophila is the fact that the NIH-funded effort to sequence the mouse has now adopted our technique, despite their strong initial protests that whole genome shotgun sequencing could not work. They realize that the strategy is faster, cheaper, and of equal or greater quality, than the conventional approach.
At this juncture it is important that we discuss the different elements that go into evaluating the quality of a genome sequence. You have heard much and you will hear much more in the near future about finished sequence, complete sequence, and draft sequence. I would like to explain the process of creating sequence and the factors that should be evaluated in judging its quality and usefulness for researchers.
1. Library Construction - In order
to sequence a genome it is broken up into smaller fragments that are easily
sequenced. These fragments are then inserted into bacterial hosts or
vectors that are used to replicate each fragment. This collection of
bacterial clones with the fragments of the genome inside is a DNA
library. If a library is poorly constructed the sequence will be
poor. The whole genome shotgun strategy, employs far fewer DNA libraries
than the conventional BAC-by-BAC approach to genome sequencing, however the
libraries used must be of the highest quality. Libraries must have DNA
segments of uniform size and at least two libraries with different length
segments must be made. One is 2,000 base pairs in length and the other is
10,000 base pairs. Once these segments have been obtained they are inserted
into plasmids (i.e., a DNA structure that can be replicated within the bacteria
that is different than its genome). These plasmids are then placed into
bacteria that serve as vectors or hosts to the DNA.
2. Sequencing Phase - Electrophoresis is the method of separating DNA fragments. An electric current is passed through a medium containing a mixture of DNA, and DNA molecules of different size travel through the medium at different rates, depending on its electrical charge and size. Separation is based on these differences. In the past large plates containing gels were the media through which the fragments traveled. In the ABI PRISM® 3700 gels are contained in very thin capillary tubes that allow fast sample processing, small sample volumes, and the ability to eliminate manual gel pouring and sample loading tasks. Fluorescent dyes matched to each of the four letters of genetic code are attached to these fragments. The fragments then flow out the end of the capillary tube where each fluorescent dye is excited by laser signaling and resulting order of base pairs or genetic letters is determined. At this stage the genome is really a collection of fragments. Genes are not fully assembled nor is their relative position within the genome well defined.
3. Assembly and Order This is the most critical phase of the entire process. A genome sequence can only be considered to approach completion if it is accurate in the identification of the different base pairs and, most importantly, if they are in the proper order. The Drosophila genome is properly orderedthat is, the different pieces that have been sequenced are assembled in the correct orderand the sequence is highly accurate.
4. Annotation Once the sequence is obtained with genes assembled and properly located within the genome, one can begin the process of identifying the gene and describing its function. The quality of the sequence and the ordering will determine how accurate the preliminary annotation will be. In the case of the C. elegans genome the initial annotation found over 18,000 genes. Just two weeks ago that number was revised to approximately 12,000. When Chromosome 22 was published 564 genes were identified. There are publications in press at this time that suggest that the number of genes on chromosome 22 is approximately 1,000. These statistics suggest that the public programs have fallen short in their annotation efforts.
In January of 2000 Celera announced that it had unordered, but highly accurate, fragments covering 90% of the genome (including some of the public data). The public effort has announced that they are approximately two-thirds of the way to this same point, today. This is the so-called draft sequence, a term introduced by the public effort but without scientific meaning. Celera has now reached a point in its program where we can assemble and order our data to produce a complete sequence. As was seen in the Drosophila collaboration, Celeras ordered data, combined with the public data could produce a more accurate version of the human genome sequence faster than either data set could alone. One of the benefits of collaboration between the public effort would be just this. I will discuss this further later in my testimony.
With the emergence of Celera on the scene many in the public effort exhibited a new sense of urgency and a competitive drive. This is mostly for the good. We all benefit from the accelerated efforts. As I have said, at Celera we understand that speed matters. But Mr. Chairman, I find myself in the peculiar position of warning you that in the race to complete a draft human sequence, the publicly funded Human Genome Program may be at a stage where quality and scientific standards are sacrificed for credit. On Monday it was reported in Time Magazine that the public effort was doneand that the race to complete the genome sequence was over. I have read that Dr. Collins said that the draft human genome sequence they are about to announce has only a few gaps and is 99.9% accurate. However, analysis of the public data in GenBank reveals that it is an unordered collection of over 500,000 fragments of average size 8,000 base pairs. This means that the publicly funded program is nowhere close to being done.
Two years ago it was reported that Dr. Collins had said Celera would produce the Mad Magazine version of the human genome (USA Today, June 9, 1998). From its formation, Celeras goal has been to produce a high-quality human genome sequence that will stand the test of time, and we remain committed to that goal. The Subcommittee should work to guarantee that the federal effort continues to work towards an accurate, ordered, and well-annotated sequence. You should urge its investigators to keep their standards at the highest levels established in the genomics field and not rush to publish preliminary data for the sake of claiming priority. There is no example of the results of any genome sequence project being published in the scientific literature prior to meeting the established quality, order and completeness standards. It would be poor science policy and a terrible precedent for the young genomics field. At your previous hearing, Dr. Olsen of the University of Washington warned about a slippery slope of data quality if the established standards for completeness were compromised in any way simply to appear to win the genome race.
Mr. Chairman, since your earlier hearing, Celera has had many technical and scientific successes. We moved into our facilities in August of 1998. Since then we constructed the worlds largest sequencing facility and are especially pleased with the accuracy of the sequence from the individual DNA samples. An analysis of 8000 samples of Drosophila sequence indicated that 7992 had an accuracy of greater than 99.5%. The remaining 8 samples were greater than 98%. With individual sample accuracy at this high level we were able to assemble the non-repetitive portions of the fruit fly genome to an accuracy of greater than 99.99 percent. Because of our paired-end sequencing strategy we have discovered that we were able to use far fewer sequencing samples than we had originally planned. This gives us confidence that we will be able to have a high quality, accurate, and well ordered sequence for human with our current level of genome coverage.
Our data center became operational at the beginning of 1999. Our partner, Compaq Computers, has supplied us with about 800 Alpha EV6 and EV67 processors with 64-bit architecture and over 80 terabytes of storage for our data. Compaq tells us our computer center is comparable to those at the Department of Energys Defense Laboratories--Sandia and Lawrence Livermore. We have installed over 200 miles of fiber optic cable and 200 miles of copper cable to handle the data flow. This center was constructed not only for the essential task of assembling genomes using the whole genome shotgun strategy, but also for serving our customers and providing them with unprecedented computational power for their research and analysis.
We have also had many business successes, as I will touch on later in my testimony; however, I believe that we have to do a better job in communicating our business objectives to you and the public. For example Mr. Chairman, your own local newspaper, The Los Angeles Times has misunderstood and therefore misinterpreted our business model and objectives. The result is a great deal of confusion about Celera and our activities. Celera is the only genomics firm that is using its sequencing power to directly sequence the human genome. As an information company, Celera is designed to assist researchers rather than focus on the development of new pharmaceuticals. Another feature of Celera that distinguishes us from the business models of many of our competitors is that we provide our data and information without the inherent deterrent of requiring database users to pay onerous royalties on the discoveries they make with our data (often referred to as reach-through rights). We have already entered into third-party agreements that bind us to this.
How will the company build a sustainable business from its genomics and bioinformatic tools?
One of Celeras founding principles is that we will release the entire consensus human genome sequence freely to researchers on Celeras Internet site when it is completed. We believe that this is in the best interests of both science and our company, since it will allow researchers to advance science and medicine and at the same time be introduced to Celeras high quality data and software tools. We will place no restrictions on how scientists can use this data, they can publish research results derived from this data, or seek intellectual property protection on discoveries using this data. The only protection that we have indicated that we would seek is database protection, as exists in Europe, to inhibit other database companies from selling the Celera database.
Our goal is to make the complex, sometimes overwhelming, and ever-increasing volumes of biological information more accessible and useful to researchers in academia and industry. Toward that end, we are creating an unparalleled library of genomic information in our databases. Annotation of the data by Celera scientists using an array of bioinformatics tools will act as the platform for developing a range of products and services. We will offer these tools in a manner similar to the models used by other information companies, such as Lexis-Nexis, Bloomberg, and AOL. The need for services such as these will only increase as the volumes of information and the complex interrelated nature of that information increase. Pricing for subscriptions to this service will vary appropriately, depending on the product, the customer, and the application. We will provide value-added information to academics and other non-commercial researchers at reasonable rates, naturally bounded by those customers' resources and appraisals of the value-added.
Celera currently has five large pharmaceutical companies as database subscribers. The initial partners, Pharmacia Corporation, Novartis, and Amgen, provided Celera with input for improvements in our data delivery systems and software. Two additional pharmaceutical subscribers have joined us since we began sequencing the human genome in September of 1999Pfizer and Takeda Chemical Industries, Ltd. of Japan. Celera began offering web-based access to its databases and tools in March 2000. This access should be ideal for academics and smaller biotech companies. These subscribers can have access to all Celera databases, tools, and annotation, including the human genome. Our goal is to have all major commercial life science companies and academic biomedical research institutions as subscribers in the future.
With this as context I would like to address the confusion that has arisen over the accessibility of our data, in particular the accessibility of our data on the human genome. We have and will continue to react to claims that Celera intends to withhold information and delay progress, particularly when our fundamental mission is to accelerate the dissemination of high quality, accurate information. Let me emphasize--our data on the human genome is currently available to those subscribing. Our vision is that the list of subscribers will be very long. Let me draw an analogy Mr. Chairman. When you pick up a newspaper at your doorstep, you consider it quite accessible. You probably do not even remember that you are paying a subscription to have that access and you certainly dont claim that the newspaper company publishing it is being secretive or restricting access to news about current events just because you pay a subscription fee.
When you understand that data accessibility is our business and that our commitment to make the genome freely available is integral to that business you can also understand our consternation at the confusion created by the recent joint statement of President Clinton and Prime Minister Blair. Their statement, a simple re-statement of the existing policy for the publicly funded project, when extended to companies engaged in genomics is certainly no obstacle to Celeras business model. We issued the following at the time of their statement:
Celera Genomics welcomes the statement. Its own mission is completely consistent with the goals of assuring that the worlds researchers have access to this important information to enable advances and discoveries that will improve the human condition. Since the announcement of Celeras formation we have made a clear commitment that upon our completion of the consensus human genome we would publish it in a peer-reviewed scientific journal and make it available to researchers for free.
Although the joint statement was on its
face and in fact harmless, it did start a fall in the NASDAQ value over a
17% decline and loss of over $50 billion in market capitalization in the
biotechnology sector in two days, and this has continued to decline
dramatically since that time.
Celeras Human Genome Database
On January 10, 2000 we announced that we had DNA sequence in our database covering 90 percent of the human genome. As a result of that extensive sequence coverage of the 23 pairs of human chromosomes and based on statistical analysis, we believed that greater than 97 percent of all human genes were represented in the Celera database. The sequence data, developed from randomly selected fragments of all human chromosomes, contained over 5.3 billion base pairs (letters of the human genetic code) at greater than 99 percent accuracy. The 5.3 billion base pairs represented 2.58 billion base pairs of unique sequence that had been calculated to cover 81 percent of an estimated genome size of 3.18 billion base pairs. These data, combined with all of the "finished" and "draft" human genome sequence data from the public databases, gave Celera coverage of 90 percent of the human genome. Since that announcement we have continued to increase the coverage of our data. Progress is such that we have modified our earlier estimated completion date of sometime before the end of 2001 to 2000.
Celeras approach to sequencing the genome entails sequencing the entire genome of a number of different people. It differs from that of the public effort in that the public project makes a single composite genome from portions of a number of different people. Celeras approach allows us to build a database for studying the genetic variations between individuals at the same time we are deciphering the consensus human genome.
Key to our progress at Celera Genomics is that we have assembled an exceptional group of employees. We have excellent technical people operating our sequencing factory. Our biologists, software engineers, information technologists, mathematicians and bioinformaticians are some of the finest to be found anywhere. This talent has not gone unnoticed. Several companies such as RhoBio S.A., a joint venture between Rhone Poulenc Agro and Biogemma, has formed a 3-year agreement, to use expression studies to discover genes related to traits of importance in maize with Celera AgGen, our agricultural business. Similarly, we have entered into a 3-year gene discovery agreement with Rhône-Poulenc Rorer (RPR), the pharmaceutical subsidiary of Rhône-Poulenc, S.A.
We have also organized a team for discovering new genes in humans. This activity has also been the subject of confusion. Some have even said that Celeras current activities are different than those described earlier to this Subcommittee. That is not true. Since its founding we have said that Celera will seek to develop on its own 100-300 medically important genes for use by pharmaceutical and biotechnology companies from among the 100,000 human genes. We will give preference in licensing these potential therapeutic targets to our subscribers and we will license them on a non-exclusive basis. As I said at the earlier hearing, we are not attempting to patent the human genome, any of its chromosomes, or any random sequence. Celeras announced last fall that the company had made 6,500 provisional patent applications. This was the basis of charges in the Los Angeles Times that I had misled the Subcommittee. Those leveling the charges did so apparently because they were not familiar with a provisional patent application. A provisional application serves to notify the Patent Office that a discovery has been made in the event that there are other patent applications for the same discovery. A patent will not be issued on the discovery unless an actual patent application is filed within one year of the provisional application filing. During this twelve-month period, Celera will decide with its pharmaceutical partners which genes are medically important enough to file patent applications. This approach is similar to the research strategy taken by pharmaceutical companies. In their drug development process they start with thousands of compounds and reduce the number to a few promising compounds as more information is gained. Likewise, Celera will look at thousands of genes before determining which have the greatest relevance for human health and are most likely to be developed into commercial products by pharmaceutical companies. Other companies have different intellectual property strategies and I cannot speak for them, Mr. Chairman, but I urge you to consider that changes to patent law have to be considered in the context of what they will do pharmaceutical companies efforts at drug discovery.
Celera endorses the Patent and Trademark Office's recently announced position, which is supported by centuries of precedent. Fundamental patent requirements of utility, novelty, and non-obviousness are complete and effective protections to the fear propagated that the human genome will be patented or that the revolution will be slowed. Celera does not believe the genome or other mere products of nature can be patented, and our publication commitments demonstrate that we will not try to so patent it. Consistent with long-established principles of patent law, we do expect that patents and other protections for subsequent inventions using the genome alphabet and showing utility, novelty, and non-obviousness are not only appropriate, but required to assure that incentives continue to fuel the genomic revolution.
Let me take a moment to review for the
Subcommittee why we are even discussing patenting human genes.
Pharmaceutical and biotech companies use these genes as the direct means of
producing drugs such as insulin and as targets to develop drugs.
The cost of taking a single drug through the Food and Drug Administration
approval process can range from $300 to $800 million. Having patents on
the drugs allows the company a period of time when they can exclusively use
these patented discoveries for commercial purposes. This provides them a
period in which to try and recover their drug development costs. This rationale
for patenting is one that is fully accepted and supported by the NIH.
Recently, during the FY2000 budget considerations, Dr. Harold Varmus, past
Director of NIH explained the importance of patenting to assure commercial
availability to the general public of these scientific discoveries to the U.S.
Senate Appropriations Committee. He said:
...patenting of newly isolated genes whose functions and medical importance are identifiable at the time of patenting can be a spur to the development of the next steps that would benefit the public, and we believe that has been the case in the instance of several recently cloned genes.
However, Celera and many of our pharmaceutical partners are very concerned that the patenting of random genome and EST fragments by many companies and research institutions will restrict their access to key targets required for drug development. An important aspect of Celeras policies is the nonexclusive licensing of drug targets.
How does Celera respond to the concerns of scientists who worry that patenting gene sequences and putting such basic information in private hands will discourage research outside of the drug companies that own the rights to the information? Under the US and European patent systems, researchers are free to conduct basic research for non-commercial purposes on others patented discoveries. While some hypothesize that patents on genes will generally inhibit research, the facts indicate otherwise. For example, a patent was granted on the BRCA1 gene associated with breast cancer in 1993. Since that time, over 721 basic research papers have been published on the BRCA1 gene, and tens of further patent applications on important inventions, including genetic tests related to the BRAC1 gene, have been filed by individuals in universities and companies. Also, Celeras policy of licensing genes on a non-exclusive basis will assure that gene discoveries are available to manynot just one.
I would like to address one more topic that has been the subject of confusion before closing. It is Celeras willingness to collaborate on the sequencing of the human genome with the public effort. Prior to announcing the formation of Celera I met with then-Director Harold Varmus and Dr. Collins. I made the same offer of collaboration to them as I did to Dr. Rubin for the Drosophila genome. Whatever the reasons it did not come to pass as the Drosophila collaboration did. We also tried to form a collaboration with the Department of Energy, the founding agency for the U.S. Human Genome Project. The NIH and Wellcome Trust objected and the effort was sidelined. Recently, we received a much-publicized letter from the NIH and Wellcome Trust apparently calling an end to further discussions. I stated in my letter of response dated March 7, 2000 (attached) that we continue to be interested in pursuing good faith discussions toward collaboration. While both Celera and the public effort can achieve our shared goal of producing an accurate ordered version of the genome on our own I believe the collaboration on Drosophila proved we can produce that product faster and better by working together.
When PE Corporation and I announced the creation of Celera in May 1998, it was based on a shared vision of sequencing the human genome as the basis of accelerating a revolution in biology and health care. Financed exclusively by private investment, we brought together unique technologies and capabilities within a start-up enterprise to pursue this seemingly impossible goal. With hundreds of others joining in this effort, Celera already has exceeded its own expectations and continues to evolve as a participant in this exciting revolution.
It is has been Celera's consistent belief that the sequencing of the human genome is the first, not the last, chapter of this revolution. The final chapter will entail a complete understanding of life's processes, such that disease and illness finally can be treated and cured directly at the source. We envision a day when medical treatments involving the likes of radiation and chemical poisons, with their insidious side effects and trial-and-error uncertainty, are considered medieval anachronisms.
This day will not come tomorrow or during the next year. Nor will any one person, company, or organization facilitate this day. Revolution requires far more than one soldier. Celera's business model acknowledges this and, rather than internalizing the task ahead, centers on a philosophy of facilitating others in the revolution. There will be almost limitless opportunities ahead, and we abhor any notion to think we can or should "go it alone".
Celera looks forward to several roles in the revolution, but foremost are that of instigator and facilitator. This philosophy underlies Celera's most fundamental mission -- to discover and disseminate genomic, proteomic and related information. We believe this mission can be pursued in a way that serves both science and our business. In fact, we believe that entrepreneurial efforts such as Celera are the best way to progress. Speed matters discovery cant wait.
Testimony by Neal Lane before the Subcommittee on Energy & Environment, Committee on Science, United States House of Representatives - April 6, 2000
Subcommittee on Energy and Environment
Subcommittee on Energy and Environment
J. Craig Venter, Ph.D. Subcommittee On Energy And Environment
Written Testimony Of Gerald M. Rubin
President and First Lady | Vice President and Mrs. Gore
Record of Progress | The Briefing Room
Gateway to Government | Contacting the White House | White House for Kids
White House History | White House Tours | Help
T H E W H I T E H O U S E