PREPARED STATEMENT OF
J. CRAIG VENTER, Ph.D.
PRESIDENT AND CHIEF SCIENTIFIC
OFFICER CELERA GENOMICS, A PE
CORPORATION BUSINESS BEFORE
THE SUBCOMMITTEE ON ENERGY AND
ENVIRONMENT U.S. HOUSE OF
REPRESENTATIVES COMMITTEE ON SCIENCE April 6, 2000
Mr. Chairman, I welcome the opportunity to
testify today before your subcommittee about Celera's progress in
deciphering the human genome sequence and its relationship to healthcare and to
the federally funded Human Genome Project. My name is J. Craig Venter and
I am the President and Chief Scientific Officer of Celera Genomics
headquartered in Rockville, Maryland with several additional locations in
California. In June of 1998, I testified before this Subcommittee about
the impact of private sector developments on the federally funded Human Genome
Project. PE Corporation and I had just launched Celera Genomics.
Our goal was to build an information company to provide researchers in industry
and academia with an integrated information and discovery system available on a
subscription basis. At the time the federal human genome effort was
scheduled to complete its task in 2005. Even scientists within the effort
were worried 2005 was optimistic. Celera set out, using the new ABI
PRISM® 3700 DNA Sequencers produced by PE Biosystems and the whole genome
shotgun strategy developed by me and my colleagues at The Institute for Genomic
Research (TIGR), to accelerate the completion of the human genome sequence
to 2001. Why? At Celera we have adopted the motto Speed
matters because Discovery can't wait. Since the
Congress began funding the human genome effort over 5 million Americans have
died of cancer and over a million people have died because of adverse reactions
to drugs. Many scientists associated with the federal effort said we
could not accelerate the completion date for the human genome project.
Some said the ABI 3700 would not work as hoped. Many of the same
scientists who harshly criticized my proposal in 1994 to use the whole genome
shotgun strategy on a bacterial chromosome, in 1998 said that this same
strategy, would fail with larger and more complicated genomes like the human
and fruit fly, Drosophila melanogaster. One of the witnesses on that day
said, show me the data! He predicted we would failfail
catastrophically. He was wrong--and I am happy to again show
the Subcommittee and the world the data.
On March 24, 2000 the genome sequence of
Drosophila (a key model organism for biomedical researchers) was published in
Science. Celera started its sequencing of this genome in May 1999.
It was the product of the finest scientific collaboration I have ever
participated in and included Dr. Gerry Rubin and the Berkeley Drosophila Genome
Project members along with the European Drosophila Genome Project
members. Over 40 scientists from around the world came to Celera in
November of 1999, to participate in an annotation jamboree, to
begin the process of annotating the Drosophila genome. In total we identified
13,601 genes of which only 2,500 genes were previously known. Our
publications in Science had 240 authors from eight different countries (see
attached list of authors). The gene sequence exceeds community quality
standards. The genome sequence and annotation of the Drosophila genome
published in Science is equal to, if not superior to the quality of the
recently published sequence of human Chromosome 22 and the C. elegans genome
published in 1998. The Berkeley Drosophila Genome group will complete the final
process of minor gap sequencing in a few months time. We are confident
that the final product will be the equal of the high quality standard
established by TIGR in the 14 genomes and chromosomes that they have published
in the scientific literature to date.
While the whole genome shotgun strategy
clearly worked with the Celera's data set alone, I believe that all of the
collaborators in this project would say that the combination of Celera's
whole genome shotgun strategy and the draft sequence from the BAC-by-BAC
approach has led to a greater level of knowledge about the Drosophila genome
and a higher quality sequence than either approach alone would have
provided. Another testimonial to the quality of the result of the whole
genome shotgun strategy in Drosophila is the fact that the NIH-funded effort to
sequence the mouse has now adopted our technique, despite their strong initial
protests that whole genome shotgun sequencing could not work. They realize that
the strategy is faster, cheaper, and of equal or greater quality, than the
conventional approach.
At this juncture it is important that we
discuss the different elements that go into evaluating the quality of a genome
sequence. You have heard much and you will hear much more in the near
future about finished sequence, complete sequence, and
draft sequence. I would like to explain the process of
creating sequence and the factors that should be evaluated in judging its
quality and usefulness for researchers.
1. Library Construction - In order
to sequence a genome it is broken up into smaller fragments that are easily
sequenced. These fragments are then inserted into bacterial hosts or
vectors that are used to replicate each fragment. This collection of
bacterial clones with the fragments of the genome inside is a DNA
library. If a library is poorly constructed the sequence will be
poor. The whole genome shotgun strategy, employs far fewer DNA libraries
than the conventional BAC-by-BAC approach to genome sequencing, however the
libraries used must be of the highest quality. Libraries must have DNA
segments of uniform size and at least two libraries with different length
segments must be made. One is 2,000 base pairs in length and the other is
10,000 base pairs. Once these segments have been obtained they are inserted
into plasmids (i.e., a DNA structure that can be replicated within the bacteria
that is different than its genome). These plasmids are then placed into
bacteria that serve as vectors or hosts to the DNA. 2. Sequencing Phase - Electrophoresis is the
method of separating DNA fragments. An electric current is passed through a
medium containing a mixture of DNA, and DNA molecules of different size travel
through the medium at different rates, depending on its electrical charge and
size. Separation is based on these differences. In the past large plates
containing gels were the media through which the fragments traveled. In
the ABI PRISM® 3700 gels are contained in very thin capillary tubes that
allow fast sample processing, small sample volumes, and the ability to
eliminate manual gel pouring and sample loading tasks. Fluorescent dyes
matched to each of the four letters of genetic code are attached to these
fragments. The fragments then flow out the end of the capillary tube
where each fluorescent dye is excited by laser signaling and resulting order of
base pairs or genetic letters is determined. At this stage the genome is
really a collection of fragments. Genes are not fully assembled nor is
their relative position within the genome well defined. 3. Assembly and Order This is the most
critical phase of the entire process. A genome sequence can only be
considered to approach completion if it is accurate in the identification of
the different base pairs and, most importantly, if they are in the proper
order. The Drosophila genome is properly orderedthat is, the
different pieces that have been sequenced are assembled in the correct
orderand the sequence is highly accurate. 4. Annotation Once the sequence is
obtained with genes assembled and properly located within the genome, one can
begin the process of identifying the gene and describing its function.
The quality of the sequence and the ordering will determine how accurate the
preliminary annotation will be. In the case of the C. elegans genome the
initial annotation found over 18,000 genes. Just two weeks ago that
number was revised to approximately 12,000. When Chromosome 22 was
published 564 genes were identified. There are publications in press at
this time that suggest that the number of genes on chromosome 22 is
approximately 1,000. These statistics suggest that the public programs
have fallen short in their annotation efforts.
In January of 2000 Celera announced that it
had unordered, but highly accurate, fragments covering 90% of the genome
(including some of the public data). The public effort has announced that
they are approximately two-thirds of the way to this same point, today. This is
the so-called draft sequence, a term introduced by the public
effort but without scientific meaning. Celera has now reached a point in
its program where we can assemble and order our data to produce a complete
sequence. As was seen in the Drosophila collaboration, Celera's
ordered data, combined with the public data could produce a more accurate
version of the human genome sequence faster than either data set could
alone. One of the benefits of collaboration between the public effort
would be just this. I will discuss this further later in my
testimony.
With the emergence of Celera on the scene
many in the public effort exhibited a new sense of urgency and a competitive
drive. This is mostly for the good. We all benefit from the
accelerated efforts. As I have said, at Celera we understand that
speed matters. But Mr. Chairman, I find myself in the peculiar
position of warning you that in the race to complete a draft human sequence,
the publicly funded Human Genome Program may be at a stage where quality and
scientific standards are sacrificed for credit. On Monday it was reported
in Time Magazine that the public effort was doneand that the race
to complete the genome sequence was over. I have read that Dr. Collins
said that the draft human genome sequence they are about to announce has only a
few gaps and is 99.9% accurate. However, analysis of the public data in
GenBank reveals that it is an unordered collection of over 500,000 fragments of
average size 8,000 base pairs. This means that the publicly funded
program is nowhere close to being done.
Two years ago it was reported that Dr.
Collins had said Celera would produce the Mad Magazine version of the human
genome (USA Today, June 9, 1998). From its formation, Celera's goal
has been to produce a high-quality human genome sequence that will stand the
test of time, and we remain committed to that goal. The Subcommittee should
work to guarantee that the federal effort continues to work towards an
accurate, ordered, and well-annotated sequence. You should urge its
investigators to keep their standards at the highest levels established in the
genomics field and not rush to publish preliminary data for the sake of
claiming priority. There is no example of the results of any genome
sequence project being published in the scientific literature prior to meeting
the established quality, order and completeness standards. It would be
poor science policy and a terrible precedent for the young genomics
field. At your previous hearing, Dr. Olsen of the University of
Washington warned about a slippery slope of data quality if the
established standards for completeness were compromised in any way simply to
appear to win the genome race.
Mr. Chairman, since your earlier hearing,
Celera has had many technical and scientific successes. We moved into our
facilities in August of 1998. Since then we constructed the world's
largest sequencing facility and are especially pleased with the accuracy of the
sequence from the individual DNA samples. An analysis of 8000 samples of
Drosophila sequence indicated that 7992 had an accuracy of greater than
99.5%. The remaining 8 samples were greater than 98%. With
individual sample accuracy at this high level we were able to assemble the
non-repetitive portions of the fruit fly genome to an accuracy of greater than
99.99 percent. Because of our paired-end sequencing strategy we have
discovered that we were able to use far fewer sequencing samples than we had
originally planned. This gives us confidence that we will be able to have
a high quality, accurate, and well ordered sequence for human with our current
level of genome coverage.
Our data center became operational at the
beginning of 1999. Our partner, Compaq Computers, has supplied us
with about 800 Alpha EV6 and EV67 processors with 64-bit architecture and over
80 terabytes of storage for our data. Compaq tells us our computer center
is comparable to those at the Department of Energy's Defense
Laboratories--Sandia and Lawrence Livermore. We have installed over 200
miles of fiber optic cable and 200 miles of copper cable to handle the data
flow. This center was constructed not only for the essential task of
assembling genomes using the whole genome shotgun strategy, but also for
serving our customers and providing them with unprecedented computational power
for their research and analysis.
We have also had many business successes,
as I will touch on later in my testimony; however, I believe that we have to do
a better job in communicating our business objectives to you and the
public. For example Mr. Chairman, your own local newspaper, The Los
Angeles Times has misunderstood and therefore misinterpreted our business model
and objectives. The result is a great deal of confusion about Celera and
our activities. Celera is the only genomics firm that is using its sequencing
power to directly sequence the human genome. As an information company, Celera
is designed to assist researchers rather than focus on the development of new
pharmaceuticals. Another feature of Celera that distinguishes us from the
business models of many of our competitors is that we provide our data and
information without the inherent deterrent of requiring database users to pay
onerous royalties on the discoveries they make with our data (often referred to
as reach-through rights). We have already entered into
third-party agreements that bind us to this.
How will the company build a sustainable
business from its genomics and bioinformatic tools?
One of Celera's founding principles is
that we will release the entire consensus human genome sequence freely to
researchers on Celera's Internet site when it is completed. We
believe that this is in the best interests of both science and our company,
since it will allow researchers to advance science and medicine and at the same
time be introduced to Celera's high quality data and software tools.
We will place no restrictions on how scientists can use this data, they can
publish research results derived from this data, or seek intellectual property
protection on discoveries using this data. The only protection that we have
indicated that we would seek is database protection, as exists in Europe, to
inhibit other database companies from selling the Celera database.
Our goal is to make the complex, sometimes
overwhelming, and ever-increasing volumes of biological information more
accessible and useful to researchers in academia and industry. Toward
that end, we are creating an unparalleled library of genomic information in our
databases. Annotation of the data by Celera scientists using an array of
bioinformatics tools will act as the platform for developing a range of
products and services. We will offer these tools in a manner similar to
the models used by other information companies, such as Lexis-Nexis, Bloomberg,
and AOL. The need for services such as these will only increase as the
volumes of information and the complex interrelated nature of that information
increase. Pricing for subscriptions to this service will vary
appropriately, depending on the product, the customer, and the
application. We will provide value-added information to academics and
other non-commercial researchers at reasonable rates, naturally bounded by
those customers' resources and appraisals of the value-added.
Celera currently has five large
pharmaceutical companies as database subscribers. The initial partners,
Pharmacia Corporation, Novartis, and Amgen, provided Celera with input for
improvements in our data delivery systems and software. Two additional
pharmaceutical subscribers have joined us since we began sequencing the human
genome in September of 1999Pfizer and Takeda Chemical Industries, Ltd. of
Japan. Celera began offering web-based access to its databases and tools
in March 2000. This access should be ideal for academics and smaller
biotech companies. These subscribers can have access to all Celera
databases, tools, and annotation, including the human genome. Our goal is
to have all major commercial life science companies and academic biomedical
research institutions as subscribers in the future.
With this as context I would like to
address the confusion that has arisen over the accessibility of our data, in
particular the accessibility of our data on the human genome. We have and
will continue to react to claims that Celera intends to withhold information
and delay progress, particularly when our fundamental mission is to accelerate
the dissemination of high quality, accurate information. Let me
emphasize--our data on the human genome is currently available to those
subscribing. Our vision is that the list of subscribers will be very
long. Let me draw an analogy Mr. Chairman. When you pick up a
newspaper at your doorstep, you consider it quite accessible. You
probably do not even remember that you are paying a subscription to have that
access and you certainly don't claim that the newspaper company publishing
it is being secretive or restricting access to news about current events just
because you pay a subscription fee.
When you understand that data accessibility
is our business and that our commitment to make the genome freely available is
integral to that business you can also understand our consternation at the
confusion created by the recent joint statement of President Clinton and Prime
Minister Blair. Their statement, a simple re-statement of the existing
policy for the publicly funded project, when extended to companies engaged in
genomics is certainly no obstacle to Celera's business model. We
issued the following at the time of their statement:
Celera Genomics welcomes the
statement. Its own mission is completely consistent with the goals of
assuring that the world's researchers have access to this important
information to enable advances and discoveries that will improve the human
condition. Since the announcement of Celera's formation we have made
a clear commitment that upon our completion of the consensus human genome we
would publish it in a peer-reviewed scientific journal and make it available to
researchers for free.
Although the joint statement was on its
face and in fact harmless, it did start a fall in the NASDAQ value over a
17% decline and loss of over $50 billion in market capitalization in the
biotechnology sector in two days, and this has continued to decline
dramatically since that time.
Celera's Human Genome
Database
On January 10, 2000 we announced that we
had DNA sequence in our database covering 90 percent of the human genome. As a
result of that extensive sequence coverage of the 23 pairs of human chromosomes
and based on statistical analysis, we believed that greater than 97 percent of
all human genes were represented in the Celera database. The sequence data,
developed from randomly selected fragments of all human chromosomes, contained
over 5.3 billion base pairs (letters of the human genetic code) at greater than
99 percent accuracy. The 5.3 billion base pairs represented 2.58 billion base
pairs of unique sequence that had been calculated to cover 81 percent of an
estimated genome size of 3.18 billion base pairs. These data, combined with all
of the "finished" and "draft" human genome sequence data from the public
databases, gave Celera coverage of 90 percent of the human genome. Since
that announcement we have continued to increase the coverage of our data.
Progress is such that we have modified our earlier estimated completion date of
sometime before the end of 2001 to 2000.
Celera's approach to sequencing the
genome entails sequencing the entire genome of a number of different
people. It differs from that of the public effort in that the public
project makes a single composite genome from portions of a number of different
people. Celera's approach allows us to build a database for studying
the genetic variations between individuals at the same time we are deciphering
the consensus human genome.
Key to our progress at Celera Genomics is
that we have assembled an exceptional group of employees. We have
excellent technical people operating our sequencing factory. Our
biologists, software engineers, information technologists, mathematicians and
bioinformaticians are some of the finest to be found anywhere. This
talent has not gone unnoticed. Several companies such as RhoBio S.A., a
joint venture between Rhone Poulenc Agro and Biogemma, has formed a 3-year
agreement, to use expression studies to discover genes related to traits of
importance in maize with Celera AgGen, our agricultural business.
Similarly, we have entered into a 3-year gene discovery agreement with
Rhône-Poulenc Rorer (RPR), the pharmaceutical subsidiary of
Rhône-Poulenc, S.A.
We have also organized a team for
discovering new genes in humans. This activity has also been the subject
of confusion. Some have even said that Celera's current activities
are different than those described earlier to this Subcommittee. That is
not true. Since its founding we have said that Celera will seek to
develop on its own 100-300 medically important genes for use by pharmaceutical
and biotechnology companies from among the 100,000 human genes. We will
give preference in licensing these potential therapeutic targets to our
subscribers and we will license them on a non-exclusive basis. As I said
at the earlier hearing, we are not attempting to patent the human genome, any
of its chromosomes, or any random sequence. Celera's announced last
fall that the company had made 6,500 provisional patent applications.
This was the basis of charges in the Los Angeles Times that I had misled the
Subcommittee. Those leveling the charges did so apparently because they
were not familiar with a provisional patent application. A
provisional application serves to notify the Patent Office that a discovery has
been made in the event that there are other patent applications for the same
discovery. A patent will not be issued on the discovery unless an actual
patent application is filed within one year of the provisional application
filing. During this twelve-month period, Celera will decide with its
pharmaceutical partners which genes are medically important enough to file
patent applications. This approach is similar to the research strategy taken by
pharmaceutical companies. In their drug development process they start
with thousands of compounds and reduce the number to a few promising compounds
as more information is gained. Likewise, Celera will look at thousands of
genes before determining which have the greatest relevance for human health and
are most likely to be developed into commercial products by pharmaceutical
companies. Other companies have different intellectual property
strategies and I cannot speak for them, Mr. Chairman, but I urge you to
consider that changes to patent law have to be considered in the context of
what they will do pharmaceutical companies' efforts at drug
discovery.
Celera endorses the Patent and Trademark
Office's recently announced position, which is supported by centuries of
precedent. Fundamental patent requirements of utility, novelty, and
non-obviousness are complete and effective protections to the fear propagated
that the human genome will be patented or that the revolution will be
slowed. Celera does not believe the genome or other mere products of
nature can be patented, and our publication commitments demonstrate that we
will not try to so patent it. Consistent with long-established principles
of patent law, we do expect that patents and other protections for subsequent
inventions using the genome alphabet and showing utility, novelty, and
non-obviousness are not only appropriate, but required to assure that
incentives continue to fuel the genomic revolution.
Let me take a moment to review for the
Subcommittee why we are even discussing patenting human genes.
Pharmaceutical and biotech companies use these genes as the direct means of
producing drugs such as insulin and as targets to develop drugs.
The cost of taking a single drug through the Food and Drug Administration
approval process can range from $300 to $800 million. Having patents on
the drugs allows the company a period of time when they can exclusively use
these patented discoveries for commercial purposes. This provides them a
period in which to try and recover their drug development costs. This rationale
for patenting is one that is fully accepted and supported by the NIH.
Recently, during the FY2000 budget considerations, Dr. Harold Varmus, past
Director of NIH explained the importance of patenting to assure commercial
availability to the general public of these scientific discoveries to the U.S.
Senate Appropriations Committee. He said: ...patenting of newly isolated genes whose functions and
medical importance are identifiable at the time of patenting can be a spur to
the development of the next steps that would benefit the public, and we believe
that has been the case in the instance of several recently cloned
genes.
However, Celera and many of our
pharmaceutical partners are very concerned that the patenting of random genome
and EST fragments by many companies and research institutions will restrict
their access to key targets required for drug development. An important
aspect of Celera's policies is the nonexclusive licensing of drug
targets.
How does Celera respond to the concerns of
scientists who worry that patenting gene sequences and putting such basic
information in private hands will discourage research outside of the drug
companies that own the rights to the information? Under the US and
European patent systems, researchers are free to conduct basic research for
non-commercial purposes on others' patented discoveries. While some
hypothesize that patents on genes will generally inhibit research, the facts
indicate otherwise. For example, a patent was granted on the BRCA1 gene
associated with breast cancer in 1993. Since that time, over 721 basic
research papers have been published on the BRCA1 gene, and tens of further
patent applications on important inventions, including genetic tests related to
the BRAC1 gene, have been filed by individuals in universities and
companies. Also, Celera's policy of licensing genes on a
non-exclusive basis will assure that gene discoveries are available to
manynot just one.
I would like to address one more topic that
has been the subject of confusion before closing. It is Celera's
willingness to collaborate on the sequencing of the human genome with the
public effort. Prior to announcing the formation of Celera I met with
then-Director Harold Varmus and Dr. Collins. I made the same offer of
collaboration to them as I did to Dr. Rubin for the Drosophila genome.
Whatever the reasons it did not come to pass as the Drosophila collaboration
did. We also tried to form a collaboration with the Department of Energy,
the founding agency for the U.S. Human Genome Project. The NIH and
Wellcome Trust objected and the effort was sidelined. Recently, we
received a much-publicized letter from the NIH and Wellcome Trust apparently
calling an end to further discussions. I stated in my letter of response
dated March 7, 2000 (attached) that we continue to be interested in pursuing
good faith discussions toward collaboration. While both Celera and the
public effort can achieve our shared goal of producing an accurate ordered
version of the genome on our own I believe the collaboration on Drosophila
proved we can produce that product faster and better by working
together.
Conclusion
When PE Corporation and I announced the
creation of Celera in May 1998, it was based on a shared vision of sequencing
the human genome as the basis of accelerating a revolution in biology and
health care. Financed exclusively by private investment, we brought
together unique technologies and capabilities within a start-up enterprise to
pursue this seemingly impossible goal. With hundreds of others joining in
this effort, Celera already has exceeded its own expectations and continues to
evolve as a participant in this exciting revolution.
It is has been Celera's consistent belief
that the sequencing of the human genome is the first, not the last, chapter of
this revolution. The final chapter will entail a complete understanding
of life's processes, such that disease and illness finally can be treated and
cured directly at the source. We envision a day when medical treatments
involving the likes of radiation and chemical poisons, with their insidious
side effects and trial-and-error uncertainty, are considered medieval
anachronisms.
This day will not come tomorrow or during
the next year. Nor will any one person, company, or organization
facilitate this day. Revolution requires far more than one soldier.
Celera's business model acknowledges this and, rather than internalizing the
task ahead, centers on a philosophy of facilitating others in the
revolution. There will be almost limitless opportunities ahead, and we
abhor any notion to think we can or should "go it alone".
Celera looks forward to several roles in
the revolution, but foremost are that of instigator and facilitator. This
philosophy underlies Celera's most fundamental mission -- to discover and
disseminate genomic, proteomic and related information. We believe this
mission can be pursued in a way that serves both science and our
business. In fact, we believe that entrepreneurial efforts such as Celera
are the best way to progress. Speed matters discovery
can't wait.
Office of Science
and Technology Policy 1600 Pennsylvania Ave, N.W Washington, DC 20502
202.395.7347
mailto:ostpinfo@ostp.eop.gov
|