DNA-Encoded Compound Libraries as Open Source: A Powerful Pathway to New Drugs
Graphical Abstract
“… We envisioned an iterative system where a unique DNA tag identifier that encoded the event was appended to each newly formed molecule. These vast collections of molecules are known today as DNA- encoded chemical libraries (DECLs), and allow scientists to do selections on the benchtop that previously required access to large and complex high-throughput screening centers …” Read more in the Guest Editorial by Richard A. Lerner and Sydney Brenner.
At about the same time that libraries of biological molecules were enjoying success, chemists began to construct vast libraries of organic molecules by using methods such as “split and pool” synthesis in a field that became known as combinatorial chemistry. The problem, however, was that although these libraries could be very large, they were not easily deconvoluted for determining the nature of the active molecules. Thus, it is one thing to build large libraries of molecules that can be used to bind to targets, but its quite another matter to identify the active molecules. A library of large size is useful for covering chemical space, but in chemistry large size is the enemy of identification!
In 1992, we were, of course, mindful of advances in libraries of biological molecules and were thinking about how synthetic small molecules in organic libraries could be made to enjoy some of the advantages of biological molecules. It occurred to us that if we could link the diversity of chemical synthesis to the power of genetics we could obviate some of the problems with combinatorial libraries of organic molecules. Thus, we envisioned an iterative system where a unique DNA tag identifier that encoded the event was appended to the newly formed molecule for each step in the chemical synthesis. In the end, one had a vast collection of molecules where each one carried a unique DNA tag that encoded its synthetic history. This was the first time in organic chemistry where organic molecules carried information beyond their intrinsic structures. It was of special importance that the molecules in the library contained information capable of replication so that it could be retrieved when only a few molecules are present. While we referred to these libraries as “encoded combinatorial libraries today they are most often referred to as DNA-encoded chemical libraries (DECLs).
Much has happened since our initial studies. The requirement for orthogonal and compatible chemistries has been greatly simplified by the use of enzymes to construct the DNA tag. Major advances in next-generation DNA sequencing and information technology have allowed scientists to identify hits from even the largest libraries. But the most important advance is that methods to construct diverse DNA-tagged organic compounds in the library in a way that does not disrupt the DNA continue to emerge from the synthetic organic community. If inventing new DNA-compatible reactions continues to capture the attention of chemists, there is little doubt that the nature and diversity of the libraries will continue to improve. In these enhanced libraries, we expect that the efficiency of drug discovery will scale with library size.
With these advances in hand, the DECL field has exploded with the requisite participation of academic laboratories as well as biotechnology and pharmaceutical companies. Recently, the Fifth International Symposium on DNA-Encoded Chemical Libraries was held at the ETH Zurich, where one could access the state of the field. First, virtually every major pharmaceutical company and biotechnology company was present, thus giving witness to the fact that DNA-encoded chemical libraries have arrived as a key component of the armamentarium of drug discovery. Many useful hits for important targets were described and some of the discovered molecules have already entered the clinic.
Much time was spent comparing high-throughput screening (HTS) to the DECL approach both in the presentations and informal discussions. There are, of course, the obvious comparisons of cost, infrastructure needs, library size, and chemical space that is addressed. But, most importantly, the question of who is enabled by DNA-encoded chemical libraries was raised. Clearly, big and small pharmaceutical companies can benefit tremendously. Nevertheless, a somewhat surprising but related aspect of this question concerns how can scientists at academic institutions benefit from DECLs? The magnitude of the problem is exemplified by considering the National Institutes of Health (NIH) who spends over 30 billion (30×109) dollars a year, much of it used to discover and analyze molecules that could be important to disease. While the NIH understand that discovery is the cornerstone of disease control and prevention, the people who pay the bills wish to see these discoveries quickly turned into drugs, and in many cases this means small molecules. The problem is that most biologists do not have access to large compound libraries Even if pharmaceutical companies were to make their HTS libraries available, the standard academic has no way to efficiently use them. Thus, we are spending billions of dollars to discover new proteins, many of which could be drug targets but we have no general way to bridge the gap between the biology and the chemistry. This bridging is essential to efficiently generate new drugs from the knowledge generated by the academic community. Fortunately, because of the advent of DECLs, one doesn't need a building full of robots to enable biologists. Rather, an Eppendorf tube containing a trillion (1012) different DNA-encoded molecules and a PCR machine will suffice. Also, the usual secrecy barriers that complicate a wide distribution of compound libraries are overcome. However, in the case of DNA-encoded compound libraries only the provider has the code so that the nature of the “hits” can only be uncovered after information has been transferred from the screening scientist to the provider of the library.
Thus, we envision a system that operates as follows. A pharmaceutical or a biotech company provides an encoded library, but not the code, to a researcher. The researcher does a binding assay on the benchtop and carries out the PCR on the binding molecules to read the code in the form of DNA sequences. These sequences are returned to the owner of the library who now knows the nature of the molecules bound as well as some of their structure–activity relationships from truncated molecules. If the nature of the discovered molecules looks interesting, then everyone wins. The pharmaceutical company gets a drug lead and the researcher gets a tool compound to help further dissect the role of the target molecule in health and disease. Of course, such collaborations will be accompanied by license relationships so that the partners both share in a commercial success. In the end, the problem reduces to the question of how will creators of huge libraries make them an open source for academia?
We suggest that it may be important to have an intermediary agency (or company?) that acts like a clearing house and facilitates the interaction between the creator of the library and the academic user. In this way, the great gains in efficiency that DECLs bring to drug discovery are not lost to endless bureaucracies. Also, a central provider can annotate the libraries in terms of overall efficiency and usefulness for a given class of targets. There is little loss and potentially much gain for the creator of a DECL in making it available to an intermediary agency. This is because the therapeutic space now studied will be very large and contain biologically validated targets that the library generator did not have, and, most importantly, did not even contemplate!
This focus on making DECLs generally available must not obviate the driving force to create new libraries with greater access to chemical space. It is critically important that the synthetic organic community continues to invent new DNA-sparing reactions that can be used to increase the chemical diversity of the libraries.
As the use of DECLs continues its rapid expansion, some questions remain. Now that several groups have successfully created truly giant libraries, we need to understand more about how to measure the importance of increased chemical diversity as we move from libraries containing millions of molecules to those containing 100 trillion or more. What is the trade-off between library sizes versus smaller libraries that contain chemical matter that is superior from the point of view of drug discovery? We need to learn more about how DECLs can not only assist in target-based drug discovery but also help in the process of lead discovery. One way that this has already occurred is when truncated molecules in the library are selected. All large libraries contain a variety of truncated molecules because not all synthetic schemes go to completion. Rather than being a nuisance, these truncated molecules can give information as to the active component(s) of the synthetic molecules. When the binding efficiency of truncated molecules is compared to that of the holo-molecule they are part of, one gets a structure–activity relationship that informs about the importance of various functional groups.
In summary, by linking the power of genetics and the power of chemistry a new field has emerged. Basically, scientists can now do selections on the benchtop that previously required access to large and complex high-throughput screening centers. General access to DECLs will almost certainly increase the efficiency of drug discovery by the academic scientific community. As an established field, the DECL approach will continue to move from the art of the doable to the art of the desirable.