Volume 4, Issue 3 e317
REVIEW ARTICLE
Open Access

Deep learning for predicting synergistic drug combinations: State-of-the-arts and future directions

Yu Wang

Yu Wang

Putuo District Ganquan Street Community Healthcare Center, Shanghai, China

Search for more papers by this author
Junjie Wang

Junjie Wang

Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China

Search for more papers by this author
Yun Liu

Corresponding Author

Yun Liu

Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China

Department of Information, the First Affiliated Hospital, Nanjing Medical University, Nanjing, China

Correspondence

Yun Liu, Department of Information, the First Affiliated Hospital, Nanjing Medical University, Nanjing, 210029, China.

Email: [email protected]

Search for more papers by this author
First published: 17 June 2024

Abstract

Combination therapy has emerged as an efficacy strategy for treating complex diseases. Its potential to overcome drug resistance and minimize toxicity makes it highly desirable. However, the vast number of potential drug pairs presents a significant challenge, rendering exhaustive clinical testing impractical. In recent years, deep learning-based methods have emerged as promising tools for predicting synergistic drug combinations. This review aims to provide a comprehensive overview of applying diverse deep-learning architectures for drug combination prediction. This review commences by elucidating the quantitative measures employed to assess drug combination synergy. Subsequently, we delve into the various deep-learning methods currently employed for drug combination prediction. Finally, the review concludes by outlining the key challenges facing deep learning approaches and proposes potential challenges for future research.

1 INTRODUCTION

The practice of combining multiple pharmaceutical agents for disease treatment, known as drug combination therapy, is commonly employed for complicated diseases including cancer,1-3 hypertension4, 5 and infectious diseases.6-8 Compared to monotherapy, drug combination therapy offers several advantages such as improved efficacy and reduced side effects and toxicity. However, it is important to note that not all drug combinations exhibit synergistic effects; in fact, some combinations may show antagonistic effects.9 For example, remdesivir with amodiaquine to treat coronavirus disease 2019 shows a strong antagonistic effect.10

Early studies on screening drug combination synergy typically rely on wet-lab experiments, which are characterized by their time-consuming nature and considerable cost implications.11 Furthermore, drug combination trials carry inherent risks of adverse side effects or potentially harmful reactions. Consequently, preclinical strategies have emerged as vital tools for identifying and assessing drug combinations within the context of cancer research.

With the accumulation of large-scale datasets, computational approaches have emerged as an alternative for efficiently screening and prioritizing candidate drug pairs. For example, Li et al.12 proposed a random forests-based method for predicting synergistic anticancer combinations. Meng et all.13 trained an extreme gradient boosting (XGBoost) to predict the synergistic drug combinations using the features by runing struc2vec on protein interaction network. Sidorov et al.14 also proposed an XGBoost-based model, but it trains a unique model for each cell line.

Deep learning-based methods have garnered significant attention across various research domains in bioinformatics, encompassing areas such as drug-target interaction,15, 16 hydrogel design and optimization,17 Organoids18 and drug design.19 These methods employ diverse architectures, including fully connected neural networks (FCNN), convolutional neural networks (CNNs),20 Transformer networks21 and Graph Neural Networks (GNNs),22 which have demonstrated their efficacy in extracting meaningful features from raw data and effective modelling complex and high-dimensional datasets. Consequently, the prediction of drug combination synergy through deep learning approaches has emerged as a promising and advancing trend in this field.

This review comprehensively examines recent applications in the realm of deep learning-based drug combination synergy prediction. We start with an abridged overview of the synergy metrics employed in the drug combination synergy prediction task, alongside the datasets utilized for model development. Subsequently, the review delves into the latest developments in deep learning-based drug combination synergy prediction models. Finally, we outline the limitations and future research.

2 METRICS OF SYNERGISTIC EFFECT IN DRUG COMBINATIONS PREDICTION TASK

In deep learning-based drug combination synergy prediction models, synergy scores serve as the gold standard (label). However, a central challenge lies in the lack of consensus on a universal synergy criterion for characterizing the degree of synergistic effects. While several metrics exist, they each present distinct underlying assumptions and may not comprehensively capture the complexity of drug combinations. Among the most widely used metrics are the Loewe additivity model,23 Bliss independence model,24 highest single agents (HSAs),25 and zero interaction potency (ZIP).26

The Loewe additivity model assumes no self-interaction between drugs and interchangeability within a combination. According to this model, for two drugs a and b, the expected effect under Loewe additivity satisfies the equation:
a A + b B = 1 $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\frac{a}{A} + \ \frac{b}{B} = \ 1\ } \end{array} \end{equation}$$ ()
where A and B represent the equivalent doses of drug A and drug B alone that produce the expected effect.
The Bliss operates under the assumption that the relative effect of each drug is independent of the other. The expected effect E A B ${{E}_{AB}}$ is thus articulated as:
E A B = E A + E B E A B $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\ {{E}_{AB}} = {{E}_A}\ + {{E}_B} - {{E}_{AB}}} \end{array} \end{equation}$$ ()
where E A ${{E}_A}$ and E B ${{E}_B}$ denote the monotherapy effect of drugs A and B, respectively.
The drug combination synergy effect is determined by the maximum effect of each drug under the HSA metric. Thus the expected effect E A B ${{E}_{AB}}$ can be given by:
E A B = max ( E A , E B ) $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\ {{E}_{AB}} = {\mathrm{max}}({{E}_A}\ ,{{E}_B})} \end{array} \end{equation}$$ ()

The expected effect under the ZIP metric is based on the concept that two drugs do not potentiate each other. Except for these metrics, there also exist other metrics including Combination Index (CI),27 S score28 and ComboScore.29 Even though these metrics can provide a view to quantify synergy, no consensus exists. Besides, most of them do not distinguish the potency and efficacy. Recently, multi-dimensional Synergy of Combinations (MuSyC) aims to provide a more consistent and unbiased metric and model with the synergy by enhancing potency and/or efficacy of the single agents.30, 31 Following the MuSyC idea, SynBa uses the Baysian approach to estimate the synergistic.

As deep learning-based models are increasingly applied in real-world settings, the development of a standardized synergy metric becomes even more critical. While efforts towards unifying existing frameworks are ongoing, achieving consensus remains an unresolved issue. In the meantime, it is imperative that deep learning models be evaluated across multiple metrics to demonstrate their efficacy and reliability for future applications.

3 DEEP LEARNING BASED DRUG COMBINATION PREDICTION

The schematic representation of drug combination synergy prediction is illustrated in Figure 1. The foundational pipeline comprises four distinct stages: dataset preparation, input data processing, deployment of deep learning models, and model evaluation. In the dataset preparation stage, several crucial tasks are undertaken, including the selection of appropriate synergy score metrics and the filtering of the dataset to ensure its relevance and quality. Subsequently, careful consideration is given to the input data for the deep learning models, with an emphasis on capturing the intricate interactions between drugs and their effects. The design of the deep learning models constitutes a critical aspect of the process, requiring meticulous attention to detail to effectively capture the underlying complexities of drug combinations. Finally, the evaluation stage involves assessing the performance of the model using multiple metrics and methodologies to ensure robustness and reliability. we present a detailed investigation in terms of input data, datasets, models and evaluation in Figure 4.

Details are in the caption following the image
Schematic view of the generic pipeline of deep learning-based models.

3.1 Datasets

The mechanisms underpinning deep learning-based drug combination prediction models are inherently data-driven, with their performance contingent upon the availability of large-scale datasets. In Table 1, we enumerate several datasets for reference. Except for these six datasets, there also exist some databases such as FORCINA, CLOUD, etc. It should be noted that these datasets show a significant overlap. These datasets play a crucial role in supplying the necessary volume and diversity of data essential for training and validating the predictive algorithms embedded within deep learning models. The accessibility of these drug synergy combination datasets enables the development and refinement of deep learning-based models, thereby advancing research in the field.

TABLE 1. Summary of drug synergy combination datasets.
Dataset # of drugs # of cell lines # of drug pairs # of combination References
DrugComb 8397 2040 74421 751498 32
O'Neil 38 39 583 22737 33
DrugCombDB 2887 124 448555 6055926 34
ALMANAC 104 60 5232 304549 29
SYNERGxDB 1977 151 22507 536596 35

It is important to note that the aforementioned drug synergy combination datasets primarily stem from in-vitro experiments and emphasize synthetic compound drugs. However, recent efforts have introduced in-silico models, such as CDCDB, curated from sources like ClinicalTrials.gov.36 Additionally, the NPCDR dataset offers comprehensive molecular regulation data pertaining to natural product-based drug combinations across various disease cell lines and model organisms37. These diverse datasets contribute to a more comprehensive understanding of drug synergy.

3.2 Input data

For deep learning-based models to accurately predict the synergy effects, it is imperative to incorporate relevant drug and cell features into the models. A thorough examination of the features utilized in recent models reveals that they can be categorized into two distinct classes: biological attributes and biological knowledge.

Biological attributes pertain to the inherent characteristics of drugs and cells that directly impact their interactions and subsequent effects. For drugs, common biological attributes include SMILES representations, drug structures, descriptors, and fingerprints. Conversely, for cell lines, key biological attributes encompass gene expression profiles, genomic mutations (MUT), copy number variations (CNV), and other genomic data.

In the domain of deep learning-based models, a variety of methods have been developed that leverage biological attributes to enhance predictive accuracy. DeepSynergy,38 for example, utilizes molecular fingerprints and descriptors to characterize drugs, while gene expression data is employed to profile cell lines. This methodology has consistently demonstrated competitive performance across diverse scenarios. The approach adopted by DeepSynergy has been mirrored in other models such as MatchMaker39 and SDCNet,40 which also rely on similar input data for predictive purposes. Subsequent to DeepSynergy's establishment of this approach, there has been a shift in focus towards innovative representation of drug information. Some recent studies have eschewed the traditional reliance on fingerprints and descriptors, instead opting to model drug structures as graphs. These studies harness Graph Neural Networks (GNNs) to distil structural information from these graph representations. DeepDDS41 and AttenSyn42 exemplify this novel methodology, showcasing the potential of GNNs in capturing the intricate relationships within molecular structures to refine drug synergy predictions. Beyond drug feature exploration, research has also expanded to incorporate multi-omic data for cell line characterization. SynergyX, for instance, integrates six types of omic data, demonstrating enhanced predictive performance.43 The findings from this study suggest that utilizing two or three types of omic data can yield robust results.43 In light of the diverse drug features and varied cell line representations, CCSynergy introduces various levels of bioactivity profiles, providing a foundation for future research endeavours.44

On the other hand, biological knowledge encompasses the vast array of information that has been accumulated through scientific research and experimentation. This includes but is not limited to, known drug-target interactions, and protein-protein interaction networks. Early works like TranSynergy45 and GraphSynergy46 use the PPI network to model the drug and cell line representations. KGANSynergy47 uses multiple drug-related knowledge graphs to construct protein-related knowledge graphs for representing the drug and cell line. Except for the exploration of the external resources, some works also model the drug combination into a heterogeneous graph, such as Hypergraphsynergy,48 CGMS49 and HypertranSynergy50.

3.3 Basic neural networks

Based on the related drugs and cell line information, modelling the intricate relationships via the neural networks to effectively predict the synergistic effects is a pivotal pursuit and challenging task. This survey aims to review the prevalent deep learning-based methodologies and categorize their underlying neural network architectures into distinct classes: FCNNs, RNNs, CNNs, GNNs, Attention and Transformer. Figure 2 presents the representative structures of these fundamental neural network types.

Details are in the caption following the image
Basic neural networks for drug combination synergy prediction task.

FCNNs are typically characterized by a set of learned weights and biases applied across all input features, which can be succinctly represented by the matrix multiplication formula y = W x + b $y\ = \ Wx + b$ , where W , b $W,b$ denote the trainable weight matrix and bias vector, x represents the input features.

CNNs aim to capture the local patterns via convolutional operations. The convolutional operation applies several filters or kernels to input data to extract and represent spatial information in data. The convolution can be formulated as:
f g t = τ Ω g τ f t + τ d τ $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\left( {f*g} \right)\ \left( t \right) = \mathop \smallint \limits_{\tau \in {{\Omega}}} g\left( \tau \right)f\left( {t + \tau } \right)d\tau \ } \end{array} \end{equation}$$ ()
where f ( · ) $f( \cdot )$ and g ( · ) $g( \cdot )$ represent the feature and parametric kernel function. Ω ${\mathrm{\Omega \ }}$ represents a neighbourhood in a given space.
RNNs possess a distinct characteristic compared to FCNNs and CNNs which incorporate feedback loops. This allows the network to utilize its output at a prior time step t 1 $t - 1$ as additional input for subsequent time steps t $t$ . As a consequence, RNNs demonstrate remarkable proficiency in handling sequential data. Given the input x t ${{x}_t}$ at current time t $t$ and the corresponding hidden state h t 1 ${{h}_{t - 1}}$ which from the previous time t 1 $t - 1$ output. The mathematical formulation for the update of the hidden state in an RNN is given by:
h t = t a n h W x t , h t 1 + b $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\ {{h}_t} = \ tanh\left( {W\left[ {{{x}_t},{{h}_{t - 1}}} \right] + b} \right)} \end{array} \end{equation}$$ ()
where W $W\ $ and b $\ b$ are the trainable weights and bias, which are shared temporally.
The concept of Attention has initially been utilized in natural language processing tasks, allowing for explicit modelling of long-range dependencies in sequences and handling information derived from diverse representation spaces. An attention function maps queries ( Q $Q$ ), keys ( K $K$ ), and values ( V $V$ ) to an output as follows:
A t t e n t i o n Q , K , V = s o f t m a x Q K T d V $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\ Attention\ \left( {Q,K,V} \right) = \ softmax\left( {\frac{{Q{{K}^T}}}{{\sqrt d }}} \right)V} \end{array} \end{equation}$$ ()
where d $\sqrt d $ is the dimensionality of the query vector.
Building on the foundation of attention mechanisms, Vaswani et al.21 introduced the Transformer architecture, which exhibits exceptional capability in capturing long-range dependencies and contextual relationships within sequential data. The Transformer leverages an encoder-decoder structure. Given an input sequences, the encoder generates a contextualized representation, while the decoder utilizes this representation to generate the output sequences. Both the encoder and decoder comprise multiple identical layers. The core component of the Transformer is the multi-head attention mechanism (MHA). The MHA breaks down the input into multiple attention ‘heads’, each of which focuses on different subspace. These heads operate in parallel, focusing on extracting distinct features from the input. Following individual attention computations within each head, the resulting outputs are concatenated to form a comprehensive representation. Given the number of head as h $h$ , the MHA can be formulated as:
M H A Q , K , V = C o n c a t h e a d 1 , , h e a d h w h e r e ; h e a d i = A t t e n t i o n Q i , K i , V i $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {MHA\ \left( {Q,K,V} \right) = Concat\left( {hea{{d}_1}, \ldots ,hea{{d}_h}} \right)}\\ {where;{{head}_i} = \ Attention\left( {{{Q}_i},\ {{K}_i},\ {{V}_i}} \right)} \end{array} \end{equation}$$ ()
where Q i = X W Q i ${{Q}_i} = \ X\ {{W}_{{{Q}_i}}}$ , K i = X W K i ${{K}_i} = \ X\ {{W}_{{{K}_i}}}$ , and V i = X W V i ${{V}_i} = \ X\ {{W}_{{{V}_i}}}$ denote the respective subspaces defined by the weight matrices W Q i ${{W}_{{{Q}_i}}}$ , W K i ${{W}_{{{K}_i}}}$ and W V i ${{W}_{{{V}_i}}}$ .
GNNs are designed to process graph-structured data. The foundational idea of GNN is to generate node or edge representations by aggregating information from neighbouring nodes. This is achieved by the message-passing function via an iterative process. Given a graph as G = ( V , E ) $G\ = \ ( {V,\ E} )$ , this process can be formulated as follows:
h i = ϕ x i , v j N v i ψ x i , x j , e i j $$\begin{equation} \def\eqcellsep{&}\begin{array}{*{20}{c}} {\ {{h}_i} = \ \phi \left( {{{x}_i},\ {{ \oplus }_{{{v}_j}\ \in N\left( {{{v}_i}} \right)}}\ \psi \left( {{{x}_i},\ {{x}_j},\ {{e}_{ij}}} \right)} \right)} \end{array} \end{equation}$$ ()
where $ \oplus $ is the permutation-invariant aggregation operator, ϕ ( · ) $\phi ( \cdot )$ and ψ ( · ) $\psi ( \cdot )$ are typically implemented as neural networks, which aggregate information from neighbouring nodes N ( v i ) $N( {{{v}_i}} )$ .

3.4 Deep learning based models

In the context of employing deep learning techniques, it is essential to contemplate two principal elements: the backbone architecture of the model and the training objective that guides the learning process. Recent research has witnessed a proliferation of models utilizing backbone architectures such as FCNN, GNN, and Transformers. Regarding the training objectives, single-task learning (STL) and multi-task learning (MTL) have emerged as prevalent strategies. STL models are specifically tailored to optimize the prediction of drug combination synergy. Conversely, MTL models are trained to simultaneously achieve performance on multiple related tasks. These tasks may encompass not only the prediction of synergy but also other pertinent objectives such as drug-drug interaction classification51 and single-drug response prediction.52

The selection of the backbone architecture in deep learning models is primarily influenced by the nature of the drug features and cell line representation. Specifically, when dealing with one-dimensional (1D) data such as molecular fingerprints and gene expression profiles, FCNN often emerge as the architecture of choice. DeepSynergy38 is a representative model that utilizes FCNN and employs concatenated features of drug pairs and cell lines as input. Similarly, AuDNNsynergy53 adopts a comparable FCNN-based network, which incorporates three autoencoders to extract latent features. Principal Component Analysis precedes the input into the FCNN to ensure that the data is represented in a lower-dimensional space, as demonstrated in SynPred.54 KGE-DC55 employs a three-layer FCNN to predict drug combination synergy by integrating information from knowledge graphs (KGs) about drugs and cell lines with molecular fingerprints and gene expression data. Recent models like CCSynergy44 and SynPathy56 also utilize FCNNs but integrate a broader range of drug and cell features to improve prediction accuracy. In contrast to the DeepSynergy architecture, which treats drug pair and cell line features as a single entity, MatchMaker39 introduces a novel FCNN model that uses two subnetworks to individually model the interactions of each drug with the cell line. This design principle has been adopted by MARSY52 and MTLSynergy.57 However, this architecture must satisfy the permutation invariant requirement to ensure consistent predictions regardless of the input order.

Beyond the 1D input data, drug structures and biological knowledge are also utilized as inputs in deep learning models, with GNNs being employed to model these data structures effectively. Drug structures are commonly represented as graphs, where nodes correspond to atoms and edges, represent chemical bonds. This graph representation allows for the application of various GNNs to extract features from the molecular structure, which are crucial for understanding the pharmacological properties of drugs. Studies such as DeepDDS,41 AttenSyn42 and GTextSyn58 have leveraged GNNs to capture these structural features. In contrast to the representation of drug structures, some studies have attempted to design cell line-specific graphs. For instance, MGAE-DC models each cell line as three distinct graphs based on the interactions between drugs.59 This model employs a graph encoder-decoder architecture to capture drug embeddings that are common across different cell lines. Similarly, SDCNet40 constructs unique relational graphs for each cell line and utilizes a relational GCN to learn invariant patterns of drug combinations.

In addition to constructing individual graphs for each cell line, drug combination synergy data can be conceptualized as a graph where drugs and cell lines serve as nodes. CGMS treats this graph as a heterogeneous graph and employs the heterogeneous graph attention network to generate graph embeddings.49 HypergraphSynergy48 and HypertranSynergy50 model this graph as a hypergraph and utilize different hypergraph neural networks to model the synergy prediction task. Furthermore, by incorporating external biological knowledge, early works such as GraphSynergy46 and PRODeepSyn,60 which utilize PPI networks to represent drug and cell line features, have shown comparable performance.

The Transformer architecture has demonstrated its effectiveness across various domains. DTSyn introduces a dual Transformer encoder framework designed to extract correlations among drugs, targets, and cell lines.61 DFFNDDS employs a pre-trained chemical language model that utilizes the Transformer as its foundational element to distil drug features from the SMILES.62 DeepTraSynergy also harnesses the power of Transformers as feature extractors.63 SynergyX takes advantage of the Transformer's capability to model cross-modal biological knowledge.43 Inspired by the success of large language models (LLMs) in natural language processing, several studies have extended the application of LLMs to the domain of drug combination synergy prediction. CancerGPT,64 for example, utilizes LLMs to predict the synergy, demonstrating the potential of transfer learning from language models to biomedical applications. SynerGPT introduces a GPT model to learn drug synergy functions.65

In Figure 3 , we present a detailed report on the parameter count and inference times for a selection of representative models, including DeepSynergy, DeepDDS-GAT, DeepDDS-GCN, HypergraphSynergy, DFFNDDS, SynegyX, MF- SynDCP and PermuteDDS.66 These metrics were obtained using an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz in conjunction with a GeForce RTX 3090 graphics card. It is important to note that this analysis does not encompass all existing models.

Details are in the caption following the image
The inference times and model parameters for some deep learning-based methods.
Details are in the caption following the image
Summary of deep learning-based approaches.

Figure 3 presents a comparative analysis of the efficiency and complexity of the aforementioned models, as measured by their parameter count and inference time, respectively. The ordinate (y-axis) delineates the parameter count for each model, with each represented as a discrete point on the graph. Adjacent to each point, the inference time is annotated, providing a direct indication of the model's operational latency when processing new data. Upon examination of the figure, it is evident that the parameter count varies substantially, extending from approximately 5.86 million parameters for the DeepDDS-GAT model to a more considerable 175 million parameters for the SynegyX model. Con- currently, the inference times exhibit a similarly broad spectrum, with the MFSynDCP model demonstrating the most expeditious performance at roughly 0.12 ms, while the PermuteDDS model exhibits the longest latency, at 38.46 ms.

3.5 Evaluation

The evaluation of model performance is a crucial step in assessing the efficacy of these models. While k-fold cross-validation (CV) techniques are commonly employed, certain studies have adopted specialized CV strategies to better simulate real-world scenarios.67-74 These strategies include:
  • Leave Cell Line Out (LCO): This strategy involves excluding specific cell lines from the training data, enabling assessment of the model's performance across different diseases. LCO variants may exclude a certain percentage of specific cell lines or leave out one cell line entirely.48, 41
  • Leave Drug Out (LDO): In this strategy, a single drug is omitted from the training dataset, allowing evaluation of the model's predictive capabilities for the excluded drug. Like LCO, LDO variants may exclude a certain percentage of specific drugs or leave out one drug entirely.
  • Leave Drugs Out (LDSO): Similar to LDO, LDPO extends the evaluation to drug pairs, excluding certain drug combinations from the training set to assess the model's robustness.
  • Leave Drug Combination Out (LDCO): This strategy involves removing specific drug combinations to evaluate the model's ability to predict the synergy of new drug combinations.
  • Leave Tissue Out: By excluding specific tissue types, this approach tests the model's generalizability to different biological contexts.

By simulating challenges such as predicting outcomes in new diseases or with new drugs, these specialized CV strategies better prepare models for clinical use and drug development.

In addition to CV strategies, other evaluation approaches include independent testing,48, 50 case studies39, 43 and explainable experiments.41, 43, 45, 46 Among these experiments, Interpretability is crucial for understanding why a model makes a particular decision. However, there are limitations in current approaches. Some studies evaluate interpretability only through examples, and evaluations often rely on visualizing attention weights43 or Euclidean distances via dimensionality reduction methods,59 lacking a unified standard for systematically assessing interpretability.

4 CHALLENGES AND FUTURE DIRECTIONS

While current deep learning-based models have demonstrated commendable performance in drug combination synergy prediction, it is essential to recognize the limitations inherent in these models. One such limitation pertains to the input features, where the majority of existing models rely on 1D and 2D information, with the 3D structural in the formation of drugs being notably underutilized. Research has indicated that the three-dimensional conformations of drugs are crucial for understanding Drug-Drug Interactions.75

Furthermore, while trained chemical models have been employed with notable success in drug-related tasks, the specialized models such as GraphMVP,76 MolCLR77 and UNI-MOL78 outperform traditional fingerprint and chemical descriptor-based approaches, as well as SMILES-derived representations. These models offer a more nuanced understanding of molecular structures and their pharmacological properties.

In addition to drug features, the representation of cell line features warrants reevaluation. It has been observed that many models exhibit subpar performance under the LCO strategy,49, 59, 54 which suggests a need to rethink the way cell lines are characterized within these models. The LCO strategy is particularly revealing, as it tests the model's generalised ability for real-world applications.

LLMs have demonstrated remarkable efficacy across a spectrum of natural language processing tasks. This success has catalyzed growing interest in the fields of bioinformatics and biomedicine, where researchers are integrating the sophisticated reasoning capabilities of LLMs with biomedical data.79, 80 Notably, initiatives like CancerGPT have begun exploring the application of LLMs for drug combination synergy prediction.64 Concurrently, the evolution of multi-modal LLMs (MLLMs) has extended the capacity to comprehend information spanning various modalities.81 83 Despite these advancements, the potential of MLLMs in the context of drug combination synergy prediction remains largely untapped.

AUTHOR CONTRIBUTIONS

Yu Wang did a literature search. Junjie Wang and Yun Liu conceived and designed this study. Junjie Wang oversaw the drafting and editing. All authors reviewed the manuscript.

ACKNOWLEDGEMENTS

We thank the reviewers who provided helpful comments on earlier drafts of the manuscript. This work was supported by the National Key R&D Program of China (Grant No.2023YFC3605800) and the National Natural Science Foundation of China (NSFC, Grant Nos. 62271174 and 62102191).

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest.

    ETHICS STATEMENT

    This article does not contain any studies with human participants or animals performed by any of the authors.

    DATA AVAILABILITY STATEMENT

    No data are available.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.