Mothra: Multi-objective de novo Molecular Generation Using Monte Carlo Tree Search

Introduction

신약 개발의 도전 과제

논문에서는 신약 개발 과정이 임상 사용 승인까지 12–15년이 소요되고, 평균 $2.6 billion의 비용이 필요한 길고 비싼 과정이라고 설명하고 있다.

신약 개발 과정은 target validation, compound screening, lead optimization, preclinical testing, phase 1, 2, 3, launch 승인 순서로 진행된다.

저자들은 preclinical candidate와 clinical trial 단계에서 90% 이상의 실패율을 보이며, 이 실패의 90%는 clinical efficacy, toxicity, drug-like properties 부족에서 기인한다고 이야기한다.

따라서 drug candidate는 ADMET (absorption, distribution, metabolism, excretion, toxicity) profile과 target protein에 대한 binding affinity 등 다수의 요구사항을 동시에 충족해야 한다고 설명하고 있다.

기존 high-throughput screening (HTS) 방법은 기존 compound library에 의존하기 때문에 약 $10^{60}$개로 추정되는 vast chemical space 전반을 covering하기 어렵다는 한계가 있다고 이야기한다.

이러한 한계를 극복하기 위해 deep learning 기반 molecular generative model이 등장하였다. VAE-based, GAN-based, genetic algorithm-based, reinforcement learning-based 방법들이 제안되어 왔으나, 대부분 single-objective optimization으로 정의되어 있다고 지적한다.

Multi-Objective Optimization의 필요성

실제 drug discovery에서는 drug efficacy, safety, production cost 등 여러 평가 지표를 동시에 optimization해야 한다고 설명하고 있다.

기존 연구에서는 multi-objective optimization 문제를 linear summation이나 multiplier-adjusted multiplication을 통해 desirability score (DScore) 단일 objective로 변환하는 방식을 사용하였다.

저자들은 이러한 접근에 다음과 같은 한계가 있다고 이야기한다.

서로 다른 evaluation index 사이의 weight 조정 문제가 발생한다. 예를 들어 QED score “0.5”와 docking score “0.5”는 서로 다른 의미를 가지므로, 단순 weighted sum은 적절하지 않다고 설명한다.
Trade-off 관계에 있는 objective들을 단순 합산이나 곱셈으로 maximization 또는 minimization 할 수 없다.

이를 해결하기 위해 논문에서는 Pareto frontier를 활용한 multi-objective optimization (MOO) 접근법을 제안한다.

Pareto front는 모든 objective function이 optimal한 solution의 set으로, front 위의 한 solution에서 어떤 objective를 개선하면 반드시 다른 objective가 악화된다는 성질을 가진다고 설명하고 있다.

Mothra의 제안

논문에서는 SMILES 기반의 Pareto multi-objective MCTS를 활용한 de novo molecular generation model인 Mothra를 제안한다.

Evaluation function으로 다음 세 가지 objective를 설정하였다.

Docking score (SBMolGen 기반)
QED score
Estimated toxicity probability (eToxPred 기반)

추가로 SAscore는 합성 난이도가 높은 molecule을 filtering하기 위한 threshold로 사용한다.

저자들은 MCTS의 simulation step에서 생성된 molecule 정보가 search tree로 feedback되므로, structurally 유사한 molecule을 탐색하기에 적합하다고 주장한다.

Method

Mothra Overview

논문에서는 Mothra가 Pareto Monte Carlo Tree Search 기반의 molecular optimizer라고 설명하고 있다.

Structure generator로는 ChemTS와 MERMAID에서 사용된 RNN-based generator를 활용하고, exploration system으로는 multi-objective MCTS (MOMCTS) 기반 시스템을 사용한다.

DScore 계산 없이 exploration 방향을 조절하기 위해 Pareto optimization을 도입하였고, NSGA-II 기반 Pygmo 라이브러리로 molecule이 Pareto front에 속하는지를 평가한다.

Figure 1. Workflow of Mothra. Subfigures located near the step diagram show the contents of each step. A node in a search tree corresponds to a SMILES character. This workflow consists of four steps: selection, expansion, simulation, and backpropagation. (I) Selection step: Choose a leaf node considering the current Pareto front. (II) Expansion step: Add a child node to a selected node. (III) Simulation step: Complete the substrings of molecules and evaluate their rewards. In addition, update the Pareto front. (IV) Backpropagation step: Feedback rewards to the nodes on the path.

Figure 1에서 MCTS의 네 가지 단계가 SMILES 기반 molecular generation에 어떻게 적용되는지를 보여준다. 각 node는 SMILES vocabulary의 한 character에 대응하며, root node에서 leaf node까지의 path는 부분적인 SMILES string을 형성한다.

MCTS의 네 단계

논문에서는 MCTS의 네 단계를 Mothra에 맞게 다음과 같이 정의하고 있다.

Selection: Search tree에서 각 node는 SMILES vocabulary의 한 character (element 또는 structure)를 포함한다. Pareto front를 고려하여 tree policy에 따라 expandable node를 선택한다. Root node에서 선택된 leaf node까지의 path는 현재 탐색 중인 SMILES string의 substring을 나타낸다.
Expansion: 선택된 node의 child node로 새로운 node 하나를 추가한다. 추가된 node는 SMILES grammar에 따라 substring의 끝에 새로운 character를 덧붙이는 역할을 한다.
Simulation: Pretrained RNN이 default policy로 작동하여 molecule을 완성한다. 완성된 SMILES string이 valid molecule인지 검증하고, reward vector를 계산하여 Pareto front의 vector들과 비교한다.
Backpropagation: 계산된 reward vector를 selection 단계에서 사용한 path의 모든 parent node로 전파한다.

기존 MCTS와 Mothra의 주요 차이점은 다음 두 가지라고 이야기한다.

Backpropagation에 사용되는 reward가 scalar value가 아닌 multi-dimensional vector이다.
Pareto front engine이 simulation 단계에 적용되어, selection 단계에서 계산된 Pareto front를 활용해 leaf node를 선택한다.

Pareto Front

논문에서는 Mothra가 MOO를 채택하고 있으며, Pareto dominance를 통해 reward space 내 두 solution point 간의 관계를 정의한다고 설명하고 있다.

두 reward vector $r_x = \{r_{x1}, r_{x2}, \dots, r_{xd}\}$와 $r_y = \{r_{y1}, r_{y2}, \dots, r_{yd}\}$에 대해, 모든 $i = 1, \dots, d$에 대해 $r_{xi} \geq r_{yi}$이면 point $x$가 $y$에 대해 Pareto dominant라고 정의한다.

어떤 point도 다른 point에 의해 dominate되지 않을 때 이를 non-dominated point라고 부르며, non-dominated point들의 집합을 Pareto front로 정의한다.

\[P_A = \{r \in A : \nexists\, r' \in A \;\; s.t. \;\; r' \succ r\}\]

위 수식에서 $A$는 reward vector들의 집합을 의미하고, $P_A$는 $A$에서의 Pareto front를 나타낸다. $r' \succ r$은 $r'$이 $r$에 대해 Pareto dominant함을 의미한다.

Hyper-volume Indicator

Pareto front 내의 point들에는 본질적인 우선순위가 존재하지 않기 때문에, 논문에서는 hyper-volume indicator를 도입해 ordering을 부여한다고 설명한다.

\[HV(A; z) = \mu(\{x \in \mathbb{R}^d : \exists\, r \in A \;\; s.t. \;\; r \succeq x \succeq z\})\]

위 수식에서 $z$는 reference point를 의미하고, $\mu$는 $\mathbb{R}^d$ 상의 Lebesgue measure를 나타낸다.

Figure 2. Hyper-volume in a two-dimensional reward space, where the reference point z is the original point, and the purple part is the hypervolume. (a) The X marks represent points belonging to the Pareto front. Black indicates points that contribute to expanding the hyper-volume, and red indicates points that do not. (b) The perspective projection on the Pareto front.

Figure 2(a)에서 검은색 X는 hyper-volume 확장에 기여하는 Pareto front point를 의미하고, 빨간색 X는 기여하지 않는 point를 의미한다. (b)는 Pareto front 위로의 perspective projection을 보여준다.

MOMCTS에서는 hyper-volume 확장에 기여하지 않는 node에 대해 해당 point와 projected point 사이의 distance에 비례하는 penalty를 부여한다.

Pareto MCTS

논문에서는 multi-objective search를 위해 Pareto MOMCTS를 사용하며, 기존 MCTS와 다음 두 가지 측면에서 차이가 있다고 이야기한다.

Multi-dimensional reward vector 사용
Pareto front 계산 (NSGA-II 기반 pygmo 활용)

Pareto front 내 node들을 정렬하기 위해 Pareto rank를 사용할 수도 있으나, 모든 node를 유지해야 하므로 computation cost가 높다는 한계가 있다고 설명한다.

따라서 Mothra는 hyper-volume indicator와 projected distance penalty를 결합한 방식을 채택하였다.

Cumulative reward는 다음과 같이 정의된다.

\[r_s \leftarrow \frac{1}{n_s + 1}(n_s \times r_s + r_u)\]

위 수식에서 $r_u$는 새로운 evaluation의 reward를 의미하고, $n_s$는 state $s$의 방문 횟수를 나타낸다.

Upper Confidence Bound (UCB) $\overline{r}_s$는 exploitation과 exploration 사이의 balance를 조절하기 위해 다음과 같이 정의된다.

\[\overline{r}_s = \sum_{i=1}^{d} \left( r_{s;i} + \sqrt{c_i \ln(n_{\text{parent}}) / n_s} \right)\]

위 수식에서 $c_i$는 reward vector의 $i$번째 component에 대한 exploration vs exploitation parameter를 의미한다.

Hyper-volume indicator를 활용한 upper bound $U(s)$는 다음과 같다.

\[U(s) = V(\overline{r}_s) = HV(P \cup \{\overline{r}_s\}; z)\]

위 수식에서 $P$는 Pareto front를 의미하고, $z$는 hyper-volume indicator의 reference point를 나타낸다. $U(s)$는 node $s$의 scalar evaluation을 제공하지만, Pareto front 내의 point가 $\overline{r}_s$를 dominate하면 constant 값으로 유지되는 한계가 있다고 설명한다.

이를 보완하기 위해 projected distance penalty를 추가한 $W(s)$를 사용한다.

\[W(s) = U(s) - |\overline{r}_s^p - \overline{r}_s|_2\]

위 수식에서 $\overline{r}_s^p$는 $\overline{r}_s$를 Pareto front upper bound로 projection한 point를 의미한다.

Mothra Algorithm

Algorithm 1은 Mothra의 전체 search 절차를 보여준다. 각 단계를 다음과 같이 해석할 수 있다.

초기화 단계에서는 search tree $T_0$을 root node $v_0$만 포함한 상태로 시작하고, Pareto front $P$를 empty set으로 초기화한다고 설명하고 있다.
Main loop의 첫 두 줄 (TreePolicy, DefaultPolicy)에서는 selection-expansion 후 simulation을 수행하여 새로운 reward $r_u$를 얻는 과정이라고 볼 수 있다.
if block에서는 새로 얻은 reward $r_u$가 현재 Pareto front 내의 어떤 point에도 dominate되지 않으면, $r_u$가 dominate하는 기존 point를 제거하고 $r_u$를 Pareto front에 추가한다고 설명한다.
내부 while loop은 backpropagation 단계에 해당하며, cumulative reward 업데이트와 방문 횟수 증가, parent node로의 이동을 수행한다.
TreePolicy는 internal node에서 selection을 수행하는 함수로, child node 중 $W(v’)$이 가장 큰 node를 argmax로 선택하여 expansion할 leaf node를 결정한다고 이야기한다.
DefaultPolicy는 simulation 단계에서 사용되는 policy로, 현재 path에서 추출한 SMILES fragment $S$에 RNN이 새로운 character를 생성하여 추가하는 방식으로 valid molecule을 완성하고, 그에 대한 reward를 반환한다.

RNN Training

논문에서는 de novo molecular structure generator로 ChemTS의 RNN을 활용한다고 설명하고 있다.

RNN 구조는 다음과 같다.

81-dimensional embedding layer
2개의 256-dimensional GRU layer
Activation function: hyperbolic tangent (tanh)

RNN은 ligand search 과정 동안 pretrained 상태로 고정되어 사용된다.

Training hyperparameter는 Adam optimizer, learning rate 0.01, batch size 256, 100 epoch로 설정하였다.

Dataset

RNN training에는 ZINC database에서 무작위로 추출한 약 250,000개의 ligand-like molecule을 SMILES 형식으로 사용하였다. ChemTS와 동일한 dataset이라고 이야기한다.

Ligand generation의 target protein 3D structure는 Protein Data Bank (PDB)에서 획득하였으며, DDR1 kinase (PDB ID: 3ZOS)를 target으로 사용한다.

Table 1에서 사용된 SMILES vocabulary를 정리하고 있다. &는 start symbol, \n은 end symbol을 의미한다.

Objective Functions

논문에서는 docking score, QED, toxicity probability를 reward function으로 설정하였다.

Docking Score Reward

Docking score는 molecule과 target protein 간의 binding energy를 평가한다. Binding energy가 낮을수록 binding이 강하며, reward에 사용할 때는 reward가 클수록 좋도록 변환한다.

\[r_{\text{docking}} = -\frac{(\text{DS}(S) - \text{DS}_{\text{BASELINE}}) \times 0.1}{1 + |(\text{DS}(S) - \text{DS}_{\text{BASELINE}}) \times 0.1|}\]

위 수식에서 $\text{DS}(S)$는 현재 docking score를 의미하고, $\text{DS}_{\text{BASELINE}}$은 protein별 base score (본 연구에서는 0)를 나타낸다. SBMolGen에서 영감을 받은 monotone-increasing function이라고 설명하고 있다.

QED Reward

QED는 generated molecule의 drug-likeness를 0과 1 사이로 평가한다. 값이 클수록 drug-like molecule로 간주된다.

\[r_{\text{QED}} = \text{QED}(S)\]

Toxicity Reward

eToxPred system은 generated molecule의 toxicity probability를 0과 1 사이로 추정한다. 값이 클수록 독성이 높다.

\[r_{\text{tox}} = 1 - P_{\text{etoxpred}}(S)\]

위 수식에서 $P_{\text{etoxpred}}(S)$는 eToxPred로 추정된 toxicity probability를 의미한다.

SAscore Filtering

SAscore는 1에서 10 범위에서 synthesis accessibility를 평가하며, 값이 클수록 합성이 어렵다. 본 연구에서는 SAscore를 final evaluation이 아닌 threshold 기반 filter로 사용하였으며, threshold는 3.5로 설정하였다.

Docking Simulation

Binding affinity 평가를 위해 AutoDock Vina를 사용하였다. PDB에서 protein structure를 획득한 후 AutoDock tool로 hydrogen atom을 추가하고, target protein의 binding pocket은 PDB에 등록된 ligand의 중심을 기준으로 ligand 전체를 덮는 rectangular prism으로 정의하였다.

생성된 compound 수만큼 docking simulation을 수행해야 하므로, AutoDock Vina의 exhaustiveness 옵션은 1로 설정하였다.

3D conformer 변환에는 Open Babel을 사용하였으며, 각 ligand에 대해 lowest energy conformer 또는 isomer를 선택한다.

Evaluation Metrics

논문에서는 molecular generative model 평가를 위해 다음 metric을 계산하였다.

Duplication Ratio: 생성된 molecule 중 SMILES 표현이 달라도 동일한 molecule인 비율
Novelty: Training dataset에 존재하지 않는 valid하고 unique한 molecule의 비율
Internal Diversity: 생성된 molecule 간 Tanimoto similarity의 평균
Uniqueness: 전체 generated molecule 중 valid하고 unique한 molecule의 비율

Results

실험 환경

논문에서는 DDR1 kinase (PDB ID: 3ZOS)를 target protein으로 설정하여 Mothra의 multi-objective molecular generation 성능을 검증하였다.

DDR1 kinase의 cocrystal ligand인 Potatinib의 docking score는 -9.4 kcal/mol로 보고되어 있다.

Main search는 14일간 수행되었으며, Intel Xeon E5–2680 V4 processor 2개와 NVIDIA Tesla P100 GPU 4개로 구성된 환경에서 진행하였다.

Generated Molecule Distribution

Figure 3. Population of molecules produced for each metric for compound generation by targeting DDR1 kinase (PDBID: 3ZOS). Figure (a), (b), and (c) show the population of molecules in Docking score, QED, and toxicity probability, respectively.

Figure 3은 각 objective function에 대한 generated molecule의 distribution을 보여준다. Docking score axis는 raw value이므로 낮을수록 binding이 강함을 의미한다.

저자들은 각 axis 상의 distribution이 single-peak에 양쪽으로 넓은 base를 가지는 형태이므로, Mothra가 broad search를 달성하였다고 해석한다.

Pareto Front Analysis

Figure 4. Scatter plot of Pareto front. Figures (b),(c), and (d) draw the relevance between the docking score and QED, QED and the toxicity probability, and toxicity probability and the docking score, respectively. Green crosses correspond to known molecules binding to the target protein registered in the ChEMBL database. The colors of dots correspond to timesteps. Pareto fronts were calculated when 100, 500, 1000, and 2664 molecules were generated.

Figure 4(a)는 DDR1 kinase에 대해 최적화된 compound의 docking score, QED, toxicity probability를 3차원 공간에 표시한다.

Figure 4(b)–(d)는 3차원 point cloud를 2차원 평면에 projection하여 두 objective function 간의 trade-off 관계를 시각화한다.

논문에서는 시간이 지남에 따라 hyper-volume이 증가하는 방향으로 search가 진행되어 더 좋은 molecule이 생성되었다고 설명하고 있으며, 일부 generated molecule은 ChEMBL database에 등록된 known ligand보다 더 좋은 성능을 보였다고 보고한다.

다만, Pareto front 위에서도 한 index에서는 높은 값을, 다른 index에서는 낮은 값을 보이는 weak Pareto optimal solution이 포함되며, 이러한 solution은 생성 후 filtering으로 제거 가능하다고 이야기한다.

Generated Molecules and Docking Poses

Figure 5. Generated molecules with DDR1 kinase and their docking poses. Figure (a) shows the first generated molecule and Figure (b) to (d) show molecules in the last Pareto front with DDR1 kinase and their docking poses. The list [A, B, C] shows the evaluation score on each caption. A corresponds to the docking score [kcal/mol], B to the QED score, and C to the toxicity probability.

Figure 5에서 Mothra가 생성한 molecular structure와 docking pose가 제시된다. 저자들은 Mothra가 optimal direction으로 정확하게 molecule을 생성하였으며, target protein의 binding pocket에 적절히 결합한다고 해석한다.

특히 Figure 5(b)와 (c)를 비교하면 docking score와 QED는 개선되었으나 toxicity probability는 유지되었으며, 이는 Mothra가 objective function들 간의 trade-off 관계를 효과적으로 포착하였음을 보여준다고 설명한다.

Quantitative Metrics

Table 2. Metrics for Assessing the Molecular Generative Models

Table 2에서 기존 multi-objective molecular generative model과의 비교 결과를 제시한다.

Mothra의 duplication ratio는 0.054 ± 0.0041, novelty는 1.0 ± 0.0으로 보고되었다. Internal diversity는 0.886 ± 0.00142로 기존 방법 (MOO-DENOVO 0.733, DeLA-DrugSelf 0.84) 대비 가장 높았으며, 저자들은 Mothra가 다양한 molecule을 생성하였다고 해석한다.

ChemTSv2와의 비교

Figure 6. Relevance of the docking score with EGFR (target) protein and QED or toxicity index. Red points show the top 10 molecules in terms of DScore. Blue points show the other molecules.

EGFR을 target protein으로 설정하고, ERBB2 등 여러 low-affinity protein을 함께 고려한 실험을 진행하였다. 추가 objective로 solubility, permeability, metabolic stability, SAscore, QED를 maximize 대상으로, toxicity를 minimize 대상으로 설정하였다.

논문에서는 ChemTSv2가 estimated Pareto front에서 벗어난 molecule을 제안하며, Pareto front를 정확히 포착하지 못하고 DScore reward에 치우친 molecule을 생성한다고 지적한다. 이러한 양상은 trade-off 관계를 숨기게 되어 결과 해석에 오해를 줄 수 있다고 이야기하며, Mothra가 superior method라고 주장한다.

Discussion

MOO vs SOO

Figure 7. Distribution of compounds generated by single-objective optimization. The docking score toward DDR1 kinase was used as the objective function.

Figure 8. Distribution of compounds generated by MOO. The docking score toward DDR1 kinase, QED, and toxicity probability were used as the objective function.

MOO의 효과를 검증하기 위해 single-objective optimization (SOO)와 비교 실험을 수행하였다. SOO에서는 affinity와 constant function을 objective로, MOO에서는 affinity, QED, toxicity probability를 objective로 설정하였다.

Figure 7에서는 SOO 결과로 affinity만 개선되고 QED, toxicity probability는 낮은 값에 머무는 것을 보여준다.

Figure 8에서는 MOO 결과로 모든 index에서 양호한 compound가 생성되는 것을 보여준다.

논문에서는 MCTS의 특성상 local minimum을 깊게 파고드는 경향이 있어, SOO에서는 small ligand에서 binding affinity reward를 얻기 어렵기 때문에 다른 leaf가 선택되지 않을 수 있다고 설명한다.

반면 MOO에서는 매 search step마다 다른 objective를 고려할 수 있어, drug discovery에서 desired property를 가진 compound를 더 효과적으로 생성할 수 있다고 주장한다.

Number of Objective Functions

이론적으로 MOO에서 사용 가능한 objective function 수에는 제한이 없으나, 너무 많으면 vast chemical space를 효율적으로 탐색하기 어렵다고 설명한다.

4개 이상의 objective를 다루는 문제는 complexity로 인해 “many-objective” problem으로 불리며, 이를 피하기 위해 generation system 이후에 threshold 기반 property 제한을 적용한다고 이야기한다.

또한 evaluation function 간의 correlation도 고려해야 하며, compound space의 각 axis는 independent해야 한다고 강조한다. ChEMBL을 population으로 사용한 QED와 toxicity 간의 correlation coefficient가 0.24였기 때문에 이 두 evaluation function이 채택되었다고 설명하고 있다.

Chemical Space와 Generation Distribution 차이

Figure 9. Distribution of compounds. The black and green hills indicate the distributions of ZINC and all generated molecules, respectively. The red dots indicate the molecules belonging to the Pareto front. The Docking Score is normalized (larger is better).

Figure 9는 ZINC distribution (black)과 generated molecule distribution (green), 그리고 Pareto front에 속하는 molecule (red)을 overlay한 결과를 보여준다.

저자들은 Mothra의 generator가 ZINC distribution에서 학습되었음에도, 생성된 molecule이 ZINC와는 다른 distribution을 형성한다고 해석한다.

Mothra는 single root node에서 시작하므로 generated molecule은 모두 RNN과 MOMCTS를 통해 발견되며, MOMCTS가 generated molecule 정보를 활용하여 general chemical space에서 desired molecule을 탐색한다고 설명한다.

RNN Generator 한계

논문에서는 RNN generator가 SMILES grammar만 학습하기 때문에 compound의 전체 구조를 학습하지 못하는 한계가 있다고 이야기한다.

Chiral carbon 처리에서는 absolute configuration이 명시적으로 생성되는 경우와 그렇지 않은 경우가 모두 관찰되었고, 의미 없는 위치에 @가 추가되는 경우도 보고되었다.

대안으로 transformer 기반 model을 고려할 수 있으나, transformer는 string representation에서 chirality를 인식하는 데 어려움이 있다고 알려져 있으므로, generator 개선은 future work로 남겨진다고 설명한다.

Limitation

논문에서 제시한 한계점은 다음과 같다.

MOO에서 evaluation function 수에 이론적 제한은 없으나, 본 연구처럼 3개 objective를 사용할 때도 Pareto front에 50개 이상의 compound가 속할 수 있다고 보고된다. 이러한 Pareto frontier 증가는 solution space를 확장시켜 search 시간이 길어지는 문제를 야기한다.
Evaluation function 수를 줄이면 Pareto front에 속하는 compound 수는 줄지만, drug discovery에서 고려해야 할 중요한 평가 항목을 반영하지 못할 위험이 있다.
각 axis는 independent해야 하므로 evaluation function 간 correlation coefficient의 절댓값이 0에 가까워야 한다는 제약이 존재한다.
RNN generator는 chirality 처리에 한계가 있고, transformer 대안 역시 string representation에서 chirality 인식이 어렵다고 알려져 있어, generator architecture 개선이 향후 과제로 남는다.

Conclusion

논문에서는 Pareto optimization 기반 multi-objective molecular generation system인 Mothra를 제안한다.

기존 multi-objective molecular generative model이 linear combination 방식에 머물러 Pareto optimization을 다루지 못했던 점과 달리, Mothra는 reward space의 Pareto frontier를 효과적으로 포착한다고 주장한다.

SOO 대비 MOO 환경에서 desired direction으로 molecule을 생성하였으며, drug discovery 초기 단계에서 target protein 구조 정보를 바탕으로 seed compound 생성에 활용할 수 있다고 결론짓는다.

실용성을 높이기 위해서는 사용자가 자신의 필요에 맞게 objective function을 유연하게 정의할 수 있도록 하는 것이 바람직하다고 제언하고 있다.

Reference

Suzuki, Takamasa, et al. “Mothra: Multiobjective de novo molecular generation using monte carlo tree search.” Journal of Chemical Information and Modeling 64.19 (2024): 7291-7302.