RGFN: Synthesizable Molecular Generation Using GFlowNets

Introduction

논문에서는 small molecule discovery를 위한 generative model이 search space를 크게 확장할 수 있는 잠재력을 가진다고 설명하고 있다. 그러나 기존 machine learning 기반 small molecule generation 방법들은 생성된 candidate compound의 synthesizability가 낮아 experimental validation이 어렵다는 한계가 있다고 이야기한다.

저자들은 이를 해결하기 위해 chemical reaction의 공간에서 직접 동작하는 Reaction-GFlowNet (RGFN)을 제안한다. 제안된 reaction과 building block set으로 기존 screening library보다 orders of magnitude 큰 search space를 확보할 수 있으며, 동시에 낮은 synthesis cost를 유지할 수 있다고 주장한다. 또한 대규모 fragment library로의 scaling이 가능하다는 점도 보여준다.

Contribution

논문에서 제시하는 주요 contribution은 다음과 같다.

GFlowNet framework를 chemical reaction의 action space에서 동작하도록 확장한 RGFN을 제안한다.
저렴하고 접근 가능한 chemical building block과 high-yield reaction을 curated set으로 정리하여, 생성된 분자의 synthesizability를 out-of-the-box로 보장한다.
Action embedding 기법을 통해 large fragment library에 대한 scalability를 확보한다.
Pretrained proxy model과 GPU-accelerated docking을 포함한 다양한 oracle 환경에서 효과성을 실험적으로 입증하였다.

Method

Generative Flow Networks

논문에서는 GFlowNet을 unnormalized target distribution으로부터 compositional object를 sampling하도록 학습되는 amortized variational inference algorithm이라고 설명하고 있다. GFlowNet은 terminal state set $\mathcal{X}$에서 reward function $R: \mathcal{X} \to \mathbb{R}^+$에 비례하여 object를 sampling하는 것을 목표로 한다.

GFlowNet은 pointed directed acyclic graph (DAG) $G = (S, A)$ 위에서 정의된다. $s \in S$는 state를 의미하고, $a = s \to s’ \in A$는 state $s$에서 action을 적용해 state $s’$로 전이하는 edge를 의미한다.

DAG의 edge와 state 위에 non-negative flow function $F(s \to s’)$, $F(s)$를 정의하며, 모든 terminal state $x \in \mathcal{X}$에 대해 $F(x) = R(x)$가 성립한다. 잘 학습된 GFlowNet은 다음 flow-matching constraint를 만족한다.

\[F(s) = \sum_{(s'' \to s) \in A} F(s'' \to s) = \sum_{(s \to s') \in A} F(s \to s')\]

위 수식에서 $F(s)$는 state $s$를 지나는 전체 flow를 의미하고, 좌변은 $s$로 들어오는 flow의 합, 우변은 $s$에서 나가는 flow의 합을 나타낸다. 이 조건은 trajectory가 $s_0$에서 시작하여 forward policy $P_F$에 따라 sampling되었을 때 terminal state $x$에 도달할 확률이 reward에 비례하도록 보장한다.

Trajectory Balance

논문에서는 trajectory balance loss가 credit assignment를 개선하는 학습 손실로 알려져 있다고 설명한다. Forward policy $P_F$와 함께 backward policy $P_B$, scalar $Z_\theta$를 학습하며, 모든 trajectory $\tau = (s_0 \to s_1 \to \dots \to s_n = x)$에 대해 다음 조건을 만족하도록 한다.

\[Z_\theta \prod_{t=1}^{n} P_F(s_t \mid s_{t-1}) = R(x) \prod_{t=1}^{n} P_B(s_{t-1} \mid s_t)\]

위 수식에서 $Z_\theta$는 partition function의 학습 가능한 estimate를 의미하고, $P_F(s_t \mid s_{t-1})$는 state $s_{t-1}$에서 $s_t$로의 forward transition probability를 의미한다. $P_B(s_{t-1} \mid s_t)$는 backward transition probability를 나타낸다.

Reaction-GFlowNet Generation Process

논문에서는 RGFN이 기본 chemical fragment를 reaction의 chain을 통해 결합함으로써 분자를 생성한다고 설명하고 있다.

Figure 1: Illustration of RGFN sampling process. At the beginning, the RGFN selects an initial molecular building block. In the next two steps, a reaction and a proper reactant are chosen. Then the in silico reaction is simulated with RDKit's RunReactants functionality and one of the resulting molecules is selected. The process is repeated until the stop action is chosen. The obtained molecule is then evaluated using the reward function.

위 figure는 RGFN의 sampling process를 보여준다.

첫 번째 단계에서는 reactant 혹은 surrogate reactant 집합에서 initial building block을 선택한다고 설명한다.
두 번째 단계에서는 reaction template (graph transformation으로 표현된 reaction)을 선택한다.
세 번째 단계에서는 또 다른 reactant를 선택한다.
네 번째 단계에서는 in silico reaction을 수행하고, 결과로 얻어진 분자 중 하나를 선택한다.
다섯 번째 단계에서는 stop action이 선택될 때까지 두 번째에서 네 번째 단계를 반복한다고 이야기한다.

생성이 끝난 분자는 reward function으로 평가된다.

Forward Policy

논문에서는 forward policy $P_F$의 backbone으로 graph transformer model $f$를 사용한다고 설명한다. Graph transformer는 molecular graph $m$을 입력으로 받아 embedding $f(m) \in \mathbb{R}^D$를 출력한다. 또한 reaction $r \in R$에 conditioning할 수 있으며 이는 $f(m, r)$로 표기한다.

초기 building block을 선택할 때 $i$번째 fragment $m_i$를 고를 확률은 다음과 같이 정의한다.

\[p(m_i \mid \varnothing) = \sigma^{\lvert M \rvert}(s)_i, \quad s_i = \mathrm{MLP}_M(f(\varnothing))_i\]

위 수식에서 $\mathrm{MLP}_M : \mathbb{R}^D \to \mathbb{R}^{\lvert M \rvert}$는 multi-layer perceptron을 의미하고, $\sigma^k$는 길이 $k$의 logit vector 위의 softmax function이다. $\varnothing$는 empty graph를 의미한다.

Reaction template을 선택할 때 $i$번째 reaction $r_i$를 고를 확률은 다음과 같이 정의한다.

\[p(r_i \mid m) = \sigma^{\lvert R \rvert+1}(s)_i, \quad s_i = \mathrm{MLP}_R(f(m))_i\]

위 수식에서 $\mathrm{MLP}_R$은 $\lvert R \rvert+1$개의 logit을 출력하는데, 마지막 logit은 stop action에 해당한다. 적용 불가능한 reaction의 score $s_i$는 $-\infty$로 mask한다.

두 번째 reactant $m_i$를 고를 확률은 다음과 같이 정의한다.

\[p(m_i \mid m, r) = \sigma^{\lvert M \rvert}(s)_i, \quad s_i = \mathrm{MLP}_M(f(m, r))_i\]

이 단계에서 $\mathrm{MLP}_M$은 initial fragment selection과 공유되며, reaction $r$과 호환되지 않는 fragment는 filtering된다고 이야기한다.

마지막으로 reaction 결과로 얻어진 분자 집합 $M’$ 중에서 $m’_i$를 선택할 확률은 다음과 같이 정의한다.

\[p(m'_i) = \sigma^{\lvert M' \rvert}(s)_i, \quad s_i = \mathrm{MLP}_{M'}(f(m'_i))\]

위 수식에서 $\mathrm{MLP}_{M’} : \mathbb{R}^D \to \mathbb{R}$는 embedded 분자 $m’_i$를 scoring하는 module을 의미한다.

Backward Policy

논문에서는 backward policy가 분자 $m$이 어떤 reaction $r$의 결과로 얻어지는 state에서만 non-deterministic하다고 설명한다. 분자 $m$을 만들 수 있는 tuple $(r, m’, m’’)$의 집합을 $T$라고 정의하고, $i$번째 tuple을 고를 확률은 다음과 같이 정의한다.

\[p((r_i, m'_i, m''_i) \mid m) = \sigma^{\lvert T \rvert}(s)_i, \quad s_i = \mathrm{MLP}_B(f(m'_i, r_i))\]

위 수식에서 $\mathrm{MLP}_B : \mathbb{R}^D \to \mathbb{R}$이고, $f$는 forward policy와 유사한 backbone transformer model이다. $T$를 적절히 정의하기 위해서 $m$을 얻기까지 수행된 reaction의 개수 $k$를 implicit하게 tracking하며, $m’$을 $k-1$ reaction으로 recursive하게 복원할 수 있는 tuple만 $T$에 포함된다.

Action Embedding

논문에서는 기본 $\mathrm{MLP}_M$이 large building block library 환경에서는 분자 사이의 구조적 유사성을 활용하지 못한다고 지적한다. 어떤 분자 $m_i$가 trajectory에서 선택되었을 때, training signal이 구조적으로 유사한 $m_j$의 선택 확률에도 영향을 주어야 하지만, 기본 $\mathrm{MLP}_M$은 구조적 유사성을 무시한다고 설명한다.

저자들은 이를 해결하기 위해 building block을 simple machine learning model $g$로 embedding하고, building block 선택 확률을 다음과 같이 재정의한다.

\[p(m_i \mid m, r) = \sigma^{\lvert M \rvert}(s)_i, \quad s_i = \phi(W f(m, r))^T g(m_i)\]

위 수식에서 $\phi$는 activation function (논문에서는 GELU 사용)을 의미하고, $W \in \mathbb{R}^{D \times D}$는 학습 가능한 linear layer이다. $g$로는 분자 $m_i$의 MACCS fingerprint와 index $i$를 linear하게 embedding하는 함수를 사용한다.

저자들은 이 접근이 inference 시 추가 비용을 발생시키지 않으며 (embedding $g(m_i)$를 caching할 수 있기 때문), large fragment library에 대한 scaling 성능을 크게 개선한다고 주장한다.

Chemical Language

논문에서는 17개의 reaction과 350개의 building block을 선정하였다고 설명한다. 선정된 reaction에는 amide bond formation, nucleophilic aromatic substitution, Michael addition, isocyanate-based urea synthesis, SuFEx, sulfonyl chloride substitution, alkyne-azide 및 nitrile-azide cycloaddition, esterification, Suzuki-Miyaura, Buchwald-Hartwig, Sonogashira cross-coupling, amide reduction, peptide terminal thiourea cyclization 등이 포함된다.

선정된 reaction들은 일반적으로 robust하며 75~100%의 high yield를 보이기 때문에, sampling된 분자에 대해 신뢰할 수 있는 synthesis pathway를 보장한다고 이야기한다. Reaction은 SMARTS template으로 encoding되며, 총 132개의 서로 다른 SMARTS template을 사용한다.

Building block database 구축 시 affordable reagent (gram당 $200 이하)만 고려하였으며, 평균 cost는 gram당 $22.52, 최저 $0.023, 최고 $190으로 보고한다.

Experiment

Setup

논문에서는 다양한 biological oracle을 사용하여 RGFN의 성능을 평가하였다고 설명한다. 사용된 oracle은 다음과 같다.

sEH proxy: docking score regression에 학습된 MPNN-based pretrained proxy
Senolytic proxy: senolytic 분류에 학습된 GNN-based proxy (GNEprop)
DRD2 proxy: ECFP6 fingerprint와 Gaussian kernel을 사용하는 SVM classifier
GPU-accelerated docking: Vina-GPU 2.1 implementation의 QuickVina 2 docking

Docking target으로는 soluble epoxide hydrolase (sEH), ATP-dependent Clp protease (ClpP), SARS-CoV-2 main protease (Mpro), transducin β-like-related protein 1 (TBLR1)을 사용하였다고 보고한다. 저자들은 docking score를 training loop 안에서 직접 계산함으로써, proxy model의 generalization failure를 피하고 target 선택에 대한 flexibility를 확보할 수 있다고 주장한다.

State Space Size

논문에서는 maximum reaction의 개수에 따른 state space size를 random trajectory 1,000개를 sampling하여 추정하였다고 설명한다. Curated low-cost reactant만 사용해도 maximum 4 reaction에서 Enamine REAL (6.5B compound)보다 orders of magnitude 더 큰 state space를 얻을 수 있다고 보고한다. Fragment 수를 늘리거나 maximum reaction 수를 늘리면 state space size가 더욱 크게 증가한다고 이야기한다.

Comparison with Existing Methods

논문에서는 GraphGA, SyntheMol, casVAE, FGFN, FGFN+SA를 baseline으로 비교한다. 이 중 SyntheMol과 casVAE는 reaction-based 방법이고, 나머지는 synthesizability를 명시적으로 강제하지 않는다.

Figure 3: Distributions of rewards across different tasks.

위 figure는 sEH proxy, senolytics proxy, DRD2 proxy, ClpP docking 네 가지 oracle에서 각 method가 발견한 reward의 distribution을 보여준다. 논문에서는 synthesizability를 강제하지 않는 GraphGA보다 RGFN의 average reward가 다소 낮지만, reaction-based 방법인 SyntheMol과 casVAE보다는 outperform한다고 주장한다. 또한 RGFN은 FGFN과 비슷하거나 더 높은 average reward를 얻으며, 특히 sparse reward의 senolytic discovery에서 FGFN은 high-reward 분자를 거의 발견하지 못하지만 RGFN은 다양한 senolytic candidate를 찾았다고 보고한다. RGFN과 FGFN+SA 사이의 격차는 더욱 커서, FGFN에 synthesizability constraint를 추가하면 high-reward 분자 발견 능력이 떨어진다고 해석한다.

Figure 4: Number of discovered modes as a function of normalized iterations. Log scale used.

위 figure는 normalized iteration에 따른 discovered mode의 수를 보여준다. Mode는 reward가 threshold 이상이며 다른 모든 mode와 Tanimoto similarity 0.5 미만인 분자로 정의되며, Leader algorithm으로 계산된다. 논문에서는 FGFN이 평균 reward는 낮아도 mode discovery에서는 가장 우수하다고 설명하며, 이는 RGFN의 fragment와 reaction 수가 상대적으로 작아 sample diversity가 다소 낮기 때문일 수 있다고 해석한다. 다만 RGFN은 senolytic discovery를 포함한 모든 task에서 GraphGA, SyntheMol, casVAE를 outperform한다고 보고한다.

Table 1: Average values of synthesizability-related metrics for top-k modes.

위 table은 top-k mode의 synthesizability 관련 지표를 보여준다. Molecular weight, QED, SAScore는 top-500 mode에 대해, AiZynthFinder retrosynthesis score는 top-100 mode에 대해 계산되었다. 논문에서는 RGFN이 SyntheMol과 casVAE와 비슷한 수준의 synthesizability score를 보이며, GraphGA와 FGFN을 크게 outperform한다고 주장한다. SAScore를 reward로 추가한 FGFN+SA는 SAScore는 개선되지만 AiZynthFinder score는 크게 개선되지 않아, SAScore만으로는 synthesizability를 보장하기에 부족하다고 해석한다. 추가로 RGFN의 모든 mode는 expert chemist가 직접 검수하여 synthesizable함을 확인하였다고 보고한다.

Scaling to Larger Fragment Libraries

Figure 5: The number of discovered Murcko scaffolds with sEH proxy value above 7 (a) and 8 (b) as a function of fragment library size. We compare standard independent embeddings of fragment selection actions (blue) with our fingerprint-based embeddings (orange) that account for the fragments' chemical structure. The number of scaffolds is reported after 2k training iterations for 3 random seeds (the solid line is the median, while the shaded area spans from minimum to maximum values). We observe that our approach greatly outperforms independent embedding when scaling to a larger action space.

위 figure는 fragment library size에 따라 발견된 Murcko scaffold의 수를 보여준다. 논문에서는 fingerprint-based embedding이 standard independent embedding보다 large action space로 scaling할 때 훨씬 빠르게 convergence하며, 더 많은 high-reward scaffold를 발견한다고 주장한다. 특히 library size가 커질수록 두 방식의 격차가 커진다고 보고한다.

Examination of Generated Ligands

Figure 6: UMAP plot of chemical structures of top-500 modes generated for each target. RGFN generates sufficient chemical diversity to produce distinct clusters of compounds. See Appendix G for description of each target protein.

위 figure는 각 docking target별로 생성된 top-500 mode의 extended-connectivity fingerprint를 UMAP으로 시각화한 결과이다. 논문에서는 각 target에 대응되는 ligand들이 distinct cluster를 형성하여 RGFN이 target별로 충분한 chemical diversity를 가진 분자를 생성한다고 설명한다. 또한 sEH proxy와 sEH docking 결과가 구조적으로 차이를 보이는데, 이는 proxy model이 docking score를 잘 근사하지 못할 가능성을 시사한다고 해석한다.

Figure 7: Top docked RGFN ligands after filtering steps (blue) overlaid with the PDB-derived ligand (purple) for each of sEH, ClpP, and Mpro.

위 figure는 RGFN이 생성한 top docked ligand (파란색)와 PDB-derived ligand (보라색)를 binding pocket에 겹쳐 보여준다. 논문에서는 RGFN이 생성한 분자가 known ligand와 유사한 realistic docking pose를 형성하면서도 구조적으로는 다양하다고 주장한다.

Limitation

논문에서는 다음 한계를 명시적으로 언급한다.

현재 implementation은 17개의 reaction type과 350개의 building block만 사용하기 때문에, 가능한 drug-like space의 일부만 cover한다고 인정한다. Scaling experiment에서 building block 수를 늘리는 것은 가능하지만, 더 큰 reaction diversity가 필요하다고 이야기한다.
현재 사용된 building block과 reaction은 linear하고 flat한 모양의 분자를 생성하는 경향이 있다. Peptide macrocyclization이나 ring-closing metathesis 같은 cyclization reaction, sp3 hybridized atom과 stereochemistry를 도입하는 reaction을 추가하면 shape diversity와 potency가 개선될 것이라고 본다.
RGFN은 reactant에서 product로의 sequence는 제공하지만, 엄밀한 의미의 synthetic route (reaction condition, external reagent, catalyst, protecting group strategy 등)는 명시적으로 생성하지 않는다고 인정한다.
Scoring oracle로 사용된 molecular docking 자체의 한계가 존재한다. Docking score는 molecular weight와 강하게 상관되어 drug-likeness나 optimal MW, ClogP 같은 요건을 반영하지 못하며, binding affinity prediction이 실험값과 약하게 상관되고 target binding site에 크게 의존한다고 보고한다. 저자들은 multi-fidelity framework, ensemble docking, MM-PBSA, FEP, 또는 wet lab과 결합된 active learning loop와의 결합을 향후 방향으로 제시한다.

Conclusion

논문에서는 RGFN이 chemical reaction의 action space에서 동작하는 GFlowNet으로서, curated high-yield reaction과 low-cost building block을 활용하여 기존 screening library보다 orders of magnitude 큰 state space에서 synthesizable 분자를 생성할 수 있다고 결론짓는다. 저자들은 RGFN이 GraphGA에 가까운 reward를 달성하면서도 synthesizability를 크게 개선하며, fragment-based GFlowNet (FGFN)을 outperform한다고 주장한다. 또한 action embedding mechanism이 large building block space로의 scaling을 가능하게 한다고 이야기한다.

논문은 wet lab과 결합된 active learning pipeline에서 RGFN이 부정확한 docking oracle에 대한 의존을 줄이는 대안이 될 수 있으며, 저렴한 fragment와 high-yield reaction을 기반으로 한 ease of synthesis가 high-throughput screening의 유망한 대안이 될 수 있다고 전망한다.

Reference

Koziarski, Michał, et al. “Rgfn: Synthesizable molecular generation using gflownets.” Advances in Neural Information Processing Systems 37 (2024): 46908-46955.