SPARROW: An algorithmic framework for synthetic cost-aware decision making in molecular design

Introduction

논문에서는 small molecule discovery가 design–synthesize–test의 iterative cycle로 진행되며, 각 cycle마다 어떤 후보 분자를 실제로 합성하고 평가할지 결정하는 prioritization이 핵심 challenge라고 설명하고 있다.

저자들은 기존 prioritization 방법이 cost와 utility를 동시에, 그리고 정확하게 다루지 못한다고 지적한다. Generative model은 합성이 어려운 분자를 자주 제안하고, synthetic accessibility score 기반 filter는 각 compound를 개별적으로 평가하기 때문에 batch synthesis에서 공유되는 intermediate와 starting material로 인한 cost 절감을 반영하지 못한다고 이야기한다. Cost-aware Bayesian optimization 역시 각 experiment의 cost를 단일 수치로 가정하므로 batch 수준의 non-additive cost를 표현할 수 없다고 한계를 정리한다.

이를 해결하기 위해 논문에서는 SPARROW (Synthesis Planning And Rewards-based Route Optimization Workflow) 라는 algorithmic decision-making framework를 제안한다. SPARROW는 multi-objective optimization 기반으로 cost와 utility의 balance를 고려하면서 candidate molecule과 hypothetical synthetic route를 동시에 prioritization한다.

저자들은 SPARROW의 novelty가 graph-based optimization의 mathematical formulation에 있다고 강조한다. 이를 통해 simultaneous molecule and synthetic route selection에서 expert intuition으로 처리하던 부분을 정량적으로 모델링할 수 있다고 주장한다.

논문에서는 SPARROW의 기능을 세 가지 case study로 검증한다고 이야기한다.

Information gain과 synthetic cost의 balance 달성
Batch of molecules에서 synthetic cost의 non-additivity 반영
Hundreds of molecules 규모의 candidate library로 확장 가능성 확인

Method

Definition of Candidate Design Space

Figure 1: Overview of SPARROW and its role within the molecular design cycle. Each molecule in a candidate set, comprising molecular ideas from any combination of algorithmic or expert sources, is annotated with its anticipated properties and potential synthetic routes. SPARROW then weighs the utility of every candidate against their synthetic costs, not one-by-one but as a batch, and selects an optimal subset of candidates for synthesis and testing.

논문에서는 SPARROW가 retrosynthetic graph 위에서 optimization을 수행한다고 설명한다. Retrosynthetic graph는 directed bipartite graph로, reaction node와 compound node 두 종류로 구성된다.

Reaction node의 parent는 reactant이고, child는 reaction product이다.
Compound node의 parent는 해당 compound를 생성하는 reaction이고, child는 해당 compound를 소비하는 reaction이다.

Retrosynthetic graph는 json 파일로 저장되며, candidate reaction은 chemist-curated route 또는 retrosynthesis model을 통해 수집한다. Candidate reaction을 미리 valid synthetic route로 정리할 필요는 없고, SPARROW의 optimization constraint가 valid pathway selection을 자동으로 처리한다고 이야기한다.

Optimization을 위해 추가로 필요한 정보는 compound buyability와 cost, likelihood of reaction success, reaction condition 세 가지이다. Buyable compound에는 dummy parent reaction node가 추가되어 구매와 합성의 선택이 동일한 reaction selection 문제로 환원된다. Reaction condition은 ASKCOS context recommender, reaction 성공 확률은 ASKCOS forward predictor model로부터 얻고, compound 가격은 ChemSpace API로 결정한다고 설명한다.

Decision Variables and Constraints

논문에서는 SPARROW가 두 종류의 binary decision variable에 대해 optimization을 수행한다고 이야기한다. $c_j$는 compound node $j$가 선택되었는지 여부, $r_i$는 reaction node $i$가 선택되었는지 여부를 나타낸다.

Figure 2: SPARROW's problem formulation. A nonlinear objective function is defined that maximizes the expected reward per unit cost of selected candidates and routes. We currently simplify this into a tractable objective function that balances utility and cost through a weighted sum. Three constraints are included to ensure that selected compounds have reactions to produce them, selected reactions have reactants to run them, and cycles are forbidden.

Valid synthetic route를 보장하기 위해 세 가지 constraint가 도입된다고 설명한다.

Parent node constraint $c_j \geq r_i \;\forall j \in \mathcal{P}_i, i \in \mathcal{R}$: 어떤 reaction이 선택되면 그 reaction의 모든 parent compound, 즉 reactant도 반드시 선택되어야 한다는 조건이다.
Compound node constraint $\sum_{i \in \mathcal{P}_j} r_i \geq c_j \;\forall j \in \mathcal{C}$: 어떤 compound가 선택되면 그 compound를 생성하는 parent reaction 중 적어도 하나는 선택되어야 한다. Buyable material의 경우 SPARROW가 구매 가치가 있다고 판단하면 dummy parent node가 선택되어, 구매와 합성 중 최적의 선택이 가능하도록 설계되었다고 이야기한다.
Cycle prevention constraint $\sum_{i \in \mathcal{Y}} r_i \leq \text{length}(\mathcal{Y}) - 1 \;\forall \mathcal{Y}$: retrosynthetic graph 내 cycle $\mathcal{Y}$가 통째로 선택되는 것을 금지한다. Buyable compound에서 출발하지 않는 순환 경로를 배제하는 역할이다.

Linear Objective Function

논문에서는 이상적인 problem formulation이 expected information gain per unit cost를 정확히 반영하는 nonlinear objective라고 설명한다.

\[\arg\max_{\mathbf{c}, \mathbf{r}} \frac{\sum_{j \in \mathcal{T}} c_j U_j \prod_{i \in \mathcal{R}_j} L_i}{\text{cost}\left(\{\mathcal{R}_j \,\forall j \in \mathcal{T} : c_j = 1\}\right)}\]

위 수식에서 $\mathcal{T}$는 전체 candidate 분자 집합을 의미하고, $U_j$는 candidate $j$의 reward, $\mathcal{R}_j$는 candidate $j$를 생성하기 위해 선택된 reaction 집합, $L_i$는 reaction $i$가 성공할 확률을 나타낸다. 한 candidate의 expected reward는 reward $U_j$에 합성 성공 확률 $\prod_{i \in \mathcal{R}_j} L_i$를 곱한 형태로 정의된다.

다만 이 nonlinear objective는 decision variable에 대해 비선형적이라 global optimum 보장이 어렵고 계산 자원도 많이 소모된다고 이야기한다. 따라서 SPARROW에서는 scalarized linear objective function을 도입하여 cumulative reward를 최대화하고 synthetic cost와 reaction failure risk를 동시에 최소화한다.

\[\arg\min_{\mathbf{c}, \mathbf{r}} \quad -\lambda_1 \sum_{j \in \mathcal{T}} c_j U_j + \lambda_2 \sum_{i \in \mathcal{S}} D_i r_i + \lambda_3 \sum_{i \in \mathcal{R}} \min\{L_i^{-1}, 20\} r_i\]

위 수식의 첫 번째 term은 선택된 candidate의 cumulative reward를 maximize한다. 이 formulation은 reward가 independent하다고 가정하므로, marginal information gain, molecular diversity, matched molecular pair는 반영되지 않는다고 명시한다. 두 번째 term은 starting material 구매 cost를 minimize하는 역할로, $\mathcal{S}$는 dummy reaction node 집합, $D_i$는 dummy reaction $i$에 대응하는 starting material의 가격이다. Solvent, reagent, catalyst의 cost는 포함되지 않는다고 이야기한다. 세 번째 term은 선택된 reaction의 penalty를 minimize하며, total reaction step 수와 failure risk를 동시에 줄이는 효과가 있다. Reaction cost는 다른 reaction과 independent하고 constant라고 가정한다.

$\boldsymbol{\lambda} = [\lambda_1, \lambda_2, \lambda_3]$는 각 objective의 상대적 중요도를 조정하는 weighting factor이다. 논문에서는 SPARROW가 최종적으로 하나의 solution만 사용하기 때문에 Pareto optimization 대신 scalarized objective를 선택했다고 설명한다. 다양한 weighting factor로 SPARROW를 여러 번 실행하면 Pareto front를 approximation할 수 있다고 이야기한다.

Optimization Solver

논문에서는 linear optimization problem을 PuLP와 open source coin-or branch and cut (CBC) solver로 푼다고 설명한다. Solver의 relative tolerance는 $10^{-7}$, absolute tolerance는 $10^{-9}$로 설정하였다.

Baseline

논문에서는 SPARROW의 성능을 batch synthetic cost의 non-additivity를 고려하지 않는 세 가지 selection strategy와 비교한다고 이야기한다.

모든 baseline에서 각 candidate에 대해 ASKCOS로 retrosynthetic search를 수행하고, output route를 plausibility score로 ranking한 뒤 ChemSpace에서 buyable로 확인된 starting material을 포함하는 route 중 highest score를 선택한다. Selection은 다음 세 가지 metric으로 진행된다.

Reward only
Synthetic Accessibility (SA) score only
Combined score = reward − SA score

각 strategy별로 selected compound 개수를 다양화하여 multiple solution을 얻는다고 설명한다. 다만 baseline은 compound 가격을 2024년 3월에 측정하고 SPARROW는 2023년 10월에 측정하였는데, 가격 차이가 작아서 비교 결과에 유의미한 영향을 미치지 않는다고 이야기한다. Baseline의 전체 time cost는 SPARROW 대비 약 60% 적은데, retrosynthesis 시간은 동일하지만 baseline은 price assignment, condition recommendation, reaction scoring 대상 path가 적기 때문이다.

Results

SPARROW는 candidate target molecule과 synthetic route로 구성된 reaction network를 생성하고, graph-based optimization으로 synthetic cost와 utility의 cumulative balance를 최적화하는 분자와 route를 선택한다고 설명한다.

여기서 utility는 분자 property를 평가하는 가치를 의미한다. 논문에서는 application과 design 단계에 따라 utility의 정의가 달라지며, molecular property prediction 값, prediction uncertainty, structure-property relationship 개선 기여도 등이 활용될 수 있다고 이야기한다. Candidate library는 각 candidate molecule별 reward 값과 함께 SPARROW에 제공되어야 한다.

또한 molecule selection으로 얻는 reward는 그 분자의 synthesis에 필요한 reaction step의 성공에 의존한다. Route 내 어떤 reaction이라도 실패하면 information을 얻을 수 없으므로, 이를 reward와 successful synthesis probability의 곱으로 formulation하여 expected reward maximization 문제로 환원한다고 설명한다.

논문에서는 SPARROW를 세 가지 case study로 실험한다.

Case 1: Balancing Cost and Utility

첫 번째 case는 Garibsingh et al.이 보고한 14개의 ASCT2 inhibitor 후보 분자에 대한 실험이다. 각 분자의 reward는 binding free energy를 0과 1 사이로 linear scaling하여 부여하였다고 설명한다.

Figure 3: Demonstration of SPARROW's ability to balance cost and reward on a 14-member candidate library of putative ASCT2 inhibitors. (A, B) Weighting factors were varied to arrive at Pareto fronts that depict the trade-offs between cumulative reward, starting material cost, number of reactions, and reaction scores. SPARROW identifies solutions with cheaper starting materials and higher reaction scores when compared to baselines. (C) Specific solutions for different weighting factors λ are summarized, with the complete set of selected routes for λ = [8, 1, 1] drawn.

논문에서는 weighting factor $\boldsymbol{\lambda}$를 변경하면서 SPARROW가 distinct Pareto front를 얻는다고 이야기한다. Figure 3C에서 동일한 candidate library에서도 $\boldsymbol{\lambda}$에 따라 선택되는 candidate 수가 3개에서 13개까지 변하는 것을 확인할 수 있다.

Sample route를 살펴보면 SPARROW가 high reward, few-step synthesis, cheap starting material을 모두 우선시하며, 가능한 경우 공통 starting material과 overlapping reaction step을 활용하여 batch synthesis 전체 cost를 절감한다고 설명한다.

Baseline과 비교하면 reaction step 수는 유사하지만, SPARROW가 더 cheap한 starting material과 더 high한 reaction score를 사용하는 route를 선택한다고 보고한다. 이는 model confidence와 successful synthesis likelihood 측면에서 baseline 대비 우위에 있다고 해석한다.

Case 2: Unifying Library-based and De Novo Design

두 번째 case는 Koscher et al.의 autonomous molecular discovery platform에서 제안된 121개의 후보 분자에 대한 실험이다. Candidate들은 absorption wavelength, lipophilicity, photo-oxidative stability를 동시에 최적화하는 graph completion model로 생성되었으며, non-dominated rank를 $U = (14 - \text{rank})/13$로 변환하여 reward를 부여하였다고 설명한다.

Figure 4: Results of SPARROW applied to an autonomous molecular design cycle by Koscher et al. (A, B) SPARROW identifies solutions with higher rewards and lower costs when compared to baselines. (C) Increasing the reward weight λ1 increases both the cumulative reward and the cost of starting materials. (D) Larger λ1 values reduce the relative penalty associated with reaction costs, leading to solutions with more reactions and reactions with lower confidence scores. (E) SPARROW's downselection can be visualized as a network of selected and unselected nodes (λ1 = 5). All examples use λ2 = λ3 = 1.

논문에서는 SPARROW가 모든 baseline 대비 더 적은 reaction 수, 더 높은 reaction score, 더 낮은 starting material cost를 갖는 솔루션을 찾았다고 보고한다. Figure 4C와 4D에서는 $\lambda_1$을 증가시킬수록 cumulative reward와 starting material cost가 동시에 증가하며, reaction 수가 늘고 평균 reaction score가 감소하는 trade-off가 명확하게 나타난다고 설명한다. Figure 4E는 전체 retrosynthetic graph에서 SPARROW가 선택한 candidate, intermediate, dummy node, reaction node를 시각화한 결과를 보여준다.

Figure 5: SPARROW's proposed routes for Case 2 with λ = [3, 1, 1]. SPARROW selects routes with overlapping reaction steps and intermediates that provide utility themselves. The balance of cost and rewards inherently enables SPARROW to simultaneously propose synthesizing some candidates and buying others that are commercially available. Molecules in blue are starting materials, and those in pink are targets.

Figure 5는 $\boldsymbol{\lambda} = [3, 1, 1]$에서의 솔루션을 보여준다. 논문에서는 두 가지 특징을 강조한다.

Candidate set에 buyable compound가 포함되어 있을 때 SPARROW가 일부 candidate를 직접 구매하도록 제안하여, 구매와 합성의 가치를 동시에 weighing하는 기능을 입증한다고 이야기한다.
생성된 route 내에 candidate 분자 자체가 intermediate로 활용되는 경우가 있는데, 이는 의약화학에서 자주 강조되는 “test your intermediates” 전략을 formal하게 구현한 것이라고 해석한다.

또한 모든 candidate가 generative model로 제안된 분자임에도 SPARROW의 route가 library-based와 de novo design 분자를 통합적으로 다룰 수 있음을 보여준다고 주장한다.

Case 3: Optimizing over Large Candidate Sets

세 번째 case는 Button et al.의 reaction rule-based generative model이 제안한 300개의 alectinib analog에 대한 실험이다. Alectinib과의 similarity rank를 기반으로 $U = (17 - \text{rank})/16$로 reward를 부여하였다. Reaction-based model로 생성된 candidate라 synthesizability가 높을 것으로 기대되었으며, ASKCOS는 60초 expansion 안에 300개 중 215개의 candidate에 대한 route를 식별했다고 보고한다. 나머지 85개는 reaction template과 starting material 정의 차이로 인해 route 탐색이 되지 않았다고 이야기한다.

Figure 6: Example set of synthetic routes selected by SPARROW for Case 3 using λ = [30, 1, 5]. Synthetic routes are grouped by shared starting materials. SPARROW illustrates that we may tolerate longer synthetic routes to candidates with higher rewards, demonstrating its ability to balance cost and reward. Common starting materials and commercially available candidates are used where possible.

논문에서는 이 case study가 SPARROW의 scalability를 보여준다고 설명한다. 이전 case study와 마찬가지로 overlapping reaction step과 starting material 재활용이 식별되며, 가장 긴 synthetic route 두 개는 높은 reward를 가진 분자를 합성한다고 이야기한다. 즉, 비싼 합성 cost를 감수할 만한 가치가 있는 candidate에 대해 SPARROW가 더 긴 route를 허용하는 trade-off 균형을 보여준다.

전체 workflow 실행 시간은 약 13시간으로, retrosynthesis planning 5시간, buyability·cost 검색 4시간, condition recommendation과 scoring 4시간이 소요되었다고 보고한다. Linear optimization problem 자체는 PuLP와 CBC solver로 보통 수 초 내에 풀린다. 다만 $\boldsymbol{\lambda}$ 값의 order of magnitude가 매우 다른 edge case에서는 numerical instability로 solver가 수 분에서 수 시간 걸릴 수 있다고 언급한다.

Discussion and Limitation

논문에서는 SPARROW가 molecular design cycle에서 synthesis candidate를 prioritization하는 centralized framework이며, expected information gain은 maximize하고 synthesis cost는 minimize하는 optimization 문제로 정식화하였다고 정리한다. Model-based route와 chemist-defined route를 모두 수용할 수 있고, 다양한 source의 분자를 비교 가능한 centralized framework라는 점이 강조된다.

세 case study를 통해 SPARROW의 다음 기능들이 입증되었다고 이야기한다.

Batch effect를 통한 synthetic cost balance
Library-based와 de novo design 분자의 integration
High cost, impractical structure의 deprioritization
Adjustable weighting factor를 통한 다양한 cost/utility balance solution 제공

저자들은 향후 개선 방향도 명시적으로 제시한다.

현재 linear objective의 assumption을 완화하기 위해 nonlinear objective function 도입을 고려한다고 이야기한다.
Molecular utility independent 가정은 candidate diversity 최적화나 matched molecular pair 발굴에 부적합하다고 인정한다.
Reaction cost가 constant이고 다른 reaction과 independent하다는 가정은 reagent cost와 parallel·high-throughput·automated synthesis 호환성 등 batch 수준의 복잡한 cost interaction을 충분히 반영하지 못한다고 이야기한다.
Synthetic cost를 minimization objective 대신 explicit constraint로 다루는 방식이 well-defined budget을 가진 프로젝트에는 더 적합할 수 있으며, weighting factor tuning 부담도 완화할 수 있다고 제안한다.
Linear optimization의 더 큰 candidate set으로의 scaling 연구가 필요하다고 한다.
Enumerative molecular design, retrosynthetic modeling, reaction success prediction, property prediction과 uncertainty quantification 발전과 연계되어야 효과가 극대화된다고 전망한다.

Summary

SPARROW는 molecular design cycle에서의 batch-level prioritization 문제를 retrosynthetic graph 위의 linear optimization으로 formal화한 framework이다. 논문에서는 cumulative reward, starting material cost, reaction failure risk를 동시에 고려하는 scalarized objective와 세 가지 graph constraint를 통해, batch synthesis의 non-additive cost를 자연스럽게 반영할 수 있다고 주장한다.

세 case study에서 SPARROW는 reward-only, SA score-only, combined score baseline보다 일관되게 낮은 cost, 적은 reaction 수, 높은 reaction confidence를 갖는 솔루션을 제공했다. Library-based 후보와 de novo 후보, 그리고 구매 가능한 building block을 동일한 framework 안에서 통합적으로 다룰 수 있다는 점이 강조된다.

다만 utility independence, reaction cost independence, weighting factor tuning 등 남아 있는 한계가 명시적으로 제시되며, nonlinear formulation 도입과 reaction network modeling의 발전을 통해 보완될 수 있다고 논문에서는 전망하고 있다.

References

Fromer, Jenna C., and Connor W. Coley. “An algorithmic framework for synthetic cost-aware decision making in molecular design.” Nature Computational Science 4.6 (2024): 440-450.