“The main challenge of solving these problems is exploring the huge combinatorial search space of token sequences. Therefore, the solution lies in focusing the search on the most promising token sequences. ”
– Felipe Leno da Silva
Reinforcement learning (RL) is an important branch of machine learning that enables artificial intelligence (AI) algorithms to make decisions in complex scenarios. RL models learn from the consequences of their actions, identifying rewards associated with positive interactions with an environment. This process is analogous to an athlete learning how other players behave after she attempts to score, then adjusting her positioning to optimize subsequent attempts.
Traditional RL methods learn a decision-making strategy that maximizes expected or average rewards. However, for some problem types, one might be interested in maximizing the best-case rewards, even if the strategy doesn’t perform as well on average—for example, trying to set a new high score at an arcade. An LLNL research team has developed a framework known as deep symbolic optimization (DSO) that adapts RL to learn these best-case rewards. In DSO, the team breaks down task solutions into sequences of discrete “tokens,” or building blocks. A sequence of tokens represents a possible solution to a symbolic optimization problem, and the goal is to find the sequence that optimizes a quality metric (i.e., the reward).
The original application of DSO was on a problem called symbolic regression, which aims to learn concise mathematical expressions to explain a scientific dataset. In symbolic regression, the token sequences represent mathematical expressions, and the reward is based on how well the expression fits the dataset. “Solving symbolic regression efficiently might support the discovery of equations to describe all sorts of natural phenomena for which we can generate experimental data—in addition to being an excellent benchmark problem for various relevant applications that can be modeled in a similar way,” says computational engineer Felipe Leno da Silva. “The main challenge of solving these problems is exploring the huge combinatorial search space of token sequences. Therefore, the solution lies in focusing the search on the most promising token sequences.”
Underlying DSO is a recurrent neural network (RNN) that generates tokens sequentially and learns to recognize promising token sequences. The algorithm’s “risk-seeking” policy evaluates each expression by its rewards, then keeps a top percentage of them for subsequent training iterations, thereby learning to optimize only on the best rewards. Furthermore, users can incorporate domain knowledge by specifying priors and constraints into the training process that, respectively, bias and prune the search space.
Brenden Petersen, who leads the DSO team and LLNL’s Optimization and Control group, explains, “DSO leverages the representational capacity of deep neural networks to navigate complex combinatorial search spaces. That is, it uses large models (with many thousands of parameters) to search the space of small objects (comprised of tens of building blocks).”
The DSO problem-solving method has achieved state-of-the-art performance on symbolic regression when tested against baseline methods including Eureqa, a commercial product considered to be the gold standard for symbolic regression. More recently, a new version of the algorithm won the first-ever worldwide symbolic regression competition, held at the 2022 Genetic and Evolutionary Computation Conference.
These positive results have motivated further algorithmic developments. Leno da Silva states, “We’re exploring a fruitful research avenue in DSO performance by combining it with genetic algorithms and by improving other aspects of the framework, such as leveraging language models to provide better representations to the RNN.”
The major advantage of the DSO framework is its applicability beyond symbolic regression. Petersen points out, “Our framework applies to any discrete sequence optimization problems where the user may want to incorporate some knowledge into the search. We are already applying it to other tasks, such as finding interpretable RL policies or optimizing amino acid sequences.” Other active areas of research using DSO include healthcare decision making, power converter design and optimization, and antibody therapeutics development.
In one application, DSO also enables the learning of symbolic policies, which provide readability and interpretability for RL solutions. In other words, DSO provides RL solutions in a more trustworthy and verifiable form. For example, in a collaboration with the University of Vermont, the DSO team demonstrated an RL pipeline that can aid in treatment discovery for sepsis. Using a sepsis simulation dataset and approaching treatment as a sequential decision-making problem, researchers trained a model to decide which cytokine-mediating drugs to administer, at which doses, and at which clinically relevant time scales.
Through an insight that natural language processing models process sentences in a similar way as DSO processes token sequences, new research led by Leno da Silva leverages language models to learn symbolic optimization solutions more efficiently. The team was able to significantly accelerate learning on a challenging antibody therapeutics development simulation—a result that can support breakthroughs in AI-assisted antibody optimization.
LLNL’s DSO research began during a project funded by the Laboratory Directed Research and Development (LDRD) program. Current research, including exploration of new applications, is supported by a combination of funding from LDRD, the National Institutes of Health, and the Advanced Research Projects Agency-Energy.
In addition to Petersen and Leno da Silva, the DSO team includes Mikel Landajuela, T. Nathan Mundhenk, Jiachen Yang, Chak Lee, Claudio Santiago, Ruben Glatt, Jacob Pettit, Andre Goncalves, Denis Vashchenko, and Dan Faissol, along with former LLNL staff Joanne Kim, Sookyung Kim, and Sam Nguyen. Data Science Summer Institute students Garrett Mulcahy, Haoyu Niu, and Milena Rmus also contributed. The team’s work has been widely recognized at top-tier machine learning conferences and a recent competition:
- SRBench Competition on Interpretable Symbolic Regression for Data Science 2022
- Petersen et al.: 1st place Real-World track
- Adaptive and Learning Agents Workshop (ALA) 2022
- Leno da Silva et al.: Leveraging language models to efficiently learn symbolic optimization solutions
- International Conference on Learning Representations (ICLR) 2021
- International Conference on Machine Learning (ICML) 2021
- Landajuela et al.: Discovering symbolic policies with deep reinforcement learning
- Petersen et al.: Incorporating domain knowledge into neural-guided search via in situ priors and constraints
- Pettit et al.: Learning sparse symbolic policies for sepsis treatment
- Conference on Neural Information Processing Systems (NeurIPS) 2021
- International Joint Conference on Artificial Intelligence (IJCAI) 2020