IBM Research and Science of Synthesis/Thieme Chemistry
Collaborate to accelerate discovery in organic chemistry
Stuttgart – In 2018 IBM launched the RXN for Chemistry cloud platform to help synthetic organic chemists in predicting the outcome of chemical reactions using an artificial intelligence (AI) model, called Molecular Transformer. Today the platform has executed nearly 4 million predictions from 26,000 users with more than 100 published papers from different groups around the world using the same underlying technology.
A prerequisite for achieving optimal prediction results is high-quality datasets. Until recently, RXN for Chemistry was trained on 2.5 million records taken from patents and textbook-like chemical reactions. The baseline models achieved more than 90 percent accuracy when tested on the representative chemistry of the training data, but to become a standard tool of the discovery process the IBM team knew it would need to improve its range to additional reaction classes.
Earlier this year IBM Research and Thieme Chemistry incorporated expert synthesis data from Thieme's curated digital publication source on organic chemistry – Science of Synthesis – into RXN for Chemistry and initial results show that Thieme-trained models predict correct reactions twice as often as baseline models when tested on Science of Synthesis chemistry.
Initial results of forward reaction prediction on two reactions from Science of Synthesis.
"The challenge for organic chemists is that there are hundreds of thousands of possible reactions of organic compounds. To address this, we used natural language processing models for all RXN prediction tasks. The RXN models have no built-in chemistry and are not based on codified rules. Every chemical prediction is based on the knowledge learned from the data during training. With AI, cloud and automation, today we can accelerate discovery in organic chemistry by a factor of ten," says Dr. Teodoro Laino, Distinguished Scientist at IBM Research Europe.
RXN for Chemistry applies neural machine translation models to predict the outcome of a chemical reaction, much in the same way an AI model translates texts from one language to another. The goal of this AI platform is to make reliable predictions of reaction results and thus, improve synthesis planning in organic chemistry.
Driving technical innovation with, high-quality, diverse, and well-structured data
"Tools for translating from one language to another are only as good as the data on which the algorithms are trained," says Dr. Alain Vaucher, Research Scientist at IBM. "Our assumption is that this is also true for predicting chemical synthesis results: the results depend very much on the underlying data."
"We are pleased to be directly involved in this innovative project, which is of high importance for the chemistry community," says Dr. M. Fiona Shortt de Hernandez, Senior Director Product Management, Strategic Partnerships and Science of Synthesis at Thieme Chemistry. "Six highly-renowned organic synthesis experts and their groups have agreed to test the retrained models. Together this collaboration will help drive the development of state-of-the-art custom-fit tools for organic chemists," Shortt affirms.
"Integrating the high-quality, curated data from Science of Synthesis, will give us a unique opportunity to take the performance of RXN for chemistry to an unprecedented level. I am very excited to share these preliminary results and curious to see how they will lead in the next months to an improved AI experience for synthetic organic chemists. The collaboration with Thieme is an important landmark between AI solution providers and domain specific Data publishers, with important business opportunities for both," says Laino.
Would you be interested in using IBM RXN for Chemistry, trained on Science of Synthesis, as a cloud service if it should become available later? Please contact ibmrxn@thieme-chemistry.com.
IBM RXN for Chemistry is available for free at: https://rxn.res.ibm.com