Blog
-
March 12, 2024

Bioptimus: Biology’s Swiss Army Knife

Foundation models (FM), such as large language models (LLMs), have been the catalyst for recent AI breakthroughs across natural language processing (NLP) and computer vision, yielding unprecedented performance in a wide range of applications.

They quickly caught the attention of the general public with the release of ChatGPT, where FMs demonstrated an impressive ability to extract deep concepts from raw data. To some academics, they even exhibit some sparks or artificial general intelligence (AGI). 

Now, imagine the FM for biology

Life is a remarkable biological organization system – an astonishing journey from the microscopic to the macroscopic. It emerges from atoms coming together to form molecules, which collaborate to construct the building blocks of cells; these cells, in turn, harmonize their functions to create tissues with specialized roles, creating the foundation for organs and organ systems.

Our mission is to build the reference multi-scale foundation model of biology, creating a universal computational model to capture the complexity of biology across scales in order to unlock new AI capabilities to propel advancements in fundamental science and innovations in biomedicine and biotechnology.

Foundation models are the future of biology

Biology and biomedical sciences are fields that are ripe for disruption. Technological advances such as DNA sequencing, proteomics and medical imaging have enabled the generation of massive datasets describing biological systems at various scales, from molecules to entire organisms. 

From this, the scientific community has started to explore FMs in biology, with very promising early successes, including predicting the structure and functions of proteins. However, current FMs proposed for biological data remain limited in size and complexity. They are far from maxing out the beneficial effect of scaling, as observed for natural language or images. Current models are also limited to a single biological scale, and little effort yet has attempted to capture the full multi-scale complexity of biology.

Therefore, there is immense value in being a “pure player” in FMs, especially for biology, due to the expertise and cost that are specific to training very large and multiscale models. We’ve seen this with the progress of fundamental machine learning in the last years and industry readiness to leverage “pure play” FMs to accelerate research.

Bioptimus: biology’s Swiss Army Knife

Enabling the scientific community to gain a holistic understanding of biology across scales is critical to unravel disease biology, discover new drugs and develop better diagnostic tools. At Bioptimus, we believe these FMs have the ability to capture how different scales of biology regulate and interact with each other, coming together to create biology’s ‘Swiss Army knife’.

For any scientist or organization studying biological systems, FMs like these will help to more efficiently map experimental and biomedical data at any scale and from any modality into a coherent computational representation. In turn, this representation will power numerous downstream applications such as predicting the evolution of a disease, or the response of a patient to an existing or new candidate treatment.

To make Bioptimus a reality, we have brought together a unique group of top engineers and research scientists, with vast experience at the interface of machine learning and biology from companies such as Google DeepMind, Novartis and Owkin. 

With this expertise, we’ll develop multi-scale FMs combining genomic, protein, tissue, and whole body-level representations of patients. We will use these FMs to support industry partners and the entire research community to accelerate the next generation of novel ​therapeutics and biotechnological innovations, ultimately making a positive impact on humanity.

Code, Compute, Create

Leveraging attention-based models, we’ll create contextual representations of biological entities that interact with each other, and connect said representations across scales. This will function in the same way LLMs create the representation of a text from the representations of the words or tokens it contains and their interactions (see fig 1).

The first step in creating these models is to source the right data from large datasets at each scale of biology, including DNA, RNA, protein, tissue and electronic health records. For this, we stand on the shoulders of giants. Thanks to key partnerships, we’ll benefit from seven years of Owkin's expertise, a team of 50+ partnership managers who have built strong relationships with 180 KOLs, and 30 top tier institutions across the world. And this network keeps expanding.

The second step is leveraging MOSAIC, a $50 million initiative uniting top academic centers and industry partners to create the world’s largest spatial omics and multimodal dataset in oncology from 7,000 patients across seven cancer indications. Most importantly, the cornerstone of MOSAIC is spatial transcriptomics, part of the new wave of techniques that allow us to connect the fundamental scales of biology.

In order to achieve the computing power necessary to train and serve the FMs, Bioptimus will have access to an exclusive partnership between Owkin and Amazon Web Services (AWS). The collaboration provides improved volume and reliability of supply of best-in-class GPUs, and data scientists will benefit from enhanced workspace performance and data science tooling.

This solution will offer enhanced data security measures, scalable storage options, and robust computing capabilities, empowering scientists and researchers to access, analyze, and manage vast amounts of data efficiently and securely.

The way forward

The application of foundation models in biology is set to have a profound impact in science and society. We’re committed to unlocking the potential of foundation models in this field, helping us improve humanity through a better understanding of biology as a whole.

Through Bioptimus, the whole biology of a human, from proteins to tissues, will be encoded in the Bioptimus Multiscale Representation. By harnessing the power of foundation models and advanced algorithms trained on massive amounts of biological data across scales, we will quantitatively capture the laws of biology that have since remained too complex to be properly understood. 

Are you looking to experiment and build on top of FMs to create breakthroughs in drug discovery, disease understanding and diagnostics? Do you want to work with some of the best minds in science with unrivaled access to multilevel data? Then we want to hear from you! We are looking forward to connecting with the vibrant community of researchers and industry stakeholders.

To find out how you can be a part of Bioptimus, visit our website.
For more information on the launch of Bioptimus, view our press release here.
Author
Author
The Bioptimus Team