A roadmap for safer, explainable protein-design AI
Protein language models are artificial intelligence tools which help engineer proteins with useful properties, including completely new structures never seen before in nature.
Key points
- Focus: Protein language models are artificial intelligence tools which help engineer proteins with useful properties, including completely new structures
- Detail: Science reporting: verify primary technical documentation
- Editorial reading: science reporting; whenever possible, verify the cited primary source.
Protein language models are artificial intelligence tools which help engineer proteins with useful properties, including completely new structures never seen before in nature. The science-journalism coverage adds useful context, while the strongest evidential footing still comes from the underlying data, papers or institutional documentation.
This matters because cosmology operates at the edge of what current instruments can measure, where systematic errors and model assumptions are never trivial. Small discrepancies between independent measurements have historically pointed toward missing physics rather than simple calibration errors, and the ongoing tension in the Hubble constant is a live example of how a persistent disagreement between methods can reshape the theoretical landscape. Each new dataset that approaches this territory with independent systematics adds real information to a problem that has resisted easy resolution for more than a decade. This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility: Add as preferred source Nature Machine Intelligence (2026).
Explainability through analysis of training data. The technology has huge potential for addressing global challenges, such as synthesizing enzymes that can absorb carbon dioxide from the atmosphere or building catalysts that.
In a new perspective paper published in Nature Machine Intelligence, researchers at the Center for Genomic Regulation (CRG) analyze how "explainable AI," the techniques and. The authors write that if you want to understand why an AI model has made a predictive decision about what type of structure or properties a protein has, you first need to ask.
The first is what training data the model learned from, which, for example, can explain whether the model has biases that don't account for human genetic diversity, or whether it. Discover the latest in science, tech, and space with over 100, 000 subscribers who rely on Phys. org for daily insights.
The relevance goes beyond one dataset because even small shifts in measured parameters can matter when the field is testing the limits of the standard cosmological model. The Lambda-CDM framework describes the observable universe with remarkable economy, but its success rests on two components, dark matter and dark energy, whose physical nature remains entirely unknown. Any credible measurement that tightens or loosens the constraints on those components moves the entire theoretical enterprise forward, regardless of whether the immediate result looks dramatic on its own terms.
A smaller share of studies go a step further, using these insights as a "Multitasker," reapplying learned signals to help annotate new proteins or predict additional properties. This is when the technology shifted from being a tool of efficiency to one that provides new insight.
Because this item comes through Phys. org Biology as science journalism, it should be treated as contextual reporting rather than primary evidence. Good science reporting can identify why a result matters, connect it to the wider literature and make technical work readable, but the decisive evidence remains in the original paper, dataset, mission release or technical record. That distinction is especially important when a story is later repeated by aggregators, because repetition increases visibility, not evidential strength.
The next step is to see whether the effect survives when independent surveys, different calibration strategies and tighter control of systematic uncertainties enter the picture. Programmes such as Euclid, DESI and the Rubin Observatory will deliver datasets over the next several years that cover the same parameter space with largely independent methods. If the current signal persists through those tests, its theoretical implications will become impossible to set aside.

Original source: Phys. org Biology