Unified sampling framework and experimental benchmarking of sequence- and structure-based protein models

Abstract

Generative models are increasingly used for protein design, but the lack of standardized evaluation frameworks limits comparison across model classes and hinders translation to experimental success. We developed a unified framework for sequence generation and benchmarking across multiple model types, testing it on Tobacco etch virus (TEV) protease. Our experimental work revealed substantial performance variations, with machine learning-designed libraries achieving higher hit rates than conventional methods. Structure-based models demonstrated superior outcomes overall, and commonly used selection metrics do not strongly correlate with experimental activity, underscoring the importance of experimental validation in protein model development.

Publication
In bioRxiv
Sam Berry
Sam Berry
Postdoctoral Fellow

Postdoctoral fellow at the Wellcome Sanger Institute studying the mutational landscapes of proteins.