I build generative models for DNA design.

PhD researcher at UCL's Barnes Lab. Encode: AI for Science Fellow (Pillar VC × ARIA). Previously Tempus AI, Snorkel AI, Databricks.

Genomic AI has a usability problem. We have foundation models that can read and write DNA, but almost no one outside a handful of ML labs can use them. My research combines DNA language models with reinforcement learning to design synthetic biology constructs that work on the first try — and to make those models accessible to working biologists through human-readable interfaces. The long-term goal is a closed-loop system where models propose sequences, wet-lab experiments validate them, and the results train the next generation of models — collapsing the cost and timeline of biological engineering.

I think computational biology is approaching its ChatGPT moment — the point where interface design, not just model capability, determines who gets to use these tools. I'm building toward that.

Now

Last updated: April 2026.

Research

PlasmidLM

A conditional Mixture-of-Experts language model for plasmid DNA.

PlasmidLM is a generative model trained on hundreds of thousands of natural plasmid sequences. It uses a conditional MoE architecture to specialize across functional sequence classes (origins, selection markers, regulatory elements) and generates novel, biologically plausible constructs from natural-language or structural prompts.

PlasmidRL

GRPO post-training for biologically realistic DNA generation.

PlasmidRL applies GRPO-based reinforcement learning to PlasmidLM, using composite reward signals from sequence alignment, motif scoring, and structural priors. The result is a model whose outputs exhibit emergent biological realism — replication origins in the right places, codon usage that matches host organisms, and regulatory architecture that holds up under expert review. Currently under review at ICML 2026.

ChatNAV

An end-to-end neoantigen vaccine design pipeline.

ChatNAV is an 11-module pipeline that takes patient sequencing data and outputs ranked, manufacturable mRNA vaccine candidates. It integrates variant calling, HLA typing, MHC binding prediction, structural scoring (PANDORA, AlphaFold2-Multimer), and polyepitope optimization behind a single FastAPI backend. Built to make personalized cancer vaccine design accessible to clinical research labs without bioinformatics infrastructure.

Closed-Loop Plasmid Engineering

Models that design DNA, robots that build it, data that trains the next model.

In collaboration with Twig Bio, this project pairs PlasmidLM-generated constructs with high-throughput expression assays (GFP fluorescence, AlphaLISA) to create a self-improving design loop. Currently the subject of a Google.org AI for Science Impact Challenge proposal.

Publications

Emergent Biological Realism in RL-Trained DNA Language Models

Under review at ICML 2026

GRPO post-training of PlasmidLM produces sequences with emergent structural and functional realism.

Designing Minimal E. coli Genomes Using Variational Autoencoders

Cell Systems — revision in progress

PlasmidGPT

Earlier work on generative models for plasmid sequences

Writing

Loading recent posts...

View all posts →

Previously

Tempus AI

Machine Learning Scientist

Founding member of the generative AI team at one of the largest precision medicine companies in the US. Worked on LLM applications over clinical and genomic data.

Snorkel AI

Senior Machine Learning Engineer

Built synthetic data and RLHF infrastructure used by Fortune 500 customers to fine-tune and align production language models.

WEALTHAWK

Founder

Founder of an AI-native lead development platform for wealth managers. Acquired by Praxis Solutions.

UC Berkeley

B.A. Data Science, minor in Bioengineering

Consulting

I work with teams on production ML systems — particularly around Databricks, MLflow, and LLM evaluation. Recent engagements include LLM-as-a-judge infrastructure and agentic application evaluation. If you're working on something hard in this space, get in touch.

Outside research: basketball, padel, and a standing interest in the London and Bay Area startup scenes.

Contact

me [at] mcclainthiel [dot] com