Alzheimer's research, python, and the beauty (and frustration) of modelling biomarker trajectories
There’s a particular kind of frustration that lives inside ANOTHER Python error at 11 PM.
The kind where you’ve been staring at the same plot for so long that the axes stop meaning anything.
I was sitting with a graph of amyloid accumulation rates, trying to make sense of a cloud of dots that looked more like a Jackson Pollock than a scientific finding, and I remember thinking: this is what research actually is, isn’t it? Not the clean figures you see in journal papers. This. The mess before the meaning. But that’s the beauty of it.
That moment, equal parts defeated and oddly alive, was back in November last year, just a couple months in working at the Elbert Lab at the University of Washington, under Dr. Donald Elbert and my mentor Dylan.
I’d come in expecting to learn a lot of new things. And oh boy have I since joining the lab last September.
I hadn’t quite expected to feel them.
The feeling
Alzheimer’s disease has a way of making itself personal before it makes itself scientific. Most people who’ve watched a grandparent or parent lose the thread of their own story don’t think about amyloid plaques or genetic risk variants.
They just notice that something is going, slowly and irreversibly, before the person is gone.
I think about that a lot when I’m working. The disease I study in the form of data points and differential equations is the same disease that rearranges families, quietly, for years.
I love the intersection of biology and mathematics, and I’ve fallen in love with it even more since I started working for this lab. The idea that you can write an equation that describes something happening inside a living brain is so incredibly powerful, and it’s part of why I pursue research that involves it.
The research
My work in the Elbert Lab has two faces. On one side, there’s the more classical biological work: generating plots of β-amyloid aggregation, the process by which a normally harmless protein misfolds and clumps into plaques—the defining pathology of Alzheimer’s. I also model how β-amyloid concentrations in the brain’s blood vessels change in response to different antibody treatments. It’s painstaking, detail-oriented work, and it gave me an appreciation for just how slow and nonlinear biological systems are.
The brain does not behave like a textbook.
The computational project I’ve been building over the past 6 months, though, is where things got philosophically interesting.
Here’s the problem: Alzheimer’s disease unfolds over decades. The amyloid accumulation that eventually leads to dementia probably begins twenty years before any cognitive symptoms appear. If you want to understand how the disease progresses—how fast amyloid builds, when it plateaus, how that differs across people—you ideally want to follow individuals across that entire span of time. You want longitudinal data. But we don’t have decades to wait, and the people living with this disease don’t either.
So the question I started chasing was a strange one: if we can’t follow a brain for twenty years, can we reverse-engineer the timeline from a single snapshot?
The dataset I used, OASIS-3 (a large open-access neuroimaging cohort) gives us exactly that: snapshots. Hundreds of individuals, each measured at a moment in time, each carrying a different “current” level of amyloid burden (measured in a unit called Centiloids, derived from PET brain scans) and a corresponding estimate of how fast their amyloid is changing.
This is cross-sectional data. A crowd of strangers standing at different points along the same road, photographed once.
The insight borrowed from Jack et al.’s 2013 framework is that those strangers, collectively, can tell you something about the road itself. If you fit a smooth curve to the relationship between where someone is on the amyloid scale and how fast they’re moving, you can treat that curve as a kind of velocity field—a function that tells you, at any disease stage, how quickly things are progressing. I did this using spline models, fitting separate curves for people who carry the APOE4 gene variant (the most significant genetic risk factor for late-onset Alzheimer’s) and those who don’t.
Then came the part I found especially beautiful: treating that rate curve as a dynamic system and integrating it forward in time. Instead of asking “what is this person’s amyloid level?”, I asked myself, “if someone starts here and follows this rate function, where do they end up in five years? In twenty?” The math is not especially exotic. It’s essentially what physicists do when they reconstruct a trajectory from a velocity field. But applied to disease progression, it felt revelatory. Suddenly I had predicted longitudinal trajectories for APOE4-positive and APOE4-negative individuals, reconstructed from cross-sectional data, without a single participant being followed for more than a few years.
All this done in the absence of decades-long longitudinal data.
What those trajectories showed was stark. APOE4 carriers accumulate amyloid faster, reach higher burdens, and do so earlier. The curves diverge in a way that makes the genetic risk factor legible not just as a statistical association but as a shape. A difference in the geometry of disease progression.
Now I fed all these trajectories into a model of microglial dynamics, microglia being the brain’s resident immune cells, the ones that are supposed to clear amyloid debris and increasingly fail to do so as Alzheimer’s disease advances. Using the Nelder-Mead optimization algorithm, I tuned the model’s parameters to match real patient data, asking: what values of microglial clearance capacity and immune activation best explain what we observe?
The answer that came back was quiet but pointed. Even relatively small differences in the amyloid trajectories between APOE4 carriers and non-carriers produced noticeably different optimized parameter values, particularly in the parameters governing how aggressively microglia respond to amyloid at low levels and in their disease-activated state. The suggestion, tentative but real, is that the genetic risk encoded in APOE4 doesn’t just accelerate amyloid. It may ripple downstream into the brain’s immune response in ways we can now begin to quantify.
A leap of faith
I want to be honest about the seams in this work, because I think honesty is a big part of what makes research credible and interesting.
The jump from cross-sectional data to longitudinal data is a real leap of faith. When I treat a population snapshot as a proxy for individual progression, I’m assuming that differences across people at different disease stages approximate the changes that happen within a single person over time. Jack et al. themselves flag this: if the cohort doesn’t capture the full arc of disease, or if people at different stages differ in ways unrelated to progression, the reconstructed trajectories can be biased. Mine are model-based estimates, not measured histories. They should be interpreted that way.
The model fit is also imperfect, intentionally so. I’ve only tuned a single parameter, the microglial clearance response, while holding the rest constant. That’s a choice, not a limitation. I was interested in one specific mechanism, and overfitting an inherently noisy cross-sectional dataset to achieve a prettier curve would have told me less about the data, not more. Real data is messy. The model’s job isn’t to mirror the noise. It’s to capture the underlying shape.
Still, there are times when I look at the scatterplots and see all those dots that don’t quite line up the way I’d expect them to. Then I feel the gap between what the model says and what biology is actually doing, and I sit with that discomfort.
I think that’s a good thing to feel. Overconfidence is its own kind of error.
The impact of all this…
I’m an undergraduate student. I will not single-handedly cure Alzheimer’s disease with this research. The distances involved…between a spline fit and a clinical trial…between a parameter value and a real patient outcome…are enormous, and I hold no illusions about that.
But there’s something I keep coming back to. Something I couldn’t have articulated when I started: the value of understanding, even when action is far away. Every mechanism we map in this lab…every trajectory we reconstruct…every parameter we interrogate…it narrows the space of what we don’t know, adding a brick to the bridge between the known and the unknown. Alzheimer’s is still mostly a space of what we don’t know, and it will be like that for quite some time.
I think about the strangers in my dataset sometimes. People who sat in a scanner for an hour so that their data could end up in the hands of researchers they’ll probably never meet, contributing to knowledge and understanding they may never see. There’s something quietly profound about that chain, and I think about it more as I do this research…from patient to dataset to model to maybe, someday, something that helps someone else.
That’s what working on a problem this huge, at this early stage in my career, really feels like: small, careful, inexperienced, and somehow—improbably—worth it.
For now, I’ll continue my work on creating models to translate population-level imaging data to cell-level mechanisms to better understand how genetic risk influences neuroinflammation in Alzheimer’s as I learn more about the disease…and coding ;)
What does it mean to you to contribute to something you may never see completed?




This is an important topic and your research is meaningful. Therefore, if I were you, I would try to humanize the writing style to reach a larger audience, so it is not trapped in such a technical and personal bubble.