| |

A slower mind

I ran the same two prompts on two different AI systems this morning. First, a haiku about artificial intelligence. Then a ten-stanza poem in iambic pentameter. A parlor trick, the kind of thing you do when you’re testing something and want a clean finish line.

Qwen 3.6, running locally on my machine, took 80.9 seconds on the haiku. The poem took 1,190 seconds, nearly twenty minutes. Claude Sonnet 4.6, running on Anthropic’s servers somewhere I can’t see or touch, finished both in under twenty seconds.

The poetry doesn’t matter. What matters is what I actually want to use these tools for, and what happens to my thinking when the tool is fast enough to keep up.

What I’m actually trying to do

I’m a Director of Institutional Research. I study higher education, AI policy, learning outcomes, and the places where those things collide. Three weeks ago I started building a compounding research wiki, a living knowledge base I maintain through Claude Cowork, Anthropic’s agentic workspace. In three weeks it has absorbed thirty to forty articles and web references, plus literature review summaries across four or five related topics generated through Consensus, a peer-reviewed academic search tool. Every source is cross-referenced against everything that came before.

The ingestion works like this: every time I read a new paper, article, or report on AI in education, I run it through a custom skill called /ingest. The raw article goes in. Claude reads it, summarizes it, scours the existing wiki pages for connections, and sometimes surfaces something I hadn’t seen. The synthesis gets written into a new markdown page, the index updates, contradictions get flagged, cross-references get added. There’s a second track too, a clippings wiki for shorter reads and articles, running its own /ingest-clippings skill with its own index. My responses to what Claude surfaces get logged as separate thought pieces, also managed through Cowork. Everything touches Anthropic’s servers: the raw sources, the synthesis, the connections Claude draws between them, my own thinking as it develops. The entire knowledge infrastructure runs through Claude. That’s not incidental to how the system works. It’s the point.

Each new ingestion gets checked against everything already there. A paper arguing that AI tutoring systems reduce equity gaps has to be cross-referenced against the three earlier pages that said the opposite, the two that said it depends on implementation, and the one from last week that introduced a variable nobody had measured before. The /ingest skill does this. It reads the whole index, reasons across it, flags the tension, updates accordingly.

Anyone who’s been through a doctoral program knows the reading pace that gets drilled into you: scan, absorb, move, repeat. The wiki is built for that rhythm. At Qwen’s speed, that operation (which Claude handles in two to three minutes) would eat the better part of a morning. Run it several times a week and the arithmetic turns ugly. This isn’t about waiting twenty minutes for a poem. It’s about whether a research practice holds together at all.

What it actually does to my thinking

Before the wiki, I had Readwise.

Colleagues would send articles. I’d find things online. Everything went in, tagged and saved, with the intention of getting back to it. Sometimes I did. More often, weeks passed before I worked through a batch, and by then the ideas had shifted. Something urgent in January felt stale by March. In AI research especially, what’s true today can be wrong by next month. The pile kept growing. I kept feeling behind. Dozens of articles I’d never absorbed, each one nudging the next idea further away.

The problem wasn’t volume. It was timing. A good idea that doesn’t get processed doesn’t get delayed. It disappears. It never becomes the blog post. It never becomes the insight spoken at a meeting. It stays an intention, and then it stops being even that.

The wiki changed this, but not how I expected. I thought I was building a better filing system. What I built reasons across what’s stored fast enough that I can stay inside the thought. I ingest a new paper and within minutes I know where it fits, what it contradicts, what it confirms. More than once it has stopped me from publishing something, not because the idea was wrong, but because the ingestion surfaced a counterargument I hadn’t seen, or newer information that made the original framing incomplete. That’s not a filing system. That’s a thinking partner who reads faster than I do and remembers everything.

It also tells me which insights are still current. In a field where consensus shifts every few weeks, knowing that a piece of evidence is six months old and three papers have since complicated it isn’t a minor thing. It’s the difference between sounding informed and sounding behind.

Speed is not optional here, and I want to be honest about why. I suspect I have ADHD, and whether or not that’s true, I know this about myself: a delayed response is a lost thought. A tool that makes me wait twenty minutes doesn’t slow my thinking. It ends the session. By the time the answer arrives I’ve moved somewhere else, the thread is gone, whatever was alive in the moment is dead. Slow tools don’t just irritate me. They let things die.

The privacy pull

The wiki contains my thinking. Not just sources but my synthesis, my annotations, my developing arguments, my judgments about where the field is heading. When I send a new article for ingestion, I’m also sending the relevant wiki context Claude needs to reason across. It transits Anthropic’s servers in pieces, conversation by conversation, not as one giant upload. The markdown files stay local, in Obsidian, on my machine.

I use a Pro account. Anthropic’s policy is that they don’t train on Pro conversations by default. The data goes up, the response comes down, and they’re not supposed to be building models on my prompts. That’s meaningfully different from the free tier. I’ve read the policy. I’m not being naive.

But “not trained on” is not the same as “never stored.” There are retention windows, safety monitoring, and terms that can change. I trust Anthropic’s current policy. That trust is not control. It’s a bet on a company’s intentions, renewed silently every time I run /ingest.

I work in institutional research. Student records, outcomes data, internal reporting, all of it lives under FERPA and obligations I take seriously. I’ve built careful workflows to keep sensitive data out of prompts. That line I hold.

The wiki is different. It’s my own intellectual work, transiting someone else’s infrastructure piecemeal, indefinitely. One colleague knows the full scope of what I’m running. I’ve kept it mostly private, less out of embarrassment than because it’s personal the way a journal is personal. It’s where my thinking develops before it becomes anything public. Whether I’m comfortable with that transiting Anthropic’s servers is a real question. Not paranoid, just real. So I priced it out.

The honest arithmetic

To run a local model capable of what /ingest does (long-context reasoning, contradiction detection, synthesis across a growing index) you need something in the 70 billion parameter range. A seven billion parameter model won’t do it.

A single RTX 5090 tops out at around 45 tokens per second on a 32 billion parameter model. Push to 70 billion and the model spills into system RAM, dropping to one or two tokens per second, slower than typing. A dual RTX 5090 build starts around $7,600 for the GPUs alone and climbs past $10,000 fully built. A Mac Studio M4 Max with 128GB runs roughly $4,000 and handles the model size, but tops out at 40-60 tokens per second on larger models.

Eleven thousand dollars is findable. What isn’t findable is a local setup that actually does what I need.

The gap shows up exactly where my workflow needs it most: connection finding, nuance, synthesis across a long and growing context. I tested a local model seriously enough to know. The speed alone was intolerable, and for someone whose mind doesn’t wait well, that word is not hyperbole. Beyond speed, the quality wasn’t there. The model missed things, needed correction, and the time saved on inference got spent on fixes instead. I wasn’t extending my thinking. I was managing it.

The honest price of matching Claude locally may not exist at any consumer price point right now. I haven’t paid it. Not because I can’t. Because paying it wouldn’t get me what I need.

The choice I keep making

People worry, in a general way, about where their prompts go. They download Ollama, run a local model for a weekend, find it slower and less capable, and quietly go back to the fast thing. The discomfort gets filed under “something I should address eventually.”

I’ve done the same. For most of what I do, drafting, editing, structured data analysis, the practical risk of a cloud model is low enough that worrying costs more than it saves.

But the wiki is different. It’s my professional thinking, developing in real time, sent repeatedly to a system I don’t control. I haven’t resolved that tension. I’ve decided, repeatedly, that the cost of resolving it is higher than I’m willing to pay. Not in dollars, but in what I’d lose in the quality of my own thinking.

When you use a cloud model for serious intellectual work, you’re trusting a company with something genuinely valuable. Not just data, but thinking. The terms of that trust are set by someone else and can change without notice.

I kept using the cloud model. What I couldn’t afford was a slower mind.

The price of privacy, honestly calculated, isn’t ten thousand dollars. It’s the ideas that die in the waiting: the blog post that never gets written, the insight that never makes it to the meeting, the connection that was alive for thirty seconds before something else pulled my attention away. Most people can’t name that cost because they’ve never felt the other thing. I have. That’s why I keep paying the trust instead of the hardware.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *