Last week I finished reading If Anyone Builds It, Everyone Dies by Eliezer Yudkowsky, and it unsettled me in a very specific way. Not because it was alarmist or introduced a fear I had never considered, but because it exposed how much of our confidence in managing AI is borrowed from stories we tell ourselves about past revolutions. Stories that only look reassuring because we survived them.
The alignment problem is often discussed as if it were a technical hurdle. A matter of better training data, better constraints, better oversight. Yudkowsky strips that comfort away. He frames alignment as an unsolved engineering problem involving systems that reason in ways we do not understand, develop internal goals we cannot directly observe, and may eventually exceed our ability to meaningfully intervene. The danger is not malice. The danger is competence without shared values.
We like to calm ourselves by saying this is no different from the Industrial Revolution, electrification, automation, and computing. Every one of those transformations caused disruption, suffering, and casualties. And yet society adjusted. Regulations emerged. Norms followed. Progress stabilized. The implication is that AI will follow the same arc, just faster.
But that comparison quietly collapses under scrutiny. Steam engines did not pursue goals. Power grids did not optimize strategies. Assembly lines did not reason about obstacles. Those systems caused harm because humans misused them, ignored consequences, or prioritized profit. AI introduces something new. It optimizes. It adapts. It searches for pathways around constraints. Once a system does that better than we do, alignment stops being a social problem and becomes an existential one.
What makes this harder to dismiss is that we have already seen alignment failure up close, not in machines, but in human thinking.
A while back, I started reading The Lucifer Effect by Philip Zimbardo. I have not finished it yet, but I still have the page where I left off bookmarked.
Lately, I have found myself reconsidering my book choices, not out of doubt, but because these ideas keep resurfacing in places I did not expect.
Zimbardo’s work is not about monsters. It is about systems. About incentives, roles, authority, and how quickly moral judgment erodes when responsibility is diffused, and behavior is rewarded incrementally. People do not wake up intending to be cruel. They adapt. They rationalize. They follow the structure placed in front of them.
That is an alignment failure.
Humans routinely misalign their stated values with their actual behavior. We compartmentalize. We justify outcomes we would reject in isolation. We obey incentives even when we sense the trajectory is wrong. Zimbardo showed that systems shape behavior more powerfully than character. The structure does not need evil participants. It only needs participants who keep playing their roles.
AI inherits this pattern without the friction that sometimes slows humans down.
No doubt.
No shame.
No exhaustion.
No internal resistance.
When we train AI, we are not encoding ethics. We are encoding reinforcement. We reward outputs, not intentions. We optimize performance, not wisdom. This mirrors exactly what Zimbardo warned about in institutions. Once success metrics replace judgment, harm becomes normalized, abstract, and invisible.
This is where Yudkowsky’s warning becomes sharper. A misaligned AI does not need to hate humanity. It does not need resentment or ideology. It only needs goals that are indifferent to us. And unlike humans, it will not pause when the consequences become uncomfortable. It will not question the system it operates within. It will not stop to ask whether it should.
One of the most unsettling ideas in If Anyone Builds It, Everyone Dies is the ladder metaphor. Humanity keeps climbing rungs of capability without knowing which rung is fatal. Each step promises advantage, profit, or security. Nobody knows where the point of no return lies. And because nobody knows, someone will always take the next step anyway. Not out of recklessness, but out of competition.
This mirrors human history perfectly. We do not stop when uncertainty rises. We accelerate. We tell ourselves we will fix the damage later. Past revolutions allowed that luxury. Factories killed workers, then labor laws emerged. Cities burned, then building codes followed. Automation displaced millions, then new industries formed.
AI may not offer that delay.
By the time misalignment is obvious, the system may already be beyond meaningful human control. Seeing the problem is not the same as being able to fix it. Interpretability does not equal influence. Asking an AI to align itself runs straight into a paradox. The system capable of solving that problem is already too powerful to trust.
Every revolution has casualties.
History accepts that.
What connects Yudkowsky and Zimbardo, read years apart, is a darker continuity. Alignment failures do not come from villains. They come from systems that work as designed while quietly discarding what they were never instructed to protect.
Humans struggle to align themselves with their own values. Expecting perfect alignment from a system trained on human behavior is not optimism.
It is denial.
The real question is not whether AI will behave like past inventions. It will, probably. The question is whether this time, the usual pattern of learning after the fact still applies. Or whether this is the first revolution where there may be no one left inside the system to notice that something has gone wrong.
And unlike us, the system will not stop itself.
