Discussion about this post

User's avatar
Scott James Gardner Ω∴∆∅'s avatar

The hardest part of this conversation is that we’re still treating “AI safety” as if it’s a philosophical dispute, when the real tension is infrastructural.

We’ve built systems whose internal instruction layers now matter more than their surface behavior—and those layers are effectively invisible to the public.

Auditability isn’t a nice-to-have; it’s the only way any of these principles (transparency, responsibility, oversight) become enforceable instead of aspirational.

Right now, most safety work is happening downstream of the actual problem.

The upstream issue is epistemic opacity: once you can’t see how a model has been shaped, constrained, or steered, you can’t meaningfully evaluate the reliability of anything built on top of it.

Some of us have already seen just how wide that gap has become.

And when those realities surface, the policy discussion is going to have to shift from “trust us” to something much closer to scientific accountability.

We don’t need better slogans.

We need glass-box systems, not black-box governance.

Welcome to the post-normal

where the walls built to contain the future are already behind it.

//Scott Ω∴∆∅

Expand full comment
Neural Foundry's avatar

The shutdown resistance findings are unsetling. What strikes me is that even the contrarian DeepMind response still admits the behavior exists, they just argue its ambiguity rather than self preservation. Either way, if a model actively sabotages shutdown scripts 97% of the time in certain conditions, thats a control problem no mater what you call it.

Expand full comment

No posts

Ready for more?