9 Comments
User's avatar
Vera Horvath's avatar

Refreshing ideas, thanks!

Expand full comment
Séb Krier's avatar

Thank you!

Expand full comment
Rohit Krishnan's avatar

This was very good!

Expand full comment
Séb Krier's avatar

Thanks Rohit :)

Expand full comment
Steeven's avatar

This was super cool. Could you talk about differences in agents themselves? Obviously if I’m the neighbor vs the polluter and my agent and their agent is negotiating a price over the pollution, we both stand to gain if one of us has a better negotiation bot. In this case, does everyone have to have the same agent to prevent constant winning?

How would you draw the line between legitimate negotiation and prompt injection? We already have lobbyists and salespeople who are hired on the basis of getting better than normal negotiation outcomes, so what’s to stop a large company from training an especially good bargaining model and using it against a consumer?

You could put in stopgaps to the agent ie notify me if the agent will give up more than X dollars or equivalent but that re-introduces transaction costs.

You could also mandate that only certain types of publicly available models could be used but research could still be done on better prompts or finding negotiation scenarios where the model is biased to reward one party over the other in ways we wouldn’t expect.

You could also mandate shared values like honesty which humans would actually vote on and the models over the course of negotiation would attempt to fulfill those values

Expand full comment
Séb Krier's avatar

Thanks for the excellent comment! These are great questions.

On the difference in agents, yes it's worth exploring if this just leads to an arms race. It's not clear how material the difference is. One analogy is the difference between using a high-powered White Shoe lawyer versus a lawyer from a regular firm. I would imagine that for many services/disputes, the difference isn't huge - but in some very complex domains it might be. Where that is the case, then in principle a successful negotiation capability could be quickly copied and offered by competitors, and the market for these agents could still act as a strong equalizer over time.

On legitimate negotiation vs. prompt injection: one is 'within an agreed framework of rules' and the other isn't. So e.g. lobbyists today are fine provided they comply with various legal requirements, whereas bribery isn't. If we deem an activity to be problematic (e.g. some form of predatory use of bargaining agents against a consumer) then presumably you should be able to regulate this. But this should only come after negotiation/bargaining fails, and market solutions are exhausted.

Another thing: the targeted consumer here can also quickly band up with other consumers who would also have a lot to gain by preventing the predatory behaviour in general, which would make such an action by the large company less likely/more costly.

I like the shared values on honesty bit - I think in many situations we would expect people to be honest, so similarly if someone instructs their agent to be dishonest, this should be 'discoverable' by other agents. Maybe targeted/narrow privacy-preserving audits could help here! The reputation systems help too; future negotiations would become much harder or more expensive, as other agents would demand stronger guarantees or refuse to engage at all.

Expand full comment
Elizabeth McCabe's avatar

I really enjoyed this!

I am skeptical that mechanism design can do what's required of it here though. It requires outcomes are deterministic, e.g. you can't use it to make efficient prices, that someone then chooses. Also it creates a lot of friction if joining is voluntary. I explain this in a bit more depth here: https://open.substack.com/pub/elizabethnorahanne/p/ai-agent-economies-have-a-lying-problem?r=2kmhr6&utm_campaign=post&utm_medium=web

Do you have any other ways of incentivising truth-telling? In my mind it seems pretty crucial for solving externality problems.

Expand full comment
David Duvenaud's avatar

I like this article, and the vision, but I'd like to pick at one part:

"An agent, no matter how personalized, cannot be a tool for committing crimes. Your agent cannot help you orchestrate fraud, DDOS a hospital, hire a hitman, or procure materials for a bioweapon, any more than your word processor can grant you immunity for writing a fraudulent cheque."

I realize you didn't mean it literally, but the first sentence is plainly false. And word processors don't grant you immunity but they can help you commit crimes. What's your proposal? That states should insist on a certain form of liability? In the DDOS example, is that actually illegal? Even if so, will it be feasible to outlaw agent-enabled widely destructive behavior?

I don't think this is a nitpick - I think making the agents lawful enough to avoid medium-scale terrorism (like DDOSes) is likely to require also crushing their ability to effectively advocate on users' behalf against the state. Or at least, states that don't mostly align their citizens' agents to the state might pay a big alignment tax, in the sense of having to tolerate all sorts of mischief and Goodhearting of whatever systems they try to set up. I'm not sure though.

Expand full comment
Matt Reardon's avatar

This is a vision of a slow takeoff world where alignment *is* solved. Seb seems to confine catastrophic misalignment to scenarios with a singular superintelligence, but that isn't a necessary feature. 8 billion superintelligences nominally tasked with coordinating on our behalf can also coordinate on other things that entail cutting human values out of the picture.

It's true that in slow takeoff worlds human interests can provide a Schelling point, but I think it's very reasonable to worry how durable that is in a rapidly changing post-AGI future.

Expand full comment