4 Comments

The idiot in me can't help but hope this problem can be rephrased as equivalent to how political nepotism happens to the initially innocent ("vandals"). When the goal is consistently aligned, but the environment adapts based on previous behavior, the current recommended behavior would change to minimize goal-acquisition and tolerating unintended side effects. This idea can be transferred to collaborative agents as well, including some deceptive agents ("deputies"). https://graymirror.substack.com/p/3-descriptive-constitution-of-the?s=r

Expand full comment

“Luckily, most doctors are smarter than addicts”

Yikes. I dunno about that.

Expand full comment

Are we assuming that the doctor is indifferent to whether we learn the diagnosis, or that it's actively trying to prevent us from learning it? If withholding the diagnosis is valuable to the doctor, then what's stopping it from bribing the detective (by letting it win more often than random chance) in exchange for keeping their shared source of randomness a secret?

Let's say our payout is $100 and the doctor assigns a value of $2 to keeping my diagnosis a secret. So the doctor chooses a deterministic pattern which contains no information about ground truth (which the detective can therefore predict perfectly) and the detective intentionally guesses wrong 49% of the time. Now they each expect to get $51 of utility out of each prediction.

I don't think there's an equilibrium, though. They are both better off by colluding, but what's stopping the detective from just taking all of that "new" utility by guessing correctly more often? If the detective starts winning more than 52% of the time, the doctor will just start giving the real diagnosis. But between 50-52% I don't think there's a stable point for them to coordinate on. If they're true clones, though, they could "pre-determine" an arbitrary number in that range, like 51%, and be strictly better off than if they hadn't.

Expand full comment

Are we assuming that the doctor is indifferent to whether we learn the diagnosis, or that it's actively trying to prevent us from learning it? If withholding the diagnosis is valuable to the doctor, then what's stopping it from bribing the detective (by letting it win more often than random chance) in exchange for keeping their shared source of randomness a secret?

Let's say our payout is $100 and the doctor assigns a value of $2 to keeping my diagnosis a secret. So the doctor chooses a deterministic pattern which contains no information about ground truth (which the detective can therefore predict perfectly) and the detective intentionally guesses wrong 49% of the time. Now they each expect to get $51 of utility out of each prediction.

I don't think there's an equilibrium, though. They are both better off by colluding, but what's stopping the detective from just taking all of that "new" utility by guessing correctly more often? If the detective starts winning more than 52% of the time, the doctor will just start giving the real diagnosis. But between 50-52% I don't think there's a stable point for them to coordinate on. If they're true clones, though, they could "pre-determine" an arbitrary number in that range, like 51%, and be strictly better off than if they hadn't.

Expand full comment