Inferring utility functions from locally non-transitive preferences

universalprior.substack.com

TL;DR: Fanboying JvN, then a nuts-and-bolts description of the von-Neumann-Morgenstern theorem. A connection to reward modeling, some simulations, and pretty figures!

I'm not sure I understand what the neural net stuff at the end is about. If you are doing noisy sorting, that doesn't require neural networks, and there are lots of known algorithms: https://gwern.net/Resorter#noisy-sorting (You are right that noisy comparison sorting turns out to be much the same, complexity wise, as noise-free comparison sorting.) You can also make sorting and ranking directly differentiable (https://gwern.net/GPT-2-preference-learning#grover-et-al-2019), which would probably be much more efficient than trying to brute force it by iterating a MLP (wouldn't that approach bloat any NN you tried to do that inside of as a component? You'd have to unroll it for a small fixed number of iterations, I guess, and weight-tie it).

I'm not sure I understand what the neural net stuff at the end is about. If you are doing noisy sorting, that doesn't require neural networks, and there are lots of known algorithms: https://gwern.net/Resorter#noisy-sorting (You are right that noisy comparison sorting turns out to be much the same, complexity wise, as noise-free comparison sorting.) You can also make sorting and ranking directly differentiable (https://gwern.net/GPT-2-preference-learning#grover-et-al-2019), which would probably be much more efficient than trying to brute force it by iterating a MLP (wouldn't that approach bloat any NN you tried to do that inside of as a component? You'd have to unroll it for a small fixed number of iterations, I guess, and weight-tie it).