Previously in this series: Cognitive Biases in Large Language Models, Drug addicts and deceptively aligned agents - a comparative analysis, Compute Governance: The Role of Commodity Hardware TL;DR: In this project, we (Jacques Thibodeau, Logan Smith, Kyle McDonnell, Laria Reynolds, and yours truly) collected and analyzed existing AI alignment research which we make publicly available. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each. Furthermore, we found that a classifier trained on AI alignment research articles can detect relevant articles that we did not originally include in the dataset.
Share this post
Researching Alignment Research: Unsupervised…
Share this post
Previously in this series: Cognitive Biases in Large Language Models, Drug addicts and deceptively aligned agents - a comparative analysis, Compute Governance: The Role of Commodity Hardware TL;DR: In this project, we (Jacques Thibodeau, Logan Smith, Kyle McDonnell, Laria Reynolds, and yours truly) collected and analyzed existing AI alignment research which we make publicly available. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each. Furthermore, we found that a classifier trained on AI alignment research articles can detect relevant articles that we did not originally include in the dataset.