Dark mode
Search
6 items with this tag.
Positive Values Seem More Robust and Lasting than Prohibitions
Alignment Allows “Non-Robust” Decision-Influences and Doesn’t Require Robust Grading
Understanding and Avoiding Value Drift
The Shard Theory of Human Values
Humans Provide an Untapped Wealth of Evidence About Alignment
Human Values & Biases Are Inaccessible to the Genome