Search
7 items with this tag.
Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses
Many Arguments for AI X-Risk Are Wrong
Dreams of AI Alignment: The Danger of Suggestive Names
Don’t Use the “Shoggoth” Meme to Portray LLMs
Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems
Don’t Align Agents to Evaluations of Plans
Don’t Design Agents Which Exploit Adversarial Inputs