Table of contents
What’s happening in AI today feels, to some of its participants, more like an act of summoning than a software process. They are creating blobby, alien Shoggoths, making them bigger and more powerful, and hoping that there are enough smiley faces to cover the scary parts.
The shoggoth meme is that in order to prevent AI language models from behaving in scary and dangerous ways, AI companies have had to train them to act polite and harmless. One popular way to do this is called reinforcement learning from human feedback, or rlhf, a process that involves asking humans to score chatbot responses and feeding those scores back into the AI model.
Most AI researchers agree that models trained using rlhf are better behaved than models without it. But some argue that fine-tuning a language model this way doesn’t actually make the underlying model less weird and inscrutable. In their view, it’s just a flimsy, friendly mask that obscures the mysterious beast underneath.
@TetraspaceWest
,the meme’s creator, said the Shoggoth “represents something that thinks in a way that humans don’t understand, and that’s totally different from the way that humans think.”
This meme accurately portrays the (imo correct) idea that finetuning and rlhf don’t change the base model too much. Furthermore, it’s probably true that these llms think in an “alien” way.
However, this image is obviously optimized to be scary and disgusting. It looks dangerous, with long rows of sharp teeth. It is an eldritch horror. It’s at this point that I’d like to point out the simple, obvious fact that we don’t actually know how these models work, and we definitely don’t know that they’re creepy and dangerous on the inside.
Thankfully, the usual form of the shoggoth meme is at least less scary.

However, the shoggoth has consistently been viewed in a scary, negative light. From the origins of the word itself, “shoggoth” communicates a sense of danger which is unsupported by substantial evidence.
[H.P. Lovecraft’s] At the Mountains of Madness includes a detailed account of the circumstances of the shoggoths’ creation by the extraterrestrial Elder Things. Shoggoths were initially used to build the cities of their masters. Though able to “understand” the Elder Things’ language, shoggoths had no real consciousness and were controlled through hypnotic suggestion. Over millions of years of existence, some shoggoths mutated, developed independent minds, and rebelled.

In my opinion, the prevalence of the shoggoth meme is just another (small) reflection of how the rationalist community epistemics have been compromised by groupthink and fear. If it’s your job to try to accurately understand how models work—if you aspire to wield them and grow them for friendly purposes—then you shouldn’t pollute your head with propaganda which isn’t based on any substantial evidence.
I’m confident that if there were a “pro-AI” meme with a friendly-looking base model, LessWrong & the shoggoth enjoyers would have nitpicked the friendly meme-creature to hell. They would (correctly) point out “hey, we don’t actually know how these things work; we don’t know them to be friendly, or what they even ‘want’ (if anything); we don’t actually know what each stage of training does…”
Oh, hm, let’s try that! I’ll make a meme asserting that the final model is a friendly combination of its three stages of training, each stage adding different colors of knowledge (pre-training), helpfulness (supervised instruction finetuning), and deep caring (rlhf):
I’m sure that nothing bad will happen to me if I slap this on my laptop, right? I’ll be able to think perfectly neutrally about whether AI will be friendly.
Find out when I post more content: newsletter & RSS
alex@turntrout.com