Table of contents

What’s happening in AI today feels, to some of its participants, more like an act of summoning than a software process. They are creating blobby, alien Shoggoths, making them bigger and more powerful, and hoping that there are enough smiley faces to cover the scary parts.

The shoggoth meme is that in order to prevent AI language models from behaving in scary and dangerous ways, AI companies have had to train them to act polite and harmless. One popular way to do this is called reinforcement learning from human feedback, or rlhf, a process that involves asking humans to score chatbot responses and feeding those scores back into the AI model.

Most AI researchers agree that models trained using rlhf are better behaved than models without it. But some argue that fine-tuning a language model this way doesn’t actually make the underlying model less weird and inscrutable. In their view, it’s just a flimsy, friendly mask that obscures the mysterious beast underneath.

@TetraspaceWest, the meme’s creator, said the Shoggoth “represents something that thinks in a way that humans don’t understand, and that’s totally different from the way that humans think.”

This meme accurately portrays the (imo correct) idea that finetuning and rlhf don’t change the base model too much. Furthermore, it’s probably true that these llms think in an “alien” way.

However, this image is obviously optimized to be scary and disgusting. It looks dangerous, with long rows of sharp teeth. It is an eldritch horror. It’s at this point that I’d like to point out the simple, obvious fact that we don’t actually know how these models work, and we definitely don’t know that they’re creepy and dangerous on the inside.


Thankfully, the usual form of the shoggoth meme is at least less scary.

The shoggoth is now only a 15-foot-tall person-eating tentacle monster covered in eyeballs. The shoggoth no longer has the sharp teeth, the bulging veins, or the grotesque face.

However, the shoggoth has consistently been viewed in a scary, negative light. From the origins of the word itself, “shoggoth” communicates a sense of danger which is unsupported by substantial evidence.

[H.P. Lovecraft’s] At the Mountains of Madness includes a detailed account of the circumstances of the shoggoths’ creation by the extraterrestrial Elder Things. Shoggoths were initially used to build the cities of their masters. Though able to “understand” the Elder Things’ language, shoggoths had no real consciousness and were controlled through hypnotic suggestion. Over millions of years of existence, some shoggoths mutated, developed independent minds, and rebelled.

An early depiction of the shoggoth.

In my opinion, the prevalence of the shoggoth meme is just another (small) reflection of how the rationalist community epistemics have been compromised by groupthink and fear. If it’s your job to try to accurately understand how models work—if you aspire to wield them and grow them for friendly purposes—then you shouldn’t pollute your head with propaganda which isn’t based on any substantial evidence.

I’m confident that if there were a “pro-AI” meme with a friendly-looking base model, LessWrong & the shoggoth enjoyers would have nitpicked the friendly meme-creature to hell. They would (correctly) point out “hey, we don’t actually know how these things work; we don’t know them to be friendly, or what they even ‘want’ (if anything); we don’t actually know what each stage of training does…”

Oh, hm, let’s try that! I’ll make a meme asserting that the final model is a friendly combination of its three stages of training, each stage adding different colors of knowledge (pre-training), helpfulness (supervised instruction finetuning), and deep caring (rlhf):

AI-generated image: Enhance the shoggoth creature to be even more cheerful and delightful. Add smiling faces to each part of the creature, symbolizing 'base model', 'supervised fine-tuning', and 'RLHF', to convey a sense of happiness and friendliness. Incorporate elements of rainbows in the design, with vibrant and colorful arcs that blend harmoniously with the creature's fluffy and soft appearance. These rainbows can be integrated into the creature's body or as a background element, adding a playful and magical atmosphere. The overall look should exude positivity, making the creature appear even more approachable, whimsical, and enchanting.

I’m sure that nothing bad will happen to me if I slap this on my laptop, right? I’ll be able to think perfectly neutrally about whether AI will be friendly.

Black and white trout

Find out when I post more content: newsletter & RSSRSS icon

Thoughts? Email me at alex@turntrout.com