punctilio (n.): precise observance of formalities.
The best typography package for English.
import { transform } from 'punctilio'transform('"It\'s a beautiful thing, the destruction of words..." -- 1984')// → “It’s a beautiful thing, the destruction of words…” — 1984
As far as I can tell, punctilio is the most reliable and feature-complete. I originally built punctilio’s logic for TurnTrout.com. I wrote and sharpened the core RegExes sporadically over several months, exhaustively testing edge cases. Eventually, I decided to spin off the functionality into its own package.
I tested punctilio 0.4 against smartypants 0.2.2, tipograph 0.7.4, and smartquotes 2.3.2.1 These other packages have spotty feature coverage and inconsistent impact on text. For example, smartypants mishandles quotes after em dashes (though quite hard to see in GitHub’s font) and lacks multiplication sign support.
Input
smartypants
punctilio
She said--"Hi!"
She said—”Hi!” (✗)
She said—“Hi!” (✓)
5x5
5x5 (✗)
5×5 (✓)
I basically graded all libraries on a subset of my unit tests, selected to represent a wide range of features.
As far as I can tell, punctilio’s only missing feature is non-English quote support. I don’t have a personal reason to use non-English localization, but feel free to make a pull request!
Works with html doms via separation boundaries
Other typography libraries either transform plain strings or operate on ast nodes individually (retext-smartypantscan’t map changes back to html). But real html has text spanning multiple elements—if you concatenate text from <em>Wait</em>..., transform it, then try to split it back, you’ve lost track of where </em> belonged.
punctilio introduces separation boundaries. First, insert a “separator” character (default: U+E000) at each element boundary before transforming (like at the start and end of an <em>). Every RegEx allows this character mid-pattern without breaking matches. For example, .[SEP].. still becomes …[SEP]. punctilio validates the output by ensuring the separator count remains the same.
import { transform, DEFAULT_SEPARATOR } from 'punctilio'transform(`"Wait${DEFAULT_SEPARATOR}"`)// → `“Wait”${DEFAULT_SEPARATOR}`// The separator doesn’t block the information that this should be an end-quote!
Use via a dom walker tracks which text node each segment came from, inserts separators between them, transforms the combined string, then splits on separators to update each node. Use the separator option if U+E000 conflicts with your content. For an example of how to integrate this functionality, see my website’s code.
punctilio doesn’t enable all transformations by default. Fractions and degrees tend to match too aggressively (perfectly applying the degree transformation requires semantic meaning). Superscript letters and punctuation ligatures have spotty font support—on GitHub, the readme’s font doesn’t even support the example superscript! Furthermore, ligatures = true can change the meaning of text by collapsing question and exclamation marks.
I open source my website infrastructure and article edit histories at alexander-turner/TurnTrout.com. I license the repository under cc by-sa 4.0, which means you can share and adapt the site as long as you provide attribution and distribute any derivative works under the same license.
When I started writing in 2018, I didn’t include alt text. Over the years, over 500 un-alt’ed images piled up. These (mostly) aren’t simple images of geese or sunsets. Most of my images are technical, from graphs of experimental results to hand-drawn AI alignment comics. Describing these assets was a major slog, so I turned to automation.
To implement accessibility best practices, I needed alt text that didn’t describe the image so much as communicate the information the image is supposed to communicate. None of the scattershot AI projects I found met the bar, so I wrote my own package.
alt-text-llm is an AI-powered tool for generating and managing alt text in Markdown files. Originally developed for this website, alt-text-llm streamlines the process of making web content accessible. The package detects assets missing alt text, suggests context-aware descriptions, and provides an interactive reviewing interface in the terminal.
Generating alt text for maze diagrams from Understanding and Controlling a Maze-solving Policy Network.alt-text-llm displays the surrounding text (above the image), the image itself in the terminal using imgcat, and the llm-generated alt suggestion. The user interactively edits or approves the text.
In the end, I generated over 550 high-quality alt-text suggestions for about $12.50 using Gemini 2.5 Pro. With alt-text-llm, I addressed hundreds and hundreds of alt-less images: detecting them; describing them; reviewing them; and lastly applying my finalized alts to the original Markdown files. turntrout.com is now friendlier to the millions of people who browse the web with the help of screen readers.
Dataset contamination is bad for several reasons. Most obviously, when benchmarks are included in AI training data, those benchmarks no longer measure generalization—the AI may have been directly taught the answers. Even more concerningly, if your data promote negative “stereotypes” about AIs, they might become self-fulfilling prophecies, training future models to exhibit those same behaviors.
In the Claude 4 system card, Anthropic revealed that approximately 250,000 transcripts from their alignment faking paper had been scraped from the public web and included in their pretraining data. This caused an early model to hallucinate details from the paper’s fictional scenarios, forcing Anthropic to implement unique mitigations. Speculatively, this kind of misalignment data could degrade the alignment of any models trained thereafter.1
Data scraping practices are a serious problem. The tool we are currently releasing will not stop state-of-the-art actors. Since I wanted to at least mitigate the problem, I put out a bounty for a simple, open source tool to harden data against scraping. The tool is now ready: easy-dataset-share. In less than 30 minutes and at a cost of $0, you can deploy a download portal with basic protections against scrapers, serving a canary-tagged dataset with modest protections against AI training.
easy-dataset-share will not stop sophisticated scrapers
Sophisticated scraping operations can bypass Cloudflare Turnstile for about $0.001 cents per trial (via e.g. CapSolver). The robots.txt and Terms of Service are not technically binding and rely on the good faith of the user, although the ToS does provide limited legal deterrence. Canary strings can be stripped from documents. Overall, this tool is just a first step towards mitigating dataset contamination. We later discuss improvements which might protect against sophisticated actors.
One command to set up your shell, editor, and secret management.
My .dotfiles repository provides comprehensive development environment setup. With this command, I quickly personalize any shell—even if I’m just visiting with ssh for a few hours.
Fish shell with autocomplete, syntax highlighting, and the tide theme,
neovim via LazyVim, providing a full ide experience,
tmux with automatic session saving and restoration,
envchain for hardware-encrypted secret management via macOS Secure Enclave or Linux gnome-keyring—no more plaintext api keys in configuration files,
Open source AI tool setup,
autojump for quick directory navigation,
Reversible file deletion by default via trash-put instead of rm,
git aliases and other productivity shortcuts, and—drum roll—
goosesay, because every terminal needs more geese.
______________________________________ / Find out just what any people will \ | quietly submit to and you have the | | exact measure of the injustice and | | wrong which will be imposed on them. | \ --- Frederick Douglass / -------------------------------------- \ \ \ ___ .´ ""-⹁ _.-´) e _ '⹁ '-===.<_.-´ '⹁ \ \ \ ; \ ; \ _ | '⹁__..--"" ""-._ _.´) / ""-´ _> : -´/ ; .__< __) \ '._ .__.-' .-´ '⹁_ '-⹁__.-´ / '-⹁__/ ⹁ _.´ ____< /'⹁__/_."" .´.----´ | | .´ / | | ´´-/ ___| ; <_ / `.'´
Each time I open the fish shell, a rainbow goose blurts out an interesting phrase. I spent several hours to achieve this modern luxury.
if status is-interactive fortune 5% computers 5% linuxcookie 2% startrek 88% wisdom | cowsay -f ~/.dotfiles/apps/goose.cow | lolcat -S 6end
The way this works is that:
I sample a saying by calling the fortune command,
I pipe the saying into goosesay (my variant of the cow in the original cowsay),
The lolcat command splays the text ’cross the rainbow.