Table of Contents
Punctilio for meticulous typography
punctilio (n.): precise observance of formalities.
Pretty good at making your text pretty. The most feature-complete and reliable English typography package. punctilio transforms plain ascii into typographically correct Unicode, even across html element boundaries.
Smart quotes · Em / en dashes · Ellipses · Math symbols · Legal symbols · Arrows · Primes · Fractions · Superscripts · Ligatures · Non-breaking spaces · html-aware · Bri’ish localisation support
import { transform } from 'punctilio'
transform('"It\'s a beautiful thing, the destruction of words..." -- 1984')
// → “It’s a beautiful thing, the destruction of words…”—1984Format-faithful: Text→text, Markdown→Markdown, html→html. Transform typography, preserve structure.
npm install punctilioWhy punctilio?
As far as I can tell, punctilio is the most reliable and feature-complete. I built punctilio for my website. I wrote1 and sharpened the core RegExes sporadically over several months, exhaustively testing edge cases. Eventually, I decided to spin off the functionality into its own package.
I tested punctilio 1.2.9 against smartypants 0.2.2, tipograph 0.7.4, smartquotes 2.3.2, typograf 7.6.0, and retext-smartypants 6.2.0.2 These other packages have spotty feature coverage and inconsistent impact on text. For example, smartypants mishandles quotes after em dashes (though quite hard to see in GitHub’s font) and lacks multiplication sign support.
| Input | smartypants | punctilio |
|---|---|---|
| 5×5 | 5×5 (✗) | 5×5 (✓) |
My benchmark.mjs measures how well libraries handle a wide range of scenarios. The benchmark normalizes stylistic differences (e.g. non-breaking vs regular space, British vs American dash spacing) for fair comparison.
| Package | Passed (of 159) |
|---|---|
punctilio | 154 (97%) |
tipograph | 92 (58%) |
typograf | 74 (47%) |
smartquotes | 72 (45%) |
smartypants | 68 (43%) |
retext-smartypants | 65 (41%) |
| Feature | Example | punctilio | smartypants | tipograph | smartquotes | typograf |
|---|---|---|---|---|---|---|
| Smart quotes | "hello" → “hello” | ✓ | ✓ | ✓ | ✓ | ✓ |
| Leading apostrophe | 'Twas → ’Twas | ✓ | ✗ | ✗ | ◐ | ✗ |
| Em dash | -- → — | ✓ | ✓ | ✗ | ✗ | ✓ |
| En dash (ranges) | 1-5 → 1–5 | ✓ | ✗ | ✓ | ✗ | ✗ |
| Minus sign | -5 → −5 | ✓ | ✗ | ✓ | ✗ | ✗ |
| Ellipsis | ... → … | ✓ | ✓ | ✓ | ✗ | ✓ |
| Multiplication | 5x5 → 5×5 | ✓ | ✗ | ✗ | ✗ | ◐ |
| Math symbols | != → ≠ | ✓ | ✗ | ◐ | ✗ | ◐ |
| Legal symbols | (c) 2004 → © 2004 | ✓ | ✗ | ◐ | ✗ | ✓ |
| Arrows | -> → → | ✓ | ✗ | ◐ | ✗ | ◐ |
| Prime marks | 5'10" → 5′10″ | ✓ | ✗ | ✓ | ✓ | ✗ |
| Degrees | 20 C → 20 °C | ✓ | ✗ | ✗ | ✗ | ✓ |
| Fractions | 1/2 → ½ | ✓ | ✗ | ✗ | ✗ | ✓ |
| Superscripts | 2nd → 2ⁿᵈ | ✓ | ✗ | ✗ | ✗ | ✗ |
| English localization | American / British | ✓ | ✗ | ✗ | ✗ | ✗ |
| Ligatures | ?? → ⁇ | ✓ | ✗ | ✓ | ✗ | ✗ |
| Non-English quotes | „Hallo” | ✗ | ✗ | ✓ | ✗ | ◐ |
| Non-breaking spaces | Chapter 1 | ✓ | ✗ | ✗ | ✗ | ✓ |
Known limitations of punctilio
| Pattern | Behavior | Notes |
|---|---|---|
'99 but 5' clearance | 5' not converted to 5′ | Leading apostrophe is indistinguishable from an opening quote without semantic understanding |
«Bonjour» | Not spaced to « Bonjour » | French localization not supported |
Test suite
Setting aside the benchmark, punctilio’s test suite includes 1,100+ tests at 100% branch coverage, including edge cases derived from competitor libraries (smartquotes, retext-smartypants, typograf) and the Standard Ebooks typography manual. I also verify that all transformations are stable when applied multiple times.
Works with html doms via separation boundaries
Perhaps the most innovative feature of the library is that it properly handles doms! (This means it’ll also work on Markdown: convert to html, transform with punctilio, convert back to Markdown.)
Other typography libraries take one of two approaches, both with drawbacks.
- String-based libraries (like
smartypants) transform plain text but are unaware of html structure. If you concatenate text from<em>Wait</em>..., transform it intoWait…, and then try to convert back—you’ve lost track of where the</em>belongs. - Ast-based libraries (like
rehype-retext) process each text node individually, preserving structure but losing cross-node information. A quote that opens inside<em>"Wait</em>and closes outside it..."spans two text node. Processed independently, the library can’t tell whether the final"is opening or closing, because it never sees both at once.
punctilio introduces separation boundaries to get the best of both worlds:
- Flatten the parent container’s contents to a string, delimiting element boundaries with a private-use Unicode character (
U+E000) to avoid unintended matches. - Every RegEx allows (and preserves) these characters, treating them as boundaries of a “permeable membrane” through which contextual information flows. For example,
.U+E000..still becomes…U+E000. - Rehydrate the html ast. For all k, set element k’s text content to the segment starting at separator occurrence k.
import { transform, DEFAULT_SEPARATOR } from 'punctilio'
transform(`"Wait${DEFAULT_SEPARATOR}"`)
// → `“Wait”${DEFAULT_SEPARATOR}`
// The separator doesn’t block the information that this should be an end-quote!For rehype / unified pipelines, use the built-in plugin which handles the separator logic automatically:
import rehypePunctilio from 'punctilio/rehype'
unified()
.use(rehypeParse)
.use(rehypePunctilio)
.use(rehypeStringify)
.process('<p><em>"Wait</em>..." -- she said</p>')
// → <p><em>“Wait</em>…”—she said</p>
// The opening quote inside <em> and the closing quote outside it
// are both resolved correctly across the element boundary.For Markdown asts via remark, use remarkPunctilio which applies the same separator technique to preserve inline element boundaries, or use transformMarkdown for a simpler Markdown-to-Markdown pipeline.
For manual dom walking or custom transforms, use transformElement from punctilio/rehype.
Options
punctilio doesn’t enable all transformations by default. Fractions and degrees tend to match too aggressively (perfectly applying the degree transformation requires semantic meaning). Superscript letters and punctuation ligatures have spotty font support. Furthermore, ligatures = true can change the meaning of text by collapsing question and exclamation marks.
transform(text, {
punctuationStyle: 'american' | 'british' | 'none', // default: 'american'
dashStyle: 'american' | 'british' | 'none', // default: 'american'
symbols: true, // ellipsis, math, legal, arrows
collapseSpaces: true, // normalize whitespace
fractions: false, // 1/2 → ½
degrees: false, // 20 C → 20 °C
superscript: false, // 1st → 1ˢᵗ
ligatures: false, // ??? → ⁇, ?! → ⁈, !? → - Fully general prime mark conversion (e.g.
5'10"→5′10″) requires semantic understanding to distinguish from closing quotes (e.g."Term 1"should produce closing quotes).punctiliocounts quotes to heuristically guess whether the matched number at the end of a quote (if not, it requires a prime mark). Other libraries liketipograph0.7.4 use simpler patterns that make more mistakes. - The
americanstyle follows the Chicago Manual of Style:- Periods and commas go inside quotation marks (“Hello,” she said.)
- Unspaced em-dashes between words (word—word)
- The
britishstyle follows Oxford style:- Periods and commas go outside quotation marks (“Hello,” she said.)
- Spaced en-dashes between words (word—word)
- Setting either style to
noneskips the entire transform category:punctuationStyle: 'none'preserves straight quotes, apostrophes, and prime marks;dashStyle: 'none'preserves all hyphens, number ranges, date ranges, and minus signs. punctiliois idempotent by design:transform(transform(text))always equalstransform(text). If performance is critical, setcheckIdempotency: falseto skip the verification pass.- When
useModifierLetterApostropheis enabled, apostrophes use Modifier Letter Apostrophe (U+02bc) while closing single quotes use Right Single Quotation Mark (U+2019), producing semantically distinct codepoints in most cases. Contractions (don’t), possessives (dog’s), and leading abbreviations (’twas, ’99) all output U+02bc. Bare trailing possessives likedogs'remain ambiguous. This option is off by default since U+02bc may not be matched by downstream RegEx patterns that expect standard quote characters.
This website
I’ve made 5,496 commits. That’s over halfway to being over 9,000!
This site is one of my most heartfelt works of art. I’ve passionately optimized its design while obsessively testing—for example, 100% TypeScript branch coverage, 100% Python line coverage, and hundreds of visual regression tests.
I open source my website infrastructure and article edit histories at alexander-turner/TurnTrout.com. I license the repository under CC by-sa 4.0, which means you can share and adapt the site as long as you provide attribution and distribute any derivative works under the same license.
You can locally serve the site by running:
SITE_DIR=/tmp/TurnTrout.com
git clone https://github.com/alexander-turner/TurnTrout.com.git "$SITE_DIR" --depth 1
cd "$SITE_DIR"
yes | pnpm install --frozen-lockfile
pnpm devAutomatic alt text generation
Install with pip install alt-text-llm.
When I started writing in 2018, I didn’t include alt text. Over the years, over 500 un-alt’ed images piled up. These (mostly) aren’t simple images of geese or sunsets. Most of my images are technical, from graphs of experimental results to hand-drawn AI alignment comics. Describing these assets was a major slog, so I turned to automation.
To implement accessibility best practices, I needed alt text that didn’t describe the image so much as communicate the information the image is supposed to communicate. None of the scattershot AI projects I found met the bar, so I wrote my own package.
alt-text-llm is an AI-powered tool for generating and managing alt text in Markdown files. Originally developed for this website, alt-text-llm streamlines the process of making web content accessible. The package detects assets missing alt text, suggests context-aware descriptions, and provides an interactive reviewing interface in the terminal.

alt-text-llm displays the surrounding text (above the image), the image itself in the terminal using imgcat, and the llm-generated alt suggestion. The user interactively edits or approves the text.In the end, I generated over 550 high-quality alt-text suggestions for about $12.50 using Gemini 2.5 Pro. With alt-text-llm, I addressed hundreds and hundreds of alt-less images: detecting them; describing them; reviewing them; and lastly applying my finalized alts to the original Markdown files. turntrout.com is now friendlier to the millions of people who browse the web with the help of screen readers.
If you want to improve accessibility for your content, go ahead and check out my repository!
Protect datasets from scrapers
Install with pip install easy-dataset-share.
I helped fund this project. Here’s the introduction to an article I wrote:
Dataset contamination is bad for several reasons. Most obviously, when benchmarks are included in AI training data, those benchmarks no longer measure generalization—the AI may have been directly taught the answers. Even more concerningly, if your data promote negative “stereotypes” about AIs, they might become self-fulfilling prophecies, training future models to exhibit those same behaviors.
In the Claude 4 system card, Anthropic revealed that approximately 250,000 transcripts from their alignment faking paper had been scraped from the public web and included in their pretraining data. This caused an early model to hallucinate details from the paper’s fictional scenarios, forcing Anthropic to implement unique mitigations. Speculatively, this kind of misalignment data could degrade the alignment of any models trained thereafter.1
Data scraping practices are a serious problem. The tool we are currently releasing will not stop state-of-the-art actors. Since I wanted to at least mitigate the problem, I put out a bounty for a simple, open source tool to harden data against scraping. The tool is now ready:
easy-dataset-share. In less than 30 minutes and at a cost of $0, you can deploy a download portal with basic protections against scrapers, serving a canary-tagged dataset with modest protections against AI training.easy-dataset-sharewill not stop sophisticated scrapersSophisticated scraping operations can bypass Cloudflare Turnstile for about $0.001 cents per trial (via e.g. CapSolver). The
robots.txtand Terms of Service are not technically binding and rely on the good faith of the user, although the ToS does provide limited legal deterrence. Canary strings can be stripped from documents. Overall, this tool is just a first step towards mitigating dataset contamination. We later discuss improvements which might protect against sophisticated actors.
Automated setup
One command to set up your shell, editor, and secret management.
My .dotfiles repository provides comprehensive development environment setup. With this command, I quickly personalize any shell—even if I’m just visiting with ssh for a few hours.
- Fish shell with autocomplete, syntax highlighting, and the
tidetheme, neovimvia LazyVim, providing a full ide experience,tmuxwith automatic session saving and restoration,envchainfor hardware-encrypted secret management via macOS Secure Enclave or Linux gnome-keyring—no more plaintext api keys in configuration files,- Open source AI tool setup,
autojumpfor quick directory navigation,- Reversible file deletion by default via
trash-putinstead ofrm, gitaliases and other productivity shortcuts, and—drum roll—goosesay, because every terminal needs more geese.
______________________________________
/ Find out just what any people will \
| quietly submit to and you have the |
| exact measure of the injustice and |
| wrong which will be imposed on them. |
\ --- Frederick Douglass /
--------------------------------------
\
\
\ ___
.´ ""-⹁
_.-´) e _ '⹁
'-===.<_.-´ '⹁ \
\ \
; \
; \ _
| '⹁__..--"" ""-._ _.´)
/ ""-´ _>
: -´/
; .__< __)
\ '._ .__.-' .-´
'⹁_ '-⹁__.-´ /
'-⹁__/ ⹁ _.´
____< /'⹁__/_.""
.´.----´ | |
.´ / | |
´´-/ ___| ;
<_ /
`.'´Each time I open the fish shell, a rainbow goose blurts out an interesting phrase. I spent several hours to achieve this modern luxury.
if status is-interactive
fortune 5% computers 5% linuxcookie 2% startrek 88% wisdom | cowsay -f ~/.dotfiles/apps/goose.cow | lolcat -S 6
endThe way this works is that:
- I sample a saying by calling the
fortunecommand, - I pipe the saying into
goosesay(my variant of the cow in the originalcowsay), - The
lolcatcommand splays the text ’cross the rainbow.
Minor contributions
Scss linting rule
I contributed a rule to stylelint-scss. I ran into the following issue:
- I defined a css
--property. - I defined the
--propertyusing the scss variable$var. - In this specific context, browsers will not interpolate
$varwhich means the final css contains the literal “$var.”
To fix the problem, $var must be interpolated into #{$var}. My custom-property-no-missing-interpolation rule catches and automatically fixes this mistake.
Find out when I post more content: newsletter & rss
alex@turntrout.com (pgp)Footnotes
-
While Claude is the number one contributor to this repository, that’s because Claude helped me port my existing code and added some features. The core regular expressions (e.g. dashes, quotes, multiplication signs) are human-written and were quite delicate. Those numerous commits don’t show in this repo’s history. ⤴
-
The Python libraries I found were closely related to the JavaScript packages. I tested them and found similar scores, so I don’t include separate Python results. ⤴