The Bad Toupee Fallacy: Why You're (Likely) Wrong About AI-Generated Writing
You don't notice what you don't notice.
People love to complain about AI-generated writing. It’s bland and lacks finesse; it’s impersonal, generic—and just plain boring. Read a few paragraphs and you are either already asleep or wishing you were. Right?
Well, not so fast. If you have faith in your ability to differentiate between the stuff large language models (LLMs) such as ChatGPT spout out and the words and sentences produced by human intelligence, you might have succumbed to the so-called “bad toupee fallacy”, which goes something like this: “I’ve noticed that some people are wearing bad toupees. Therefore, I’m always able to spot when someone is wearing a toupee.”
The problem is, of course, that there may be plenty more toupees out there, in the wild, that may elude detection. You may have only developed a knack for spotting the bad ones and you probably have no idea whether the bad ones constitute a small or large fraction of the total number.
It’s a type of selection bias that can cloud our judgment when we try to distinguish between categories within the same domain (e.g., human writing vs. writing by LLMs, or human hair vs. toupees). Here are three examples to illustrate the fallacy:
1. "There’s no such thing as a good plastic surgery—everyone who gets it looks weird."
You're only noticing the poor surgeries. Successful ones may be undetectable and therefore not part of your data set. How can you be sure that the most attractive movie star you know hasn’t had a ton of plastic surgery?
2. "Online scams are easy to detect."
You see the clumsy, obvious scam emails. The well-crafted ones that trick people are less likely to be flagged or discussed. How do you know that you’re not being scammed right now?
3. "Fake luxury watches are easy to spot."
You’re basing this on poorly made counterfeits. The high-quality fakes pass for real and aren’t part of your sample because you don’t realize they’re fake. How do you know that there aren’t well-made fakes that you haven’t noticed? Perhaps you’re wearing one right now?
The point is basically that you don’t notice what you don’t notice—by definition. Therefore, you can’t know the true size of the category you are talking about, which should preclude you from making categorical statements such as “All X are Y”.
What brought me onto this quirky fallacy was this post by full-time editor and writer Sean Kernan: 13 Signs You Used ChatGPT To Write That. Kernan claims to be able to detect AI-generated writing “from a mile away” and provides what he sees as a list of 13 unmistakable signs. One giveaway is supposedly the use of “em” dashes—which I’m using here—as opposed to using no dashes at all, or the shorter “en” dashes or hyphens more common in non-American styles of writing. Another is apparently the frequent use of a colon in titles. A third is a general lack of depth.
Convinced yet? Me neither.
Perhaps Kernan is indeed able to spot examples of bad writing by LLMs. But how many good ones go unnoticed? He cannot possibly know how large the category of LLM writing is and so he is in no position to make such sweeping generalizations. Perhaps his favorite writer uses ChatGPT all the time—how would he know? He can’t and he doesn’t.
We should keep in mind that some people are experts at prompt engineering, which is the art and skill of ensuring that an LLM gives you exactly what you want. You can ask it to be less generic, less hedging, less formulaic. You can ask it to write in the style of your favorite novelist. You can ask it to insert references to culture or history or science that seem human-like and apropos to the point you’re trying to make.
I therefore largely agree with philosopher Victor Kumar’s more measured view in his recent piece: Why Aren’t You Using ChatGPT?
Kumar puts it right:
“Assessments [of ChatGPT’s purported uselessness] litter magazine essays and social media, but they reveal less about the tool than the authors: they're confessing that they don’t know how to ask it good questions.”
I’m sorry to be so blunt, but people who claim always to be able to detect LLM writing seem a tad bit full of themselves. Perhaps they are only judging from text produced by the earlier LLMs that indeed had several flaws. Or, maybe they are simply the Luddites of our times doing whatever they can to impugn a new and eerily superior technology they fear might rob them of their livelihood and sense of purpose.
My guess is that it’s a bit of both, and I can certainly relate to the feeling that something precious was lost when ChatGPT launched its first model in late 2022. Overcome by this feeling, it’s tempting to console oneself by looking for reasons why AI-generated text always will be subpar. It’s a way of coping, of making sense of a scary and unfamiliar situation.
But this feeling of anxiety should not lead us into fallacy land. Instead, we should try to ally with our new writing coaches and make the best of the unwritten chapters before us—whether wearing a toupee or not.
Take care,
E