More Me Than Me
Dear Linguists: Help Me Understand LLMs' Flow and Fluency
I studied linguistics in the ‘90s. It was a kinetic, high-energy time. Government and Binding theory dominated, with its research centered on the development of principles and parameters—an attempt to uncover the universal principles underlying human linguistic ability. This approach was pared back in the mid-90s with the publication of Chomsky’s new theory, The Minimalist Program. A reaction to the wild growth of principles and parameters, the Minimalist Program sought to strip syntactic theory to what was only “computationally necessary”.
On the West Coast, where I was studying, were a group of contrarians, all of whom rejected Chomsky, and were developing competing syntactic theories. Their main calling card was that their theories were computationally tractable, i.e. could be implemented by a computer. (Hey, you may be thinking: isn’t that an awful lot like “computationally necessary”? I can only tell you what they told me—hell no.)
Similar foment was alive in the study of meaning. Most undergrads began their study of semantics in the Montague tradition, an approach that emphasizes the analysis of snippets of language and their formalization with the help of small, toy models. This work was often complemented by two theories developed in the 1980s—Hans Kamp’s Discourse Representation Theory and Irene Heim’s File Change Semantics—both of which view meaning as dynamic, a process.
As with syntax, there were many competing frameworks developed in parallel. Among the alternatives are Lexical Semantics, focused on the meaning of words, Conceptual Semantics, a theory of structured mental representations, and Relevance Theory, which leans into pragmatics and inference.
Since I graduated, “big theory” has hardly shifted; the ideas that defined the ‘90s still hold sway today. What I can’t figure out is why the arrival of generative AI hasn’t led to a Gestalt shift. LLMs ought to be a windfall for syntax and semantics; instead we appear to be trapped in epicycles, still orbiting established theories.
On the syntax side, Chesi’s overview is a clear guide to the internal debate unfolding in that field right now. Although titled Is This the End of (Generative) Linguistics as We Know It?, the answer is, unsurprisingly, no. Chesi sees the potential for a renaissance in the study of syntax, not its end. He suggests that the solutions include formalizing minimalist proposals, adopting open evaluation suites, and using LLMs to stress-test theoretical claims.
Finding a tidy overview of what is happening in the study of meaning is harder. One of the most influential viewpoints is that the output of generative AI is meaningless and so, irrelevant to semantic theorizing. Emily Bender et al. argue that because models lack grounding in the real world—i.e., experiences that connect words to things, concepts, and situations—as well as any intention to communicate, their output is nonsense.
Although influential, this view isn’t universal. LLMs are emerging as tools for testing semantic and pragmatic theories, but most of that work looks as if it aims to understand very specific phenomena. Here’s what I mean:
Evidence that LLMs do not understand Grice’s Maxim of Manner, which is the good advice to be clear and concise in all that you say, avoiding unnecessary verbiage and padding.
Lots of probing of how well LLMs reason deductively, including the meaning of logical connectives such as and, or, not as well as logical inference patterns.
Testing of scope ambiguities and whether LLMs make the same judgment calls about them as humans.
And look, don’t get me wrong—this is a rich and fascinating vein of research. It offers concrete insight into pockets of meaning in language: maxims, scope ambiguity, logical inference—these all matter.
It’s just that none of this gives me the kind of insight I would hope for when such a weird, wonderful, and alien interlocutor shows up at my doorstep. It doesn’t tell me why chatbot prose is (often) coherent, accessible, and most vexingly of all, improves my first drafts. Current linguistic research not only doesn’t tell me, it can’t. The frameworks I learned in the ’90s—generative grammar, truth-conditional semantics and others —were built to explain competence on idealized fragments, not whatever it is that LLMs are doing.
Put simply, here’s what I would love the post-’90s “next big thing in linguistics” to explain:
Chatbots smoothly correct grammar, spelling, and other mechanics of writing. What I don’t understand is why it is so capable of flow. When I ask for a rewrite of an awkwardly phrased sentence, it makes all kinds of adjustments that invariably lead to a better expression of my thought — my own thought! I don’t think it can just be that we consistently prefer what is statistically most likely to follow the last token. Why do its rewrites often sound more authentically like my voice? How can we put this lightning in a bottle and help writers everywhere sound like themselves, but better?
Although Bender et al. and many others argue that the output of LLMs is meaningless, it sure doesn’t feel that way. If this is a mass delusion event — and not like the one that Warzel warns us about — then what is going on? Hand-waving about imputing meaning where there is none doesn’t help. That’s just a restatement of the phenomenon. I want to know why I am under this spell and how I get out!
Writers, teachers, and other users: What puzzles about language and AI do you most want to see answered? What questions would a new linguistic theory answer for you? And linguist friends—help me out. What am I missing? Are amazing things happening and I can’t see them?

My coursework in linguistics happened during the early 1970s just as transformational grammar was sprouting wings. During my doctoral work in language and literacy, most of what I found useful then and still today emerged from Michael Halliday and functional grammar. More recently, James Hudson’s word grammar rooted in default inheritance has helped me think about bot speak (his 2008 book). Charles Fillmores early work on frame theory is useful as well. LLMs operate via syntactic parsing to some degree but the real magic I think comes from the training methods. Picture a bot scanning 500 different streams of text within a functional genre (say, the discipline of history) simultaneously for recurrent linguistic patterns . For 24 hours the bot trains on millions of texts running in parallel streams and finds that historians signal levels of confidence in information depending on whether an analysis is done using primary documents vs secondary documents. Different language patterns show up in historical writing which extends beyond one sentence. The bot writes one sentence at a time, but a word selected for a slot in medias res has affordances that will impact word choices in sentences to appear later in text. It’s not really generating sentence after sentence as separate entities but as parts of larger text structure. Cognitive verbs function differently depending on subject matter. For example, analysis as a mental protocol in tge context of poetry is associated with much different words, phrases, sentences, and text structure than in, say, conducting an autopsy. Of course bots produce meaningful texts. How do I know? I can make sense of it. The difference is the bot can’t. That doesn’t make it meaningless. Here is where Charles Fillmore comes in handy. I also have found tremendous help from Otto Jespersens book from the 1920s on the philosophy of grammar. Words do have fixed meanings available to all of us and to bots. Words also have private lives inside unique utterances (Bakhtin) with centripetal force. The longer a chat goes on in terms of conversational turns, the movement in bot output becomes more attuned to the specific idiolect of the user. That’s why bots can take a muddy half formed idea and figure out what you might be trying to say.