My ebullient 4-year-old son, Blake, is a big fan of the CDs and DVDs that the band They Might Be Giants recently produced for the kiddie market. He’ll gleefully sing along to “Seven,” a catchy tune from their 2008 album “Here Come the 123s” that tells of a house overrun by anthropomorphic number sevens. The first one is greeted at the door: “Oh, there’s the doorbell. Let’s see who’s out there. Oh, it’s a seven. Hello, Seven. Won’t you come in, Seven? Make yourself at home.”

Despite the song’s playful surrealism (more and more sevens arrive, filling up the living room), the opening lines are routine and formulaic. The polite ritual of answering the door and inviting a guest into your house relies on certain fixed phrases in English: “Won’t you come in?” “Make yourself at home.”

As Blake learned these pleasantries through the song and its video, I wondered how much — or how little — his grasp of basic linguistic etiquette is grounded in the syntactical rules that structure how words are combined in English. An idiom like “Make yourself at home” is rather tricky if you stop to think about it: the imperative verb “make” is followed by a second-person reflexive pronoun (“yourself”) and an adverbial phrase (“at home”), but it’s difficult to break the phrase into its components. Instead, we grasp the whole thing at once.

Ritualized moments of everyday communication — greeting someone, answering a telephone call, wishing someone a happy birthday — are full of these canned phrases that we learn to perform with rote precision at an early age. Words work as social lubricants in such situations, and a language learner like Blake is primarily getting a handle on the pragmatics of set phrases in English, or how they create concrete effects in real-life interactions. The abstract rules of sentence structure are secondary.

In recent decades, the study of language acquisition and instruction has increasingly focused on “chunking”: how children learn language not so much on a word-by-word basis but in larger “lexical chunks” or meaningful strings of words that are committed to memory. Chunks may consist of fixed idioms or conventional speech routines, but they can also simply be combinations of words that appear together frequently, in patterns that are known as “collocations.” In the 1960s, the linguist Michael Halliday pointed out that we tend to talk of “strong tea” instead of “powerful tea,” even though the phrases make equal sense. Rain, on the other hand, is much more likely to be described as “heavy” than “strong.”

A native speaker picks up thousands of chunks like “heavy rain” or “make yourself at home” in childhood, and psycholinguistic research suggests that these phrases are stored and processed in the brain as individual units. As the University of Nottingham linguist Norbert Schmitt has explained, it is much less taxing cognitively to have a set of ready-made lexical chunks at our disposal than to have to work through all the possibilities of word selection and sequencing every time we open our mouths.

Cognitive studies of chunking have been bolstered by computer-driven analysis of usage patterns in large databases of texts called “corpora.” As linguists and lexicographers build bigger and bigger corpora (a major-league corpus now contains billions of words, thanks to readily available online texts), it becomes clearer just how “chunky” the language is, with certain words showing undeniable attractions to certain others.

Many English-language teachers have been eager to apply corpus findings in the classroom to zero in on salient chunks rather than individual vocabulary words. This is especially so among teachers of English as a second language, since it’s mainly the knowledge of chunks that allows non-native speakers to advance toward nativelike fluency. In his 1993 book, “The Lexical Approach,” Michael Lewis to Classroom: Language Use and Language Teaching” and “Teaching Chunks of Language: From Noticing to Remembering.”

Not everyone is on board, however. Michael Swan, a British writer on language pedagogy, has emerged as a prominent critic of the lexical-chunk approach. Though he acknowledges, as he told me in an e-mail, that “high-priority chunks need to be taught,” he worries that “the ‘new toy’ effect can mean that formulaic expressions get more attention than they deserve, and other aspects of language — ordinary vocabulary, grammar, pronunciation and skills — get sidelined.”

Swan also finds it unrealistic to expect that teaching chunks will produce nativelike proficiency in language learners. “Native English speakers have tens or hundreds of thousands — estimates vary — of these formulae at their command,” he says. “A student could learn 10 a day for years and still not approach native-speaker competence.”

Besides, Swan warns, “overemphasizing ‘scripts’ in our teaching can lead to a phrase-book approach, where formulaic learning is privileged and the more generative parts of language — in particular the grammatical system — are backgrounded.” Formulaic language is all well and good when talking about the familiar and the recurrent, he argues, but it is inadequate for dealing with novel ideas and situations, where the more open-ended aspects of language are paramount.

The methodology of the chunking approach is still open to this type of criticism, but data-driven reliance on corpus research will most likely dominate English instruction in coming years. Lexical chunks have entered the house of language teaching, and they’re making themselves at home.

Ben Zimmer will answer one reader question every other week. Send your queries to


Full article and photo: