Why isn't language more iconic?
If non-arbitrariness helps with word leaning, why are languages still mostly arbitrary?
Last month, I wrote about iconicity: cases where the form of a word (how it’s spoken or signed) resembles its meaning. This includes obvious examples, such as onomatopoeia (e.g., “meow”), as well as less obvious examples like ideophones. As I mentioned in that article, one account of why some words are iconic (as opposed to arbitrary) is that iconicity appears to scaffold word learning: in general, iconic words seem to be learned earlier and more easily.
This observation flips the initial question on its head: if iconicity is so helpful, why isn’t language even more iconic? Why do we bother with arbitrary symbols in the first place? Answering that question is an important step toward addressing a broader, more fundamental question in Cognitive Science: why do languages look the way that they do?
There are a few potential accounts in the literature, though no definitive answers. In this post, I’ll work through each of those accounts in turn. Importantly, these accounts are not mutually incompatible—they just place different emphases on different mechanisms.
Arbitrariness through conventionalization
One account comes from thinking about how communicative signals are conventionalized over time, and in particular through repeated interactions within a particular community.
Suppose you’re playing a game of Pictionary with a friend, and you have to convey the concept “cartoon” through a drawing. Perhaps you draw something like a rather detailed cartoon bunny, and your friend eventually guesses the correct answer. Now suppose you’ve got to communicate the same concept to your friend later in the game: you could draw something equally detailed, or you could simplify your initial drawing, perhaps by identifying a few key features (e.g., floppy ears). Over multiple iterations, you and your friend might converge on an increasingly conventionalized representation of the “cartoon” concept. By the end of the game, a newcomer might have a hard time determining what this representation is meant to convey—but because you and your friend have a shared communicative history, the symbol means something to both of you.
In fact, that’s exactly what Garrod et al. (2007) found in an experiment that followed roughly this design. Pairs of participants were tasked with communicating various concepts (from a small set of 16 concepts total) to each other by drawing them, as in the game Pictionary. Crucially, participants communicated about the same concept multiple times over the course of the experiment, allowing researchers to ask how both graphical depictions and communicative success evolved over the course of multiple interactions. One such example is depicted in the figure below.

Perhaps unsurprisingly, accuracy improved over the course of the study. What’s more notable is that the graphical complexity of participants’ depictions—measured using something called Perimetric Complexity—decreased, in line with the example provided above. That is, pairs of participants seemed to converge on simpler depictions over repeated interactions. The only condition in which this didn’t occur was when participants were not given feedback about whether their partner guessed successfully: in that condition, graphical complexity actually increased.
Drawing on the semiotic theories of C.S. Peirce, the authors differentiated between iconic signs (in which there is a salient perceptual relation between the sign and signified) and symbolic signs (in which the relation is arbitrary).1 Symbols relate to their referents not through some inherent property of the symbol’s form, but because of their habitual or conventional use as signifying that referent. The authors write:
Over an extended sequence of interactions, we hypothesize there is a shift of the locus of information from the sign itself to the communicators' representations of the sign's usage. At the beginning of the ‘pictionary’ experiment communicators need to be able to recognize the object from the sign and they can only do this through the visual resemblance between sign and object. So, the sign functions iconically. Sometime later in the interaction, the same concept needs to be communicated. Now, a much-reduced drawing serves to secure recognition. What differs is the shared experience of the interaction that the participants now possess…The more structured the interaction history behind the sign, the more symbolic it can become.
If we had to construct a novel communication system every time we interacted with someone, that system might have to be very iconic. Fortunately, however, communication is not one-shot: each interaction inherits structure from the shared history of interactions, either between a specific pair of interlocutors or other community members more broadly. That means that in principle, the form of a sign can be decoupled from its meaning.
Why might such decoupling be useful? One obvious answer is efficiency: producing a fully iconic representation of a referent is difficult and time-consuming—a simpler, conventionalized representation is, naturally, much easier and faster to produce. All else equal, it stands to reason that interlocutors would prefer an efficient communicative system over an inefficient one. Indeed, there is ample evidence (like this more recent paper led by Robert Hawkins) that repeated interactions with the same partner lead to increasingly efficient pictorial communicative systems.
Another (complementary) answer is that this decoupling allows for other hallmarks of linguistic structure to emerge, such as compositionality. For example, this 2019 study led by Yasamin Motamedi asked participants to communicate about various concepts (e.g., a firefighter or barber) using their hands. Crucially, concepts varied along two dimensions: their function (i.e., whether it was a person, a location, an object, or an action) and their theme (e.g., food vs. religion). This experimental manipulation produced a range of concepts for each theme (e.g., the “food” theme included chef, restaurant, frying pan, and to cook) and for each function (e.g., “person” could include chef, vicar, and more).
From an experimental standpoint, the differentiation by both function and theme was very important: there were now multiple semantic axes along which to identify a referent. In principle, then, it might be important to develop signals that differentiate among the functions within a theme (e.g., chef vs. to cook), beyond simply differentiating between broad thematic categories (e.g., religion vs. food).
Indeed, that’s exactly what participants did. Over the course of multiple interactions, communicative systems both decreased in complexity (as in previous work) and increased in compositionality: that is, rather than using a single, continuous gesture to refer to a concept like chef, participants decomposed the concept into its theme (food) and function (person). The result was a system that looked a lot more like language than mere pantomime: regular markers for different functional and thematic axes, which could be flexibly combined to produce novel symbols. For example, rather than having a specific marker for vicar, the marker for person could be combined with the marker for religion to refer to that concept.
Strikingly, this evolution of systematic compositionality is conceptually similar (if on a much smaller scale) to what’s been observed in emerging sign languages, such as Nicaraguan Sign Language (NSL2). Even the earliest generations of NSL—and even the individualized home sign systems that preceded these—were much more systematic than, say, spontaneous gestures produced by hearing participants; NSL signers rapidly converged upon a compositional system that enabled greater productivity and communicative efficiency.
Conventionalization, then, can produce communicative systems that are both more efficient and more compositional than novel, purely iconic ones. These conventionalized systems are also much more arbitrary, even if some of their signs could be traced to iconic origins. Thus, arbitrariness is not directly “selected for”, but rather emerges as a result of several other selection pressures (i.e., for efficiency and compositionality) and a relaxing of another pressure (i.e., for iconicity).
Arbitrariness, flexibility, and abstraction
A related (though distinct) account is that arbitrariness unlocks additional flexibility in signaling: forms no longer have to correspond directly to their meanings. This flexibility is important for a number of reasons, foremost among them that it allows signs to convey more abstract meanings like “happiness” or “idea”. The latter argument was presented in a 2018 article by Gary Lupyan and Bodo Winter, and hinges on a couple of key premises.
The first is that language is more abstract than you’d think. “Abstract” words are generally defined as those referring to meanings that can’t be experienced directly; this is in contrast to “concrete” words, which refer to things or actions in reality, i.e., things you can experience directly. Of course, this difference is not binary: concreteness and abstractness likely exist on a continuum, with some meanings being more concrete than others. Researchers like Marc Brysbaert have created massive datasets of human judgments about the concreteness of words in English and other languages; these datasets allow other researchers to compare the concreteness of words like “table” vs. “justice”.
In their study, Lupyan and Winter started by considering canonically abstract words like “justice” or “freedom”. Because these are classic examples of abstract words, we can use them as a reference point to gauge how abstract other words are. A surprisingly large number of words are rated as more abstract than those canonical examples, and those words tend to be quite frequent. In fact, if words are selected randomly from an actual corpus of language use:
…we discover that we have a 59% chance of selecting a word that is above the median level of abstractness (M = 2.15). Example words in this part of the concrete/abstract distribution are extrovert, uncomfortable, innovating, immodest and flamboyant.
And by the time one has selected five words at random, there is a very good chance (>75%) that at least one of them is at least as abstract as a word like “freedom”. That is, lots of words are actually very abstract, or at least rated as such, especially when we consider “function” words like “the”. Because some of those abstract words also happen to be used very frequently (especially words like “the”!), we encounter abstract concepts all the time.
The second premise is that iconicity limits abstraction. In a purely iconic language, each sign should resemble its meaning in some way. But if abstract meanings are defined as those that can’t be experienced directly, how could we refer to them iconically? The authors write3:
To illustrate, consider again the word fun. Despite being abstract, one can imagine ways in which this word could be more iconic. In a signed modality, this could take the form of imitating a prototypical activity such as dancing (a student suggested ‘jazz hands’). In the vocal modality, the iconicity could incorporate phonological characteristics common to laughs or cheers. Note, however, that in doing so, the word form necessarily resembles a particular type of fun rather than a more abstract and generalizable idea of fun. This is because iconic depiction is always selective—only particular aspects of a word's meaning are expressed iconically.
In their view, iconicity ties a word too closely to a specific, concrete meaning, preventing people from talking about more general, abstract meanings.4 This argument is hard to prove, but there is some empirical evidence consistent with it. For example, words that are rated as particularly iconic also tend to be rated as more concrete: that is, iconic words are less abstract. Iconic words also exhibit less contextual diversity in their usage, i.e., they are more closely tied to specific contexts.
Together, these premises point to an intriguing explanation for why language is so arbitrary: people use language to talk about abstract concepts, and arbitrariness gives language the necessary flexibility to do that. In this account, arbitrariness is more explicitly a design feature that’s positively “selected for”, as it confers a direct benefit (in contrast with emerging as a byproduct of other selection pressures).
Wherefore arbitrariness?
We began with the question of why language isn’t more iconic, given iconicity’s apparent benefits for language acquisition. I’ve presented two distinct (but compatible) views: first, that arbitrariness emerges as a byproduct of other processes, such as conventionalization and a pressure for efficiency and compositionality; and second, that arbitrariness unlocks additional flexibility in signaling and allows speakers to communicate about more abstract concepts that would be harder to express in a purely iconic language.
Questions about why language looks the way it does are, of course, very difficult to answer. We can’t turn back the clock and run experiments that test how different factors affect the distribution of languages and language features we see in the world. Instead, we have to rely on convergent evidence from computational models, laboratory experiments, and empirical descriptions of how language is used. This isn’t so different from addressing “why” questions in other domains, such as evolutionary biology (e.g., about the evolutionary function of specific traits), though the inherent fuzziness of constructs in language and cognition might make it even more challenging.
Still, keeping these caveats in mind, both accounts do seem compelling. They don’t explain everything—where does our drive for compositionality and abstraction come from, for instance?—but they do situate the notion of arbitrariness within a network of plausible causal factors. That seems like a property of a good explanation to me.
Peirce also described indexical signs, in which the sign relates to the referent in some causal way (e.g., smoke is an index of fire).
Or ISN in Spanish (Idioma de Señas de Nicaragua).
Note that I’ve removed the linked citations from this quote.
A counterargument to this would be that people reliably use concrete language to talk about abstract concepts. This is the premise of conceptual metaphor theory, which holds that much of our understanding of more abstract concepts is grounded (metaphorically) in embodied experience.