The Evolutionary Mystery Of How Humans First Came To Talk

It is estimated that human beings first began to speak somewhere between 20,000 and 200,000 years ago. If this seems to you like an absurdly unspecific window of time, then you wouldn’t be wrong.

When it comes to analyzing and interpreting the facts of our pre-history — i.e. the age before Homo sapiens sapiens (aka the modern man) were to ever set foot on this earth — it becomes quickly apparent that we will always have far more questions than answers.

However, just because we don’t know something for sure, doesn’t mean that we can’t give it our best educated guess.

Don’t worry, I won’t be the one making any guesses, I’m just here to explain some things and to hopefully entertain you along the way…

A bit of prehistory background

It was in 1857, in the Neander Valley cave, in Germany, that an archaeologist first stumbled upon the remains of an ancient, never-before-encountered type of human skull. The skull he discovered was larger in size than that of the Modern Man’s, indicating primarily a larger brain, although likely one not quite as efficient (1).

Further evidence discovered within and around the surrounding area of the cave pointed towards a few additional conclusions, namely, that this species of man — later given the name Homo neanderthalensis — wore clothes, engaged in small battles or wars with neighboring peoples, participated in communal activities and organizations, shaped tools for construction, cared for the sick and dying members of his family and/or tribe, and even buried the dead, marking their graves with headstones, and likely engaging in some form of religious ritual. All things, bear in mind, that point towards a certain level of (relatively) sophisticated communication.

However, it wasn’t until a man by the name of Philip Lieberman began to study the physiology of the Neanderthal Man, that we understood exactly how relatively sophisticated this speech might have been.

Indeed, Lieberman was able to establish that sounds such as the “ee” in “knee”, and the “oo” in “zoo,” would have been physically impossible for the homo neanderthalensis to achieve, severely limiting their capacity for any sort of precise pronunciation (2). Instead, the use of advanced human language would, therefore, be attributed to the species of man to displace the Neanderthal Man roughly 30,000 years ago — that is, the Cro-Magnons.

The advent of the Cro-Magnon people, gave rise to the earliest signs of modern civilization in Europe, bringing with them vast improvements to the development of handheld tools, the routine harnessing of fire, and, rather pleasantly, the first form of artwork even recorded to exist.

What’s more, is that one key physiological difference — the repositioning of the tongue to allow for the descending of the larynx, and the extending of the neck — meant that this species of man would likely not only have looked and walked liked us, but also have talked in an almost identical manner as we do now.

Hence, the evolution of well articulated speech and language was born. Unfortunately, this is where things begin to get slightly bizarre…

There exists a big mystery surrounding the evolution of the human capacity for language, involving how exactly different peoples, from all over the world, at roughly the same exact time, all spontaneously began to speak.

More specifically, did this remarkably sudden evolution occur as the result of rapid social learning, or possibly the process of natural selection?— if so, then what and where, exactly, is the mechanism that would have been selected for? There are three primary approaches to answering these questions.

1. The Sociocultural Hypothesis

The Sociocultural Hypotheses for the evolution of human language suggests that the capacity for humans to communicate via spoken language arises as a result of empiricism and social learning. It proposes that language is an external phenomenon, and in no way, shape, or form, a genetic one (3).

“The presumption was that our minds at birth were blank slates on to which the rules and quirks of our native languages were written.”

— Bill Bryson, Mother Tongue

This theory of the evolution of human language largely follows a framework originally put forth by a psychologist named Vygotsky.

Vygotsky’s sociocultural theory (SCT), as it were, primarily suggests that, although biological factors are responsible for the cognitive mechanisms that permit the learning of language (i.e. the descending of the larynx within the throat), it is the socio-culture setting, itself, that acts as the sole determinant for the development of sophisticated speech (4).

In this sense, language is not viewed as unique, in that there is no domain-specific mechanism behind it. This perspective places language in the same category as other cognitive or behavioural phenomena that are also established as domain-general, such as voluntary attention, long-term planning, intentional memory, and logical thought (5).

In other words, these things exist purely because we have learned how to perform them.

The rules of grammar and syntax are, therefore, as proposed by the sociocultural hypothesis for language, learned only as a result of conditioning and association by the human brain via domain-general mechanisms.

To recap, here is what the Sociocultural Theory suggests:

that language is a product of social learning that takes places via the senses and an individual’s external experience of the word
there are no features universal to all languages
there is no innate genetic knowledge of language
and that there exists no domain-specific mechanism for the ability to communicate

Criticisms for the SCT

There are issues with this theory, however, and critics are quick to point out that because cultural evolution does not take place in a biological vacuum, it then becomes impossible to separate the language phenomenon from its physiological constraints (6).

Further to this, it seems highly implausible that there would exist no innate appreciation for language. After all, we understand from countless research studies, that children of completely different cultural backgrounds — be it Chinese, Norwegian, English or Arabic — all begin to speak (or attempt to, at least), in more or less the same systematic way (7).

They might start with the word, “me,” transition to “me want,” and, as they continue to grow, end up somewhat to forming a full sentence along the lines of, “me want this food now.” Thus, it makes sense, that many historians and psychologists, alike, take up issue with the sociocultural theory as the end all be all explanation for the evolution of human language.

And so the mystery continues…

2. The Nativist Perspective

In stark contrast to the SCT, the Nativist Perspective maintains that, in order for humans to have developed the capacity for language, some innate, biological endowment of this ability must have occurred.

Thus, instead of language acquisition existing as a one-way process — i.e. information from the external linguistic environment going into the brain — the Nativist Hypothesis suggests that language acquisition is two-way process, whereby speech content in the environment interacts with innate grammar in the brain in order to result in the human faculty for language.

“People know how to talk in more or less the same sense that spiders know how to spin webs.”

— Steven Pinker, The Language Instinct

Noam Chomsky, a linguist and renowned historian, famously argued, not only that certain structural facets of speech must be genetic, but also that all languages must share particular universal characteristics that are acquired via domain-specific mechanisms within the brain (8).

Chomsky found evidence to support the notion that language — and subsequently a Universal Grammar (UG) — is “programmed,” so to speak, based on the fact that by the end of the first month of a child’s life, they have already formed a natural preference for speech sounds, over all other types of sound, regardless of what language is being spoken and whether or not it is native to them.

But, if language is to be considered a a domain-specific mechanism, then where exactly in the brain does it exist?

Research suggests, in the Broca’s and Wernike’s areas. These are areas of the brain where, respectively, science has discovered that the production of coherent speech, and the comprehension of language, occur.

We have a domain-specific mechanism ladies and gentlemen, I repeat, we have a domain-specific mechanism.

More like, at least we believe we do, for now.

The Nativist Perspective doesn’t quite solve all the confusion, however, because within this perspective there are two main schools of thought that remain at odds with one another. These are, notably, The By-Product Hypothesis and The Adaptationist Hypothesis.

2.1 The By-Product Hypothesis

The By-Product Hypothesis differs dramatically from the Sociocultural Hypothesis, in that it implicates the human capacity for language as a by-product or co-occurrence of the process of natural selection.

A man named Stephen Gould was one of the first to expand upon this, pointing out that, although the development of a larger brain in humans was necessary for the biological capacity of language acquisition to occur, it makes the most sense for language to have arisen as a natural result of the complexities involved in the operation of the brain, not necessarily for it to have come as a result of natural selection, itself (9).

What’s more is that Gould believed language first developed not as a means of communicating with others, but as a way to maintain an inner dialogue with oneself.

In this sense, he states that, somewhere along the line, a function shift must have occurred so that the original evolutionary function of the brain was able to be surpassed, and that the spandrel of human language — i.e. the off-shoot consequence of having a bigger brain — could be “co-opted” out of later utility for cooperative purposes (10).

The concept of functional shift is not a novel one by any means, and was first outlined by Darwin as a way to account for the incipient stages of useful structures, that could not be explained by continuous evolution. (A prime example of this shift can be seen in the early development of the animal wing, in that, where no primary benefit can be pin-pointed, it’s theorized that this original structure must, therefore, have performed some other primary benefit, before being later coopted for aerodynamic advantage (11)).

However, the By-Product Hypothesis goes one step further, by rejecting Darwin’s stipulation that these spandrels must have offered some form of utility.

Instead, it’s argued that, because multi-cellular organisms are intrinsically complex, primary adaptations must therefore give rise to a variety of chain-reaction consequences — i.e. spandrels — with the frequency and complexityies of these by-products increasing with the level of sophistication of the organism in question (12).

Said differently: the human, being one of the most complex organisms out there, must therefore find within its evolutionary history, a series of highly impressive spandrels that exceed any and all functional relation to their primary adaptations.

Whew…does your brain hurt yet?

Here’s a recap of what exactly the BPH proposes:

that there are universal features shared by all languages
there is some form of innate genetic knowledge of language
there exists a domain-specific mechanism for the ability to communicate
that language is exaptive (meaning a by-product of a naturally selected mechanism), not adaptive (meaning primarily selected for in and of itself)
that language as a by-product exists as a natural result of human complexity
and that language evolved first for inner thought, and was only later co-opted in order to communicate with others

Criticisms for the BPH

When it comes to anything pre-history related, it might be better for us all to just go ahead and accept that nothing will truly ever be as straightforward as we might like it. Naturally, then, it will come as no surprise to you that there are a few key problems with the By-Product Hypothesis.

First and foremost, the observation has been made that there are currently no other known processes for the development of organic, complex, and functional structures, aside from primary natural selection. In other words, the presumption that the human capacity for language (an organic, complex, and domain-specific, functional structure) developed as a miraculous and spontaneous by-product of the natural selection for a bigger brain, can more or less be categorized as wishful thinking.

Realistically, the only evolutionary by-products we know of to have occurred spontaneously are entirely non-organic, non-complex characteristics, such as having a belly-button, or the fact that bones and teeth are the colour white — both of which are completely useless features.

It makes sense, then, why the idea that we, humans, randomly got lucky enough to be the first, and last, recipients of a by-product as complex and useful as language, doesn’t exactly hold up to a high degree of scientific reasoning.

We are left, then, only with one last standing theory.

2.2 The Adaptationist Hypothesis

Although both the By-Product Hypothesis and The Adaptationist Hypothesis implicate natural selection in the evolution of the faulty of language, the latter maintains that in order for this notion to have any validity, language must, then, result as a direct consequence of adaptive selection for purposes of communication (13).

In addition, the Adaptationist Hypothesis indicates that, just as the neural mechanisms for vision arose as an adaptive response to stimuli of the visual environment, so it follows that the formation of a Universal Grammar would arise as an adaptive response to stimuli of the linguistic one (14).

The phenomenon highlighted within the literature as being the most plausible for supporting this process is referred to as the Baldwin effect.

The Baldwin effect opposes both the Sociocultural and the By-Product Hypotheses by indicating that this form of natural selection may very well have occurred as a trait developed over the course of one individual’s lifespan.

The way this process would work is by gradually becoming encoded into the genetics of the organism’s offspring across many generations, until, finally, the need for the original environmental stimuli would be no more. Individuals who acquired the trait faster, would, in turn, demonstrate a selective advantage, thus facilitating the natural selection of language (15).

We’re nearly to the end, thanks for hanging in there with me!

Here’s a recap of exactly what the AH proposes:

there exists a domain-specific mechanism for the ability to communicate
that language is adaptative, not exaptive, and therefore comes as a result of primary natural selection

The talking takeaway

Interestingly, some further support for the Adaptationist Hypothesis arises when the maladaptive qualities related to the human capacity for language are considered.

As we already know, in order for humans to have gained the ability to speak, it was required, that the tongue move from its original location, allowing for the larynx to descend in the throat. While these evolutionary changes were once needed in order for spoken language to be conceived, they also meant that food must pass by the larynx, which — safe to say — substantially increased the risk of chocking to death for early, Heimlich maneuver-less humans (16).

This fact, therefore, implies that the adaptive benefits of language development, would need to have significantly outweighed the disadvantages present where survivability was concerned.

So, it is actually possible that the need for humans to talk to each other somehow managed to outweigh the need for us to be able to eat without choking to death?

I suppose you’ll have to answer that question for yourself, because, right now— if it could — science would be shrugging.

Alexandra Walker-Jones — February 2021

Text References: