How language families are identified: The comparative method

Indo-European language family
https://en.wikipedia.org/wiki/Indo-European_vocabulary

Semitic/Afroasiatic language family
https://en.wikipedia.org/wiki/Semitic_languages
https://en.wikipedia.org/wiki/Afroasiatic_languages

So we’ve looked at this map of the world’s major language families, and I just said “Look at how all these far-flung languages come from a common ancestor.” But it does lead to the question of how anybody knows this.
Certainly, it would not be obvious that many of these very distant languages have any kind of shared history at all.
Now, one of the most basic methods that has been used to identify language families is called the comparative method.
So here, let’s take a look using the example of the Indo-European language family.
And here’s an example of some of these giant tables that we see put together.
And so just imagine giant poster-sized sheets of paper, just with columns and columns.
And what you do in the comparative method is essentially make each row is a word, and then each column represents a different language that you think, you know, it might be related somehow.
Of course if you put a language that’s not related, you’ll see that it’s not going to match up.
It’s always going to be an exception.
But when you put languages that actually are related together in a table like this, they start to show up some very surprising correspondences.
Now, words that have the same ancestor, that are you could say are sister words or cousin words, they’re called cognates, meaning “born together”.
And so these are called cognate sets, trying to establish these sets of words that can be said to descend from the same word.
So ignore that column on the left there, because that’s showing the results of the work here.
But let’s look at the, you know, just imagine just starting to put together a table like this, where you see OK, here’s the English word ‘mother’, and of course it’s showing that it comes from old English mōdor.
OK, so you have mōdor.
And then the Gothic language, an East Germanic language, mōdar.
And then you see Latin māter, and Ancient Greek mḗtēr, as in we see an example of the goddess Demeter, and in the word ‘metropolis’ meaning a mother city.
So you start to find these patterns.
And as with anything in, you know, the world of statistics, you know, when you find a few of them you can start to think, well, you can think maybe these are simply coincidences.
After all, coincidences do happen.
And when you find words that appear to be related but are not, they’re called “false friends”.
But when you see enough of them, you start to think that, you know, it becomes harder to imagine that it is just a random coincidence, because the coincidences just start lining up too much.
Not only that you see a parallel, but that the parallels follow a consistent format, where, you know, certain sounds in one language are matched with certain sounds in other languages.
For example, here you see for both mother and father, you know, Gothic goes mōdar, fadar, so the ‘th’ sound in English corresponds to the d in Gothic, the t in Latin, and also the t in Greek, and as well in Sanskrit, as well.
And you can continue with these other major languages that are all related in the Indo-European family.
And this, really, you can just imagine the the excitement that would have happened in the early days of the these correspondences being developed, because it really does seem amazing that these completely – assumed to be – completely different languages, you have English, and you have Latin, Greek, and Sanskrit, three of the great languages of the ancient world, that they all happen to, they seem to be related.
And so Sanskrit language of ancient India, Greek and Latin, languages of the ancient Mediterranean, all seeming to share these correspondences.
So yeah, you can see the same pattern continue, even with brother and sister as well.
So you see for here, for brother, again you notice the t again, the ‘th’ sound in English corresponds to the t.
And you also see that, you know, the f, OK, here the f in Latin corresponds to the b in English and Gothic, so the the two Germanic languages here.
You see the pʰ in Greek, the bʰ in Sanskrit.
These, you know, if you just see one of them, you wonder, but when you see this repeated over and over, OK, you’re going to see, like you know, when you see a bʰ in Sanskrit, it’s going to be a f in Latin.
You see this repeated over and over, and you can start to imagine the sorts of sound changes that might have happened.
So you start to reconstruct backwards.
And it is surprising how much of this whole idea of these, you know, proto-languages, and Proto-Indo-European being this shared ancestor language, it’s entirely reconstructed.
There’s no direct evidence for it existing as a language.
But you just start with these cognate sets, this table of correspondences, and then you imagine, you start to rewind the changes that might have happened, and imagine, you know, what would the original or the earlier language have to have been like in order to lead to this particular pattern of daughter languages?
And that’s where you can end up creating this column, after a lot of discussion and trying different hypotheses and so on, and the debates on many of the details are still ongoing.
But it’s proposed that, you know, if you have an original Proto-Indo-European form something like méH₂tēr – and this is described as “h number 2” because there’s a few different h’s that are imagined in PIE, and the exact details of them are disputed – so then but this could be, like if you imagine this as the protoform, then you could actually draw out these changes that happen to gradually end up with the words that we see, that we know exist in later languages.
For example the t here, OK, we say, well, that t becomes a th, you know, for English, it becomes a d for Gothic, and so on.
And you see here for father, it’s imagined that the original, the earlier form, was a p.
So the p became a f in the Germanic languages, English and Gothic.
And, you know, you can see different, right here you see even in Russian, the p disappeared entirely, in Slavic.
So you start, like it really is hard to believe that you can work all this out with just looking at a few words.
But as you keep adding more words, these giant tables, just going on and on and on, and more and more words added showing repeated patterns, you start to really get a picture that becomes possible to test whether it really makes sense.
And eventually, you get so comprehensive a picture with so many different correspondences and a clear system of, you know, this must have changed to this, you know, in this earlier language, and that’s where you start to get all that tree worked out.
OK, these were the same language at some point, and then they branched.
One language made this change, another language didn’t.
It made another change.
And you start to see how this whole tree forms.
And you continue to test that against the actual evidence you have, the only real evidence you have, which is the evidence of actual known languages, through written records and so on.
And from there, you can really reconstruct a very probable picture.
As much as we can say, we don’t really know, there’s no direct evidence, but it seems to be a highly probable picture of some kind of earlier form of a language that would have had many of these features.
And so we see this in many different areas.
The vocabulary can be taken from all over the language.
Here’s examples for the pronouns.
Like you see I is, you know, matched with you have ik in Gothic, and that’s with egō in Latin, and in Greek as well, from ahám in Sanskrit, and so on.
So really, the correspondences are really quite surprising, and they just go on and on.
I find one of my favorite examples is the number two, because you can see how, you really see how in so many different languages, the number two has this kind of common form.
And even as it’s spelled in English, you have two, or Old English twā, and you can see Latin duo.
So if you imagine two and duo, you know, when you look at the way the words are pronounced now, you think two and duo, OK, they’re quite different.
But you know, looking down into the the past of each word, imagining this is twā, this is actually pronounced two, and this is duo, you know, and then in Sanskrit it’s dvā́.
So you can see how these could possibly be related together.
And then it’s all tracked back to imagining this Proto-Indo-European dwóH₁, and also a neuter form, different gender forms as well.
So these tables go on and on, and you eventually build a picture of what is almost certainly going to be this language family.
And that is a picture of the Indo-European language family.
And really, that’s the main method for reconstructing these language families, any of them.
I can show another example from the orange here, the Afro-Asiatic language family, and here one of the main sub-branches of it is the Semitic language family, best known for including Arabic and Hebrew.
So here we see some examples of shared words.
For example, you have bayt and bet.
You see all these words that are something like bet, which is traced back to this Proto-Semitic word bayt.
And this is even the origin of the letter beta, which is our letter b, originally was a little picture of a house, a little floor plan of a house, and that comes from this word.
That’ll be a whole other story, the origin of our writing system.
And you see, you know, for example, here the word šalām, Arabic salām, hebrew šālôm, and these other, the ancient Akkadian language you see it as well.
Ge’ez, language of ancient Ethiopia, Mehri səlōm, language of South Arabia, Maltese, a later language.
So all of them sharing consistent sound patterns that can be traced back to this Proto-Semitic word.
And then, of course, Semitic being the major branch of Afro-Asiatic, you can then take Semitic and imagine that as just being one branch here, and now you can combine it with other languages, including Ancient Egyptian, which appears to be a related language.
Although, you know, the further out you get into these bigger families, the harder it is to make clear correspondences.
But here you see Chadic, a language of northern Africa, Cushitic, Omotic, and Berber, more languages of north and northeast Africa.
And especially Ancient Egyptian and Semitic, which is a fascinating combination.
So here you can see regular patterns, some of these quite difficult to pronounce.
But, you know, for example, you would see, you know, here we have die, mwt.
Well, it’s also mwt.
Līs, lsn for tongue.
Some of these cases, these changes are much bigger, so it requires going much deeper to try to figure out the match.
And as well, there’s also the issue that for many of these words, there isn’t always a matching word.
Because in many cases, the word itself can change.
We can start to use a different word for the same thing.
A famous example is the word ‘dog’ that came into English.
It’s not really used anywhere else.
In other languages, it would have been the word ‘hound’.
So we would have had, in Old English, we had the word ‘hound’, and you see hund, related to this canis, kunós.
And so this word for dog.
But in English, we started using the word ‘dog’.
OK, so that completely replaced the word ‘hund’.
There’s no genetic relationship, there’s no correspondence between the English word ‘dog’ and these other words like hund.
So that’s just a completely different word.
So here we see that, you know, that’s where you see the blanks in this table, that there isn’t always a matching word there.
But when you put it all together, you eventually get enough links to think that this probably is related.
Although you can see here, like Omotic here, starting to wonder, you know, do you really have enough here?
You have one word.
Based on this chart, I would not be very convinced about Omotic being part of the family.
But you start to see, you know, some languages are easier than others to show the parallel.
And even based on this chart, you know, it’s not quite as clear and convincing as we saw, you know, in tables like you see here.
But if you just keep adding more of these cognate sets, building an even clearer picture of the sound correspondences, you can start to get a picture of an earlier protoform of the language that could be plausibly and probably expected to have led, through regular sound changes, splits along the way, as people diverge and languages split, to lead to the present languages we know.
And that’s really the foundation of the evidence for why languages are considered to be part of families.

Leave a Comment Cancel Reply