“. . . no matter which language you take, salt is close to seawater.” Above: salt farmers harvesting salt, Pak Thale, Ban Laem, Phetchaburi, Thailand. Photo: J. J. Harrison

When a student of German offers you ein Gift, it’s fairly safe to say that he didn’t mean to threaten you with poison. A newcomer to Italian might wonder why lamb’s wool is considered morbido, while in Spanish, the beginner who has misspoken now confesses to be embarazada — pregnant.

These false cognates — falsche Freunde, faux amis, or “false friends” — abound in any two languages, particularly those that are closely related. The bane of the language-learner, however, is a goldmine for linguists, cultural evolutionists, and computer scientists, a group of whom will meet at SFI Aug. 27–28, 2018. Given the messy state of linguistic affairs, they ask, is it possible to quantitatively encode “meaning” independent of any particular language?

“If I tell you all of the contexts in which a given word fits, you probably have a pretty good sense of what its ‘meaning’ is,” explains SFI External Professor Tanmoy Bhattacharya (Los Alamos National Laboratory), a co-organizer of the working group. But it is a far cry from that to quantifying the linguistically salient measures of distance between meanings. Nevertheless, “It seems that all over the world, people have similar notions of [linguistic] distance. For example, if you didn’t have a word for ‘salt’ and wanted to use a word, you say ‘salt’ is made by ‘drying seawater.’ And it turns out that no matter which language you take, salt is close to seawater. What can that tell us about how we divide the world into pieces?”

Translation may be one place to start. Rather than going the language-class route, scientists are interested in removing any kind of cultural bias: simply feeding large linguistic datasets to a computer, telling it only what translates, and testing what the machine learns. The computer takes, say, a German sentence, changes it into an intermediate representation as zeroes and ones, and takes that to another language, such as English.

What happens, then, when the German-to-English translation is halted halfway and redirected to French?

“It may turn out this is a pretty good translation from German to French,” Bhattacharya notes. “The zeroes and ones could be powerful enough to go from any language to any other language.” In fact, recent work by Google scientists found evidence for such an interlinguum.

Among the organizers of the working group is External Professor George Starostin (Higher School of Economics, Moscow), a curator of the Evolution of Human Languages (EHL) Project, SFI’s long-standing project for investigating deep-level historical connections between the many linguistic families of the world. “The study of semantic shifts — subtle, gradual changes in word meanings that accumulate over time — is just as important for unraveling language history as is the study of sound change, which has traditionally dominated historical linguistics,” says Starostin. “In addition, historical databanks on semantic shifts, built up in the process of our investigation into the distant past of modern languages, may help shed valuable light on certain universal or culturally conditioned properties of the human mind — letting us understand which types of meanings are more commonly connected in the brain, bringing us even closer to the construction of a universal semantic meta-language.” In between elements of historical and synchronic research, the working group will be using semantic networks and aspects of cultural evolution to, among other things, predict true cognates and break down how meanings shift over time.

Is the goal to perfect Google Translate? Not exactly: the aim reaches far beyond. “This internal set of zeroes and ones is probably a representation of meaning,” says Bhattacharya. “This is an idea that we want to follow.”

External Professors Eric Smith (Tokyo Earth-Life Science Institute) and Peter Stadler (Leipzig University) are also organizing the working group.

More SFI News