Scholars seek a lingua franca for linguistics research

August 20, 2015

Over time, English has swirled into dialects so different that speakers from the same country cannot always understand each other. Similarly, linguists – as they have catalogued words, spellings, pronunciations, and meanings – have stylized their individual academic databases to suit the needs of their own research.

In an age of computational linguistics, that can be a problem. Computers offer vastly improved capabilities for finding patterns and connections. But while human brains are good at smoothing over minor inconsistencies, computers tend to be very literal.

And data that can’t be understood can’t be part of the conversation. “Because of the large quantities of data that can be brought to bear on a problem, for many studies occasional data quality issues are not fatal,” explains SFI Professor Tanmoy Bhattacharya, who leads SFI’s linguistics program. But, he says, “the next advance in linguistics will need to understand weak signals or complicated histories deep in the data, and in these situations data issues will be very important. We will need to understand how the data being used are selected, curated, and presented.”

Further, language databases will need to adopt coding conventions that allow them to talk to one another. “We need to develop a lingua franca for all linguistics databases to speak,” he says. “Whatever way databases organize their own data, or speak their own internal dialect, we should be able to translate them all into something universally understandable and answer queries using the same code all others use.”

Bhattacharya, SFI Distinguised Fellow Murray Gell-Mann, and longtime SFI collaborator George Starostin are hosting an invitation-only working group this week at SFI to address this challenge. Conventional and computational linguists will evaluate existing relevant online and offline databases, explore optimal data formats, and discuss– perhaps even establish – the most useful programmed analysis tools for historical linguistics research.

“What is going to come of this is the preparation to enable the next big advance in computational linguistics,” Bhattacharya says.

Scholars seek a lingua franca for linguistics research

August 20, 2015

Share

News Media Contact

Santa Fe Institute

Tags

Related Projects

More SFI News

In memoriam: Daniel C. Dennett

New Book: The time for complexity economics has come

Karen Willcox Winner of the 2024 Theodore von Kármán Prize

Tim Kohler to deliver Linda S. Cordell Lecture

To accelerate biosphere science, reconnect three scientific cultures

Mirta Galesic receives prestigious ERC Advanced Grant

Carlo Rovelli receives 2024 Lewis Thomas Prize

Research News Brief: Defining a city using cell-phone data

Complexity tools for USDA nutritional guidelines

Quantifying the potential value of data

Carlo Rovelli joins SFI's Fractal Faculty

New book offers thoughtful approach to modeling complex social systems

Research News Brief: A test of AI “personalities” and behavior

Study: To make sense of history, embrace uncertainty

Study: Predicting steps in a random process

Embodied intelligence & a sense of self

How to track important changes in a dynamic network

African and South Asian students build new connections during inaugural Complexity Global School

New gifts support SFI Education and Postdoctoral programs

The cultural evolution of collective property rights