Lost in Translation at the Border – USC Viterbi | School of Engineering – USC Viterbi School of Engineering

Many asylum seekers face long waits in Mexico due to a shortage of translators for Indigenous language speakers. Photo/iStock.

Imagine fleeing persecution at home, surviving a difficult journey, arriving in a new country to seek asylum, only to be turned away at the border because nobody speaks your language. This is the reality for hundreds of migrants coming into the United States from remote areas of Central America who do not speak common languages, such as Spanish or Portuguese.

A shortage of translators for Indigenous asylum seekers speaking traditional languages means many must wait for months or even years in Mexico to apply for asylum, creating a long backlog in an already overwhelmed immigration system.

Katy Felkner is developing a machine translation system for Mexican and Central American Indigenous languages to help asylum seekers at the border. Photo/Katy Felkner.

The U.S. immigration system is set up to handle English and Spanish, said Katy Felkner, a Ph.D. student in computer science at the USC Viterbi School of Engineering, but there are several hundreds of people a year who are minority language speakers, in particular, speaking Indigenous languages from Mexico and Central America, who are not able to access any of the resources and legal aid that exists for Spanish-speaking migrants.

In other cases, people are unable to explain the threats to their lives in their hometowns, which could be the basis for asylum. When migrants cannot understand or be understood, there is no way to establish the threat to their safety during a credible fear interview conducted by the U.S. Department of Homeland Security.

The statistics are staggering: asylum-seeking immigrants without a lawyer prevailed in only 13 percent of their cases, while those with a lawyer prevailed in 74 percent of their cases, according to a study in the Fordham Law Review.

Felkner, who conducts her research at the USC Information Sciences Institute (ISI) under Jonathan May, a research associate professor, is working on developing a solution: a machine translation system for Mexican and Central American Indigenous languages that can be used by organizations providing legal aid to refugees and asylum-seekers.

People are being directly adversely impacted because there arent interpreters available for their languages in legal aid organizations, said Felkner. This is a concrete and immediate way that we can use natural language processing for social good.

People are being directly adversely impacted because there arent interpreters available for their languages in legal aid organizations. Katy Felkner.

Felkner is currently working on a system for Kiche, a Guatemalan language, which is one of the 25 most common languages spoken in immigration court in recent years, according to The New York Times.

Were trying to provide a rough translation system to allow nonprofits and NGOs that dont have the resources to hire interpreters to provide some level of legal assistance and give asylum seekers a fair chance to get through that credible fear interview, said Felkner.

Felkners interest in languages began during her undergraduate degree at the University of Oklahoma, where she earned a dual degree in computer science and letters, with a focus on Latin. During her first year of college, she worked on a project called the Digital Latin Library, writing Python code to create digital versions of ancient texts.

Thats what got me thinking about language technology, said Felkner. I taught myself some basics of natural language processing and ended up focusing on machine translation because I think its one of the areas with the most immediate human impact, and also one of the most difficult problems in this area.

While Felkner and May are currently focused on developing a text-to-text translator, the end goal, years from now, is a multilingual speech-to-speech translation system: the lawyer would speak English or Spanish, and the system would automatically translate into the asylum seekers Indigenous language, and vice-versa.

Translation systems are trained using parallel data: in other words, they learn from seeing translation pairs, or the same text in both languages, at the sentence level. But there is very little parallel data in Indigenous languages, including Kiche, despite it being spoken by around one million people.

Thats because parallel data only exists when there is a compelling reason to translate into or out of that language. Essentially, said Felkner, if its commercially viableDisney dubbing films from English to Spanish, for instanceor stemming from a religious motivation.

In many cases, due to the influence of missionaries throughout Latin America, the only parallel data sourcethe same text in both languagesis the Bible, which doesnt give researchers much to work with.

Were really trying to push the lower bound on how little data you can have to successfully train a machine translation system. Katy Felkner.

Imagine youre an English speaker trying to learn Spanish, but the only Spanish youre ever allowed to see is the New Testament, said Felkner. It would be quite difficult.

Thats bad news for the data-hungry deep learning models used by language translation systems that take a quantity over quality approach.

The models have to see a word, phrase, grammatical construction a bunch of times to see where its likely to occur and what it corresponds to in the other language, said Felkner. But we dont have this for Kiche and other extremely low resource Indigenous languages.

The numbers speak for themselves. From English to Kiche, Felkner has roughly 15,000 sentences of parallel data, and 8,000 sentences for Spanish to Kiche. By contrast, the Spanish to English model she trained for some baseline work had 13 million sentences of training data.

Were trying to work with essentially no data, said Felkner. And this is the case for pretty much all low-resource languages, even more so in the Americas.

One tactic in existing low-resource work uses closely related, higher resource languages as a starting point: for instance, to translate from English into Romanian, you would start training the model in Spanish.

But since Indigenous languages of the Americas developed separately from Europe and Asia, the majority are low resource, and most of them are extremely low resource, a term Felkner coined to describe a language with less than around 30,000 sentences of parallel data.

Were really trying to push the lower bound on how little data you can have to successfully train a machine translation system, said Felkner.

But Felkner, with her background in linguistics, was undeterred. Over the past two years, she has worked on creating language data for the models using some tricks of the trade in natural language processing.

One tactic involves teaching the model to complete the abstract task of translation and then setting it to work on the specific language in question. Its the same principle as learning to drive a bus by learning to drive a car first, said Felkner.

To do this, Felkner took an English to Spanish model, and then fine-tuned it for Kiche to Spanish. It turned out, this approach, called transfer learning, showed promise even in an extremely low resource case. That was very exciting, said Felkner. The transfer learning approach and pre-training from a not-closely-related language had never really been tested in this extremely low resource environment, and I found that it worked.

She also tapped into another resource: using grammar books published by field linguists in the mid-to-late 70s to generate plausible synthetic data that can be used to help the models learn. Felkner is using the grammar books to write rules that will help her construct syntactically correct sentences from the dictionaries. The technical term for this is bootstrapping or data augmentation or colloquially, fake it til you make it.

We use this as pre-training data, to essentially teach the models the basics of grammar, said Felkner. Then, we can save our real data, such as the Bible parallel data, for the fine-tuning period when it will learn whats semantically meaningful, or what actually makes sense.

Finally, shes testing a technique that involves parsing nouns in the English and Kiche sides of the Bible, replacing them with other nouns, and then using a set of rules to correctly inflect the sentences for grammar.

For example, if the training data has the sentence: the boy kicked the ball, the researchers could use this approach to generate sentences like the girl kicked the ball, the doctor kicked the ball, the teacher kicked the ball, which can all become training data.

The idea is to use these synthetically-generated examples to essentially build a rough version of the system, so that we can get a lot of use out of the small amount of real data that we do have, and finetune it to exactly where we want it to be, said Felkner.

Working in extremely low-resource language translation is not easy, and it can be frustrating at times, admits Felkner. But the challenge, and the potential to change lives, drive her to succeed. Her work is being noticed, too: she was recently awarded a National Science Foundation Graduate Research Fellowship to continue working on the border translation project.

Within the next year, she plans to undertake a field trip to observe how legal aid organizations are working at the border, and where her system could fit into their workflow. She is also working on a demo website for the system, which she hopes to unveil in 2023, and once developed, she hopes the system could one day be applied to other Indigenous languages.

Hill climbing on high resource languages can make your Alexa, Google Home or Siri understand you better, but its not transformative in the same way, said Felkner. Im doing this work because it has an immediate humanitarian impact. As JFK once said, we choose to go to the moon not because it is easy, but because it is hard. I often think the things that are worth doing are difficult.

Published on August 24th, 2022

Last updated on August 24th, 2022

Continue reading here:

Lost in Translation at the Border - USC Viterbi | School of Engineering - USC Viterbi School of Engineering

University of California expands list of courses that meet math requirement for admission - EdSource [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Bombshell Betty Race car to be Reengineered and Restored By UVU Students to honor the Legacy of its Owner - GlobeNewswire [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Phyllis Coleman Mouton to receive Trailblazer Award at Women Who Mean Business ceremony - The Advocate [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Fairfield University Partners with Pulse Secure on New Cybersecurity Lab to Prepare the Next Generation of Information Security Professionals -... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Global Cloud Identity and Access Management(IAM) Market Segmentation By Top Key Players- IBM Microsoft Oracle Computer Science CA Okta NetIQ Sailpoint... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Stanford supports alliance of universities in diversifying STEM postdocs - The Stanford Daily [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
N.C. A&T Welcomes New and Newly-Appointed Administrators and Faculty - Yes! Weekly [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Calvin Students Place In Top 10% Of Worldwide Programming Competition - News - Calvin News [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Multiple tenure-track positions in Computer Science & Engineering job with University of Minnesota-Twin Cities Computer Science & Engineering... [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
New smartwatch app alerts deaf and hard-of-hearing users to common home-related sounds - National Science Foundation [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
MTRAC Innovation Hub for Advanced Computing awards $270000 to Wayne State University artificial intelligence projects - The South End [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
New study outlines steps higher education should take to prepare a new quantum workforce | College of Science | RIT - RIT University News Services [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Carleton Hosts Herzberg Lecture on Increasing Diversity in Computer Science with Maria Klawe - Carleton Newsroom [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
Baylor University Invites Application for McCollum Endowed Chair of Data Science - Analytics Insight [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
CHEN | Put Computer Science in the Common Core - Cornell University The Cornell Daily Sun [Last Updated On: November 11th, 2020] [Originally Added On: November 11th, 2020]
GCVI's Tremain running to the NCAA on scholarship - GuelphToday [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Faculty, alumni, other members of U of T community named to Order of Canada - News@UofT [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Why 4-year colleges are tapping Amazon to help deliver cloud computing degrees - Education Dive [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Army Teams With Howard University on AI Center MeriTalk - MeriTalk [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
McGrath one of 10 women to earn STEM scholarship - The Riverdale Press [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
This learning platform is proving adults can benefit greatly from learning math and science - iMore [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Artificial Intelligence Is Now Smart Enough to Know When It Can't Be Trusted - ScienceAlert [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Students and schools in the news - Blue Springs Examiner [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Missouri S&T News and Events Missouri S&T faculty honored for outstanding teaching - Missouri S&T News and Research [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
HCCC Offers Opportunities for Adjunct Faculty and Instructors at Virtual Job Fair - The Hudson Reporter [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
4-H ignites a passion for science and technology in Minnesota youth - Southernminn.com [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
MIT's New Center to Advance Predictive Simulation Research Will Focus on Exascale Simulation of Materials in Hypersonic Flow Environments -... [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Computer scientist James Allen named AAAS fellow - University of Rochester [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Center to advance predictive simulation research established at MIT Schwarzman College of Computing - MIT News [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Setting the pace in computer science education | Opinion - Paragould Daily Press [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Mohammed VI University in Benguerir Launches School of Computer Science - Morocco World News [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Asa Hutchinson: Setting the pace in computer science education - Searcy Daily Citizen [Last Updated On: November 28th, 2020] [Originally Added On: November 28th, 2020]
Former FX tech person points out the racist trajectory of skin and hair CGI - Boing Boing [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
AI is not yet perfect, but it's on the rise and getting better with computer vision - TechRepublic [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Philosophy Threatened at University of Evansville - Daily Nous [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Two Maryland Teachers Receive National Honors in Math, Science Education - maryland.gov [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Special Scientist Research, Department of Computer Science job with UNIVERSITY OF CYPRUS | 238208 - Times Higher Education (THE) [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Computer science jobs pay well and are growing fast. Why are they out of reach for so many of America's students? - The Conversation US [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Computer science grad finds success and a new academic family in cybersecurity - ASU Now [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
What is Computer Science? in the US - International Student [Last Updated On: December 11th, 2020] [Originally Added On: December 11th, 2020]
Accurate Neural Network Computer Vision Without The 'Black Box' - Duke Today [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Crick Named Mathematical Sciences Distinguished Alumnus Of The Year - The Chattanoogan [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Nadya's Hot Chocolate Bombs: yummy for the tummy - theday.com [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Trouble hearing in a crowded room? New 'cone of silence' could help - Science Magazine [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
James Fujimoto wins the Visionary Prize from the Greenberg Prize to End Blindness - MIT News [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
To the brain, reading computer code is not the same as reading language - MIT News [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
U of Texas will stop using controversial algorithm to evaluate Ph.D. applicants - Inside Higher Ed [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Gift from Ann S. Bowers '59 creates new college of computing and information science | Cornell Chronicle - Cornell Chronicle [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
NYS Board of Regents adopts first-ever learning standards for computer science and digital fluency - RochesterFirst [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Computer science prof Townsend recognized for educational contributions - DePauw University [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Missouri S&T News and Events New faculty in computer science - Missouri S&T News and Research [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
Retired UW computer science professor embroiled in Twitter spat over AI ethics and cancel culture - GeekWire [Last Updated On: December 19th, 2020] [Originally Added On: December 19th, 2020]
How UC fought COVID-19 in 2020 - University of California [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
Search committee appointed for dean of Princeton's School of Public and International Affairs - Princeton University [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
How Yale economists are informing India's COVID-19 response - Yale News [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
Top MIT research stories of 2020 - MIT News [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
St. Albans City School kids were 'on the case' for Computer Science Week. What mystery did they solve? - St. Albans Messenger [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
Cobb Schools receives grant for computer science teacher training - The Catoosa County News [Last Updated On: December 23rd, 2020] [Originally Added On: December 23rd, 2020]
Scholarship honors the legacy of Terry Arthur's dedication to students - Augusta Free Press [Last Updated On: December 24th, 2020] [Originally Added On: December 24th, 2020]
This tool helps predict which COVID patients will need hospitalization and which can be sent home - Press-Enterprise [Last Updated On: December 24th, 2020] [Originally Added On: December 24th, 2020]
Students express concerns over teaching appointment of Jason Mars - The Michigan Daily [Last Updated On: December 24th, 2020] [Originally Added On: December 24th, 2020]
Prince Mohammad Bin Fahd University hosted the International Conference on Computing, Mobility, and Manufacturing (CMM 2020) - PRNewswire [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
These Are the College Majors That Pay Off the Most - 24/7 Wall St. [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
He Was Going to Close the Family Diner. Then He Got a Sign. - The New York Times [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
Members of Several Well-Known Hate Groups Identified at Capitol Riot - FRONTLINE [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
Carver Community Center to offer free pampers to mothers, free coding classes for youth - Marshall News Messenger [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
MIT's College of Computing building takes shape as Alexandria and BioMed make moves in Boston - Cambridge Day [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
Bylaws of the Department of Computer Science and Engineering - Nevada Today [Last Updated On: January 10th, 2021] [Originally Added On: January 10th, 2021]
Student-run HPAIR conference goes virtual this year - Harvard Gazette [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
JUST IN: Computer scientists in breakthrough - The Herald [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
Optimizing Traffic Signals To Reduce Intersection Wait Times - Texas A&M University Today [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
STEM Majors: Interested in a 1-Credit Course About Teaching Math, Science or Computer Science? - University of Arkansas Newswire [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
Stanford AI scholar Fei-Fei Li writes about humility in tech - Fast Company [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
Professor in Computer Science - The Voice Online [Last Updated On: January 16th, 2021] [Originally Added On: January 16th, 2021]
Expansion project to grow computer science learning, research at Algoma University - Northern Ontario Business [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Teacher of Year finalist expanding Walden Grove computer science program - KGUN [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Here's why you should get a master's in computer science - Study International News [Last Updated On: January 31st, 2021] [Originally Added On: January 31st, 2021]
Two UWF teams place in top 5 in national artificial intelligence competition - University of West Florida Newsroom - UWF Newsroom [Last Updated On: February 5th, 2021] [Originally Added On: February 5th, 2021]
WNMU Board of Regents Virtually Sits Down With Legislators, Governor - WNMU News [Last Updated On: February 5th, 2021] [Originally Added On: February 5th, 2021]
Department name change signals broad impact on computer and information technologies - Princeton University [Last Updated On: February 5th, 2021] [Originally Added On: February 5th, 2021]

Cloud Hosting

Lost in Translation at the Border – USC Viterbi | School of Engineering – USC Viterbi School of Engineering

Recent Posts

Categories

Archives

Media Sites

Pages

Site admin