Tarun Dam
professor of chemistry; director of the biochemistry and molecular biology bachelor's program
When Dam got KC's email, he saw the opportunity to use the power of machine learning to make the process of glycobiology more efficient.
Working together with collaborators from Wichita State University, Kansas State University, the University of Houston, and Soka University in Tokyo, Japan, Dam and KC began studying the glycosylation of an amino acid known as asparagine. Glycans that attach to asparagine are called N-linked glycans, and they can only attach to asparagine if it has two specific amino acids on its right-hand side. The first can be any of the 20 common amino acids except proline, and the second must either be either serine or threonine.
With funding from the National Science Foundation, Dam and KC and their collaborators developed LMNglyPred, a deep-learning-based approach to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained language model.
"Now, without even doing any experiment, I can go to a computer scientist like Dukka, and if I give them a known protein, they can predict and say, 'Okay, you have five asparagines in that protein. Three will have sugar chains in this location. Two will not.' Without any experiment! That's the power of machine learning and of this collaboration."
Like all good teachers, KC explains his part of the project using a familiar metaphor.
"We cannot for sure say, 'This is where the needle is,'" KC says. "But we can really narrow that search space, so that instead of looking at everything, we can just search the area where it's most likely going to be."
professor of computer science; associate dean for research in the College of Computing
KC says the fields of computer science and glycobiology have been collaborating for nearly two decades, but using deep learning tools and large language models in this work is very newso new that KC thinks he, Dam, and their collaborators may be the first to have used a language model to predict glycosylation. The implications for glycobiologists like Damand for the medical field as a wholeare potentially immense.
Yet for all the giant leaps made by large language models and other artificial intelligence tools in recent years, KC is quick to acknowledge the foundation on which his work rests.
"It's a loop, right?" says KC. "Experimentalists like Tarun generate all this data, which we use to train our model. We then use the protein language model to inform more and better experiments. So their data helps our model get better, and our model helps their experiments get better. It's a loop."
Dam says he and KC have "huge ideas" for further collaboration on proteinsspecifically those with biomarkers for cancer.
"Tarun only cares about sugars," KC says with a chuckle, teasing Dam like an old friend. "Not to diminish anything about sugar, but there are 400 post-translational modifications, and glycosylation is only one of those 400. But when I talk to Tarun, he makes it sound like that ornament is the only important thing in the world."
Dam gives the good-natured ribbing right back to KC. "Yes, yes, he studies the other ornaments, too," Dam says with mock dismissal. "But no other modifications affect 70 percent of proteins. Only glycosylation affects the protein from birth to death. I try to convince Dukka to do more with glycobiology. It's so vast, and so important."
KC laughs, then concedes that he and Dam are considering exploring the "cross-talk between phosphorylation and O-GlcNAc." His explanation of the importance of these glycosylations to biology and human health elicits a nod of admiration from his friend and colleague.
"Dukka is not a glycobiologist, but he understands the significance of it almost like he is one," says Dam. "It's fun working with him. We respect each other's expertise. Both of our labs are doing work that is significant, and that significance will only grow as our collaboration continues."
KC agrees. "We have other interests of our own, of course, but we found some common intereststhese sugars that kind of bind us."
Michigan Technological University is a public research university founded in 1885 in Houghton, Michigan, and is home to more than 7,000 students from 55 countries around the world. Consistently ranked among the best universities in the country for return on investment, Michigans flagship technological university offers more than 120 undergraduate and graduate degree programs in science and technology, engineering, computing, forestry, business and economics, health professions, humanities, mathematics, social sciences, and the arts. The rural campus is situated just miles from Lake Superior in Michigan's Upper Peninsula, offering year-round opportunities for outdoor adventure.
Subscribe
"What we do, what we areit's all because of proteins."
"In bioinformatics, we often say we are not trying to find a needle in a haystack. We are trying to find the few spots in the haystack where needles are most likely going to be."
More here:
Research in Focus: Needles, Haystacks, and Sugar Chains - Michigan Technological University
Read More..