Using AI to map research in the School of Arts & Sciences – Penn Today

When Colin Twomey became interim executive director of the Data Driven Discovery Initiative (DDDI) last summer, he says, his background in behavioral ecology meant that he had a good idea of the data science needs for his own field and some idea for biology, genetics, and evolution. However, with DDDI serving as the hub for data science education and research across the School of Arts & Sciences, Twomey says he found his understanding of the needs for chemistry, sociology, and other fields to be lacking.

To tackle the problem, he followed his instinct as an ecologist: map out the system and get a big-picture view before digging into the details. What resulted is a work-in-progress map intended to capture all published research by current faculty in SAS, including their work before coming to Penn, encompassing research that spans several decades. It uses the same technology as ChatGPT and similar large language models (LLMs).

I really think of it as like a Google Maps for research. It gives you a very fast way to get oriented to a really big and complex research environment like Penn, Twomey says. He built what he calls the University Atlas Project, or uAtlas for short, during his personal time, and its just one of the ways Penn is leading in data-driven research, teaching, and applications.

At first glance, it might look like a single-cell atlas to a scientist or an abstract design to an artist. While the map is still being worked on, each of the more than 40,000 dots is a different publication by a professorcolor-coded by their departmentand zooming in shows labels for 240 topics. Departments are assigned a specific color. Red is economics. Highlighter-orange is chemistry. Pastel yellow is psychology. Robins-egg blue is Africana studies. Hot pink is cinema and media studies and so forth.

The spatial arrangement shows how thematically similar each paper is in relation to another and illustrates the interdisciplinary pursuits of Penn faculty. Theres all sorts of really unexpected overlaps, and it also doesnt put anyone into a box, Twomey says.

The Department of Physics and Astronomy shows up as very broad, Twomey says. It has its tendrils into everything, which is kind of amazing; it really does accommodate a very broad range of interests, from social sciences and psychology to chemistry.

The multicolored pattern of dots around labels such as inequality, bioethical dilemmas, and COVID-19 impact show how researchers in psychology, sociology, political science, philosophy, economics, Africana studies, and more are leading on the great challenges of our time.

The map is also searchable by name, which shows the varied interests and cross-disciplinary work of Penn faculty. For example, the spread-out clusters for physics professor Vijay Balasubramanian reflect his interests in string theory and neuroscience.

Users can also adjust the view to show only works published before or after a selected year. Twomey was struck by a bridge of green dots, for earth and environmental science, connecting hard science subjectsand specifically the topic of past climate variabilityto the social sciences. The bridge labeled climate communication, Twomey says, didnt start appearing until after about 2004, pointing to research led by Michael Mann.

Twomey says the tool has been useful to him in identifying what is going on in different departments. And he says it can also help faculty identify potential collaborators and prospective graduate students and postdocs determine with whom they want to work. My other hope for this is that, once you do this for long enough, you get these pictures of where the University is evolving over time, where research has moved, Twomey says.

Bhuvnesh Jain, the Walter H. and Leonore C. Annenberg Professor in the Natural Sciences and faculty co-director of DDDI, says he loves that Twomeys map is both sophisticatedin its use of an LLM to embed research papers onto an abstract spaceand visually intuitive.

The map transcends discipline and sub-discipline labels and shows how closely connected a lot of our work is, Jain says, adding that he had fun brainstorming with Twomey on the applications of this tool. I am confident that the users will range from incoming Penn undergraduates to the deans of our schools, who will be able to rapidly visualize the hubs of activity, the interconnections of different research efforts, and the growth areas in different fields.

To build this map, Twomey began by figuring out the affiliations of SAS faculty, which he says was a challenge because the data live in many places across the University. He then used Python to distill the data and a large language model to map the semantic content of each publication into a high-dimensional embedding space. But Twomey says visualizing hundreds of dimensions simultaneously is impractical, so the final map compresses data into a two-dimensional representation that best preserves the relationships between papers that address similar topics.

He next used the programming language Elixir to build a custom web server so the map would appear on a user-friendly website. Twomey then used an LLM again to add the research topics, choosing a labeling system that he felt was neither too dense nor too sparse, so its not overwhelming but still gives you enough waypoints.

To date, the map captures most but not all School of Arts & Sciences faculty as Twomey continues to work on the project. He also notes that some data from indexes like Google Scholar and OpenAlex may be incorrect, meaning a professor may show up as incorrectly attached to a paper or the year is wrong, so additional validation is needed. Twomeys goal is to eventually include research from graduate students and postdocs as well and to expand beyond SAS.

The School of Arts and Sciences has 28 departments and 34 centers, and seeing how all those intersect is super fascinating, but thats just one piece, one school, Twomey says. I want to have this Penn-wide and even scale it beyond Penn in the future.

More here:

Using AI to map research in the School of Arts & Sciences - Penn Today

Related Posts

Comments are closed.