Opinion: The Rise of the Data Physicist – American Physical Society

In the search for new physics, a new kind of scientist is bridging the gap between theory and experiment.

By Benjamin Nachman | October 13, 2023

Traditionally, many physicists have divided themselves into two tussling camps: the theorists and the experimentalists. Albert Einstein theorized general relativity, and Arthur Eddington observed it in action as bending starlight; Murray Gell-Mann and George Zweig thought up the idea of quarks, and Henry Kendall, Richard Taylor, Jerome Freidman, and their teams detected them.

In particle physics especially, the divide is stark. Consider the Higgs boson, proposed in 1964 and discovered in 2012. Since then, physicists have sought to scrutinize its properties, but theorists and experimentalists dont share Higgs data directly, and theyve spent years arguing over what to share and how to format it. (Theres now some consensus, although the going was rough.)

But theres a missing player in this dichotomy. Who, exactly, is facilitating the flow of data between theory and experiment?

Traditionally, the experimentalists filled this role, running the machines and looking at the data but in high-energy physics and many other subfields, theres too much data for this to be feasible. Researchers cant just eyeball a few events in the accelerator and come to conclusions; at the Large Hadron Collider, for instance, about a billion particle collisions happen per second, which sensors detect, process, and store in vast computing systems. And its not just quantity. All this data is outrageously complex, made more so by simulation.

In other words, these experiments produce more data than anyone could possibly analyze with traditional tools. And those tools are imperfect anyway, requiring researchers to boil down many complex events into just a handful of attributes say, the number of photons at a given energy. A lot of science gets left out.

In response to this conundrum, a growing movement in high-energy physics and other subfields, like nuclear physics and astrophysics, seeks to analyze data in its full complexity to let the data speak for itself. Experts in this area are using cutting-edge data science tools to decide which data to keep and which to discard, and to sniff out subtle patterns.

Machine learning, in particular, has allowed scientists to do what they couldnt before. For example, in the hunt for new particles, like those that might comprise dark matter, physicists dont look for single, impossible events. Instead, they look for events that happen more often than they should. This is a much harder task, requiring data-parsing at herculean scales, and machine learning has given physicists an edge.

Nowadays, the experimentalists who manage the control rooms of particle accelerators are seldom the ones developing the tools of machine learning. The former are certainly experts; they run colliders, after all. But in projects of such monumental scale, nobody can do it all, and specialization reigns. After the machines run, the data people step in.

The data people arent traditional theorists, and theyre not traditional experimentalists (though many identify as one or the other). But theyre here already, straddling different camps and fields, proving themselves invaluable to physics.

For now, this scrappy group has no clear name. They are data scientists or specialized physicists or statisticians, and they are chronically interdisciplinary. Its high time we recognize this group as distinct, with its own approaches, training regimens, and skills. (Its worth noting, too, data physics discreteness from computational physics. In computational physics, scientists use computing to cope with resource limitations; in data physics, scientists deal with data randomness, making statistics what you might call phystatistics a more vital piece of the equation.)

Naming delivers clout and legitimacy, and it shapes how future physicists are educated and funded. Many fields have fought to earn this recognition, like biological physics, sidelined for decades as an awkward meeting of two unlike sciences and now a full-fledged and vibrant subfield.

Its the data wranglers turn. I propose that we give these specialists a clear identity the data physicists. Unlike a traditional experimentalist, a data physicist probably wont have much hands-on experience with instrumentation. They probably won't spend time soldering together detector parts, a typical experience for experimentalists-in-training. And unlike a theorist, they may not have much experience with first-principles physics calculations, outside of coursework.

But the data physicist does have the core skills to understand and interrogate data complete with a strong foundation in data science, statistics, and machine learning as well as the computational and theoretical background to relate this data to underlying physical properties.

The data physicists have their work cut out for them, given the enormous amount of data being churned out by experiments in and beyond high-energy physics. Their efforts will, in turn, improve the development of new experimentation methods, which are today often developed from simpler, synthetic datasets that dont map perfectly to the real world.

But this data will go underutilized without a skilled cohort of scientists who can deftly handle it with new tools, like machine learning. In this sense, Im not merely arguing for name recognition. We need to identify and then train the next generation, to tackle the data we have right now.

How? First, we need the right degrees: Universities should develop programs explicitly for data physicists in graduate school. I expect the data physicist to have a strong physics background and extensive training in statistics,data science, and machine learning. Take my own path as a starting point: I studied computational aspects of particle theory as a masters student and took many courses in statistics as a PhD student, which led to naturally interdisciplinary research between physics and statistics/machine learning and between theorists and experimentalists.

The right education is a start, but the field also needs tenure-track positions and funding. There are promising signs, including new federal funding to help institutions launch Artificial Intelligence Institutes dedicated to advancing this research. But while investments like this fuel interdisciplinary research, they dont support new faculty not directly, at least. And if youre not at one of the big institutions that receive these funds, youre out of luck.

This is where small-scale funding must step in, including money for individual research groups, rather than for particular experiments. This is easier said than done, because a typical group grant, which a PI uses to fund themselves and a student or postdoc, forces applicants to adhere to the traditional divide: theory or experiment, or hogwash. The same goes for the Department of Energys prestigious Early Career Award there is no box to check for interdisciplinary data physics.

As tall an order as this funding is, it could be easier to achieve than a change in attitude. Physicists might well be famous for many of humanitys greatest discoveries, but theyre also notorious for their exclusionary, if not outright purist, suspicion of interdisciplinary science. Physics that borrows tools and draws inspiration from other fields from cells in biological physics, say, or from machine learning in data science is often dressed down as not real physics. This is wrong, of course, but its also a bad strategy: A great way to lose brilliant physicists is to scoff at them.

Not all are skeptical; far more, in fact, are excited. Within APS, the Topical Group on Data Science (GDS) is growing rapidly and might soon become a Division on Data Science, a reflection of the fields growing role in physics. My own excitement about working directly with data inspired me to become an experimentalist myself, although I realize now how restrictive that label was.

As available data grows, so does our need for data physicists. Lets start by calling them what they are. But then lets do the hard work: educating, training, and funding this brilliant new generation.

Benjamin Nachman is a Staff Scientist at Berkeley Lab, where he leads the Machine Learning for Fundamental Physics Group, and a Research Affiliate at the UC Berkeley Institute for Data Science. He is also a Secretary of the APS Topical Group on Data Science.

The author wishes to thank the Editor, Taryn MacKinney, for her work on this article, and David Shih for coining the term 'data physicist' at a recent Particle Physics Community Planning Exercise.

Read more from the original source:

Opinion: The Rise of the Data Physicist - American Physical Society

Related Posts

Comments are closed.