Page 1,025«..1020..1,0241,0251,0261,027..1,0301,040..»

New MS in Electrical Engineering and Computer Science Offers … – Chapman University: Happenings

Olivia Chilvers, Tally Holcombe and Noah Fuery are among the students pioneering the MS in electrical engineering and computer science program (MSEECS) at Chapman Universitys Fowler School of Engineering.

One thing that stood out to me about this program is the foundation in ethics and leadership, says Chilvers, a computer science major who expects to graduate with a bachelors degree in spring 2024 and will be pursuing the MSEECS integrated track. I like that it intertwines a humanities aspect to STEM classes.

The MSEECS is a traditional full-time program, with an integrated option for current Chapman undergraduate students. It has three main parts: computing systems, electrical systems and intelligent systems and data science. The program incorporates ethical engineering, entrepreneurial thinking, leadership skills and communication.

The new MSEECS is in response to a growing demand in the community for professionals with specialized knowledge in fast-evolving fields of engineering centered around computing and intelligent systems, says Professor Thomas Piechota, who teaches in Fowler School of Engineering and Schmid College of Science and Technology.

Fuery, who expects to graduate in spring 2024 with a computer science degree, will be taking the integrated route for his MSEECS. He thinks incorporating ethics is very smart and innovative.

The controversies and debates surrounding AI are some of the most important topics students can discuss and learn about at university, he says. This masters program will allow the students to engage with these controversial discussions about artificial intelligence.

Holcombe, a biological sciences major and spring 2023 graduate who will be in the programs first cohort in fall 2023, discovered computer science entering her junior year at Chapman. Her instructors hands-on approach built her interest and confidence in the subject.

I know how engaging the classes Ive already had were, and I wanted to stay in an institution where I know thats what the professors are like, she says.

Chilvers, who is helping to make software tools as a Boeing intern, said she hadnt intended to go to graduate school, but the new program intrigued her. She liked the opportunity to go deeper into topics and stay at Chapman.

She wants to continue at Boeing and thinks the program will make her an even better candidate to employers.

I think as a female it is a great accomplishment to be in a masters program, especially in STEM, she says.

Like Chilvers, Fuery thinks the program will be an asset to his career.

I am interested in other positions concerning topics of cyber security and cloud computing, but I am primarily focused on trying to become a software engineer, Fuery says. I hope to use this masters degree to propel me in searching for a job, but I am keeping the possibility of pursuing a Ph.D after obtaining a masters degree.

Chilvers enjoys developing software and using automation to help avoid human error in tedious tasks.

Regarding AI, its important to know how it interacts with humans or in parallel, she says.

Fuery says there are many opportunities in the new program for adding to his knowledge in computer science and electrical engineering.

The wonderful faculty and staff of the Fowler School of Engineering, and all of Chapman in general, have led me to pursue a masters degree at Chapman, he says.

Students in the program, and faculty, can use what donor Nvidia Corp. calls the first community-operated supercomputer in the nation which also happens to live at Chapman.

Prospective students can go here for a calendar of information sessions.

See original here:

New MS in Electrical Engineering and Computer Science Offers ... - Chapman University: Happenings

Read More..

‘Topping off’ Amy Gutmann Hall | Penn Today – Penn Today

Two years after the project ceremonially broke ground at 34th and Chestnut streets, members of the Penn community gathered on Wednesday for the topping off of Amy Gutmann Hall. A time-honored tradition in construction, the signing and placement of the final wood panel signaled the completion of the new School of Engineering and Applied Science buildings frame.

A hub for data science on campus and for the Philadelphia community when it officially opens next summer, Amy Gutmann Hall will embolden interdisciplinary work in a field that is transforming all facets of engineering education, and of course research and innovation, said Penn Engineerings Nemirovsky Family Dean Vijay Kumar.

The new facility, with next-generation hybrid classrooms and laboratories, will be equipped to support exploration that advances graphics and perception, privacy and security, computational social science, data-driven medical diagnostics, scientific computing, and machine learning. It will also allow for the development of safe, explainable, and trustworthy artificial intelligence, said Kumar.

Eighty-two truckloads of mass timbera more sustainable and efficient product than steel or concretehave been used to construct the 116,000-square-foot, six-story building. Philadelphias tallest new mass timber structure, Amy Gutmann Hall will evoke a warm, welcoming environment with its exposed wood throughout its spaces.

The building is not so much built as it is engineered and then prefabricated with extraordinary precision, said President Liz Magill. She noted how the techniques used to create the new building relied heavily on advanced computation and data, which is precisely the kind of work that this building will foster when its completed.

The building reflects the use, and the use helped determine the building, Magill said.

Amy Gutmann Hall, designed by Lake|Flato and KSS Architects, currently under construction led by Gilbane Building Company, and named for Penns longest serving president, has been made possible thanks to a transformative $25 million commitment to Penn Engineering from Harlan Stone in 2019, a University Trustee, member of Penn Engineerings Board of Advisors, and chair of the schools Technical Advisory Board. Stone, a School of Arts & Sciences alumnus and Penn Engineering parent, said at the gathering that he imagines the new building as a place that will produce new ideas, methodologies, and paradigms of how data can impact humanity for good.

After partaking in a celebratory toast, the crowd cheered as a crane erected the wooden panel, which had been signed by those who took a very bold idea and made it a compelling reality, said Magill.

We together celebrate this milestone in the creation of Amy Gutmann Hall, Magill said. A testament to the belief that collaborative research and learning can solve some of the worlds most urgent problems. Within this building, may the insights that we gain through data science help us harness new knowledge and understanding to create a better world. I know we all cannot wait to see these innovations come to life.

Visit link:

'Topping off' Amy Gutmann Hall | Penn Today - Penn Today

Read More..

Biden-Harris Administration announces up to $7 million through … – National Oceanic and Atmospheric Administration

Today, the Department of Commerce and NOAA announced a $7 million funding opportunity through President Bidens Investing in America agenda to establish a new multi-university Data Assimilation Consortium that will improve weather predictions. As the climate crisis contributes to worsening extreme weather events affecting Americans nationwide, this investment will give Americans the information and tools they need to prepare and stay safe.

Funded by President Bidens Inflation Reduction Act, this award will provide up to $7 million over three years beginning in fiscal year 2024 for a new consortium focused on numerical weather prediction. This new consortium will bolster NOAA forecast models, provide strategic workforce development in data assimilation and enhance long-term partnerships between NOAA and those working in academia, government and the broader weather enterprise. NOAA is currently soliciting collaborative proposals for the project.

Millions of Americans are right now experiencing the impacts of extreme weather, made worse by the climate crisis, which threatens peoples safety and economic security of the nation, said U.S. Secretary of Commerce Gina Raimondo. Under President Bidens leadership, we are updating our weather infrastructure so that communities can take actions to protect lives and maintain resilience in the face of extreme weather events, including those brought on by climate change.

In recent years, the U.S. has experienced an unprecedented rate of billion-dollar weather and climate disasters, as well as increasing extreme heat events. Hundreds of lives have been lost and trillions of dollars in damages amassed as a result of severe storms, hurricanes, floods, drought, heat waves, wildfires and winter storms. During the first half of 2023, there were 12 confirmed weather/climate disaster events with losses exceeding $1 billion, and already this year, disaster events have resulted in the deaths of at least 100 people.

The need for actionable weather information never ends, and neither do our efforts to make that information as accurate as possible, said NOAA Administrator Rick Spinrad, Ph.D. This new consortium funded by President Bidens Investing in America agenda will help us stay on the cutting edge and help continue innovation needed for more precise forecasts.

Data assimilation advancements are crucial for improving our forecasts because they ensure that the forecast models start with the most accurate and best-understood initial state of the planet. This is done by using high-quality, real-world observed conditions, the most current science and our next generation weather models to increase the accuracy of forecasts. Earth system observations come from instruments on a variety of platforms, including satellites, radars, aircraft-based instruments, ships, buoys, uncrewed platforms and land-based surface stations.

Data assimilation allows for these observations to be blended with forecast model output to create high quality analyses to initialize our weather and climate forecast models. Advancing out data assimilation capabilities will help maximize the value provided by NOAAs global observing system.

This new funding will aid in the strategic partnership between NOAA and the broader weather enterprise to develop the Unified Forecast System offsite link. This comprehensive, open source Earth modeling system accelerates the transition of research successes to operations by incorporating innovations from across the enterprise and across several independent forecast models into a single seamless system.

The multi-university consortium will also advance data assimilation research and education, and foster collaboration, student training and exchange of experts between NOAA and the theJoint Center for Satellite Data Assimilationoffsite link offsite link, academic partners including minority-serving institutions and international institutions such as the Met Office in the UK and itsMet Office Academic Partnership(MOAP)universities. The Data Assimilation Consortiums collaborations will be facilitated through a new Transatlantic Data Science Academy (TDSA) that is being set up jointly by the Met Office and NOAA.

There is currently a shortage of data assimilation scientists across the U.S., and the extreme weather associated with the climate crisis is increasing demand for this occupation now more than ever. This new funding will invest in the training and workforce development we need in this important area of forecasting.

Advancing data assimilation is essential to improving our forecasts, said Assistant Secretary of Commerce for Environmental Observation and Prediction, Michael C. Morgan, Ph.D. This funding will help us train and develop a new and diverse cohort of data assimilation scientists for the future and will help us broaden access to this area of weather prediction.

More information on theNotice of Funding Opportunityis available online.

More information about the Inflation Reduction Act can be found on the NOAA website.

Read this article:

Biden-Harris Administration announces up to $7 million through ... - National Oceanic and Atmospheric Administration

Read More..

Platform Science and Uptake Partner to Bring Predictive … – PR Newswire

As the newest addition to the industry-leading Virtual Vehicle platform, Uptake's data analytics will help keep drivers safely on the road and prevent unplanned roadside breakdowns

SAN DIEGO and CHICAGO, July 27, 2023 /PRNewswire/ --Platform Science, the leading connected vehicle platform, and Uptake, a leader in predictive analytics software-as-a-service (SaaS), today announced a partnership to extend data-driven insights and predictive maintenance capabilities to some of the country's largest vehicle fleets.

Platform Science's Virtual Vehicle platform offers workflow, navigation, telematics, and a range of other solutions to the most innovative fleets and commercial vehicles. Now, it is adding Uptake's predictive analytics to the Safety and Maintenance category of its growing ecosystem of third-party applications.

"When fleet managers tell us they want new capabilities for optimizing and streamlining their business, we find the best-in-class solution," said Joe Jumayao, Vice President of Business Development at Platform Science. "Fleet managers don't want to lose time waiting on parts and repairs. They want to keep trucks rolling and deliveries arriving on time. Uptake will help them do that."

"We are thrilled to partner with the leader in connected vehicles and join the ecosystem that puts the most vital information right at the fingertips of fleet managers," said Kayne Grau, CEO of Uptake. "This partnership comes during peak season for fleets, when drivers and technicians are often at their busiest. We're excited to extend our capabilities to Platform Science's large network for a meaningful impact on the efficiency of the transportation industry."

Fleets using Uptake via the Virtual Vehicle platform will be equipped with real-time and historical data to better inform maintenance strategies and predict vehicle needs. These insights can be leveraged to maximize mechanics' time, improve labor effectiveness and streamline operations. Users will also get improved visibility into the condition of their vehicles, which is critical to their longevity. The end result will be as much as 20% reduction in downtime and impactful maintenance operations for optimal fleet performance.

In collaboration with Platform Science, Uptake will be able to provide comprehensive data insights to some of the largest truck fleets in the country. These insights enable fleet managers to transition from calendar-based preventive maintenance schedules to a dynamic predictive maintenance strategy.The transition will enable fleets to save on repair costs by an average of 12% and minimize delays by helping them prevent unplanned roadside breakdowns and catastrophic failures.Uptake userswill also have access to Platform Science's expert support team, who are available 24 hours a day to answer questions.

Uptake Fleet is available to Platform Science users today through the Virtual Vehicle platform. To learn more about the partnership, visit https://www.platformscience.com/appsintegrations/uptake-fleet.

About Platform Science

Platform Science is transforming transportation technology by empowering enterprise fleets with a unified, user-friendly technology platform. Platform Science makes it easy to develop, deploy and manage mobile devices and applications on commercial vehicles, giving fleets an edge in efficiency, flexibility, visibility, and productivity. The customizable platform delivers an unlimited canvas to fleets and developers seeking to innovate and create new solutions as customers' needs, businesses and industries evolve. Platform Science was named by Fast Company as one of the World's Most Innovative Companies for 2022. Platform Science was ranked #2 in the FreightTech 25 Awards by industry news leader, FreightWaves, for both 2022 and 2023. For more information, please visit http://www.platformscience.com

About UptakeUptake is a leader in predictive analytics software-as-a-service (SaaS), working to translate data into smarter operations. Driven by industrial data science, Uptake enables and delivers actionable insights that predict truck and component failure, optimize parts and maintenance strategies, and visualize cost information with more than 45 patents, almost 200 data science models and recognition by Gartner, Verdantix, the World Economic Forum, CNBC, and Forbes. Uptake is based in Chicago. To stay up-to-date on what we're doing, visit us at http://www.uptake.comand follow us on LinkedInand Instagram.

Media Contact

Platform Science[emailprotected]

Uptake[emailprotected]

SOURCE Uptake

More here:

Platform Science and Uptake Partner to Bring Predictive ... - PR Newswire

Read More..

Podcast: All Things Data with Guest Yabebal Fantaye – Newsroom … – University of St. Thomas Newsroom

In the ever-evolving technology landscape, data analytics and data strategy continues to play a larger role in economics and business models. Director of the Center for Applied Artificial Intelligence at the University of St. Thomas, Dr. Manjeet Rege, co-hosts the All Things Data podcast with adjunct professor and Innovation Fellow Dan Yarmoluk. The podcast provides insight into the significance of data science as it relates to business models, business economics, and delivery systems. Through informative conversation with leading data scientists, business model experts, technologists, and futurists, Rege and Yarmoluk discuss how to utilize, harness, and deploy data science, data-driven strategies, and enable digital transformations.

Rege and Yarmoluk spoke with Yabebal Fantaye about 10 Academys mission to increase access to data science opportunities globally by identifying, training and launching the careers of young people in Africa. Fantaye is from Ethiopia, has a doctorate in astrophysics and is now the scientific director and co-CEO at 10 Academy. 10 Academy is an African educational start-up that offers a three-month program for high potential recent university graduates from Africa (no work experience) and gets them into global AI/Web3 jobs in six months. 10 Academy is a not-for-profit community-owned initiative that has been designed to scale across the continent and get thousands of brilliant people into work each year. Here are some highlights from their conversation.

Q. In terms of background of the participants, do you expect them to have an undergraduate degree in the STEM field?

A. Theres no requirement in terms of which degree, but we require they are familiar with certain technical skills. Our focus is being able to place them at the end of the program, and we know we wont be able to teach everything in three months, so we have prerequisites. They can self-learn on their own before, but once they come, they must pass a test and a one-week assessment that demonstrates they have the prerequisites programming language, basic statistics, basic mathematical skills. Thats where we start.

Q. In terms of job placements, do you see these participants getting placed worldwide?

A. It is worldwide. We have people sitting in Ethiopia, Nigeria, Kenya, etc., but working in different parts of the world U.S., U.K., Brazil, Canada, all over Europe basically wherever there is work. Most companies that have hired one or two of our trainees have come back to hire more. In one case, a U.K. company has hired 12 of our trainees they see the value of the discipline, hard work, and staffing. We are seeing a lot of repeat clients, but of course are also looking to penetrate new markets, new clients.

Listen to their conversation here:

Follow this link:

Podcast: All Things Data with Guest Yabebal Fantaye - Newsroom ... - University of St. Thomas Newsroom

Read More..

Fusemachines Appoints Nate Rackiewicz as Head of Data and … – Newswire

New Role to Bolster Development, Implementation and Delivery of Robust Data, Analytics and Innovation Strategies

NEW YORK, July 28, 2023 (Newswire.com) - Fusemachines Inc., a leading provider of enterprise AI products and solutions,today announced the onboarding of Nate Rackiewicz as the new Executive Vice President and Head of Data and Analytics North America. Rackiewicz's leadership portfolio includes a roster of successful tenures at leading companies such as HBO, Take-Two Interactive, Gannett and A+E Networks. In his new role, Rackiewicz will play a pivotal part in harnessing the power of data-driven intelligence to develop innovative solutions that empower Fusemachines and its customers to thrive in an increasingly data-centric world.

Rackiewicz brings more than two decades of analytics, data engineering and AI experience to Fusemachines. His leadership will be integral to Fusemachines' growth as the organization rolls out cutting-edge AI and data offerings.

"We're excited to have Nate join our executive leadership team as we fortify our commitment towards data-powered advanced-AI solutions," saidSameer Maskey, CEO and founder, Fusemachines. "His deep knowledge and expertise in data and analytics will help strengthen the very core of our AI offerings."

"I am thrilled to be a part of Fusemachines an organization that has spent the last decade making significant contributions to the field of AI and data science,"saidNate Rackiewicz,EVP and Head of Data and Analytics, Fusemachines. "As someone who has witnessed the company's unique approach in helping businesses unleash their data and analytics potential firsthand, I know the team's dedication towards customer excellence. I am, therefore, excited to join them in propelling more organizations towards a data-powered future."

Rackiewicz's background spans media, entertainment, IT, research and advertising industries. Previously, he served as the Chief Data Officer at Gannett where he built the global Enterprise Data Management Center of Excellence (COE). At Take-Two Interactive, he led the establishment of a global data science and advanced analytics center of excellence, devising new revenue and growth opportunities. He has also led impactful consumer data and analytics efforts at organizations such as A+E Networks and HBO, further solidifying his reputation as a visionary in the field.

To learn about Fusemachines, visit http://www.fusemachines.com.

Source: Fusemachines Inc.

See original here:

Fusemachines Appoints Nate Rackiewicz as Head of Data and ... - Newswire

Read More..

NASA Awards Contract for Earth Science Data Collection and … – PR Newswire

WASHINGTON, July 26, 2023 /PRNewswire/ -- NASA has awarded a sole-source contract to Columbia University, New York, to operate the agency's Socioeconomic Data and Application Center's (SEDAC) Distributed Active Archive Center (DAAC).

The cost-no-fee contract supports the integration of socioeconomic and Earth science data used for Earth science research as well as the production of data products and applications used by the broader science community and educational institutions.

The basic period of performance is scheduled to begin Aug. 1, 2023, with four option periods. The total potential contract value is nearly $30 million if all options are exercised.

SEDACsynthesizesEarth science and socioeconomic data and information in ways useful to a wide range of decision-makers and other applied users, providingan "information gateway" between the socioeconomic and Earth science data and information domains.SEDAC has extensive holdings related to population, sustainability, and geospatial dataand provides access to many multilateral environmental agreements.

For information about NASA and agency programs, visit:

Home Page

SOURCE NASA

Go here to read the rest:

NASA Awards Contract for Earth Science Data Collection and ... - PR Newswire

Read More..

Unearthing Our Past, Predicting Our Future: Scientists Discover the … – SciTechDaily

Using AI to analyze X-ray images and genetic sequences, a joint research team from The University of Texas at Austin and New York Genome Center have identified the genes that dictate skeletal proportions. The findings, besides revealing our evolutionary history, have implications for predicting risks of musculoskeletal diseases like arthritis and back pain. Credit: The University of Texas at Austin

By leveraging artificial intelligence to scrutinize tens of thousands of X-ray pictures and genetic sequences, a team of researchers from The University of Texas at Austin and New York Genome Center have successfully identified the genes that shape our skeletons, from the width of our shoulders to the length of our legs.

This groundbreaking study, which was published as the cover article in the journal Science, not only sheds light on our evolutionary history but also paves the way for a future where physicians could more accurately assess a patients likelihood of suffering from ailments like back pain or arthritis later in life.

Our research is a powerful demonstration of the impact of AI in medicine, particularly when it comes to analyzing and quantifying imaging data, as well as integrating this information with health records and genetics rapidly and at large scale, said Vagheesh Narasimhan, an assistant professor of integrative biology as well as statistics and data science, who led the multidisciplinary team of researchers, to provide the genetic map of skeletal proportions.

Humans are the only large primates to have longer legs than arms, a change in the skeletal form that is critical in enabling the ability to walk on two legs. The scientists sought to determine which genetic changes underlie anatomical differences that are clearly visible in the fossil record leading to modern humans, from Australopithecus to Neanderthals. They also wanted to find out how these skeletal proportions allowing bipedalism affect the risk of many musculoskeletal diseases such as arthritis of the knee and hip conditions that affect billions of people in the world and are the leading causes of adult disability in the United States.

The researchers used deep learning models to perform automatic quantification on 39,000 medical images to measure distances between shoulders, knees, ankles, and other points in the body. By comparing these measurements to each persons genetic sequence, they found 145 points in the genome that control skeletal proportions.

Our work provides a road map connecting specific genes with skeletal lengths of different parts of the body, allowing developmental biologists to investigate these in a systematic way, said Tarjinder (T.J.) Singh, the studys co-author, and associate member at NYGC and assistant professor in the Columbia University Department of Psychiatry.

The team also examined how skeletal proportions associate with major musculoskeletal diseases and showed that individuals with a higher ratio of hip width to height were found to be more likely to develop osteoarthritis and pain in their hips. Similarly, people with higher ratios of femur (thigh bone) length to height were more likely to develop arthritis in their knees, knee pain, and other knee problems. People with a higher ratio of torso length to height were more likely to develop back pain.

These disorders develop from biomechanical stresses on the joints over a lifetime, said Eucharist Kun, a UT Austin biochemistry graduate student and lead author on the paper. Skeletal proportions affect everything from our gait to how we sit, and it makes sense that they are risk factors in these disorders.

The results of their work also have implications for our understanding of evolution. The researchers noted that several genetic segments that controlled skeletal proportions overlapped more than expected with areas of the genome called human accelerated regions. These are sections of the genome shared by great apes and many vertebrates but are significantly diverged in humans. This provides a genomic rationale for the divergence in our skeletal anatomy.

One of the most enduring images of the RennaisanceLeonardo Da Vincis The Vitruvian Man contained similar conceptions of the ratios and lengths of limbs and other elements that make up the human body.

In some ways, were tackling the same question that Da Vinci wrestled with, Narasimhan said. What is the basic human form and its proportion? But we are now using modern methods and also asking how those proportions are genetically determined.

Reference: The genetic architecture and evolution of the human skeletal form by Eucharist Kun, Emily M. Javan, Olivia Smith, Faris Gulamali, Javier de la Fuente, Brianna I. Flynn, Kushal Vajrala, Zoe Trutner, Prakash Jayakumar, Elliot M. Tucker-Drob, Mashaal Sohail, Tarjinder Singh and Vagheesh M. Narasimhan, 21 June 2023, Science.DOI: 10.1126/science.adf8009

In addition to Kun and Narasimhan, the co-authors are Tarjinder Singh of the New York Genome Center and Columbia University; Emily M. Javan, Olivia Smith, Javier de la Fuente, Brianna I. Flynn, Kushal Vajrala, Zoe Trutner, Prakash Jayakumar and Elliot M. Tucker-Drob of UT Austin; Faris Gulamali of Icahn School of Medicine at Mount Sinai; and Mashaal Sohail of Universidad Nacional Autonoma de Mexico.

The research was funded by the Allen Institute, Good Systems, the Ethical AI research grand challenge at UT Austin, and the National Institutes of Health, with graduate student fellowship support provided by the National Science Foundation and UT Austins provosts office.

View post:

Unearthing Our Past, Predicting Our Future: Scientists Discover the ... - SciTechDaily

Read More..

Quilter Investors adds data scientist and KPMG analyst to … – Citywire

Quilter Investors has bolstered its responsible investment team with two hires.

Jonathan de Pasquallie joins the investment firm from KPMG, where he worked as a responsible investment analyst in the Crown Dependencies.

In this role, heprovided ESG due diligence while helping investors develop their climate risk and decarbonisation strategies.

At Quilter, De Pasquallie will enhance ESG integration and reporting and help lead active ownership activities for clients.

Chris Wu joins the firm as a responsible investment quantitative analyst. He spent more than a year as a research officer at the London School of Economics and morethan two years as a research fellow in health data science at the University of Leeds. He was also an assistant professor of urban planning at Chongqing Universityin China.

Marisol Hernandez, head of responsible investing at Quilter Investors, said the hires will help the firm navigate increasing regulatory requirements and manage portfolios in line with both their needs and the rule book.

At the beginning of July, the investment firm expanded its manager research team with the hire of Malachi Ferguson from Liontrust Asset Management, where he was ananalyst in the multi-asset team covering equities, fixed income, sustainable and alternative investments.

Read the original:

Quilter Investors adds data scientist and KPMG analyst to ... - Citywire

Read More..

A universal null-distribution for topological data analysis | Scientific … – Nature.com

The distribution of persistent cycles

Let (mathcal {S}) be a d-dimensional metric measure space, and let ({textbf{X}}_n = (X_1,ldots ,X_n)in S^n) be a sequence of random variables (points), whose joint probability law is denoted by (mathbb {P}_n). Let (mathbb {P}= (mathbb {P}_n)_{n=1}^infty), and denote (mathbb {S}=(mathcal {S},mathbb {P})), which we refer to as the sampling model. Fix a filtration type (mathcal {T}) (e.g., echor Rips), and a homological degree (k>0), and consider the k-th noise persistence diagram ({{textrm{dgm}}}_k^{_{textbf{N}}}({textbf{X}}_n;mathcal {T})), which in short we denote by ({{textrm{dgm}}}_k). We study the distribution of the random persistence values (left{ pi (p)right} _{pin {{textrm{dgm}}}_k}) (where (pi (p) = {{,textrm{death},}}(p)/{{,textrm{birth},}}(p))), and refer to them as the (pi)-values of the diagram. Theoretical analysis shows that the largest (pi)-value (({{,textrm{death},}}/{{,textrm{birth},}}) ratio) of points inthe noise (({{textrm{dgm}}}_k^{_{textbf{N}}})) is (o((log n)^{1/k}))26, while the (pi)-values of the signal features (({{textrm{dgm}}}_k^{_{textbf{S}}})) are (Theta (n^{1/d}))33. Thus, the (pi)-values provide a strong separation (asymptotically) between signal and noise in persistence diagrams. We stress that in this paper we study the entire ensemble of persistence values, not only the maximal ones.

We begin by considering the case where (mathbb {P}_n) is a product measure, and the points (X_1,ldots ,X_n) are iid (independent and identically distributed). Given ({{textrm{dgm}}}_k) as defined above, denote the empirical measure of all (pi)-values

$$begin{aligned} Pi _{n} = Pi _{n}(mathbb {S},mathcal {T},k):= frac{1}{|{{textrm{dgm}}}_k|}sum _{pin {{textrm{dgm}}}_k}delta _{pi (p)}, end{aligned}$$

where (delta _x) is the Dirac delta-measure at x. In Fig.2 we present the CDF of (Pi _{n}) for the ech complex with various choices of (mathbb {S}) and k. Similar plots are available for the Rips complex in Sect. 3 of the Supplementary Information. We observe that if we fix d (dimension of (mathcal {S})), (mathcal {T}), and k, then the resulting CDF depends on neither the space (mathcal {S}) nor the distribution (mathbb {P}_n). This leads to our first conjecture.

The distribution of (pi)-values in the echcomplex. We take the empirical CDFs of the (pi)-values (log-scale), computed from various iid samples. The legend format is (mathcal {T}/ mathbb {P}/ d / k), where (mathcal {T}) is the complex type, (mathbb {P}) is the probability distribution, d is the dimension of the sampling space, and k is the degree of homology computed. By box,torus, sphere, projective(-plane), and Klein(-bottle) we refer to the uniform distribution on the respective space or its most natural parametrization, while normal and cauchy refer to the corresponding non-uniform distributions. See Methods and Sect. 3 in the Supplementary Information for further details.

Fix (d,mathcal {T},) and (k>0). For any (mathbb {S}in mathcal {I}_d),

$$begin{aligned} lim _{nrightarrow infty }Pi _{n} = Pi ^*_{d,mathcal {T},k}, end{aligned}$$

where (Pi ^*_{d,mathcal {T},k}) is a probability distribution on ([1,infty )).

The precise notion of convergence and the extent of the class (mathcal {I}_d) are to be determined as future work. We conjecture that (mathcal {I}_d) is quite large. In our experiments, the space (mathcal {S}) varied across a wide range of manifolds and other spaces. The distribution (mathbb {P}_n) is continuous and iid, but otherwise fairly generic (possibly even without moments, see the Cauchy example in Fig.2). We name this phenomenon weak universality, since on one hand the limit is independent of (mathbb {S}) (hence, universal), while on the other hand it does depend on (d,mathcal {T},k) and the iid assumption. This is in contrast to the results we discuss next.

The following procedure was discovered partly by chance. While non-intuitive, the results are striking. Given a random persistence diagram ({{textrm{dgm}}}_k), for each (pin {{textrm{dgm}}}_k) apply the transformation

$$begin{aligned} ell (p) := A{{,mathrm{log log },}}(pi (p)) + B, end{aligned}$$

(1)

where

$$begin{aligned} A = {left{ begin{array}{ll}1 &{} mathcal {T}=text {Rips},\ 1/2 &{} mathcal {T}= check{textrm{C}}text {ech},end{array}right. }qquad B = -lambda - A{bar{L}}, end{aligned}$$

(2)

and where ({bar{L}} = frac{1}{|{{textrm{dgm}}}_k|} sum _{pin {{textrm{dgm}}}_k}{{,mathrm{log log },}}(pi (p))) and (lambda) is the Euler-Mascheroni constant (=0.5772156649(ldots)). We refer to the set (left{ ell (p)right} _{pin {{textrm{dgm}}}_k}) as the (ell)-values of the diagram. In Fig.3 we present the empirical CDFs of the (ell)-values, as well as the kernel density estimates for their PDFs, for all the iid samples that were included in Fig.2. The plots for the Rips complex are similar, and can be found in Sect. 3 of the Supplementary Information. We observe that all the different settings ((mathbb {S},mathcal {T},k)) yield exactly the same distribution under the transformation given by (1). We refer to this phenomenon as strong universality.

The distribution of (ell)-values. (top) All the iid samples included in Fig.2 (26 curves). (bottom) A selection non-iid and real-data point clouds. The left column shows the empirical CDF, and the middle column is the kernel density estimate for the PDF, with the LGumbel distribution shown as the dashed line. The right column shows the QQ plots compared this distribution.

While strong universality for iid point-clouds is by itself a very unexpected and useful behavior, a natural question is how generally it applies in other scenarios. In Fig.3 we also include a selection of non-iid samples and real-data (see Experimental results and Methods for details). While the distribution of the (pi)-values for these models is vastly different than the iid case, all of these examples exhibit the same strong universality behavior.

To summarize, our experiments highly indicate that persistent (ell)-values have a universal limit for a wide class of sampling models (mathbb {S}), denoted by (mathcal {U}). For our main conjecture, we consider the empirical measure of all (ell)-values,

$$begin{aligned} mathcal {L}_n = mathcal {L}_n(mathbb {S},mathcal {T},k):= frac{1}{|textrm{dgm}_k|}sum _{pin {{textrm{dgm}}}_k}delta _{ell (p)}. end{aligned}$$

For any (mathbb {S}in mathcal {U}), (mathcal {T}), and (kge 1),

$$begin{aligned} lim _{nrightarrow infty }mathcal {L}_n = mathcal {L}^*, end{aligned}$$

where (mathcal {L}^*) is independent of (mathbb {S}), (mathcal {T}), and k.

Observe that in this Conjecture, the only dependence on the distribution generating the point-cloud isin the value of B (2) (similar to the role the mean and the variance play in the central limit theorem). In Sect. 5 of the Supplementary Information, we examine the value of B for different iid settings. As suggested by Conjecture1, our experiments confirm that the value of B (for the iid case) depends on (d,mathcal {T},k), but is otherwise independent of (mathbb {S}). Revealing the exact relationship between all parameters remains future work.

We note that models with homogeneous spacing between the points, such as perturbed lattice models, or repulsive point processes, do not follow Conjecture2. See Sect. 2.4 in the Supplementary Information.

A natural question is whether the observed limiting distribution (mathcal {L}^*) is a familiar one, and in particular, if it has a simple expression. Surprisingly, it seems that the answer might be yes. We denote the left-skewed Gumbel distribution by (textrm{LGumbel}), whose CDF and PDF are given by

$$begin{aligned} F(x) = 1-e^{-e^x},quad text {and}quad f(x) = e^{x-e^{x}}. end{aligned}$$

(3)

The expected value of this distribution is the Euler-Mascheroni constant ((lambda)) used in(2). In Fig.3, the black dashed lines represent the CDF and PDF of the LGumbel distribution. In addition, the right column presents the QQ-plots of all the different models compared to the LGumbel distribution. These plots provide very strong evidence for the validity of our final conjecture.

(mathcal {L}^* = textrm{LGumbel}).

The Gumbel distribution often emerges as the limit of extreme value distributions (i.e., minima or maxima). We wish to emphasize that the limiting LGumbel distribution in Conjecture 3 describes the entire ensemble of all (ell)-values and so it does not describe any extreme values. Consequently, while there may exist an indirect connection to extreme value theory, it is by no means evident or straightforward. The appearance of the LGumbel distribution in this context, is therefore quite surprising.

The plots in Fig.3 exhibit a remarkable fit with the LGumbel distribution, with one minor exception observed in the deviation of the tails of the QQ-plots. This slight deviation can be attributed to slow convergence rates. Existing theoretical results26 indicate that the largest (pi)-value tends to infinity with n, but its growth rate is logarithmic. Consequently, the largest (ell)-value is of order (log log log (n)), and so particularly the tails will exhibit a slow rate of convergence. Since the point-clouds we are able to process consist of at most millions of points, our ability to accurately capture the tail of the distribution is quite limited. It is important to note that this limitation holds regardless of the validity of Conjecture3. While the limiting distribution has a non-compact support, our experiments cover a restricted range of death/birth values. Therefore, if we draw the QQ-plot against any other distribution with non-compact support, we will observe similar deviations.

We present a large body of experimental evidence collected to support our conjectures. As the results do not seem to be significantly impacted by the choice of n, we leave this detail to the Supplementary Information (Section 3). Complementing the statistical plots presented here and in the Supplementary Information, we also performed a Kolmogorov-Smirnov goodness-of-fit test for our entire collection of point-clouds. The details are provided in Sect. 3.4 of the Supplementary Information. The test did not detect significant difference between the distribution of (ell)-values and the LGumbel distribution in any of the point-clouds, providing further support for the validity of Conjectures1, 2, 3.

We began by considering samples from the uniform distribution on various compact manifolds, with diverse geometry and topology. Next, we tested non-uniform distributions in (mathbb {R}^d), by taking products of well-known distributions on (mathbb {R}). We attempted to test a wide range of settings. The beta distribution has a compact support ([0,1]), while the normal and Cauchy distributions are supported on (mathbb {R}). The standard normal distribution is an obvious choice and Cauchy was chosen as a heavy-tailed distribution without moments. Finally, we considered more complex modelssampling from the configuration space of a closed five-linkage, and stratified spaces (intersecting manifolds of different dimensions). The results for many of the experiments are in Figs.2 and 3 (see Sect. 3 of the Supplementary Information for the complete set of experiments). All of the iid sampling models we tested support Conjectures1, 2, 3.

To better understand the extent of universality, as well as to consider more realistic models, we tested more complex cases. We tested two vastly different models: sampling points from the path of a d-dimensional Brownian motion, and a discrete-time sample of the Lorenz dynamical systema well-studied chaotic system. The results in Fig.3 confirm that these non-iid models exhibit strong universality as well. Surprisingly, the results for the Brownian motion demonstrate the best fit with the LGumbel distribution, among all the settings we tested (see Figs. 10 and 11 in the Supplementary Information). This could be related to the fractal, or self-similarity, behavior of the Brownian motion, but remains a topic for future study.

The most important test for Conjectures2 and 3 is with real world data. We tested three different examples (see Methods for mode details). (1) Natural images: We sampled (3times 3) patches from natural gray-scale images taken from van Hateren and van der Schaaf dataset34. We applied the dimension reduction procedure proposed byLee etal.35, which results in a point-cloud on a 7-dimensional sphere embedded in (mathbb {R}^8). We tested both the 7-dimensional point-cloud, as well as its lower-dimensional projections. (2) Audio recording: We applied the time-delay embedding transformation36 to an arbitrary speech recording to create a d-dimensional point-cloud. (3) Sentence embeddings: We used a pretrained sentence transformer37, to convert the entire text in a book into a 384-dimensional point-cloud. Our experiments using both the ech and Rips complexes, show a remarkable matching to the universal distribution (see Fig.3 for a subset).

Based on Conjectures23, we present a hypothesis testing framework for individual cycles in persistence diagrams. We address finite and infinite cycles separately.

Given a persistence diagram ({{textrm{dgm}}}= left{ p_1,ldots ,p_mright}), our goal is to determine for each point (p_i) whether it is signal or noise. This can be modelled as a multiple hypothesis testing problem with the i-th null-hypothesis, denoted (H_0^{(i)}), is that (p_i) is a noisy cycle. Assuming Conjectures2 and 3, we can formalize the null hypothesis in terms of the (ell)-values (1) as

$$begin{aligned} H_0^{(i)}: ell (p_i) sim textrm{LGumbel}. end{aligned}$$

In other words, cycles that deviate significantly from the LGumbel distribution should be declared as signal. If the observed persistence (ell)-value is x, then its corresponding p-value is computed via

$$begin{aligned} ptext {-value}_i = mathbb {P}left( ell (p_i) ge x;|;H_0^{(i)}right) = e^{-e^x}. end{aligned}$$

(4)

Since we are testing multiple cycles simultaneously, we applied the Bonferroni correction to the p-values, which sufficed for our experiments. The signal part of a diagram (for significance level (alpha)) can thus be recovered, via

$$begin{aligned} {{,{textrm{dgm}}}}_k^{_{textbf{S}}}(alpha ) = left{ pin {{{textrm{dgm}}}}_k: e^{-e^{ell (p)}} < frac{alpha }{|{{{textrm{dgm}}}}_k|}right} . end{aligned}$$

Computing persistent homology for an entire filtration is often intractable. The common practice is to fix a threshold (tau), and compute ({{textrm{dgm}}}_k(tau)) for the partial filtration. This often introduces cycles that are infinitei.e., born prior to (tau), but die after (tau). The question we address here is how to efficiently determine whether such cycles are statistically significant. Let (p=(textrm{b},textrm{d})in {{textrm{dgm}}}_k(tau)) be an infinite cycle, i.e., (textrm{b}le tau) is known and (textrm{d}>tau) is unknown. While we do not know (ell (p)), we observe that (ell (p)> tau /textrm{b}), which gives an upper bound for the p-value,

$$begin{aligned} ptext {-value}_i < ptext {-value}_i(tau ):= e^{-e^{tau /textrm{b}}}. end{aligned}$$

If (ptext {-value}_i(tau )) is below the required significance value (e.g.(alpha /|{{textrm{dgm}}}_k(tau )|)), we can declare p as significant, despite not knowing the true death-time. Otherwise, we can determine the minimal value (tau ^*) required so that (ptext {-value}_i(tau ^*)) is below the significance value. We then compute ({{textrm{dgm}}}_k(tau ^*)), and if the cycle represented by p remains infinite (i.e.(textrm{d} > tau ^*)), we declare it significant. We observe that for measuring significance, we do not need to know the exact value of (textrm{d}), only whether it is smaller or larger than (tau ^*), and we need onlyto compute the filtration up to (tau ^*), rather than the actual death time (textrm{d}). The key point is that the death time (textrm{d}) may be much larger than (tau ^*).

The procedure we just described works well for studying a single infinite cycle. However, it is likely that ({{textrm{dgm}}}_k(tau )) contains multiple infinite cycles. Moreover, increasing the threshold may result in new infinite cycles emerging as well. We therefore propose the iterative procedure described in Algorithm 1. Briefly, at every step the algorithm picks one infinite cycle, and chooses the next threshold (tau) so that we can determine if it is significant or not. The value (pi _{min }(x)) in the Algorithm 1, is the minimum (pi)-value required so that the resulting p-value (4) is smaller than x. Formally,

$$begin{aligned} pi _{min }(x) = ell ^{-1}left( F^{-1} left( 1-xright) right) = ell ^{-1}({{,mathrm{log log },}}(1/x)), end{aligned}$$

where F is the CDF of the LGumbel distribution. In the algorithm, we choose the earliest-born infinite cycle ((min (I))), while we could have chosen the latest-born ((max (I))), or any intermediate value. This choice represents a trade-off between the number of iterations needed and the overestimation of (tau). Choosing the earliest born cycle results in the smallest threshold, but with potentially more iterations, while choosing the last cycle will have fewer iterations with a possible overestimation of (tau).

Algorithm 1 Finding the threshold for infinite cycles

(begin{aligned} & tau leftarrow tau_{0} \ & {mathbf{do}} \ & quad D leftarrow {text{dgm}}_{k} (tau ) \ & quad I leftarrow left{ {{text{b}}:({text{b}},{text{d}}) in D,,{text{d}} = infty ,{text{and}},tau /{text{b}} < pi_{{{text{min}}}} left( {alpha /left| D right|} right)} right} \ & quad tau leftarrow left{ {begin{array}{*{20}l} {min ,(I) cdot pi_{min } left( {alpha /left| D right|} right)} hfill & {I ne emptyset } hfill \ tau hfill & {I = emptyset } hfill \ end{array} } right. \ & {mathbf{while}},left| I right| > 0 \ & {mathbf{return}},tau \ end{aligned})

We present two examples for our hypothesis testing framework. In all our experiments, we set the desired significance level to be (alpha =0.05).

Computing p-values:We begin with a toy example, sampling 1000 points on an 8-shape (a wedge of circles) in (mathbb {R}^2) (see Fig.4), where we vary the width of the neck. We expect one cycle to always be significant (the outer one), but the significance of the second cycle depends on the width of the neck. For each width value we computed the persistence diagram, and checked how many cycles were significant (i.e., (ptext {-value}

Computing p-values for the 8-shape (a wedge of circles). (top-left) Persistence diagrams for two instances of the 8-shape with different neck gaps, with the significance lines shown for (alpha =0.05) (the dotted and dashed lines correspond to (W=0.1) and (W=0.4), respectively). (top-right) The average number of signal cycles detected in 100 repetitions, as a function of the neck gap. (bottom) In green we show significant cycles. On the left ((W=0.1)) we see two significant cycles (p-values = 0.0005,0.012), and on the right ((W=0.4)) only the outer cycle is significant (p-value=0.0015).

Next, we apply this method to a real-world dataset, specifically the van-Hateren natural image database mentioned earlier. The main claim by de-Silva & Carlsson30 is that the space of (3times 3) patches has a 3-circle structure in (mathbb {R}^8), leading to the conclusion that the patches are concentrated around a Klein-bottle38. This was supported by five relatively long 1-cycles in the persistence diagram computed over the patches. To provide quantitative statistical support for this claim, we randomly selected a subset of the patches, processed them30, and computed p-value for all cycles (using the Rips complex). We repeated this experiment for varying numbers of patches, and computed the average number of detected signal cycles over 250 trials. The results are presented in Fig.5. Firstly, we observe that there exists a single 1-cycle that is nearly always detected (the primary circle), while other cycles appear as we increase the sample size. Secondly, we observe that the fifth cycle is intermittently detected. Plotting a 2-dimensional projection of the points, we see that this cycle contains very sparse areas, increasing its birth time and consequently the p-value.

To conclude, using this approach, we are able to correctly detect the signal cycles discovered by de-Silva & Carlsson30, as well as quantitatively declare the significance level for each cycle.

Testing the 3-circle model in the natural image patches. (left) Four different 2-dimensional projections of the 8-dimensional patches point-cloud. The projection on the top-right shows a 1-cycle that is quite thin and contains large gaps. (right) The p-value curve for the patches dataset as a function of the number of samples. We observe that we are not detecting 5-cycles in 100% of the cases. This is most likely due to the thin cycle.

Infinite cycles:To test Algorithm 1, we used a point-cloud on a 2-dimensional torus in (mathbb {R}^3). Computing the p-values of the two signal 1-cycles requires a massive complex and becomes computationally intractable for even a few thousand points. Using Algorithm 1, the filtration size is incrementally increased until the signal was detected. At 50k points, this saves approximately 80% of the edges that would otherwise be needed. This ratio would significantly increase for higher dimensional simplexes. See Sect. 7 in the Supplementary Information for the complete details.

Link:

A universal null-distribution for topological data analysis | Scientific ... - Nature.com

Read More..