Page 1,088«..1020..1,0871,0881,0891,090..1,1001,110..»

#HackInSAS: Utilizing AI/ML and Data Science to Address Today’s … – EPAM

In April, EPAMs data science experts participated in SASs 2023 Global Hackathon and received multiple awards: as a global technology winner for natural language processing (NLP) and as a regional winner for EMEA. With global participation from the brightest data scientists and technology enthusiasts, SAS hackathons look to tackle some of the most challenging, real-world business and humanitarian issues by applying data science, AI and open source cloud solutions. Teams have just one month to define a problem, collect the data and deliver a POC. This year, two EPAM teams participated in the event:

Lets dive deeper into these two exciting innovations

Social Listening for Support Services in Case of Disasters

EPAM Senior Data Scientists Leonardo Iheme and Can Tosun partnered with Linktera to create a tool to help decision makers in disaster relief coordination centers make data-driven, informed decisions. The solution harnesses the power of NLP and image analysis to turn the disruption of a natural disaster into actionable insights for coordination centers.

Just prior to the hackathon, Turkeys magnitude 4.9 earthquake struck resulting in serious questions about how to improve response to natural disasters. We wanted to help and worked with Linktera in Turkey to do just that. The goal is to streamline the decision-making process by providing data-backed decisions, so that resources can be allocated effectively for rapid response to critical situations.

Mining, Categorizing & Validating Social Media with NLP

This concept mines social media platforms, like Twitter, for real-time data, transforming this wealth of information into insights for disaster relief. In todays connected world, social media has become an essential tool for communication, where people share their experiences, seek help and stay informed. The system features advanced algorithms to filter out misinformation and validate crucial details. The goal is to not only provide verified information to coordination centers but also paint a clearer picture of the situation on the ground.

Applying NLP, we analyzed 140,000 tweets from Turkeys earthquake to identify the intent of help requests and classify them into relevant categories. To pinpoint the location of those in need, we used named entity recognition to extract addresses from tweets. We then used the Google Maps API to convert these textual descriptions into precise coordinates for mapping.

Assessing Infrastructure Damage with Machine Learning

After the earthquake, satellite image providers quickly made their highly valuable resources publicly available. This is a vital data source to validate and enrich complex social media data, which can help to understand the full extent of the disaster, the infrastructure damage, areas impacted, collapsed buildings and blocked routes that would otherwise hinder emergency response. Using advanced CV techniques, we compared satellite imagery before and after the earthquake. This methodology involves precise location identification, preprocessing of the satellite images and identifying damaged structures using advanced object detection and segmentation models.

To identify buildings within the geospatial data, we utilized a pretrained, deep learning object detection model, like the open source YOLO V5 architecture. This offers high accuracy and efficiency in detecting structures. Additionally, the team leveraged the latest segmentation model from Meta AI to delineate buildings and assess the degree of damage. This empowers stakeholders to make informed decisions with information from our satellite image processing that displays the locations and percentage of damage to detect buildings and identify blocked roads.

Building Data Visualizations for Disaster Relief

You need the right tools to make sense of the data. The PoC has a comprehensive and user-friendly dashboard for the disaster coordination centers to streamline their decision-making process and facilitate effective communication among various teams. The dashboard gathers data from the NLP and image analysis techniques and aggregates it into a single platform with an interactive map that pinpoints the locations of the affected areas, specific requirements and relevant labels to allow decision makers to quickly assess and prioritize their response. It also includes a layer dedicated to satellite imagery to visualize and assess the extent of the damage.

We hope that this design concept extends beyond Turkeys earthquake to all natural disaster relief efforts.

Mobility Insights Heidelberg A Digital Twin to Model Urban Traffic Flow

I joined EPAM Data Scientists Denis Ryzhov and Dzmitry Vabishchewich, alongside Digital Agentur Heidelberg and Fujitsu, to develop a digital twin of Heidelberg, Germany. Heidelberg is a popular tourist destination, which attracts 14 million visitors annually, almost 100 times its population. Predicting traffic and pedestrian flow by using data from IoT monitoring devices while considering the impact of weather is key to run the city smoothly and effectively. These predictions can enhance tourist experiences and safety, prevent accidents, improve planning for road closures for major events and help with future city planning and development. This is an active area of interest among many cities but dont have their own data science and technology experience in-house to accomplish this task alone.

Impacting Weather on Traffic Flow

For the first modeling initiative, the team wanted to understand and predict how weather patterns will impact the flow of traffic in the city. IoT sensor data included a central traffic light control system, cameras on traffic lights, parking garage sensors, bicycle count sensors and pedestrian count sensors. The team used time, weather and city event data to generate a decision tree and further improve the model by partitioning and using gradient boosting.

Predicting Parking Space Availability

For the teams second modeling initiative, the goal was to add a predictive parking availability function to guide motorists to parking spaces. By the end of the hackathon, the models were available on the citys website to provide long-term prediction of parking availability. It was modeled in parallel with the Python and SAS model builder using random forests for modeling. The short-term model learns patterns of occupancy from lag features. Further improvement was achieved by extrapolating a curve in response to unprecedented patterns. The model was further improved when weather and event data were added.

Forecasting Short-Term Traffic

For the third modeling initiative, the team looked at traffic flows in the city and generated short-term traffic predictions for the city to forecast traffic within the next three hours. The team performed a successful experiment in creating a model to predict traffic at each sensor location within the city, demonstrating the possibility of attaining high quality models from multiple locations. This stream was particularly challenging due to some of the gaps in the data, which we overcame by carefully selecting the analysis tools and techniques and filling gaps where necessary. The team used a light gradient boosting machine (LGBM) tree-based model for time series problems, so lagging features and rolling windows statistics are the best and quickest model to implement.

Conclusion

EPAMs proud of our hackathon teams and their award recognition from SAS. The teams delivered exciting PoC innovations using the latest AI technologies and data science practices to deliver on todays most challenging, real-world business and humanitarian issues. As always, we hope to inspire and foster technology innovations, and look forward to another great competition in next years #HackInSAS.

Here is the original post:

#HackInSAS: Utilizing AI/ML and Data Science to Address Today's ... - EPAM

Read More..

Oritain raises $57M for its forensic, big-data science approach to tracking the origin of goods – TechCrunch

Image Credits: Haje Kamps / TechCrunch

Global supply chains have made the world smaller by enabling us to have virtually anything we want at the tap of a finger. But when it comes to things like verifying a physical objects origin or its composition, those same fragmented chains and the many steps between a producer and a buyer can be an expensive minefield for the companies working within them.

Enter forensic traceability, a method that uses both data and forensic science to find the true origins of a product, and a New Zealand startup called Oritain, which has raised $57 million to continue developing its forensic traceability business, which includes not just technology for testing a products origins and composition, but also a growing fingerprint database used to help in the identification process.

Highland Europe is leading the round, with previous backer Long Ridge also participating. Oritain is not disclosing its valuation, except to confirm that it is higher than before on the back of an annual growth rate of over 90% and retention of its customers of over 100%.

Oritain currently provides SaaS-based tooling for researching the origins of food and textiles, and it works with 100 companies specifically big multinationals like Nescaf, Lacoste, Supima, and Primark, which use it to ensure they are getting what they are expecting, that customers are getting what theyre intending to send, and to help them stay in compliance with their own ESG policies, and the growing number of regulations aimed at creating rewards and penalties for violating best practices around sourcing and producing goods.

While a lot of consumer packaged goods today look like they are already traced to the hilt through barcodes, bluetooth tags and other tracking technologies, Oritains approach is a very clever complement that gets to the heart of the object itself.

The company operates from the premise that an object and the materials in it essentially tell a story. That story can be uncovered using methods not unlike those that forensics experts use in a crime lab, said CEO Grant Cochrane in an interview.

Cotton grown in one part of the world will look different in a lab, under a microscope, to cotton grown somewhere else. Factors like changes in the environment play a more significant part than you might realize in how that cotton grows and looks. Events like earthquakes and fires, as well as simply normal soil conditions in a particular locale, all can be traced by closely examining the material and comparing it against other material and other data.

Its based on causality, Cochrane said. Thats how our science works. Environmental change is very much our friend.

Taking a wide variety of measurements, Oritain then compares these measurements to other information it has about a location the claimed origin to determine whether that material (cotton in this example, but it could be coffee or something else) really came from the location its sender said it did.

The measurements are also recorded in its massive database. This in turn can be used to help identify or confirm the actual origin of future objects that might bear some of the same characteristics.

The science of this is solid enough that experts appearing in court in counterfeit cases have used Oritains data as their evidence, and its held up, noted Cochrane.

Oritains forensic science can take a commodity sample and tell you precisely where in the world it comes from, said Jacob Bernstein, a partner at Highland Europe, in a statement. Does this cocoa come from a deforested National Park? Is this cotton from where my supplier says it is? Is this coffee truly Brazilian, as the label says? This groundbreaking technology is a dream solution for sourcing and sustainability leaders at the worlds largest brands who can finally get to grips with the authenticity of their supply chains. We are immensely proud to be partnering with the Oritain team to revolutionize origin verification.

Excerpt from:

Oritain raises $57M for its forensic, big-data science approach to tracking the origin of goods - TechCrunch

Read More..

Science for the future | UNC-Chapel Hill – The University of North Carolina at Chapel Hill

Marcel Ravidat could never imagine what lay ahead when he and his dog, Robot, came across a foxhole in the southwestern French countryside in 1940. The tunnel was rumored to be a secret passage to a nearby manor, and the curious 18-year-old returned with three friends to investigate further. After days of digging, they found an ancient masterpiece 50 feet underground the cave art of Lascaux. Giant paintings and carvings of animals depict a variety of scenes throughout the chambers.

To data scientists, sites like Lascaux are full of not only artifacts, but raw data. While a spreadsheet of numbers may come to mind when one thinks of data, its definition is much more broad: factual information used as the basis for reasoning, discussion or calculation.

The people who created the work almost 20,000 years ago lit the cave with fireplaces and sandstone lamps fueled by animal fat. They used their hands, brushes and hollow bones to apply the paint. Scenes give insight into the fauna of the time, depicting some species that have long been extinct. The red, yellow and black pigments were prepared by mixing or heating minerals like hematite, iron oxyhydroxides, charcoal and manganese oxide. The closest known source of this type of manganese oxide was about 150 miles away, leading to the conclusion that these people conducted trade or used supply routes.

All of this is data and it gives valuable insight into the life of humans in Europe during the Upper Paleolithic era.

Data has been here forever. Before we humans were here, there was data about the universe, says Jay Aikat, vice dean of the UNC School of Data Science and Society and research professor of computer science. Its a matter of us paying attention to the data and what were doing with it to advance humankind.

That advancement is at the forefront of SDSS. Launched in 2022, the school is centered around progressing the field of data science and understanding how it impacts society.

As a hub of the technology and biotech industry, the Research Triangle is a fitting spot for a new school of data science. With around 4,000 tech companies, some of the fastest growing segments in the Triangle are in areas like analytics, nanotechnology and wearables. Giants like Vinfast, Wolfspeed and Apple as well as a myriad of smaller companies and startups are keen to hire graduates with strong data literacy.

For better or worse, data is pervasive in every area of life. Just like researchers understanding of the life of the Lascaux artists based on artifacts, data scientists can analyze behavior through modern technology. Our physical activity, internet search queries, purchasing and television preferences and driving tendencies are all tracked through means like smart devices, financial transactions and security cameras.

Data has a persuasive force. It has the ability to make arguments seem more plausible, more impactful, more strong, says Stan Ahalt, dean of SDSS and professor of computer science. People who use data have a persuasive podium. That podium can be used for very positive things, but it also could be used to distort things in a certain way.

It seems like the saying theres data supporting this has become the new I saw it on TV or I read it on the internet. Because of this persuasive force, Aikat says its imperative that students have a holistic understanding of data.

How we collect the data, how we analyze the data are we thinking about privacy and security as were collecting data? All of those things matter, Aikat says. So, we need to make sure that students are data literate. All our students are data literate.

SDSS will launch its online Master of Applied Data Science program in January 2024. Graduates will gain general skills in programming, statistics, mathematics, and data management, ethics, and governance, as well as specialized skills in machine learning, visualization and communication. This program will be followed by undergraduate and graduate degrees and a certificate program for working professionals.

The goal is to give students multiple avenues to incorporate data science courses into their degree, according to Aikat. This cross-disciplinary focus also drives how SDSS approaches research.

Were not just a silo as a school, she says. The success of the school really depends on having very strong collaboration with many different units across campus for UNC as a whole to be a data science powerhouse.

The concept for SDSS began moving forward in 2016, when Carolina geneticist and former vice chancellor for research Terry Magnuson charged the first committee to start thinking big about data science. As the founder of UNC-Chapel Hills genetics department, Magnuson has decades of experience building research units, engaging with industry partners and pushing science policy at local and national levels.

SDSS is currently in discussion with faculty across campus and industry partners to pinpoint research concentrations within the school. Broadly speaking, research will be centered around how data science can be used to solve social issues.

At the core of UNC is Service to the State, and the UNC School of Data Science and Society very much sees that [] as part of our mission, Aikat says. All of the research that were doing is focused on problem areas. How can we take what weve learned and apply that to social problems?

Ahalt stresses that data training is important for students in any field from the hard sciences to art history and everything in between.

Im increasingly convinced that any disciplinary area is going to be impacted by data increasingly as time goes by, Ahalt says. We havent been confronted with the volume of data that is now occurring in many industries. And so, as a consequence, I think anystudent in any degree is going to make themselves much more future-proofed by having the ability to use data in a very facile way.

View original post here:

Science for the future | UNC-Chapel Hill - The University of North Carolina at Chapel Hill

Read More..

HawkEye 360 Receives $58 Million in Series D-1 Funding to … – PR Newswire

Round led by funds and accounts managed by BlackRock will expand HawkEye 360's leadership delivering radio frequency geospatial intelligence to government customers

HERNDON, Va., July 13, 2023 /PRNewswire/ -- HawkEye 360 Inc., the world's leading defense technology company for space-based radio frequency (RF) data and analytics, announced today it has closed $58 million in new funding. The funding will be used to develop new space systems and expand analytics that support high-value defense missions. This Series D-1 round was led by funds and accounts managed by BlackRock (NYSE: BLK) with additional funding provided byManhattan Venture Partners and existing investors including Insight Partners, NightDragon, Strategic Development Fund (SDF), Razor's Edge, Alumni Ventures, and Adage Capital.

"HawkEye 360 continues to make the world a safer place through advanced RF analytics including addressing maritime, environmental, and national security needs," said HawkEye 360 CEO John Serafini. "We've learned much over the past four years delivering data to the most demanding customers in the world. We'll use this funding to drive our next steps in innovation. It speaks volumes that these leading investment firms are confident in the future of RF geospatial intelligence as a critical defense technology."

"We invest in first-class startups that have proven innovative technology, where we can come alongside to accelerate their growth," said Matt Singer, Managing Director, BlackRock. "Governments and commercial customers are asking for better intelligence and, with its full chain of control from orbit to analytics, Hawkeye 360 is leading the way for this new category of RF space-based data."

HawkEye 360 has 21 satellites in orbit and plans to move to a new Block 3 satellite architecture starting with Cluster 14 and beyond. The company is also investing further in artificial intelligence, data fusion, and multi-intelligence orchestration to better extract value from the large amount of RF data being collected. The goal is to simplify analysis for its customers.

"HawkEye 360 has disrupted what used to be a static defense intelligence domain," said Jared Carmel, Managing Partner and General Partner of Manhattan Venture Partners. "The company is the quintessential example of how a commercial operation could service the intelligence needs of the U.S. and our allies. They have built a growing market with government customers and are proof that private-sector innovation and leadership will help enable peace through strength, deter future conflict, and ensure global stability."

WilmerHale acted as legal counsel for HawkEye 360 in connection with the transaction.

Goodwin Proctor LLP acted as legal counsel for BlackRock in connection with the transaction.

For more information about HawkEye 360, please visitwww.he360.com.

About HawkEye 360

HawkEye 360 is a defense technology leader providing global knowledge of human activity and trends derived from revolutionary radio frequency (RF) geospatial intelligence. The company's innovative space-based technology was developed to detect, characterize, and geolocate a broad range of RF signals. These RF data and analytics provide an information advantage allowing analysts to detect the first glimpse of suspicious behavior, trace the first sign of adversarial activity, and reveal the first sighting of ships attempting to vanish. HawkEye 360's RF intelligence presents a quicker grasp of critical events and patterns of life, providing early warnings to drive tip-and-cue efforts, and providing leaders the insights needed to make decisions with confidence. HawkEye 360 is headquartered in Herndon, Virginia.

SOURCE HawkEye 360

View original post here:

HawkEye 360 Receives $58 Million in Series D-1 Funding to ... - PR Newswire

Read More..

What prevents us from reusing medical real-world data in research … – Nature.com

The main tasks in facilitating, or even enabling, the reuse of medical RWD in a research context are to promote interoperability, harmonization, data quality, and ensure privacy, to optimize the retrieval and management of patient consent, and to establish rules for data use and access12,13. These measures aim to address the various challenges of scientifically reusing routine clinical data described below.

Personal, i.e. non-anonymized medical data, is inherently sensitive1,17,22. As a result, uncertainties in MDS project preparation and execution arise for all roles involved in performing research on medical RWD, i.e. for patients, researchers and governing entities. The patients may lack trust in research using their personal data. Concerns about data misuse, becoming completely transparent and data leakage - especially in the case of long-term storage - can result in the patients overprotecting their own data and not giving their consent for its reuse in research23,24,25. On the other hand, it has also been shown that most EU citizens support secondary use of medical data if it serves further common good24. So, convincing patients about the social expediency of MDS can decrease their ambivalence and avoid overprotection. This can be achieved, for example, by reporting on MDS success stories13. A second important aspect is patient empowerment by informing patients about the processing and use of their data through open scientific communication and enabling their active engagement in the form of a dynamic consent management12,23.

However, there are also concerns on the part of the researcher resulting e.g. from a lack of explicit training in a complex landscape of ethical and legal requirements. These could be mitigated by discussions in interdisciplinary team meetings but differences in the daily work routine make it difficult to arrange such meetings8,9,18,21. As a consequence of unresolved concerns, researchers could delay or even cancel their MDS projects. Moreover, even governing entities such as data protection officers and ethics committees exhibit a certain level of uncertainty regarding permissible practices in MDS. They tend to overprotect the rights of the patients whose medical data is to be used while underestimating the necessity of reusing medical RWD for research purposes9,23,26,27. This leads to restrictive policies hindering scientific progress.

In general, education is a promising approach to address the uncertainties mentioned above. Technical training for medical researchers and governing entities as well as ethical and legal training for technical experts can increase confidence in project-related decision making1,18,23,24,27,28. The same effect can be achieved by developing MDS guidelines and actionable data protection concepts (DPC)13,14,15,16. A good example is the DPC of the MI-I that was developed in collaboration with the German working group of medical ethics committees (AK-EK)12. Figure1 summarizes the sources and consequences of the aforementioned uncertainties that lead to significant challenges in the reuse of medical RWD. Each source of uncertainty is associated with the roles it affects and possible measures to mitigate its impact. The challenges posed by these uncertainties are discussed in more detail below.

Sources and consequences of uncertainties that lead to significant challenges in the reuse of medical RWD. The sources of uncertainties are individually assigned to the roles they affect and possible measures to counteract them.

As mentioned above, the complex legal landscape resulting from various intervening laws contributes significantly to the uncertainty surrounding the reuse of medical RWD. At the European level, the General Data Protection Regulation (GDPR) holds substantial influence over the legal framework. In general, it prohibits the processing of health-related personal data (GDPR Art. 9 (1)) unless the informed consent of every affected person is given (GDPR Art. 9 (2a)) or a scientific exemption is present (GDPR Art. 9 (2j)). The latter is the case if the processing is in the public interest, secured by data protection measures, and adequately justified by a sufficient scientific goal. However, substantiating the presence of such a scientific exemption poses significant challenges29,30. Similarly, or even more difficult, is obtaining informed consent of patients after they have left the clinics. As such, both GDPR-based possibilities to justify the secondary use of RWD in research are difficult to implement in practice26,29. If the processing is legally based on the scientific exemption, GDPR Art. 89 further mandates the implementation of appropriate privacy safeguards supported by technical and organizational measures. Additionally, it stipulates that only the data necessary for the project should be utilized (principle of data minimization)30,31. This ensures the protection of sensitive personal data, but also introduces further challenges for the researchers.

The situation becomes further complicated due to the GDPR allowing for various interpretations by the data protection laws of EU member states30,31. Moreover, there are country-specific regulations, such as job-specific laws, that impact the legal framework of MDS31. This complex scenario poses particular challenges for international MDS projects29. As a result, identifying the correct legal basis and implementing appropriate data protection measures becomes exceptionally difficult29,30. This task, crucial in the preparation of clinical data set compilation, necessitates not only technical and medical expertise but also a comprehensive understanding of legal aspects. Thus, a well-functioning interdisciplinary team or researchers with broad training are essential.

Analyses of the current legal framework for data-driven medical research suggest that this framework is remote from practice and thus inhibits scientific progress31,32. To address these limitations, certain legal amendments or substantial infrastructure enhancements are necessary. Particularly, the infrastructure should focus on incorporating components and tools that facilitate semi-automated data integration and data anonymization. Although the current legal framework permits physicians to access, integrate, and anonymize data from their own patients, they often lack the technical expertise and time to effectively carry out these tasks. By implementing an infrastructure that enables semi-automated data integration and anonymization, researchers would be able to legally utilize valuable medical RWD without imposing additional workload on physicians29,30. Attaining a fully automated solution is not feasible since effective data integration and anonymization, leading to meaningful data sets, necessitate manual parameter selection by a domain expert. Nonetheless, by prioritizing maximal automation and specifically assigning domain experts to handle the manual steps in the process, rapid and compliant access to medical RWD, along with reduced uncertainties for researchers, can be achieved.

Not only the legal framework, but also ethical considerations can cause uncertainties. These can affect the patients and researchers but, in the context of an MDS project, especially the ethics committees as they have to judge whether a project is ethically justifiable. There are a variety of ethical principles to be taken into account for such a decision. These principles encompass patient privacy, data ownership, individual autonomy, confidentiality, necessity of data processing, non-maleficence and beneficence1,33. Considered jointly, they result in a trade-off to be made between the preservation of ethical rights of treated patients and the beneficence of the scientific project15,18,26. Criticism often arises concerning the prevailing trade-off in favor of patients privacy, where ethics committees tend to overprotect patient data23,27. What is frequently overlooked is the ethical responsibility to share and reuse medical RWD to advance medical progress in diagnoses and treatment. Thus, a consequence of overprotecting data is suboptimal patient care which is, in turn, unethical1,9,26. Measures to prevent overprotection are increasing the awareness of its risks through education, as well as the development of clear ethical regulations and guidelines28. To facilitate the latter, the data set compilation process for medical RWD should be simplified, e.g. by standardization of processes and data formats because its current complexity challenges the creation of regulations and guidelines17.

Many of the mentioned concerns related to legal and ethical requirements occur during project planning and design. Here a variety of decisions are made regarding the composition of the RWD set and its processing. These affect all subsequent project steps, but must be determined at an early stage if the project framework necessitates approvals from governing entities. This is because the governing entities require all planned processing steps to be documented in a study plan, serving as the foundation for their decision-making process. This results in long project planning phases due to uncertainties in a complex multi-player environment13,14,15,16,21. Additionally, creating a strict study plan usually works for clinical trials, but in data science, meaningful results often require more flexibility. For instance, it might be necessary to redesign the project plan throughout data processing. Therefore, project frameworks that show researchers how to reshape their project in specific cases would be much better suited for secondary use of medical RWD25,34. Taking it a step further, a general guideline or regulation on how to conduct MDS projects would decrease planning time and the risk of errors, both of which are higher if each project is designed individually14. To already now minimize the uncertainties in project planning and, thereby, the duration of the planning phase, research teams should communicate intensely and collaboratively plan their tasks9,18. Since this is a challenging task in a highly interdisciplinary environment, early definition of structures, binding deadlines, and clear assignment of responsibilities, such as designating a person responsible for timely data provision in each department, are crucial8,14.

As mentioned in the introduction to this section 3.1, dynamical consent management allowing the patients to effectively give and withdraw their consent at any point in time is a crucial measure to foster patient empowerment. As a result, it also leads to more acceptance of MDS by the affected individuals. Furthermore, in section 3.1.1 the informed patient consent was mentioned as a possible legal justification for processing personal sensitive data. However, the traditional informed consent requires patients to explicitly consent to the specific processing of their data. This means their consent is tied to a specific project35,36. For retrospective projects such a consent cannot be obtained during the patients stay at the hospital because the project idea does not exist at that time. Hence, the researcher would have to retrospectively contact all patients whose data is needed for the project, describe the project objective and methodology to them and then ask for their consent. This requires great effort, is, itself, questionable in terms of data protection and even not feasible if the patients are deceased. Making clinical data truly reusable in a research context, therefore, requires a broad consent in which the patients generally agree to the secondary use of their data in ethically approved research contexts. Furthermore, the retrieval of such a broad consent must be integrated into daily clinical routine and the consent management needs to be digitized. Otherwise, the information about the patient consent status might not be easily retrievable for the researcher8,18,21,37.

Previous research has documented that most patients are willing to share their data and even perceive sharing their medical data as a common duty38. Therefore, it is highly likely that extensively introducing a broad consent such as the one developed by the MI-I in Germany into clinical practice, combined with a fully digital and dynamic consent management, would have a significant positive impact on the feasibility of MDS projects39. It would allow patients to actively determine which future research projects may use their data.

When describing the challenges resulting from balancing benefits and harms in MDS projects, some measures were suggested that require technical solutions. One example for this is the implementation of data protection measures like data access control, safe data transfer, encryption, or de-identification20. However, there are not only technical solutions but also challenges, as shown in Fig.2.

Technical challenges of curating medical RWD sets and possible measures for improvement.

One category of technical challenges results from the specificities of medical data outlined in section 2. Medical RWD is characterized by a higher level of heterogeneity regarding data types and feature availability than data from any other scientific field18,19,26. Thus, compiling usable medical data sets from RWD requires the technical capabilities of skillful data integration, type conversion and data imputation. However, heterogeneity is not restricted to data formats. A common problem is differences in the primary purpose of data acquisition or primary care leading to different data formats and standards being used8. This results in different physicians, clinical departments, or clinical sites not necessarily using the same data scales or units, syntax, data models, ontology, or terminology. Hence, it is difficult to decide which standards to use in an MDS project. A subsequent challenge arising from this lack of interoperability is the conversion between standards that potentially leads to information loss19,26,40. Last but not least, heterogeneity is also reflected in different identifiers being used in different sites. This challenges the linkage of related medical records, which may even become impossible once the data is de-identified41. Promising and important measures to meet the challenges concerning heterogeneity are the development, standardization, harmonization and, eventually, deployment of conceptual frameworks, data models, formats, terminologies, and interfaces8,13,14,16,42. An example illustrating the feasibility and effectiveness of these measures is the widely used DICOM standard for Picture Archiving and Communications systems (PACS)18. Similar effects are expected from the deployment of the HL7 FHIR standard for general healthcare related data that is currently being developed43. However, besides appreciating the benefits of new approaches, the potential of already existing standards like the SNOMED CT terminology should not be neglected. It still has limitations, such as its complexity challenging the identification of respectively fitting codes and its incompleteness partly requiring to add own codes. On the other hand, SNOMED CT is already very comprehensive. Once its practical applicability is improved, SNOMED CT could be introduced as an obligatory standard in medical data systems fostering interoperability13,16,42.

Another significant technical challenge is the fact that a majority of medical RWD is typically available in a semi-structured or unstructured format, while the application of most machine learning algorithms necessitates structured data8,19,42,44. Primary care documentation often relies on free text fields or letters because they can capture all real-world contingencies while structured and standardized data models cannot. Additionally documenting the cases in a structured way, is too time-consuming for clinical practice. So, the primary clinical systems mainly contain semi-structured or unstructured RWD7,13,23. To increase the amount of available structured data, automated data structuring using Natural Language Processing (NLP) is a possible solution. However, it is not easy to implement for various reasons. Among them are the already mentioned inconsistent application of terms and abbreviations in medical texts and the requirement to manually structure some free text data sets to get annotated training data13,42.

Workflows in primary care settings not only lead to predominantly semi-structured or unstructured documentation of medical cases, but also greatly influence the design of clinical data management systems. In primary care and administrative contexts, such as accounting, clinical staff typically need a comprehensive overview of all data pertaining to an individual patient or case. As a result, clinical data management systems have been developed with a case- or patient-centric design that presents data in a transaction-oriented manner. However, this design is at odds with the need for query-driven extract-transform-load (ETL) processes when accessing data for MDS projects. These projects typically require only a subset of the available data features, but for a group of patients8,26. Developing a functional ETL pipeline is further complicated by the overall lack of accessible interfaces to the data management systems and the fragmented distribution of data across various clinical departments systems8,13.

This means the design of primary clinical systems could be improved significantly if it allowed for more flexibility, i.e. support patient- and case-centricity for primary care as well as data-centricity for secondary use. Moreover, the system design should comply with data specifications and developed standards rather than requiring the data to be created according to system specifications13. However, a complete redesign of primary clinical systems is most likely not feasible. An alternative solution is creating clinical data repositories in the form of data lakes or data warehouses that extract and transform medical RWD from primary systems and make it usable for research45,46. In this context, the use of standardized platforms and frameworks such as OMOP or i2b2 further increases the interoperability of the collected data47. In Germany, the MI-I established DIC and MeDIC whose goal is the creation of such data repositories for the medical RWD gathered at German university hospitals. As a common standard they agreed on the HL7 FHIR based MI-I core data set (CDS)48. Because this is work in progress and the data repositories are populated with data from primary clinical systems, the DIC and MeDIC still need to address the challenges identified in this comment paper to create FAIR data repositories for research.

See the article here:

What prevents us from reusing medical real-world data in research ... - Nature.com

Read More..

How to Start Data Science with Python | by AI News | Jul, 2023 – Medium

Photo by Fotis Fotopoulos on Unsplash

In todays data-driven world, data science has emerged as a crucial skillset for professionals across various industries. While artificial intelligence (AI) often steals the limelight, data science remains the foundation of AI and plays a pivotal role in extracting meaningful insights from vast amounts of data. Whether you aspire to work in technology, finance, healthcare, or any other field, having a strong foundation in data science is essential. This article will serve as a comprehensive guide to help you get started on your data science journey using Python as your primary tool.

Before diving into the technical aspects, it is important to grasp the fundamental concepts of data science. Data science involves extracting, transforming, and analyzing data to gain valuable insights and make informed decisions. It encompasses various techniques, such as statistical analysis, machine learning, data visualization, and predictive modeling.

Python has emerged as one of the most popular programming languages for data science due to its simplicity, versatility, and extensive libraries. It provides a user-friendly interface, making it accessible to both beginners and experienced programmers. Pythons rich ecosystem includes libraries like NumPy, Pandas, and Matplotlib, which offer powerful tools for data manipulation, analysis, and visualization.

To embark on your data science journey with Python, you need to set up your development environment. Follow these steps:

a. Install Python: Visit the official Python website (python.org) and download the latest version of Python compatible with your operating system. The website provides detailed installation instructions.

b. Choose an Integrated Development Environment (IDE): IDEs such as Jupyter Notebook, Spyder, and PyCharm are widely used for data science projects. Select an IDE that suits your preferences and install it on your machine.

Go here to read the rest:

How to Start Data Science with Python | by AI News | Jul, 2023 - Medium

Read More..

U.S. companies are on a hiring spree for A.I. jobsand they pay an average of $146,000 – CNBC

The U.S. is leading the way in artificial intelligence jobs, and many of them easily pay six figures, according to new data from the global job search platform Adzuna.

There were 7.6 million open jobs in the U.S. in June, according to the Adzuna database, with a growing share calling for AI skills: 169,045 jobs in the U.S. cited AI needs, and 3,575 called for generative AI work in particular.

Some of the most common jobs include tech roles you'd expect, like software engineer, product designer, deep learning architect and data scientist. But one fast-growing role where there's "absolutely a shortage" of qualified applicants is tax manager, says James Neave, Adzuna's head of data science.

Accounting and consulting firms are increasingly looking to hire accountants and tax managers with the right AI skills to make the business more efficient using large language models, he says.

The average tax manager job that'll use AI pays $100,445 a year, according to Adzuna data. Overall, the average job using the skill pays $146,244.

AI jobs have been around for decades but exploded in recent months as ChatGPT entered the scene in late 2022. Ever since, companies hoping to stay on the cusp of emerging technology have been desperate to hire people with skills to build the technology and implement it into their workflows, such as by building chatbots to improve customer service or processing large amounts of data to make business decisions.

One recentsurvey of LinkedIn's Top Companiesfound nearly 70% say AI is already helping them be faster and smarter, and another 32% say they expect to see larger gains from using it in the coming years. Companies likeEY explicitly listed AIas one of their top three hiring priorities, while Wells Fargo and Kaiser Permanente areimplementing it across their workflows.

"It's a super important skillset employers are looking for, across all industries," Jay Shankar, vice president of global talent acquisition at Amazon Web Services, told CNBC Make It in April. "AI is practically everywhere now and to me, if there's one technical skill you want to learn, that's the area to focus on."

Those interested in building their generative AI skills can look into certification and training courses online, from theUniversity of Michigan,Courseraand other e-learning platforms.

It could be a reliable skill in an unpredictable market that's seen some big-name tech layoffs in recent months, Neave says. People looking for high-paying roles should keep an eye on AI in the second half of the year in particular, he says.

And the U.S. seems to be leading the way in building its AI talent pool: While the U.S. advertised nearly 173,000 open AI roles in June, India posted 25,900 jobs and the UK listed 16,825 jobs during the same period.

Want to be smarter and more successful with your money, work & life?Sign up for our new newsletter!

Get CNBC's free report,11 Ways to Tell if We're in a Recession,where Kelly Evans reviews the top indicators that a recession is coming or has already begun.

Check out: Stanford and MIT study: A.I. boosted worker productivity by 14%those who use it will replace those who dont

See more here:

U.S. companies are on a hiring spree for A.I. jobsand they pay an average of $146,000 - CNBC

Read More..

Multi Million Dollar Gifts Received By Tulane, Florida Atlantic, And University Of Wisconsin-Milwaukee – Forbes

gift to support a campus-wide data science initiative.getty

This week saw historic private gifts received by three universities for special health or science initiatives. Announcing the receipt of multi-million dollar donations were the Univeristy of Wisconsin, Milwaukee; Tulane University; and Florida Atlantic University.

University of Wisconsin, Milwaukee

The University of Wisconsin, Milwaukee (UWM) announced the largest gift in its history - a $20 million donation from the Zilber Family Foundation to UWMs Joseph J. Zilber College of Public Health.

The gift will fund two endowments: the Zilber Faculty Excellence Fund and the Vera Zilber Student Program Fund. The endowments will enable an expansion of research and will increase both graduate-level and undergraduate scholarships.

The Zilber Family Foundations landmark gift is a vote of confidence in UWM and the colleges future, said UWM Chancellor Mark Mone in a news release. It reinforces our pillars of faculty excellence, top-tier research, student access and achievement. This extraordinary gift significantly enhances our ability to recruit and retain top faculty members while accelerating and supporting the students who will become our nations public health leaders. Today is one of the highlights of my career, and its due the partnership that we have here.

The Zilber Family Foundation was founded in 1961 by Joseph J. Zilber and his wife, Vera, owners of Zilber Ltd., a residential and commercial real estate firm. Its mission is to improve wellbeing by investing in nonprofit organizations that help ensure personal safety; increase social and economic opportunities; and improve quality of life in neighborhoods.

In 2007, Joe Zilber donated $10 million to support the development of the UW-Milwaukee Graduate College of Public Health, which is Wisconsins only accredited school of public health, one of 56 accredited public health schools in the country.

Tulane University

Tulane University reported it had received a $12.5 million donation from long-time supporters Libby and Robert Alexander to support a university-wide data science initiative. As a result of the gift, Tulanes Data Hub will be renamed the Connolly Alexander Institute for Data Science. Connolly is the family name of Libby Alexander, who is a Tulane graduate and member of the Board of Tulane.

The new institute will enable Tulane students across all disciplines to understand how data shapes our environment, to think critically about data-based arguments and to use data in their studies and careers, according to the university.

Understanding data in 2023 is as fundamental a skill as reading, writing and arithmetic, and its role in society will only grow in the coming years, especially with the emergence of artificial intelligence, the Alexanders said in the universitys release. Whatever their majors, Tulane students must know how to navigate data, and integrating data science across the curricula will cultivate their data literacy. We are thrilled to play a role in Tulanes data-centered evolution.

Libby Alexander earned her bachelors degree from Tulane in 1984. She and her husband, Robert, helped build Connolly, Inc. into a leading payment integrity firm. Libby ultimately became CEO of the company and later served as vice chairman of the board of Cotiviti, Inc., the successor company of Connolly, Inc. Robert, who majored in computer science at Boston University, ran his own computer company before joining Connolly, Inc. as its Chief Information Officer.

If anyone understands the importance of data management and analytics to the present and to the future, its Libby and Robert Alexander, Tulane President Michael Fitts said. Through their expertise and generosity, they have been instrumental in furthering and developing Tulanes strategy for implementing data literacy and data science at every level of the university. Were extremely grateful that the business success they achieved is now helping to drive this initiative at Tulane. Our students will reach new heights academically thanks to the support of this amazing Tulane couple.

The Alexander gift will allow us to hire more professors, instructional designers and data scientists, said Tulanes Data Hub Executive Director, Patrick Button. They can help us offer substantial additional programming and services, including establishing a data help desk for students and faculty, providing hands-on support for instructors developing new courses, and facilitating research collaborations, he added.

Florida Atlantic University

Florida Atlantic University received an $11.5 million gift from Ann and John Wood of the FairfaxWood Scholarship Foundation. It will establish the FairfaxWood Health and Innovation Technology Initiative, focused on building multi-disciplinary teams that will research the causes and treatment of amyloidosis, a rare disease caused by a buildup of abnormal amyloid fibrils in the body. Amyloidosis can affect different organs including the heart, brain, kidneys, spleen and other parts of the body.

A portion of the new gift will be used to establish an endowed FairfaxWood Chair of Clinical Neurosciences, who will direct discovery to cure initiatives for amyloidosis, and eventually other disorders.

The $11.5 million gift from the Wood family is their fourth contribution to FAUs College of Medicine and follows a $28 million donation in 2022 that funds medical education scholarships in memory of their son.

Philanthropy has an increasingly important role in advancing science and supporting vital research initiatives that have implications for people not just locally but across the globe, especially when it involves an illness or condition that is complex, multifactorial and difficult to treat, said Ann and John Wood. Amyloidosis in particular, is a disease that has personally impacted our family, and why we decided to invest in this initiative to usher in a new era to treat this disease, hopefully find a cure, and most importantly, provide patients with hope.

John and Ann Wood established Pres-T-Con, a prestressed concrete business in Trinidad. The firm built bridges, piers and cruise ship facilities throughout the Caribbean. The couple continued to operate the firm from Boca Raton until they sold it in 2005.

We are eternally grateful to Ann and John Wood for their vision, generosity and continued support of our medical school through this extraordinary gift, said FAU President Stacy Volnick, in the universitys announcement of the gift. The FairfaxWood Health & Innovation Technology Initiative will transform the way our researchers and clinicians study and treat amyloidosis and other serious medical conditions that require a synergistic approach to improve health and quality of life.

I am president emeritus of Missouri State University. After earning my B.A. from Wheaton College (Illinois), I was awarded a Ph.D. in clinical psychology from the University of Illinois in 1973. I then joined the faculty at the University of Kentucky, where I progressed through the professorial ranks and served as director of the Clinical Psychology Program, chair of the department of psychology, dean of the graduate school, and provost. In 2005, I was appointed president of Missouri State University. Following retirement from Missouri State in 2011, I became senior policy advisor to Missouri Governor Jay Nixon. Recently, I have authored two books: Degrees and Pedigrees: The Education of America's Top Executives (2017) and Coming to Grips With Higher Education (2018), both published by Rowman & Littlefield.

Continued here:

Multi Million Dollar Gifts Received By Tulane, Florida Atlantic, And University Of Wisconsin-Milwaukee - Forbes

Read More..

How CA fights wildfires with analytics and high tech – CalMatters

In summary

As nights warm and droughts intensify, past models predicting fire behavior have become unreliable. So California is working with analysts and tapping into new technology to figure out how to attack wildfires. Gleaned from military satellites, drones and infrared mapping, the information is spat out in real time and triaged by a fire behavior analyst.

Cal Fire Battalion Chief Jon Heggie wasnt expecting much to worry about when a late summer fire erupted north of Santa Cruz, home to Californias moist and cool asbestos forests. This place doesnt burn, he thought, with just three notable fires there in 70 years.

Heggies job was to predict for the crews where the wildfire might go and when, working through calculations based on topography, weather and fuels the immutable basics. For fire behavior analysts like Heggie, predictable and familiar are manageable, while weird and unexpected are synonyms for danger.

But that 2020 fire was anything but predictable.

Around 3 a.m. on Aug. 16, ominous thunder cells formed over the region. Tens of thousands of lightning strikes rained down, creating a convulsion of fire that became the CZU Lightning Complex.

By noon there were nearly two dozen fires burning, and not nearly enough people to handle them. Flames were roaring throughout the Coast Range in deep-shaded forests and waist-high ferns in sight of the Pacific Ocean. No one had ever seen anything like it. The blaze defied predictions and ran unchecked for a month. The fire spread to San Mateo County, burned through86,000 acres, destroyed almost 1,500 structures and killed a fleeing resident.

It was astonishing to see that behavior and consumption of heavy fuels, Heggie said. Seeing the devastation was mind-boggling. Things were burning outside the norm. I hadnt seen anything burn that intensely in my 30 years.

Almost as troubling was what this fire didnt do it didnt back off at night.

We would have burning periods increase in the afternoon, and we saw continuous high-intensity burns in the night, Heggie said. Thats when we are supposed to make up ground. That didnt happen.

Seeing the devastation was mind-boggling. Things were burning outside the norm. I hadnt seen anything burn that intensely in my 30 years.

That 2020 summer of fires, the worst in California history, recalibrated what veteran firefighters understand about fire behavior: Nothing is as it was.

Intensified by climate change, especially warmer nights and longer droughts, Californias fires often morph into megafires, and even gigafires covering more than a million acres. U.S. wildfires have been four times larger and three times more frequent since 2000, according to University of Colorado researchers. And other scientists recently predicted that up to 52% more California forest acreage will burn in summertime over the next two decades because of the changing climate.

As California now heads into its peak time for wildfires, even with last years quiet season and the end of its three-year drought, the specter of megafires hasnt receded. Last winters record winter rains, rather than tamping down fire threats, have promoted lush growth, which provides more fuel for summer fires.

Cal Fire officials warn that this years conditions are similar to the summer and fall of 2017 when a rainy winter was followed by one of the states most destructive fire seasons, killing 47 people and destroying almost 11,000 structures.

Its not just the size and power of modern wildfires, but their capricious behavior that has confounded fire veterans the feints and shifts that bedevil efforts to predict what a fire might do and then devise strategies to stop it. Its a dangerous calculation: In the literal heat of a fire, choices are consequential. Peoples lives and livelihoods are at stake.

Cal Fire crews now often find themselves outflanked. Responding to larger and more erratic and intense fires requires more personnel and equipment. And staging crews and engines where flames are expected to go has been thrown off-kilter.

We live in this new reality, Gov. Gavin Newsom said at a recent Cal Fire event, where we cant necessarily attach ourselves to some of the more predictive models of the past because of a world that is getting a lot hotter, a lot drier and a lot more uncertain because of climate change.

CalFire has responded by tapping into all the new technology such as drones, military satellites, infrared images and AI-assisted maps that can be brought to bear during a fire. Commanders now must consider a broader range of possibilities so they can pivot when the firefront shifts in an unexpected way. The agency also has beefed up its ability to fight nighttime fires with a new fleet of Fire Hawk helicopters equipped to fly in darkness.

We live in this new reality. ..Were enlisting cutting-edge technology in our efforts to fight wildfires, exploring how innovations like artificial intelligence can help us identify threats quicker and deploy resources smarter.

The state has thrown every possible data point at the problem with its year-old Wildfire Threat and Intelligence Integration Center, which pulls information from dozens of federal, state and private sources to create a minute-by-minute picture of conditions conducive to sparking or spreading fires.

Were enlisting cutting-edge technology in our efforts to fight wildfires, Newsom said, exploring how innovations like artificial intelligence can help us identify threats quicker and deploy resources smarter.

The 2017 Thomas Firestands as an example of what happens when a massive fire, ignited after a rainy winter, veers and shifts in unexpected ways.

The blaze in coastal Ventura and Santa Barbara counties struck in December, when fire season normally has quieted down. Fire veterans knew fall and winter fires were tamed by a blanket of moist air and fog.

But that didnt happen.

We were on day five or six, and the incident commander comes to me and asks, Are we going to have to evacuate Carpinteria tonight?, said Cal Fire Assistant Chief Tim Chavez, who was the fire behavior analyst for the Thomas Fire. I looked at the maps and we both came to the conclusion that Carpinteria would be fine, dont worry. Sure enough, that night it burned into Carpinteria and they had to evacuate the town.

Based on fire and weather data and informed hunches, no one expected the fire to continue advancing overnight. And, as the winds calmed, no one predicted the blaze would move toward the small seaside community of 13,000 south of Santa Barbara. But high temperatures, low humidity and a steep, dry landscape that hadnt felt flames in more than 30 years drew the Thomas Fire to the coast.

I looked at the maps and we both came to the conclusion that Carpinteria would be fine, dont worry. Sure enough, that night it burned into Carpinteria and they had to evacuate the town.

The sudden shift put the town in peril. Some 300 residents were evacuated in the middle of the night as the blaze moved into the eastern edge of Carpinteria.

In all, the fire, which was sparked by power lines downed by high winds, burned for nearly 40 days, spread across 281,000 acres, destroyed more than 1,000 homes and other buildings and killed two people, including a firefighter. At the time, it was the largest wildfire in Californias modern history; now, just six years later, it ranks at number eight.

The unforeseen assault on Carpenteria was an I-told-you-so from nature, the sort of humbling slap-down that fire behavior analysts in California are experiencing more and more.

Ive learned more from being wrong than from being right, Chavez said. You cannot do this job and not be surprised by something you see. Even the small fires will surprise you sometimes.

Scientists say the past 20 years have brought a profound and perhaps irreversible shift in the norms of wildfire behavior and intensity. Fires burn along the coast even when theres no desert winds to drive them, fires refuse to lay down at night and fires pierced the so-called Redwood Curtain, burning 97% of Californias oldest state park, Big Basin Redwoods.

The changes in wildfires are driven by an array of factors: a megadrought from the driest period recorded in the Western U.S. in the past 1,200 years, the loss of fog along the California coast, and stubborn nighttime temperatures that propel flames well into the night.

Higher temperatures and longer dry periods are linked to worsening fires in Western forests, with an eightfold increase from 1985 to 2017 in severely burned acreage, according to a 2020 study. Warmer and drier fire seasons corresponded with higher severity fire, the researchers wrote, suggesting that climate change will contribute to increased fire severity in future decades.

What we are seeing is a dramatic increase in extreme fire behavior, Heggie said. When you have a drought lasting 10 years, devastating the landscape, you have dead fuel loading and available fuel for when these fires start. Thats the catalyst for megafire. Thats been the driving force for change in fire behavior.

About 33% of coastal summer fog has vanished since the turn of the century, according to researchers at UC Berkeley. That blanket of cool, moist air that kept major fires out of coastal areas can no longer be relied upon to safeguard Californias redwood forests.

Firefighters are losing another ally, too, with the significant increase in overnight temperatures. Nighttime fires were about 28% more intense in 2020 than in 2003. And there are more of them 11 more flammable nights every year than 40 years ago, an increase of more than 40%.

The upshot is that fires are increasingly less likely to lie down at night, when fire crews could work to get ahead of the flames. The loss of those hours to perform critical suppression work and the additional nighttime spread gives California crews less time to catch up with fast-moving blazes.

Also, fire whirls and so-called firenados are more common as a feature of erratic fire behavior. The twisting vortex of flames, heat and wind can rise in columns hundreds of feet high and are spun by high winds.

Firenados are more than frightening to behold: They spread embers and strew debris for miles and make already dangerous fires all the more risky. One was spotted north of Los Angeles last summer.

Fires are really changing, and its a combination of all kinds of different changes, said Jennifer Balch, director of the Environmental Data Science Innovation & Inclusion Lab at the University of Colorado Boulder and a longtime fire researcher who tracks trends that drive wildfires.

We're losing fog. We're seeing drier conditions longer and later into the season. And so what that means for California right now is, under these record heat waves, we're also now butting up against the Santa Ana wind conditions, she said. I think we're loading the dice in a certain direction.

Among the many specialists at work are fire behavior analysts, who are responsible for predicting a fires daily movements for the incident commander. As a fire rages, Cal Fire analysts get their information in an avalanche of highly technical data, including wind force and direction, temperature and humidity, the shape and height of slopes, the areas burn history, which fuels are on the ground and, in some cases, how likely they are to burn.

Gleaned from satellites, drones, planes, remote sensors and computer mapping, the information is spat out in real time and triaged by the fire behavior analyst, who often uses a computer program to prepare models to predict what the fire is likely to do.

That information is synthesized and relayed quickly to fire bosses. Laptops and hand-held computers are ubiquitous on modern firelines, replacing the time-honored practice of spreading a dog-eared map on the hood of a truck.

On a typical day I would get up at 4:30 or 5, said Chavez, who has served as a fire behavior analyst for much of his career. We get an infrared fire map from overnight aircraft, and that tells us where the fire is active. Other planes fly in a grid pattern and we look at those still images. I might look at computer models, fire spread models, and the weather forecast. Theres other data that tells you what fuels are in the area. You plug all that in to see where the fire will be 24 hours from now.

At the fire camps 8 a.m. briefing, you get two minutes to tell people what to watch out for, he said. Throughout the day, Chavez says he monitors available data and hitches a helicopter ride to view the fire from the air. At another meeting at 5 p.m., he and other officers prepare the next days incident action plan. Then hes back to collating more weather and fire data. The aim is to get to bed before midnight.

The importance of the fire behavior analysts job is reflected by the sophistication of the tools available: real-time NOAA satellite data, weather information from military flights, radar, computer-generated maps showing a 100-year history of previous burns in the area as well as the current fuel load and its combustibility, airplane and drone surveillance and AI-enabled models of future fire movements. Aircraft flying over fires provide more detail, faster, about whats inside fire plumes, critical information to fire bosses.

In California, the National Guard is entering the fourth year of an agreement to share non-classified information pulled from military satellites that scan for heat signatures from the boost phase of ballistic missiles. When those heat images are associated with wildfires, the agencys FireGuard system can transmit detailed information to Cal Fire every 15 minutes.

Meteorologist Craig Clements, director of the Fire Weather Research Laboratory at San Jose State University, has chased fires for a decade.

We can pull up on a fire, and the radar starts spinning and youre peering into a plume within four minutes, Clements said. It gives us information about the particles inside, the structure of it.

Fire behavior decisions are not totally reliant on outside data inputs. Seasoned fire commanders remain firmly committed to a reliable indicator: the hair on the back of their necks.

Fireline experience and hard-earned knowledge still counts when formulating tactics. But its a measure of how norms have shifted that even that institutional knowledge can fail.

Perhaps the biggest leap is applying artificial intelligence to understand fire behavior. Neil Sahota, an AI advisor to the United Nations and a lecturer at UC Irvine, is developing systems to train a computer to review reams of data and come to a predictive conclusion.

The idea is not to replace fire behavior analysts and jettison their decades of fireline experience, Sahota said, but to augment their work and, mostly, to move much faster.

We can crunch billions of different data points in near real time, in seconds, he said. The challenge is, whats the right data? We may think there are seven variables that go into a wildfire, for example. AI may come back saying there are thousands.

In order for their information to be useful, computers have to be taught: Whats the difference between a Boy Scout campfire and a wildfire? How to distinguish between an arsonist starting a fire and a firefighter setting a backfire with a drip torch?

"We can crunch billions of different data points in near real time, in seconds. The challenge is, whats the right data?"

Despite the dizzying speed at which devices have been employed on the modern fireline, most fire behavior computer models are still based on algorithms devised by Mark Finney, a revered figure in the field of fire science.

Working from the Missoula Fire Sciences Laboratory in Montana, Finney has studied fire behavior through observation and, especially, by starting all manner of fires in combustion chambers and in the field. In another lab in Missoula, scientists bake all types of wood in special ovens to determine how fuels burn at different moisture levels.

Still, Finney is unimpressed by much of the sophisticated technology brought to bear on wildfires as they burn. He said it provides only an illusion of control.

Once you are in a position to have to fight these extreme fires, youve already lost, he said. Dont let anybody kid you, we do not suppress these fires, we dont control them. We wait for the weather.

The Missoula research group developed the National Fire Danger Rating System in 1972, which is still in place today. Among the fire behavior tools Finney designed is the FARSITE system, a simulation of fire growth invaluable to frontline fire bosses.

Dont let anybody kid you, we do not suppress these fires, we dont control them. We wait for the weather.

Finney and colleagues are working on a next-generation version of the behavior prediction system, which is now undergoing real-world tests.

This equation has an awful lot of assumptions in it, he said. Were getting there. Nature is a lot more complicated. There are still a number of mysteries on fire behavior. We dont have a road map to follow that tells us that this is good enough.

By far the best use of the predictive tools that he and others have developed is to learn how to avoid firestarts, he said, by thinning and clearing forests to reduce threat.

I would love to tell you that the key to solving these problems is more research. But if we just stopped doing research and just use what we know, wed be a lot better off.

Still, research about fire behavior races on, driven by the belief that you cant fight an enemy you dont understand.

Mike Koontz is on the frontlines of that battle, tucked into a semicircle of supercomputers. Koontz leads a team of researchers in Boulder, Colo., studying a new, volatile and compelling topic: California megafires.

We began to see a clear uptick in extreme fire behavior in California since the 2000s, said Koontz, a postdoctoral researcher with the Earth Lab at University of Colorado Boulder. We keyed in on fires that moved quickly and blew up over a short period of time. California is a trove of extreme fires, he said.

Koontz is using supercomputers to scrape databases, maps and satellite images and apply the data to an analytical framework of his devising. The team tracks significant fires that grow unexpectedly, and layers in weather conditions, topography, fire spread rates and other factors.

What comes out is a rough sketch of the elements driving Californias fires to grow so large. The next hurdle is to get the information quickly into the hands of fire commanders, Koontz said.

The goal: if not a new bible for fighting fires, at least an updated playbook.

Here is the original post:

How CA fights wildfires with analytics and high tech - CalMatters

Read More..

Science’s gender gap: the shocking data that reveal its true extent – Nature.com

Women in science publish less, collaborate less and are cited less but why?Credit: Getty

Equity for Women in Science: Dismantling Systemic Barriers to Advancement Cassidy R. Sugimoto and Vincent Larivire Harvard Univ. Press (2023)

Are gender inequities slowly disappearing in the natural and social sciences? Those who argue so might point, say, to election to prestigious academies, such as the US National Academy of Sciences and the American Academy of Arts and Sciences. Until the 2000s, women were under-represented, but in the past 20 years, women have been advantaged relative to similarly credentialed men in psychology, economics and mathematics. Equity for Women in Science is a convincing reply to those who advance such arguments. Less overt all but invisible gender gaps are still with us.

Every scientist experiences advantages and disadvantages, acceptances and rejections, citations and a lack thereof. In most societies men are advantaged relative to women, but at the individual level, advantage varies and is subject to multiple influences, including skin colour and socioeconomic status. The combined subtlety and variability of privilege make it difficult to observe and document the aggregate imbalances. That requires sophisticated and ingenious efforts.

Cassidy Sugimoto and Vincent Larivire, information scientists at the Georgia Institute of Technology in Atlanta and the University of Montreal in Canada respectively, have the sophistication and ingenuity, and have put in the effort. Their book summarizes scientometric and bibliometric analyses, conducted by themselves and their colleagues, of the influence of gender on outcomes in academia. The analyses show who publishes, who gets credit, who gets funding, who has job mobility, who collaborates and whose work is cited.

The book isnt all numbers. Besides copious amounts of data, the book provides revealing vignettes of the experiences of women in science, along with telling examples of institutional practices, both past and present. Nature, for instance, used the phrase men of science in its mission statement until the year 2000, and did not have a female editor-in-chief until 2018. My favourite example, also from 2018, concerns Donna Strickland, physicist at the University of Waterloo in Canada, who received a share of the Nobel prize in physics that year for her work on short-pulsed lasers. At the time, she was an associate professor whose Wikipedia entry had just been rejected on the grounds that she didnt meet the online encyclopedias notability criteria.

But the numbers do speak volumes. Sugimoto and Larivires global analyses show that although gender inequity occurs everywhere, there are interesting differences by country. For example, the proportion of female authors on papers varies, even among economically advantaged countries. Japan has lower rates than China (17% vs 26%), whereas China and Germany show similar rates. They are all lower than the world average of 31%.

Donna Stricklands Wikipedia page was initially rejected despite her later winning the Nobel prize.Credit: Jonathan Nackstrand/AFP/Getty

Some of Sugimoto and Larivires analyses are straightforward data mining, such as those documenting the common observation that female researchers, on average, publish less than male researchers do. (Non-binary status cannot be detected from their byline analysis). Without controls, on average, women published 20% fewer papers than men (4 vs 3.2 overall) between 2008 and 2020. That difference was reduced to 7% (4.2 vs 3.9), however, when the productivity analysis was restricted to a group of (presumably younger) researchers who published their first article in 2008. Younger people published more, with women increasing their production more than men. It is hard to pinpoint the driver for this, given the breadth of societal changes in the past two decades. Bibliometric analyses can reveal only so much: the trend might be explained by there being more women in almost every field now, more attention being given to gender gaps, more efforts to support women in science, more hiring of women at research institutions, or some combination of those and other factors.

But why does the gap in paper publication exist in the first place? The authors investigated the role of parenting, using data from an as-yet unpublished paper. It involved an international survey sent out to 1.5 million potential participants. Of these, 10,400 (fewer than 1%) yielded usable data. The representativeness of that sample is unclear, but the conclusions make sense. The authors found that the extent to which being a parent affected productivity depended more on how much time someone spent actively parenting than on how many children they had: if you leave the active parenting to someone else, it doesnt matter whether you have one child or five. According to various studies, women are on average more engaged in parenting than are men, especially invisible parenting being on call, planning, monitoring childrens emotional well-being and so on.

Sugimoto and Larivire address issues collaborations, mobility, funding that contribute to womens disadvantages relative to mens. A recent study in Nature demonstrated a deeper problem: women who appeared in progress reports for physics grants as doing equal work to men were nevertheless less likely to appear as authors on papers emanating from those grants (M. B. Ross et al. Nature 608, 135145; 2022). The more important the paper, the less likely women were to be included. Data from fields such as economics also suggest that womens contributions are undervalued compared with mens, even when they publish equally well in high-impact journals.

Perhaps the most important chapter of the book investigates disparities in citation rates. As the authors point out, ideas cannot change a field if people do not pay attention to them. Men are cited more than women are. People who believe that the present system is largely meritocratic would see citations as a reasonable proxy of an articles quality and importance. Does that mean that women just do lower-quality work?

Sugimoto and Larivire break things down by a journals impact factor to address this possibility. (The impact factor is the average number of times that articles published in a journal are cited.) Papers with men as first authors have at most a tiny citation advantage over those with female first authors for publications with impact factors of 1 or below. As impact factor increases, so do both the number of citations and the disparity. The average number of citations jumps from 2, for journals with an impact factor of 1.752, to 4 when the factor is above 2. At that impact factor, men have 0.5 citations more than women on average, compared with 0.1 below that factor. Women simply do not reap the same rewards as men.

How Nancy Hopkins and her tape measure revealed the extent of sexism in science

More interpretation of the importance of the citation disparity and the other disparities documented in the research would have been welcome. A sceptic might note, for example, that although women publishing in high-impact journals are cited considerably less frequently than men publishing in the same journals, they are still cited much more often than men or women in journals with lower-impact factors. Sugimoto and Larivire briefly bring in the concept of the accumulation of (dis)advantage how small advantages and disadvantages compound over time to produce notable effects but they could have spelled out its applicability at greater length and shown its effects. The original insight that advantage compounds over time similar to compound interest on an investment or debt was from the sociologists Robert Merton and Harriet Zuckerman. They in turn cite a considerably older, biblical source:Matthew 25:29, to every one who has, more will be given. Computer simulations show that small consistent differences in treatment add up to substantial changes in career trajectories.

Equity for Women in Science is primarily a compendium of the authors compelling research. It is weakest in its contextualization of that research. Since their work consists mainly of non-experimental analyses of large-scale patterns in publication, funding and migration between institutions, it does not directly address the underlying mechanisms. The book sparsely and selectively samples the large literature on the socio-psychological, organizational and institutional mechanisms that contribute to gender disparities, and interventions that can address them effectively.

Similarly, the authors do not tie together how they think the different components of the scientific enterprise interact. They eschew a large model that would show how, for example, funding and collaboration interact to affect academic careers. For readers with their own theories, the rich array of data could provide a testing ground even if it does not provide new insights. For those who want to challenge their beliefs in science as agender-fair enterprise, the data amply serve that purpose.

Excerpt from:

Science's gender gap: the shocking data that reveal its true extent - Nature.com

Read More..