Page 1,721«..1020..1,7201,7211,7221,723..1,7301,740..»

What is Data Mining? – SearchBusinessAnalytics

What is data mining?

Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis. Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.

Data mining is a key part of data analytics overall and one of the core disciplines in data science, which uses advanced analytics techniques to find useful information in data sets. At a more granular level, data mining is a step in the knowledge discovery in databases (KDD) process, a data science methodology for gathering, processing and analyzing data. Data mining and KDD are sometimes referred to interchangeably, but they're more commonly seen as distinct things.

Data mining is a crucial component of successful analytics initiatives in organizations. The information it generates can be used in business intelligence (BI) and advanced analytics applications that involve analysis of historical data, as well as real-time analytics applications that examine streaming data as it's created or collected.

Effective data mining aids in various aspects of planning business strategies and managing operations. That includes customer-facing functions such as marketing, advertising, sales and customer support, plus manufacturing, supply chain management, finance and HR. Data mining supports fraud detection, risk management, cybersecurity planning and many other critical business use cases. It also plays an important role in healthcare, government, scientific research, mathematics, sports and more.

Data mining is typically done by data scientists and other skilled BI and analytics professionals. But it can also be performed by data-savvy business analysts, executives and workers who function as citizen data scientists in an organization.

Its core elements include machine learning and statistical analysis, along with data management tasks done to prepare data for analysis. The use of machine learning algorithms and artificial intelligence (AI) tools has automated more of the process and made it easier to mine massive data sets, such as customer databases, transaction records and log files from web servers, mobile apps and sensors.

The data mining process can be broken down into these four primary stages:

Various techniques can be used to mine data for different data science applications. Pattern recognition is a common data mining use case that's enabled by multiple techniques, as is anomaly detection, which aims to identify outlier values in data sets. Popular data mining techniques include the following types:

Data mining tools are available from a large number of vendors, typically as part of software platforms that also include other types of data science and advanced analytics tools. Key features provided by data mining software include data preparation capabilities, built-in algorithms, predictive modeling support, a GUI-based development environment, and tools for deploying models and scoring how they perform.

Vendors that offer tools for data mining include Alteryx, AWS, Databricks, Dataiku, DataRobot, Google, H2O.ai, IBM, Knime, Microsoft, Oracle, RapidMiner, SAP, SAS Institute and Tibco Software, among others.

A variety of free open source technologies can also be used to mine data, including DataMelt, Elki, Orange, Rattle, scikit-learn and Weka. Some software vendors provide open source options, too. For example, Knime combines an open source analytics platform with commercial software for managing data science applications, while companies such as Dataiku and H2O.ai offer free versions of their tools.

In general, the business benefits of data mining come from the increased ability to uncover hidden patterns, trends, correlations and anomalies in data sets. That information can be used to improve business decision-making and strategic planning through a combination of conventional data analysis and predictive analytics.

Specific data mining benefits include the following:

Ultimately, data mining initiatives can lead to higher revenue and profits, as well as competitive advantages that set companies apart from their business rivals.

Here's how organizations in some industries use data mining as part of analytics applications:

Data mining is sometimes viewed as being synonymous with data analytics. But it's predominantly seen as a specific aspect of data analytics that automates the analysis of large data sets to discover information that otherwise couldn't be detected. That information can then be used in the data science process and in other BI and analytics applications.

Data warehousing supports data mining efforts by providing repositories for the data sets. Traditionally, historical data has been stored in enterprise data warehouses or smaller data marts built for individual business units or to hold specific subsets of data. Now, though, data mining applications are often served by data lakes that store both historical and streaming data and are based on big data platforms like Hadoop and Spark, NoSQL databases or cloud object storage services.

Data warehousing, BI and analytics technologies began to emerge in the late 1980s and early 1990s, providing an increased ability to analyze the growing amounts of data that organizations were creating and collecting. The term data mining was in use by 1995, when the First International Conference on Knowledge Discovery and Data Mining was held in Montreal.

The event was sponsored by the Association for the Advancement of Artificial Intelligence, or AARI, which also held the conference annually for the next three years. Since 1999, the conference -- popularly known as KDD 2021 and so on -- has been organized primarily by SIGKDD, the special interest group on knowledge discovery and data mining within the Association for Computing Machinery.

A technical journal, Data Mining and Knowledge Discovery, published its first issue in 1997. Initially a quarterly, it's now published bimonthly and contains peer-reviewed articles on data mining and knowledge discovery theories, techniques and practices. Another publication, the American Journal of Data Mining and Knowledge Discovery, was launched in 2016.

Link:

What is Data Mining? - SearchBusinessAnalytics

Read More..

Main Page | Data Mining and Machine Learning

Data Mining and Machine Learning: Fundamental Concepts and Algorithms

Second Edition

Mohammed J. Zaki and Wagner Meira, Jr

Cambridge University Press, March 2020

ISBN: 978-1108473989

The fundamental algorithms in data mining and machine learning form thebasis of data science, utilizing automated methods to analyze patternsand models for all kinds of data in applications ranging from scientificdiscovery to business analytics. This textbook for senior undergraduateand graduate courses provides a comprehensive, in-depth overview of datamining, machine learning and statistics, offering solid guidance forstudents, researchers, and practitioners. The book lays the foundationsof data analysis, pattern mining, clustering, classification andregression, with a focus on the algorithms and the underlying algebraic,geometric, and probabilistic concepts. New to this second edition is anentire part devoted to regression methods, including neural networks anddeep learning.

This second edition has the following new features and content:

New part five on regression: contains chapters on linear regression,logistic regression, neural networks (multilayer perceptrons), deeplearning (recurrent and convolutional neural networks), and regressionassessment.

Expanded material on ensemble models in chapter 24.

Math notation has been clarified, and important equations are nowboxed for emphasis throughout the text.

Geometric view emphasized throughout the text, including for regression.

Errors from the first edition have been corrected.

You can find here the online book,errata, table of contents and resources like slides,videos andother materials for the new edition.

Description of the first edition is alsoavailable.

Mohammed J. Zaki, Rensselaer Polytechnic Institute, New York

Mohammed J. Zaki is Professor of Computer Science at Rensselaer Polytechnic Institute, New York, where he also serves as Associate Department Head and Graduate Program Director. He has more than 250 publications and is an Associate Editor for the journal Data Mining and Knowledge Discovery. He is on the Board of Directors for Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD). He has received the National Science Foundation CAREER Award, and the Department of Energy Early Career Principal Investigator Award. He is an ACM Distinguished Member, and IEEE Fellow.

Wagner Meira, Jr, Universidade Federal de Minas Gerais, Brazil

Wagner Meira, Jr is Professor of Computer Science at Universidade Federal de Minas Gerais, Brazil, where he is currently the chair of the department. He has published more than 230 papers on data mining and parallel and distributed systems. He was leader of the Knowledge Discovery research track of InWeb and is currently Vice-chair of INCT-Cyber. He is on the editorial board of the journal Data Mining and Knowledge Discovery and was the program chair of SDM'16 and ACM WebSci'19. He has been a CNPq researcher since 2002. He has received an IBM Faculty Award and several Google Faculty Research Awards.

Read more here:

Main Page | Data Mining and Machine Learning

Read More..

Is it Time to Decolonize Global Health Data? – Research Blog – Duke University

In the digital age, we are well-acquainted with data, a crouton-esque word tossed into conversations, ingrained in the morning rush like half-caf cappuccinos and spreadsheets. Conceptually, data feels benign, necessary, and totally absorbed into the zeitgeist of the 21st century (alongside Survivor, smartphones, and Bitcoin). Data conjures up the census; white-coat scientists and their clinical trials; suits and ties; NGO board meetings; pearled strings of binary code; bar graphs, pie charts, scatter plots, pictographs, endless excel rows and columns, and more rows and more columns.

However, within the conversation of global health, researchers and laymen alike would more often than not describe data collection, use, and sharing as critical for resource mobilization, disease monitoring, surveillance prevention, treatment, etc. (Look at measles eradication! Polio! Malaria! Line graphs A, B, and C!)

Thanks to the internet, extracting health data is also faster, easier, and more widespread than ever . We have grown increasingly concerned, and rightfully so, about data ownership and data sovereignty.

Who is privy to data? Who can possess it? Can you possess it? As you can see, the conversation quickly becomes convoluted, philosophical even.

Dr. Wendy Prudhomme OMeara, moderator of the Data as a Commodity seminar on Sept. 29 and associate professor at Duke University Medical School in the Division of Infectious Diseases, discussed bioethical complexities of data creation and ownership within global health partnerships.

We can see that activitieswhere data is being collected in one place, removed from the context, and value being extracted from it for personal or financial benefit has very strong parallels to the kind of resource extraction and exploitation that characterized colonization, she said in her introductory remarks.

Data, like other raw materials (i.e. coffee, sugar, tobacco, etc.), can be extracted, often disproportionately, from lower-middle income countries (LMICs) at the expense of the local populations. This reinforces unequal power dynamics and harkens to the tenets of colonialism and imperialism.

This observation is exemplified by panelist Thiago Hernandes Rochas research which focuses on public policy evaluation and data mining. He acknowledges that global health research, in general, should prioritize the health improvements of the studied community rather than publications or grant funding. This may seem somewhat obvious to you; however, though academic competition often fosters nuances in the field, it also contributes to the commercialization of global health. Dont be shy, everyone point your finger at Big Pharma!

Though Dr. Rochas data mining technique refers to pattern-searching and analysis of dense data sets, I find mining to be an apt analogy for the exploitative potential of data extraction and research partnerships between higher income countries and LMICs.

Consider the British diamond industry in Cape Colony, South Africa, and the parallels between past colonial mineral extraction and current global health data extraction. Imagine taking a pickaxe to the earth.

Now consider the environmental ramifications of mining, and who they disproportionately affect. Consider the lingering social and economic inequalities. Of course, data is not a mine of diamonds (as your Hay Day farm might suggest) nor is it ivory or rubber or timber. Its less tangible (you cant necessarily hold it or physically possess it) and, therefore, its extraction also feels less tangible, even though this process can have very concrete consequences.

Data as a power dynamic is a rather recent characterization in academic discourse. Researchers and companies alike have pushed the open data movement to increase data availability to all people for all uses. You can see how, in a utopian society, this would be fantastic. Think of the transparency! Im sure you can also see how, in our non-utopian society, this can be exploited.

Dr. Bethany Hedt-Gauthier, a Harvard University biostatistician and seminar speaker, described herself as pro open data in a world without power dynamics an amendment critical to understanding research as a commodity itself.

She justified her stance by referencing the systematic review of authorship in collaborative health research in Africa that she conducted in collaboration with others in the field. They found that even when sub-Saharan African populations were the main sites of study, when partnered with high-income, elite institutions (like Duke or Harvard), the African authors were significantly less likely to be first or senior authors despite the comparable number of academics on both sides of the partnership.To what can we attribute this discrepancy?

Dr. Hedt-Gauthier describes forms of capital that contribute to this issue, from cultural capital (i.e. credentials) to symbolic capital (i.e. legitimacy) to financial capital; however, she poses colonialism (and its continuity in socioeconomic and political power dynamics today) as the root of this incongruity from which the aforementioned forms of capital bud and flower like poisonous oleander. In recent years, institutions, including Duke, have increased efforts to decolonize global health to achieve greater equity, equal participation, and better health outcomes overall.

Dr. Hedt-Gauthier briefly chronicled some of her own research in Rwanda at the start of the COVID-19 pandemic. Within her research partnerships, she recollected slowing down, thoughtfully engaging in two-way dialogue, and posing questions like the following: Who is involved [in the partnership]? Are all parties equally represented in paper authorship? If not, how can we share resources to ensure this? How can we assure that the people involved in the generation of data are also involved in the interpretation of its results? Who has access to data? What does co-authorship look like?

Investing time and energy into multi-country databases, funding collaborative research infrastructures, removing barriers within academia, and training researchers are just some of the methods proposed by the speakers to facilitate equitable partnerships, data sharing and use, and continued global health decolonization.

Dr. Osondu Ogbuoji, the final panelist, puts it best: We should ensure that the people in the room having the discussion about what values the data has should be as diverse as possible and ideally should have all the stakeholders. In our own research, sometimes we think we have an idea of what data to collect, but then we talk to the country partners and they have a totally different idea.

Though the question of data ownership may feel lofty or intangible, though data legality is confusing, though you may feel yourself adrift in the debate of commodity and capital, the speakers have thrown you a buoy, grab on, and understand that generally:

It is necessary to engage with data in a communicative and critical manner; it is necessary to build research partnerships that are synergistic and reciprocal; and, finally, it is necessary to approach global health via these partnerships to advance the field towards greater equity.

Post by Alex Clifford, Class of 2024

Watch the recorded seminar here: https://www.youtube.com/watch?v=wRmFzif8a1c

View original post here:

Is it Time to Decolonize Global Health Data? - Research Blog - Duke University

Read More..

11 of the Best Process Mining Tools and Software – Solutions Review

The editors at Solutions Review have compiled the following list to spotlight some of the top process mining tools and software solutions for companies to consider.

Regardless of the industry, almost every business can benefit from process mining tools. For example, Gartner defines process mining as tools that discover, monitor, and improve real processes (not assumed processes) by extracting knowledge from event logs readily available in todays information systems. Process mining is sometimes a built-in capability of a broader Business Process Management (BPM) solution suite, but it can also be found as standalone software.

However, choosing the right process mining capabilities for your company can be complicated. Itrequires in-depth research and often comes down to more than just the solution and its technical capabilities. To make your search a little easier, our editors have profiled some of the best process mining tools and software in one place. The editors have listed the companies in alphabetical order.

Description: Appian is a leading low-code platform provider that allows both experienced and citizen developers to build process-centric and case-centric applications with the ability to monitor and improve business processes in response to changing needs. With Appians process mining capabilities, organizations can integrate data from multiple systems, identify process bottlenecks, develop purpose-built dashboards for specific analysis needs, predict process behaviors, design optimized workflows, maintain compliance with process standards, reduce operational costs, and more.

Description:Bizagi is a leader in digital business process automation software. The vendor offers three tiers of solutions, including Bizagi Engine, Bizagi Studio, and Bizagi Modeler. Bizagos process mining capabilities are included via the Enterprise model of Bizagi Modeler, which equips companies with the process mining tools they need to understand their processes. Other capabilities available with Bizagi Modeler Enterprise include value-chain diagrams, Single Sign-On, model sharing, private cloud storage, real-time notifications, and more.

Description:Bonitasoftdevelops BPM software for developers to build business applications that adapt to real-time changes, UI updates, and more. WithBonitasoft, users can automate, model, and monitor business processes to streamline operations. The software automatically checks for errors and highlights them before users save their business model. Companies can use Bonitasofts AI-powered process mining algorithms to analyze data, improve visibility, identify patterns, track performance indicators, define business operating models, predict issues, and create opportunities for improvement.

Description:Celonis is a global provider of execution management solutions that help companies improve how they run their business processes. With Celonis suite of process and task mining capabilities, companies across industries can improve visibility into their operations, identify bottlenecks, and streamline efficiencies. Those capabilitiespowered by machine learning and industry-standard process query language (PQL)include analytic visualizations, drag-and-drop customization tools, task mining, extensible data models, multi-event logs, best-practice benchmarking, and tools for identifying processes that could benefit from automation.

Description:Fluxicon is a process mining solution provider for business process managers and consultants. The companys process mining product, Disco, can help users reduce costs, improve quality, compare processes beyond KPIs, and create high-level models of their processes. Its capabilities include process map animations, detailed statistics, interactive charts, automated process discovery, user-friendly log filters for drilling deeper into data, project management, performance filters, and multiple options for importing and exporting data.

Description:The IBM Process Mining product suite uses data-driven process insights to help companies across markets improve processes and make faster, more informed decisions. IBMs process mining tools can be applied in use cases like intelligent automation, customer onboarding, procure-to-pay (P2P), accounts payable, IT incident management, and order-to-cash. Features include automated robotic process automation (RPA) generation,fact-based process models, AI-powered process simulations, conformance checking, task mining, and seamless integrations with leading software SAP, Oracle, and other IBM products.

Description:TheiGrafx Business Transformation Platform enables transformation by connecting strategy to execution while mitigating risk, ensuring compliance, and providing a framework for governance, resiliency, business continuity, and continuous improvement. iGrafxs process mining technology combines artificial intelligence (AI) and machine learning to help users capture up-to-date details about how particular processes are working, identify opportunities for improvement, standardize processes, and assess which strategies could benefit from a shared services approach.

Description:Pega offers a Business Process Management tool developed on Java and OOP concepts. The platform allows users to assemble an executable business application using visual tools. With Pegas process mining software, companies can identify bottlenecks, analyze the history of a process, optimize for the future, monitor the impact of changes, deploy actual self-optimizing workflows, and integrate with Pegas low-code platform. These features can also enable users of all skill levels to improve how they discover, analyze, and apply insights into company workflows.

Description:Signavio, an SAP company, is a leading provider of BPM solutions, offering an integrated software solution allowing you to model, analyze, optimize, and execute business processes and decisions on one platform. The companys collaborative process mining solution, SAP Signavio Process Intelligence, equips companies with the capabilities needed for achieving improvements across system landscapes, data sources, and departments. Those capabilities include performance monitoring tools, advanced process mining algorithms, process data management, multiple integration options, process modeling, and more.

Description:Software AG offers business process management tools that provide the control needed to improve every business processs speed, visibility, consistency, and agility while minimizing costs and increasing standardization. With ARIS, Software AGs Business Process Transformation solution suite, businesses simulate process optimization scenarios, identify anomalies, find opportunities for improvements, and get a deeper view of their company processes. Use cases for these capabilities include order-to-cash, service management, procure-to-pay, supply chain management, and more.

Description:UiPath is a global provider of an end-to-end automation platform that combines robotic process automation (RPA) with artificial intelligence, process mining, and cloud technologies to help companies scale their digital business operations. Its process mining product suite comprises process analytics, app templates, automated alerts, built-in data transformation, intelligent software robots, task mining, and other capabilities for analyzing data from business applications. UiPathis available as a cloud-based and on-premises solution.

William Jepma is an editor, writer, and analyst at Solutions Review who aims to keep readers across industries informed and excited about the newest developments in Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Business Process Management (BPM), and Marketing Automation. You can connect with him on LinkedIn or reach him via email at wjepma@solutionsreview.com.

Read more from the original source:

11 of the Best Process Mining Tools and Software - Solutions Review

Read More..

Enhancing Enrollment in Biomarker-Driven Oncology and Rare Disease Trials – Applied Clinical Trials Online

Integrated approaches can help enhance recruitment plans.

Oncology accounts for 27% of all clinical trials conducted since 2017. Compared to other therapeutic areas, oncology trials are more resource-intensive and less efficient, requiring an average of 16 clinical sites to enroll a median of just 31 patients per study. In fact, average enrollment duration for oncology trials is two times longer than for all diseases combined (22 months vs. 11 months).1

Yet research interest remains high, with oncology accounting for 45% of all planned studies from Q4 2021 to Q4 2024. Nearly half of these are Phase II studies and almost 40% include countries in North America, while 35% are being conducted in Asia.2 As of July 2022, among the 19,700 new drugs in the pipeline, 6,731 (34%) are for cancer. This robust development activity spans over 20,000 organizations across 116 countries. Interestingly, 72% of these oncology trials are sponsored by companies outside of the top 50 pharmaceutical organizations and 67% are for rare and orphan disease indications.1

Among the more than 1,500 potential oncology biomarkers that have been identified in the preclinical setting, approximately 700 are involved in the active or planned clinical trials in oncology. Over 60% of these studies are for immuno-oncology drugs, with the remainder for targeted therapies.3

The rise of personalized medicine has been driven by biomarkers, which have enabled researchers to understand the science behind mechanism of action and have been used to target recruitment. More than one-third of all drugs approved by FDA since 2000 have been personalized medicines, demonstrating that biomarker-driven approaches help optimize treatment impact and improve patient outcomes.4 In fact, a recent analysis of 9,704 development programs from 2011 to 2020 found that trials employing preselection biomarkers have a two-fold higher likelihood of approval, driven by a nearly 50% Phase II success rate.3

The value of biomarkers is not limited to the clinical trial setting. Rather, biomarkers play a critical role throughout drug discovery and development, bridging preclinical and clinical studies. Incorporating biomarkers into programs requires careful choreographyfrom collecting biological samples and analyzing them in decentralized or specialty labs to generating data that will be integrated with other clinical information to support decision-making. It may also require a broad spectrum of logistics and laboratory management capabilities for handling a range of sample types.

Table 1 below provides a sampling of FDA-approved biomarker-driven therapies. A key challenge of integrating biomarkers into development programs is selecting the right biomarker. Often, the frequency of the biomarker of interest is very low. The same or similar biomarker may be present in multiple tumor types at varying frequencies, as is the case with HER2 amplifications in breast and gastric cancer. Biomarker frequency may also differ among races and ethnicitiesit may also change as the disease progresses. For example, EGFR exon 20 T790M alterations increase in frequency in patients with non-small cell lung cancer who have become resistant to previous lines of therapy. Consequently, selecting the right biomarker is akin to finding a needle in a haystack.

Precision for Medicine was involved in an oncology cell therapy study, where eligibility was based on the expression of two biomarkers. The first biomarker was expression of human leukocyte antigen (HLA)-A*02:01 and the second was a tumor type expressing a certain cell receptor on at least 80% of cancer cells. Precision for Medicine performed an analysis and found that the prevalence of HLA-A*02:01 varied among geographic regions, with a prevalence of 38.5% to 53.8% in Europe and 16.8% to 47.5% in North America (see Figure 1 below). Based on this finding, we recommended conducting this clinical trial in Europe.

Analysis of the expression of the cell receptor of interest showed that expression levels varied not only by tumor type, but even by subtype or demographic (see Table 2 below).

We used these findings to project the number of patients and samples that would need to be screened in order to enroll 36-40 study participants. Our assumptions were that 30% of patients screened would have HLA-A*02:01 expression and 10% of those patients would have 80% biomarker expression, and 50% of those would meet all the inclusion criteria for the study.

Based on these assumptions, it was determined that HLA analysis would need to be performed on approximately 2,500 blood samples and immunohistochemistry would need to be performed on about 750 tumor tissue samples to reach the enrollment target.

To increase the efficiency of this study, the Precision for Medicine team implemented various strategies for streamlining the recruitment process:

As oncology clinical research evolves toward personalized treatment of patients in niche populations, a biomarker-driven approach to drug discovery and development is required. With new biological targets frequently having a low level of prevalence, it is important for researchers and developers to look for more innovative approaches to patient identification.

Esther Mahillo, Vice President, Operational Strategy and Feasibility, Precision for Medicine

Read more here:

Enhancing Enrollment in Biomarker-Driven Oncology and Rare Disease Trials - Applied Clinical Trials Online

Read More..

NanoString and Visiopharm Announce Collaboration to Co-develop Integrated Workflows for GeoMx and AtoMx Spatial Biology Solutions – Business Wire

SEATTLE--(BUSINESS WIRE)--NanoString Technologies, Inc. (NASDAQ: NSTG), a leading provider of life science tools for discovery and translational research, and Visiopharm, a world leader in AI-driven digital pathology software, today announced a collaboration to accelerate the discovery of novel biomarkers and drug targets using the latest spatial imaging and machine learning technologies. Together, NanoString and Visiopharm are developing integrated workflows leveraging the multiplexing capability of the GeoMx Digital Spatial Profiler (DSP) and the AI-driven image analysis capabilities of Visiopharm. NanoStrings new cloud informatics platform, the AtoMx Spatial Informatics Platform, will enhance the integration by providing scalable computing power with worldwide access.

Integrating Visiopharms Oncotopix Discovery software into the GeoMx DSP workflow will enhance and simplify sample processing for translational research applications. Researchers can leverage Visiopharms AI algorithms to analyze four-color fluorescent images generated on GeoMx as well as associated images using hematoxylin and eosin (H&E) staining, to better understand the number and type of cells within DSP profiling regions. The combination of these technologies is expected to accelerate biomarker discovery and validation with in situ whole transcriptome and high-plex protein analysis.

Additionally, NanoString and Visiopharm will maintain file format compatibility that enables researchers to analyze GeoMx DSP whole-slide images with the Visiopharm software, providing access to a rich toolset for pathologic and spatial analysis including cell counts, phenotype mapping, distance measures, and more, which can be used to enhance and validate the high-plex spatial molecular data.

At NanoString, our vision is to map the universe of biology, and we are excited this strategic collaboration will allow us to offer our researchers new insights by combining the results from whole-slide image analysis and high-plex spatial molecular analysis, said John Gerace, chief commercial officer, NanoString. Our compatible solutions will enable a new frontier in discovery and provide deeper biological insights advancing the field of digital pathology.

Visiopharm is committed to transforming pathology through AI-based tissue mining, and this collaboration will combine the power of 20 years of expertise in pathology image analysis with high-plex spatial molecular analyses, said Louise Armstrong, chief commercial officer, Visiopharm. Spatial biology is one of the fastest growing areas of research, and this collaboration will enable a deeper understanding of the spatial high-plex and whole-transcriptome data that GeoMx delivers.

Arutha Kulasinghe, Ph.D., National Health and Medical Research Council Research Fellow, University of Queensland, will showcase integrated workflows at the Society for Immunotherapy of Cancer (SITC) annual meeting in November 2022. He will present predictive signatures for immunotherapy response identified in head and neck cancer utilizing high-plex spatial profiling.

About Visiopharm A/S

Visiopharm is a world leader in AI-driven precision pathology software. Visiopharms pioneering image analysis tools support thousands of scientists, pathologists, and image analysis experts in academic institutions, biopharmaceutical industry, and diagnostic centers. AI-based image analysis and tissue mining tools support research and drug development research worldwide, while CE-IVD APPs support primary diagnostics. With highly advanced and sophisticated artificial intelligence and deep learning, Visiopharm delivers tissue data mining tools, precision results, and workflows. Visiopharm was founded in 2002 and is privately owned. The company operates internationally with over 780 user accounts and countless users in more than 40 countries. The company headquarters are in Denmarks Medicon Valley, with offices in Sweden, England, Germany, Netherlands, and the United States. https://visiopharm.com

About NanoString Technologies, Inc.

NanoString Technologies, a leader in spatial biology, offers an ecosystem of innovative discovery and translational research solutions, empowering our customers to map the universe of biology. The GeoMx Digital Spatial Profiler, cited in more than 160 peer-reviewed publications, is a flexible and consistent solution combining the power of whole tissue imaging with gene expression and protein data for spatial whole transcriptomics and proteomics from one FFPE slide. The CosMx Spatial Molecular Imager (SMI) is an FFPE-compatible, single-cell imaging platform powered by spatial multiomics enabling researchers to map single cells in their native environments to extract deep biological insights and novel discoveries from one experiment. The AtoMx Spatial Informatics Platform (SIP) is a cloud-based informatics solution with advanced analytics and global collaboration capabilities, enabling powerful spatial biology insights anytime, anywhere. The CosMx SMI and AtoMx SIP platforms are expected to launch in 2022. At the foundation of our research tools is our nCounter Analysis System, cited in more than 6,200 peer-reviewed publications, which offers a secure way to easily profile the expression of hundreds of genes, proteins, miRNAs, or copy number variations, simultaneously with high sensitivity and precision. For more information, visit http://www.nanostring.com.

Read more:

NanoString and Visiopharm Announce Collaboration to Co-develop Integrated Workflows for GeoMx and AtoMx Spatial Biology Solutions - Business Wire

Read More..

Crypto Miners Bought Their Own Power Plant. It’s a Climate Disaster. – Earthjustice

Its a June morning in 2022, so early that most homes on Seneca Lake in upstate New York are filled with only the murmur of water lapping on the wooded shores. But in Yvonne Taylors house, theres the hum of grassroots organizing to fight one of the biggest new threats to climate.

Tap, tap, tap.

Taylor is reaching out on Facebook to a stranger in Pennsylvania who posted about the cryptocurrency mining industry coming to her town.

Wed love to talk with you about this, Taylor writes. Were being impacted by Bitcoin mining in our community too, and [we] are forming a national group of people who are experiencing the harmful effects of this industry.

The harm from certain types of cryptocurrency is that the production of new virtual coins known as mining requires a shocking amount of electricity consumption. When that power is produced with fossil fuels, it creates a lot of local pollution and climate emissions.

Bitcoin mining is so energy intensive its driving demand for new fossil fuel plants or giving old plants a new life.

At Seneca Lake, a private equity firm bought the once-mothballed Greenidge coal plant in 2014 and converted it to a fracked gas plant. In 2020, that firm started a commercial cryptocurrency mining operation by plugging thousands of computers directly into the plant to mine Bitcoin. The move turned out to be a blunder. An industry that had flown below the radar too novel to regulate suddenly stepped into a community with deep experience fending off environmental threats.

Taylor, a speech therapist whose family has lived on the lake for seven generations, first got mobilized around banning fracking in the region. Then, when a company came up with a scheme to store 88 million gallons of liquified petroleum gas in salt caverns along the lake, she and others beat that back with Earthjustices legal help.

Yvonne Taylor, photographed in Seneca Lake, where her family has lived for seven generations.

Lauren Petracca for Earthjustice

The lake, Taylor says, has really been the only constant Ive ever had in an otherwise very tumultuous life. I am as fierce about protecting it as a mama bear would be about her cub.

So, in 2020, when Taylor realized what was going on up at the local power plant and learned that global Bitcoin mining uses more electricity than some medium-sized European nations, she knew exactly whom to turn to for help.

She called Earthjustice.

New Fight, Old Foe

Taylors tip-off made its way to Mandy DeRoche, a new deputy managing attorney at Earthjustice.

A former securities and commercial litigator experienced with corporate disclosures, DeRoche had just the right skills to tackle a complex new climate threat.

Taylor and other local partners brought DeRoche up to date on their environmental watchdog efforts. In 2017, the Greenidge power station restarted as a gas-fired plant. It operated sporadically for a few years, providing power to the grid at times of peak demand.

Then, the watchdogs noticed unusual moves afoot at the plant. They learned of permit applications to construct buildings to house computers for a data center and to operate behind the meter, meaning the power wouldnt go to the grid for public use but directly to this data center.

But this was no ordinary data center.

In 2020, the power plant ramped up operations. Those nearby began to hear a low droning noise, described by one resident as the sound of a jet that never lands. The noise came from fans cooling computers. Air pollution levels jumped.

Residents were stunned and scrambled to understand what exactly had moved into town.

Mandy DeRoche, left, deputy managing attorney in the Coal Program, talks with Earthjustice senior attorney Meagan Burton during a staff meeting in New York City.

Kholood Eid for Earthjustice

DeRoche knew from her earlier career where to get better information than that provided by traditional environmental regulatory system. The buyers of the power station, Greenidge Generation LLC, were going public through a complex reverse merger. That meant they would have to make disclosures to the Securities and Exchange Commission and investors.

The details dispelled any hopes that the mining was just a side hustle. The plant operated for just 48 days in 2019, producing the equivalent carbon emissions of roughly 7,700 gas-powered cars driven for a year. The next year, the plant operated 343 days and emitted the equivalent of more than 44,500 cars. By the end of 2020, the company was running approximately 6,900 miners. More mining machines have been added since as the company builds up to a planned 32,500 machines.

The plant's air permit, a replica of when it was powering local homes and businesses in prior decades, gave the plants new investors significant runway to pollute just to mine cryptocurrency for themselves. The company also had ambitions to scale this model elsewhere.

Earthjustice had spent decades shutting down more than 100 coal plants. DeRoche glimpsed the outlines of a new industry that could raise plants back from the dead and also increase the operations of other fossil-fueled plants across the country.

Greenidge Generation LLC gave other retired, retiring, or peaking plants a roadmap of how to come back online or pollute more, and how to recruit investors, how to go public on NASDAQ, she says.

Shot Heard Round the Blockchain

DeRoche and the Sierra Club Atlantic Chapter sent a letter in 2021 to the New York State Department of Environmental Conservation pointing out that if the type of energy-draining mining seen at Greenidge took off and was powered by fossil fuels, the state had no hope of meeting its newly mandated climate emission cuts. The agency could, the letter noted, reject the power plants air permit, which was coming up for renewal.

Her phone promptly blew up with calls from journalists drawn to crypto mining controversy. Bitcoin, the oldest and most famous cryptocurrency, inspires fervent followers and relentless critics.

DeRoche refused to be drawn in. Crypto is a new and shiny thing that brings press attention, but our focus remains on pollution and energy use, says DeRoche. We see a power plant running all the time that wasnt before. We dont support power plants coming back from the dead, or operating any more than they absolutely need to.

Practically, that means Earthjustices concern is limited to a specific type of cryptocurrency mining called proof-of work that is used principally by Bitcoin. Many other coins use far less energy.

The narrow focus still unleashed intense pushback from Bitcoin believers. Local watchdogs faced threats from people who are almost evangelical about proof-of-work crypto mining, says Taylor. We became actually very fearful for our safety. We have installed an extensive security system in our home as a result of this.

The media moment also brought forth new information and allies. Journalists and local partners surfaced other mining outfits after operations surged in the U.S. following a ban in China, sending miners scurrying to find cheap, fast energy. Residents in other communities dealing with cryptocurrency mining operations began reaching out to Seneca Lake activists and Earthjustice.

Bitcoin mining machines in a warehouse at the Whinstone US Bitcoin mining facility in Rockdale, Texas, the largest in North America. Operations like this one have been boosted by Chinas intensified crypto crackdown that has pushed the industry west.

Mark Felix / AFP via Getty Images

Many miners are located in states where Earthjustice has experience fighting dirty power plants, including Kentucky, Indiana, Montana, Pennsylvania, and New York.

Most miners are plugging directly into electric grids, some of which are very dirty, like Kentuckys, which is approximately 70% coal-powered. Miners can often get sweetheart prices from utilities through power purchasing agreements or through preferential rates. Earthjustice is beginning to challenge these deals, which leave leave ordinary people and local businesses with higher electricity bills and more pollution.

In addition, crypto mining has been ramping up quickly in oil and gas fields miners bring shipping containers filled with computers right up to well heads.

A Slowdown Showdown

Sensing legislators and regulators needed more time and information, Earthjustice and allies pushed New York state to pass a partial moratorium.

The idea: hold off on permitting crypto mining at fossil fuel plants for two years while the state conducts a study on the environmental effects of cryptocurrency mining particularly with an eye to meeting a 2019 state law called the Climate Leadership and Community Protection Act. The CLCPA commits New York to serious greenhouse gas reductions.

The crypto mining industry would have none of it. They retained nearly every lobbying firm in Albany, recounts Earthjustices Liz Moran, who had the job of going toe-to-toe with that army of suits.

I heard from some legislative offices that they would hear from a lobbyist representing a crypto company at least three times daily, says Moran. That was intimidating.

The only way to beat them back, she realized, was with people power.

Earthjustice policy advocate Elizabeth Moran, photographed in the New York State Capitol in Albany.

Patrick Dodson for Earthjustice

She arranged for grassroots advocates from groups like Taylors Seneca Lake Guardian, Fossil Free Tompkins, Committee to Preserve the Finger Lakes, and many others to travel to Albany, or join calls or virtual meetings, to share their personal stories. Then, in the final days, they ramped up calls to key legislators around the clock. Support for the moratorium would flip and flip back.

The fight went down to the final minutes of New Yorks legislative session, finally passing around 2:30 a.m. on June 3.

It was David vs. Goliath. It really felt like the little guys won here, says Moran.

Back at the Lake

Victory wasnt complete for residents around Seneca Lake or around the state, however. Governor Hochul needed to sign the bill (as of press time, she still had not). Regardless, the bill would not directly impact Greenidge because it exempts miners with permit applications that predate any moratorium.

But great news came when the state decided on June 30 to deny Greenidges Title V air permit. Statewide advocates had pumped up the volume, submitting some 4,000 comments, 98% of them against renewing the permit, including Earthjustices own 57-page technical and legal comments.

My phone started lighting up with Title V air permit denied messages. I literally dropped my phone, says Taylor. The outburst startled her partner. I said, We did it, we did it, they denied the permit! We both jumped up and down hugging each other and laughing for a bit.

Greenidge is challenging the denial of the air permit; Earthjustice and the state environmental agency will defend it. Greenidge continues to operate in the meantime, but the exposure and containment of a pernicious new industry has begun. And more challenges by Earthjustice are coming

Visit link:

Crypto Miners Bought Their Own Power Plant. It's a Climate Disaster. - Earthjustice

Read More..

North America and Europe Debt Collection Software Market Report 2022: Rising Automation in the Debt Collection Process Drives Growth -…

DUBLIN--(BUSINESS WIRE)--The "North America and Europe Debt Collection Software Market Forecast to 2028 - COVID-19 Impact and Regional Analysis By Component, Deployment Type, Organization Size, and Industry Vertical" report has been added to ResearchAndMarkets.com's offering.

The NA and EU debt collection software market size is expected to grow from US$ 2386.4 million in 2022 to US$ 4148.3 million by 2028; the debt collection software market share is estimated to grow at a CAGR of 9.7% during 2022-2028.

Emergence of big data analytics & predictive analytics and digital multi-channel communications & customer centric approaches create ample demand for debt collection software during the forecast period.

Information is the most powerful weapon when it comes to debt collection. Big data analytics can help to get the most relevant information about debtors. Simple information such as demographics and behavioral aspects such as the time a debtor answers a call can significantly affect how a debt collection call is handled. Big Data allows data collection and segregation with a laser-sharp focus related to a single debtor.

Big data opens up possibilities such as speech analysis to confirm current collection. Voice analytics helps to hear 100% of every call. A feat that is impossible or recommended for humans. Input from speech analytics can contribute significantly to training and operational efficiency savings. Furthermore, predictive analytics, a form of advanced analytics, brings breakthroughs to collections. It combines various techniques such as data mining, machine learning, artificial intelligence, and statistical modeling to predict future events. It may sound cryptic, but it has been successful in debt collection.

WNS, a leading utility company, used predictive analytics to increase its debt collections by 50%. A more focused strategy for delinquency management was developed based on the predictive analytics findings. Thus, the emergence of big data and predictive analytics will drive the debt collection software market growth during the forecast period.

Moreover, businesses communicate with their customers through multiple digital channels such as SMS, email, IVR, and WhatsApp. Financial institutions must understand this and develop a communication strategy for each channel. Customers should experience the same high level of service regardless of the communication channel. This multi-channel approach greatly extends to collection spaces where customers prefer to be contacted by SMS or email rather than by phone or letter. The debt collection process should include a multi-channel approach to reach customers according to their channel preferences. Furthermore, customers are accustomed to personalizing services and offers when accessing products and services. A one-size-fits-all approach is no longer optimal and can alienate potential customers.

Additionally, it is important to offer customers self-service options. It shows that 57% of people prefer self-service channels. Customer self-service options and empathy are paramount when designing a customer-centric process for collection. Defaulting borrowers can use their channel of choice to settle their debt problems. Thus, the emerging trend of digital multi-channel communications and a customer-centric approach is driving the debt collection software demand in the debt collection software market during the forecast period.

The debt collection software market is analyzed on the basis of component, deployment type, organization size, and industry vertical.

Market Dynamics

Market Drivers

Market Restraints

Market Opportunities

Future Trends

Key Topics Covered:

1. Introduction

2. Key Takeaways

3. Research Methodology

4. NA and EU Debt Collection Software Market Landscape

5. NA and EU Debt Collection Software Market - Key Industry Dynamics

6. NA and EU Debt Collection Software Market Overview

7. Debt Collection Software Market Analysis - By Component

8. Debt Collection Software Market Analysis - By Deployment Type

9. Debt Collection Software Market Analysis - By Organization Size

10. Debt Collection Software Market Analysis - By Industry Vertical

11. Debt Collection Software Market Analysis - By Region

12. Impact of COVID-19 Pandemic on Debt Collection Software Market

13. Industry Landscape

14. Company Profiles

15. Appendix

Companies Mentioned

For more information about this report visit https://www.researchandmarkets.com/r/ep0lzj

View post:

North America and Europe Debt Collection Software Market Report 2022: Rising Automation in the Debt Collection Process Drives Growth -...

Read More..

Managing Biospecimens in Cell and Gene Therapy Trials – Applied Clinical Trials Online

Pursuing new tools and capabilities in sample logistics, storage, and data analysis.

One of the most powerful and significant developments in medical therapies in the past decade has been the maturing of cell and gene therapy (CGT) treatments. Cell therapies such as CAR-T and TCR-T offer transformative outcomes for challenging diseases. Recent cell therapy approvals and the growing number of clinical trials are accelerating the process from discovery through clinical trials to commercial manufacturing and delivery.1

Gene therapies can provide significantand possibly curativebenefits to patients who have genetic or acquired diseases. Through the direct expression of a therapeutic protein or by restoring the expression of an under expressed protein, gene therapy uses vectors to deliver gene-based drugs and therapeutic loads to patients.

While the approvals for treatments for rare diseases are certainly early wins, the impact of gene therapies will significantly expand as approved treatments are administered to larger patient groups and studies expand to address diseases that are broader reachingfor example, with treatments for multiple myeloma, leukemia, and other forms of cancer.2

Compared to more standard biopharmaceutical clinical trials, there are unique challenges associated with managing biospecimen samples during trials of CGTs. One of the most critical challenges is properly and safely managing the specimens taken from each patient, since those specimens can actually be used to create the therapy and must be returned with absolute safety to the patient for treatment.

CGT trials, while following their own specific workflows, are generally carried out in similar stages. Cell therapy can be allogenic when produced from cells that are collected from a healthy donor and shipped to a clinical site to treat a patient. Alternatively, there can be autologous therapy, where the biological material is from the patient and is transferred to a biomanufacturing site for genetic modification, and then returned and administered to the patient the sample was taken from.

Although each trial is unique, the major workflow steps are relatively similar. First, genetic or cell-based disease states and potential therapeutic pathways are identified by researchers and the trial design begins to be developed.

Once a trial is designed, trial participants need to be identified. Unlike other biopharmaceutical trials, CGT trials tend to have much smaller patient populations. Biospecimen cells are collected from these patients and need to be transported under the most stringent safety and cryopreservation conditions in coordination with regulatory requirements.3 This includes having well-established cold chain logistics that manage and document each specimens condition to ensure that no temperature-related degradation occurs.

Each specimen is then used to biomanufacture the therapeutic cellseither through modification of the cell genome (for cell therapy) or through creation of the viral vectors to deliver gene-based drugs and therapeutic loads to the patients. These temperature-sensitive therapies must then be carefully thawed, with minimum impact on viability and functionality, to be delivered to the patient by the clinical trial team at the investigative sites.

In addition, as part of the clinical trial, portions of the specimens need to be set aside, before and after biomanufacturing, for various testing requirements. Tests like qPCR, ELISA, flow cytometry, and others are critical to conducting the analysis of the therapeutic steps being studied, so proper storage (short-term and long-term) needs to be fully managed.4-5

Proper management of aliquoting biological samples is also a critical element in biospecimen management for these trials in order to mitigate risk of cell deterioration from freeze-thaw cycles. Many research centers require aliquoting to generate sub-samples for distribution to third-party laboratories and clinical partners. Since the source biospecimens from each patient are so much smaller in actual quantity, extraordinary care is needed at every step of biospecimen management not to lose any biological material.

Finally, biospecimen management for CGT trials must include support for stable, multiyear storage. With the administration of gene therapy products into a patient, there is a possibility of delayed biological events that demand the collection of data for a longer period. The therapeutic changes that patients experience due to genomic modification often need to be tracked for 10 to 15 years, so long-term cryogenic storage of the modified cells is a critical element for supporting the trial.6

As CGT trials expand, it is important for the industry to investigate and fully understand the best practices researchers should follow for managing living/active samples during CGT clinical trials, especially given some of the unique processes described earlier.

Along with management during the active trial, it is also crucial to long-term biorepository and sample management to ensure your biospecimens are secure and safely stored. These best practices include having a thorough appreciation of the regulatory factors to consider when managing the type of data generated by these trials.

Due to the sensitive nature of this personalized kind of medicine, researchers need to work very closely with regulatory agencies to fully understand and plan the trials according to established protocols. Gene therapy developers have access to expedited approval pathways such as Regenerative Medicine Advanced Therapy (RMAT) designation in the US, PRIority MEdicines (PRIME) designation in the EU, and SAKIGAKE designation in Japan.

Leading biorepository providers across the globe are responding to the heightened complexities and risks of biospecimen management in CGT trials. They are building on established sample logistics, storage, and management tools to address these unique challenges more fully. Concerns include:

Preserving the specimens from collecting the cells from patients through cell therapy manufacturing and delivery to them. Unlike other treatment regimens, the specimen is also the therapeutic pathwayits safe preservation, management and storage is critical to the progress of the clinical trial and ultimate demonstration that the therapy can be successfully applied.

Rigorous cold chain transport logistics to ensure cryopreservation at multiple stages. This requires detailed, multifactor tracking of each sample so that it is clearly and permanently associated with a specific patient at every point of exchange and every process step, including ancillary steps associated with clinical testing and long-term storage.

Detailed familiarity with requirements and compliance: All clinical practitioners and personnel from biorepository logistics and storage organizations need to be thoroughly grounded in the strict protocols established for each trial and demonstrate how their procedures comply; since CGT therapies are so new, and patient risk is elevated, biorepository operations have a special duty to manage any biospecimen management safety concerns.

The relative newness of CGT programs at major life sciences research institutions has led, in some instances, to a preference for keeping all aspects of clinical trials within a single research organization or network of researchers.7,8

Because the relatively small size of the target patient in a given geographical region is as small as one-tenth the number of patients participating in traditional clinical trials, it has created the need for multiple locations worldwide for conducting clinical trials or having the patients travel for getting the clinical treatment. As a result, researchers tend to set up their own biorepositories, biospecimen management systems, and model industry best practices to maintain sample integrity and traceability across the sample management ecosystem.

While the desire for comprehensive control would seem to make sense, there are distinct advantages to working with expert biorepository operations to manage all key aspects of biospecimen capture, transport and storage. GxP or good practice quality guidelines and regulations with leading service providers assures proper storage for the viral vectors and cells so they have the traceability and consistency.

These include creating rigorous chain-of-custody procedures with advanced biospecimen digital documentation and tracking tools that tightly associate each specimen with its source patient throughout every exchange. They have established meticulous biospecimen collection and registration procedures that are flexible enough to adapt to specific clinical trial procedures, patient populations, manufacturing locations and regulatory protocols.

Biorepositories are also customizing their cold-storage logistics and transport procedures to support CGT biospecimens. This includes tracking shipments in real time using automated tracing software and GPS-based tools, as well as working with clinical researchers to minimize transport times from the site of specimen collection to therapeutic manufacturing and specimen storage locations.

Leading biorepositories have also made extensive investments in large-scale, custom-built cryogenic storage facilities. These plants offer storage temperatures ranging from cryogenic -196 C and ultralow -80 C to refrigerated and controlled room temperature storage, offering researchers much greater flexibility. These specialized systems are continuously monitored and typically include multiple backup systems to prevent accidental specimen loss due to outside events or local power failures. Some of the leading biorepository providers have sited these state-of-the-art facilities in multiple locations worldwide to enable CGT trials with global footprints and patient populations to work with a single biospecimen management provider.

There are biorepository and biospecimen logistics providers who are actively investing in more robust, expert biospecimen data management systems. Just like their operational procedures, these tools have been developed and continually improved through hands-on experience managing millions of samples for a wide range of research applications.

These material management systems give instant digital access to comprehensive data of biospecimens that have been transported and stored, allowing researchers to efficiently manage biospecimen inventory, submit work requests, and generate standard or custom reports.

In addition, they are further developing these systems to satisfy evolving regulatory requirements for CGT clinical trial data management, since the conditions of the biospecimens at all stages of transport, biomanufacture, and storage need to be exhaustively documented.

Working with expert biorepository organizations and outsourcing biospecimen collection, cold transport logistics and both short- and long-term storage provide CGT researchers with a proven resource that can help prevent error, protect patient safety, and help make CGT trials more efficient.

An effective path forward would include increased collaboration between professional biorepository operations and clinical trial researchers. By fostering true working partnerships, these experts can educate researchers on the risks associated with imperfect or poorly planned biospecimen logistics; especially with multisite trials, they can work to refine standardized procedures and tools used to collect, secure, document, and transport specimens to reduce the risk of error or inefficiencies.

Streamlining and standardizing biospecimen management can ultimately help reduce costs, minimize rework and simplify many clinical trial management tasks often carried out by researchers. The advanced data management capabilities also provide a powerful foundation for conducting advanced data mining and analysis of trial and biospecimen data to augment other research results.

Most importantly, working with biorepository experts to handle these critical tasks will free researchers to focus their valuable time and efforts on advancing the science CGTs.

Navjot Kaur, PhD, Business Segment Manager, and Radha Krishnan, Global Director, Biorepository Operations and Strategy; both with Avantor

See more here:

Managing Biospecimens in Cell and Gene Therapy Trials - Applied Clinical Trials Online

Read More..

A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation | Scientific Reports – Nature.com

This sections introduces the Extreme Value Theory and presents the proposed methodology of this work.

Extreme Value Theory (EVT) is associated to the maximum sample (M_n = text {max} (X_1, ldots , X_n)), where ((X_1, ldots , X_n)) is a set of independent random variables with common distribution function F. In this case, the distribution of the maximum observation is given by (Pr(M_n < x) = F^n(x)). The hypothesis of independence when the X variables represent the wave height over a determined threshold is quite acceptable, because, for oceanographic data, it is common to adopt a POT scheme which selects extreme wave height events that are approximately independent25. Also, in26, authors affirm that The maximum wave heights in successive sea states can be considered independent, in the sense that the maximum height is dependent only on the sea state parameters and not in the maximum height in adjacent sea states. This (M_n) variable is described with one of the three following distributions: Gumbel, Frechet, and Weibull.

One methodology in EVT is to consider wave height time series with the annual maximum approach (AM)27, where X represents the wave height collected on regular periods of time of one year, and (M_n) is formed by the maximum values of each year. The statistical behaviour of AM can be described by the distribution of the maximum wave height in terms of Generalized Extreme Value (GEV) distribution:

$$begin{aligned}{}&G(x)= left{ begin{matrix} exp left{ -left[ 1 + xi left( frac{x - mu }{sigma } right) right] ^{frac{1}{xi }} right} , &{} xi ne 0,\ exp left{ -exp left( - left( frac{x - mu }{sigma } right) right) right} , &{} xi = 0, end{matrix} right. end{aligned}$$

(1)

where:

$$begin{aligned}{}&0< x < 1 + xi left( frac{x-mu }{sigma } right) , end{aligned}$$

(2)

where (-infty< mu < infty ), (sigma > 0) and (-infty< xi < infty ). As can be seen, the model has three parameters: location ((mu )), scale ((sigma )), and shape ((xi )).

The estimation of the return values, corresponding to the return period ((T_p)), are obtained by inverting Eq. (1):

$$ z_{p} = left{ {begin{array}{*{20}l} {mu - frac{sigma }{xi }left[ {1 - left{ { - log (1 - p)} right}^{{ - xi }} } right],} & {xi ne 0,} \ {mu - sigma log left{ { - log(1 - p)} right},} & {xi = 0,} \ end{array} } right}{text{ }} $$

(3)

where (G(z_p) = 1 - p). Then, (z_p) will be exceeded once per 1/p years, which corresponds to (T_p).

The alternative method in the EVT context is the Peak-Over-Threshold (POT), where all values over a threshold predefined by the user are selected to be statistically described instead of only the maximum values28,29. POT method has become a standard approach for these predictions13,29,30. Furthermore, several improvements over the basic approach have been proposed by various authors since then19,31,32.

The POT method is based on the fact that if the AM approach uses a GEV distribution (Eq. 1), the peaks over a high threshold should result in the related approximated distribution: the Generalized Pareto Distribution (GPD). The GPD fitted to the tail of the distribution gives the conditional non-exceedance probability (P(Xle x | X > u)), where u is the threshold level. The conditional distribution function can be calculated as:

$$ P(X le x|X{text{ > }}u) = left{ {begin{array}{*{20}l} {1 - left( {1 + xi ^{*} left( {frac{{x - u}}{{sigma ^{*} }}} right)} right)^{{frac{1}{{xi ^{*} }}}} ,} & {xi ^{*} ne 0,} \ {1 - exp left( { - left( {frac{{x - u}}{{sigma ^{*} }}} right)} right),} & {xi ^{*} = 0.} \ end{array} } right. $$

(4)

There is consistency between the GEV and GPD models, meaning that the parameters can be related to (xi ^* = epsilon ) and (sigma ^* = sigma + xi (u - mu )). The parameters (sigma ) and (xi ) are the scale and shape parameters, respectively. When (xi ge 0), the distribution is referred to as long tailed. When (xi < 0), the distribution is referred to as short tailed. The methods used to estimate the parameters of the GPD and the selection of the threshold will be now discussed.

The use of the GPD for modelling the tail of the distribution is also justified by asymptotic arguments in14. In this paper, author confirms that it is usually more convenient to interpret extreme value models in terms of return levels, rather than individual parameters. In order to obtain these return levels, the exceedance rates of thresholds have to be determined as (P(X>u)). In this way, using Eq. (4) ((P(X>x|X>u)=P(X>x)/P(X>u))) and considering that (z_N) is exceeded on average every N observations, we have:

$$begin{aligned}{}&P(X>u)left[ 1+xi ^*left( frac{z_N - u}{sigma ^*}right) right] ^{-frac{1}{xi ^*}}=frac{1}{N}. end{aligned}$$

(5)

Then, the N-year return level (z_N) is obtained as:

$$begin{aligned}{}&z_N = u + frac{sigma ^*}{xi ^*}left[ (N * P(X>u))^{xi ^ *} - 1 right] . end{aligned}$$

(6)

There are many techniques proposed for the estimation of the parameters of GEV and GPD. In19, authors applied the maximum likelihood methodology (ML) described in14. However, the use of this methodology for two parameter distributions (i.e. Weibull or Gamma) has a very important drawback: these distributions are very sensitive to the distance between the high threshold ((u_2)) and the first peak16. For this reason, ML could be used with two-parameter distribution when (u_2) reaches a peak. As this peak is excluded, the first value of the exceedance is as far from (u_2) as possible. A solution would be to use the three-parameter Weibull and Gamma distributions. However, ML estimation of such distributions is very difficult, and the algorithms usually fit two-parameter distributions inside a discrete range of location parameters33.

As stated before, in this paper, we present a new methodology to model this kind of time series considering not only extreme values but also the rest of observations. In this way, instead of selecting the maximum values per a period (usually a year) or defining thresholds in the distribution of these extreme wave heights, which has an appreciable subjective component, we model the distribution of all wave heights, considering that it is a mixture formed by a normal distribution and a uniform distribution. The motivation is that the uniform distribution is associated to regular wave height values which contaminate the normal distribution of extreme values. This theoretical mixed distribution is used then to fix the threshold for the estimation of the POT distributions. Thus, the determination of the threshold will be done in a much more objective and probabilistic way.

Let us consider as a sequence of independent random variables, ((X_1, ldots , X_n)) of wave height data. These data follow an unknown continuous distribution. We assume that this distribution is a mixture of two independent distributions: (Y_1 sim N(mu , sigma )), and (Y_2 sim U(mu - delta , mu + delta )), where (N(mu , sigma )) is a Gaussian distribution, (U(mu - delta , mu + delta )) is a uniform distribution, (mu > 0) is the common mean of both distributions, (sigma ) is the standard deviation of (Y_1), and (delta ) is the radius of (Y_2), being (mu - delta > 0). Then (f(x) = gamma f_1(x) + (1-gamma ) f_2(x)), being (gamma ) the probability that an observation comes from the normal distribution, and f(x), (f_1(x)) and (f_2(x)) are the probability density functions (pdf) of X, (Y_1) and (Y_2), respectively.

For the estimation of the values of the four above-mentioned parameters ((mu , sigma , delta , gamma )), the standard statistical theory considers the least squares methods, the method of moments and the maximum likelihood (ML) method. In this context, Mathiesen et al.13 found that the least squares methods are sensitive to outliers, although Goda34 recommended this method with modified plotting position formulae.

Clauset et al.35 show that methods based on least-squares fitting for the estimation of probability-distribution parameters can have many problems, and, usually, the results are biased. These authors propose the method of ML for fitting parametrized models such as power-law distributions to observed data, given that ML provably gives accurate parameter estimates in the limit of large sample size36. The ML method is commonly used in multiple applications, e.g. in metocean applications25, due to its asymptotic properties of being unbiased and efficient. In this regard, White et al.37 conclude that ML estimation outperforms the other fitting methods, as it always yields the lowest variance and bias of the estimator. This is not unexpected, as the ML estimator is asymptotically efficient37,38. Also, in Clauset et al.35, it is shown, among other properties, that under mild regularity conditions, the ML estimation ({hat{alpha }}) converges almost surely to the true (alpha ), when considering estimating the scaling parameter ((alpha )) of a power law in the case of continuous data. It is asymptotically Gaussian, with variance ((alpha -1)^2/n). However, the ML estimators do not achieve these asymptotic properties until they are applied to large sample sizes. Hosking and Wallis39 showed that the ML estimators are non-optimal for sample sizes up to 500, with higher bias and variance than other estimators, such as moments and probability weighted-moments estimators.

Deluca and Corral40 also presented the estimation of a single parameter (alpha ) associated with a truncated continuous power-law distribution. In order to find the ML estimator of the exponent, they proceed by directly maximizing the log-likelihood (l(alpha )). The reason is practical since their procedure is part of a more general method, valid for arbitrary distributions f(x), for which the derivative of (l(alpha )) can be challenging to evaluate. They claim that one needs to be cautious when the value of (alpha ) is very close to one in the maximization algorithm and replace (l(alpha )) by its limit at (alpha =1).

Furthermore, the use of ML estimation for two-parameter distributions such as Weibull and Gamma distributions has the drawback16 previously discussed. Besides, the ML estimation is known to provide poor results when the maximum is at the limit of the interval of validity of one of the parameters. On the other hand, the estimation of the GPD parameters is subject of ongoing research. A quantitative comparison of recent methods for estimating the parameters was presented by Kang and Song41. In our case, having to estimate four parameters, we have decided to use the method of moments, for its analytical simplicity. It is always an estimation method associated with sample and population moments. Besides, adequate estimations are obtained in multi-parametric estimation and with limited samples, as shown in this work.

Considering (phi ) as the pdf of a standard normal distribution N(0,1), the pdf of (Y_1) is defined as:

$$begin{aligned}{}&f_1(x) = frac{1}{sigma }phi (z_x), text { } z_x = frac{x - mu }{sigma }, text { } x in {mathbb {R}}. end{aligned}$$

(7)

The pdf of (Y_2) is:

$$begin{aligned}{}&f_2(x) = frac{1}{2delta }, text { } x in (mu -delta ,mu +delta ). end{aligned}$$

(8)

Consequently, the pdf of X is:

$$begin{aligned}{}&f(x) = gamma f_1(x) + (1 - gamma ) f_2(x), text { } x in {mathbb {R}}. end{aligned}$$

(9)

To infer the values of the four parameters of the wave height time series ((mu ), (sigma ), (delta ), (gamma )), we define, for any symmetric random variable with respect to the mean (mu ) with pdf g and finite moments, a set of functions in the form:

$$begin{aligned}{}&U_k(x) = int _{|t - mu | ge x}|t - mu |^k g(t)dt, text { } x ge 0, text { } k = 1,2,3, ldots , end{aligned}$$

(10)

or because of its symmetry:

$$begin{aligned}{}&U_k(x) = 2int _{x+mu }^{infty }(t - mu )^k g(t)dt, text { } k = 1,2,3, ldots . end{aligned}$$

(11)

These functions are well defined for the same moments of the variable x, because:

$$begin{aligned}{}&U_k(x)< int _{-infty }^{infty }|t - mu |^k g(t)dt < infty , text { } k = 1,2,3, ldots . end{aligned}$$

(12)

Particularly, for the normal and uniform distributions, all the moments are finite, and the same happens for all the (U_k(x)) functions. This function measures, for each pair of values x and k, the bilateral tail from the value x of the moment with respect to the mean of order k of the variable. It is, therefore, a generalization of the concept of probability tail, which is obtained for (k = 0).

Now, if we denote the corresponding moments for the distributions (Y_1) and (Y_2) by (U_{k,1}(x)) and (U_{k,2}(x)), it is verified that:

$$begin{aligned}{}&U_k(x) = gamma U_{k,1}(x) + (1 - gamma ) U_{k,2}(x). end{aligned}$$

(13)

Then, to calculate the function (U_k(x)), we just need to calculate the functions (U_{k,1}(x)) and (U_{k,2}(x)).

From the definition of (f_2(x)) and (U_k(x)), if (mu > delta ):

$$begin{aligned}{}&U_{k,2}(x) = 2 int _{mu +x}^{mu +delta }(t-mu )^kfrac{1}{2delta }dt = left. frac{(t-mu )^{k+1}}{(k+1)delta } right| _{mu +x}^{mu +delta } = frac{delta ^{k+1} - x^{k+1}}{(k+1)delta }, end{aligned}$$

(14)

then,

$$begin{aligned} & U_{k,2}(x) =left{ begin{array}{ll} frac{delta ^{k+1} - x^{k+1}}{(k+1)delta } & 0 le x le delta ,\ quad quad 0 & quad x >delta . end{array}right. end{aligned}$$

(15)

From the definition of the (f_1(x)) and (U_k(x)), we have:

$$begin{aligned}{}&U_{k,1}(x)=frac{2}{sigma }int _{mu +x}^infty (t-mu )^kphi left( frac{t-mu }{sigma } right) dt. end{aligned}$$

(16)

Let the variable u be in the form (u = frac{t-mu }{sigma }), then:

$$begin{aligned}{}&U_{k,1}(x)=2int _{frac{x}{sigma }}^infty (usigma )^kphi (u)du = sigma ^kUpsilon _kleft( frac{x}{sigma } right) , end{aligned}$$

(17)

where (Upsilon _k = 2 int _{x}^infty (u)^kphi (u)du). (Upsilon _k(z)) is the (U_k) function calculated for a N(0,1) distribution, which will be then updated with values of (k=1,2,3).

The following equations are verified:

$$begin{aligned}{}&Upsilon _1(x) = 2 int _{x}^infty uphi (u)du = 2phi (x),end{aligned}$$

(18)

$$begin{aligned}{}&Upsilon _2(x) = 2 int _{x}^infty u^2phi (u)du = 2(1-Phi (x)+xphi (x)),end{aligned}$$

(19)

$$begin{aligned}{}&Upsilon _3(x) = 2 int _{x}^infty u^3phi (u)du = 2(2+x^2)phi (x), end{aligned}$$

(20)

where (Phi ) is the cumulative distribution function (CDF) of the N(0,1) distribution. The demonstration is included below.

The three equations can be obtained using integration by parts, but it is easier to derive the functions (Upsilon _k(x)) to check the result. For the definition of the functions, for each value of k, we have:

$$begin{aligned}{}&Upsilon _k^{'}(x) = frac{partial Upsilon _k(x)}{partial x} = -2x^kphi (x). end{aligned}$$

(21)

Taking into account that (frac{partial phi (x)}{partial x}=-xphi (x)), and (frac{partial Phi (x)}{partial x}=phi (x)):

$$begin{aligned} frac{partial 2phi (x)}{partial x}&= -2xphi (x)= Upsilon _1^{'}(x), end{aligned}$$

(22)

$$begin{aligned} frac{partial (2(1-Phi (x)+xphi (x)))}{partial x}&= 2(-phi (x)+phi (x)-x^2phi (x)) = nonumber \&=-2x^2phi (x)= Upsilon _2^{'}(x), end{aligned}$$

(23)

$$begin{aligned} frac{partial (2(2+x^2)phi (x))}{partial (x)}&=2(2xphi (x)-(2+x^2)xphi (x))=nonumber \&=-2x^3phi (x)=Upsilon _3^{'}(x). end{aligned}$$

(24)

Therefore, the left and right sides of the previous equations differ in, at most, a constant. To verify that they are the same, we check the value (x=0):

$$begin{aligned}{}&Upsilon _1(0) = 2 int _0^infty u phi (u)du = sqrt{frac{2}{pi }},end{aligned}$$

(25)

$$begin{aligned}{}&Upsilon _2(0) = 2 int _0^infty u^2 phi (u)du = 1,end{aligned}$$

(26)

$$begin{aligned}{}&Upsilon _3(0) = 2 int _0^infty u^3 phi (u)du = 2 sqrt{frac{2}{pi }}, end{aligned}$$

(27)

which match with the right sides of Eqs. (18)(20):

$$begin{aligned}{}&Upsilon _1(0) = 2 phi (0) = sqrt{frac{2}{pi }},end{aligned}$$

(28)

$$begin{aligned}{}&Upsilon _2(0) = 2 (1-Phi (0)) = 1,end{aligned}$$

(29)

$$begin{aligned}{}&Upsilon _3(0) = 2 (2)phi (0) = 2 sqrt{frac{2}{pi }}. end{aligned}$$

(30)

Substituting these results in Eq. (17) we have:

Read the original:

A mixed distribution to fix the threshold for Peak-Over-Threshold wave height estimation | Scientific Reports - Nature.com

Read More..