Category Archives: Machine Learning

Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center … – AWS Blog

As of April 30, 2024 Amazon Q Business is generally available. Amazon Q Business is a conversational assistant powered by generative artificial intelligence (AI) that enhances workforce productivity by answering questions and completing tasks based on information in your enterprise systems. Your employees can access enterprise content securely and privately using web applications built with Amazon Q Business. The success of these applications depends on two key factors: first, that an end-user of the application is only able to see responses generated from documents they have been granted access to, and second, that each users conversation history is private, secure, and accessible only to the user.

Amazon Q Business operationalizes this by validating the identity of the user every time they access the application so that the application can use the end-users identity to restrict tasks and answers to documents that the user has access to. This outcome is achieved with a combination of AWS IAM Identity Center and Amazon Q Business. IAM Identity Center stores the user identity, is the authoritative source of identity information for Amazon Q Business applications, and validates the users identity when they access an Amazon Q Business application. You can configure IAM Identity Center to use your enterprise identity provider (IdP)such as Okta or Microsoft Entra IDas the identity source. Amazon Q Business makes sure that access control lists (ACLs) for enterprise documents being indexed are matched to the user identities provided by IAM Identity Center, and that these ACLs are honored every time the application calls Amazon Q Business APIs to respond to user queries.

In this post, we show how IAM Identity Center acts as a gateway to steer user identities created by your enterprise IdP as the identity source, for Amazon Q Business, and how Amazon Q Business uses these identities to respond securely and confidentially to user queries. We use an example of a generative AI employee assistant built with Amazon Q Business, demonstrate how to set it up to only respond using enterprise content that each employee has permissions to access, and show how employees are able to converse securely and privately with this assistant.

The following diagram shows a high-level architecture of how the enterprise IdP, IAM Identity Center instance, and Amazon Q Business application interact with each other to enable an authenticated user to securely and privately interact with an Amazon Q Business application using an Amazon Q Business web experience from their web browser.

When using an external IdP such as Okta, users and groups are first provisioned in the IdP and then automatically synchronized with the IAM Identity Center instance using the SCIM protocol. When a user starts the Amazon Q Business web experience, they are authenticated with their IdP using single sign-on, and the tokens obtained from the IdP are used by Amazon Q Business to validate the user with IAM Identity Center. After validation, a chat session is started with the user.

The sample use case in this post uses an IAM Identity Center account instance with its identity source configured as Okta, which is used as the IdP. Then we ingest content from Atlassian Confluence. The Amazon Q Business built-in connector for Confluence ingests the local users and groups configured in Confluence, as well as ACLs for the spaces and documents, to the Amazon Q Business application index. These users from the data source are matched with the users configured in the IAM Identity Center instance, and aliases are created in Amazon Q Business User Store for correct ACL enforcement.

To implement this solution for the sample use case of this post, you need an IAM Identity Center instance and Okta identity provider as identity source. We provide more information about these resources in this section.

An Amazon Q Business application requires an IAM Identity Center instance to be associated with it. There are two types of IAM Identity Center instances: an organization instance and an account instance. Amazon Q Business applications can work with either type of instance. These instances store the user identities that are created by an IdP, as well as the groups to which the users belong.

For production use cases, an IAM Identity Center organization instance is recommended. The advantage of an organization instance is that it can be used by an Amazon Q Business application in any AWS account in AWS Organizations, and you only pay once for a user in your company, if you have multiple Amazon Q Business applications spread across several AWS accounts and you use organization instance. Many AWS enterprise customers use Organizations, and have IAM Identity Center organization instances associated with them.

For proof of concept and departmental use cases, or in situations when an AWS account is not part of an AWS Organization and you dont want to create a new AWS organization, you can use an IAM Identity Center account instance to enable an Amazon Q Business application. In this case, only the Amazon Q Business application configured in the AWS account in which the account instance is created will be able to use that instance.

Amazon Q Business implements a per-user subscription fee. A user is billed only one time if they are uniquely identifiable across different accounts and different Amazon Q Business applications. For example, if multiple Amazon Q Business applications are within a single AWS account, a user that is uniquely identified by an IAM Identity Center instance tied to this account will only be billed one time for using these applications. If your organization has two accounts, and you have an organization-level IAM Identity Center instance, a user who is uniquely identified in the organization-level instance will be billed only one time even though they access applications in both accounts. However, if you have two account-level IAM Identity Center instances, a user in one account cant be identified as the same user in another account because there is no central identity. This means that the same user will be billed twice. We therefore recommend using organization-level IAM Identity Center instances for production use cases to optimize costs.

In both these cases, the Amazon Q Business application needs to be in the same AWS Region as the IAM Identity Center instance.

If you already use an IdP such as Okta or Entra ID, you can continue to use your preferred IdP with Amazon Q Business applications. In this case, the IAM Identity Center instance is configured to use the IdP as its identity source. The users and user groups from the IdP can be automatically synced to the IAM Identity Center instance using SCIM. Many AWS enterprise customers already have this configured for their IAM Identity Center organization instance. For more information about all the supported IdPs, see Getting started tutorials. The process is similar for IAM Identity Center organization instances and account instances.

The following screenshot shows the IAM Identity Center application configured in Okta, and the users and groups from the Okta configuration assigned to this application.

The following screenshot shows the IAM Identity Center instance user store after configuring Okta as the identity source. Here the user and group information is automatically provisioned (synchronized) from Okta into IAM Identity Center using the System for Cross-domain Identity Management (SCIM) v2.0 protocol.

Complete the following steps to create an Amazon Q Business application and enable IAM Identity Center:

For more information about Amazon Q Business retrievers, refer to Creating and selecting a retriever for an Amazon Q Business application.

The following instructions demonstrate how to configure the Confluence data source. These may differ for other data sources.

After the application is created, you will see the application settings page, as shown in the following screenshot.

To illustrate how you can build a secure and private generative AI assistant for your employees using Amazon Q Business applications, lets take a sample use case of an employee AI assistant in an enterprise corporation. Two new employees, Mateo Jackson and Mary Major, have joined the company on two different projects, and have finished their employee orientation. They have been given corporate laptops, and their accounts are provisioned in the corporate IdP. They have been told to get help from the employee AI assistant for any questions related to their new team member activities and their benefits.

The company uses Confluence to manage their enterprise content. The sample Amazon Q application used to run the scenarios for this post is configured with a data source using the built-in connector for Confluence to index the enterprise Confluence spaces used by employees. The example uses three Confluence spaces: AnyOrgApp Project, ACME Project Space, and AJ-DEMO-HR-SPACE. The access permissions for these spaces are as follows:

Lets look at how Mateo and Mary experience their employee AI assistant.

Both are provided with the URL of the employee AI assistant web experience. They use the URL and sign in to the IdP from the browsers of their laptops. Mateo and Mary both want to know about their new team member activities and their fellow team members. They ask the same questions to the employee AI assistant but get different responses, because each has access to separate projects. In the following screenshots, the browser window on the left is for Mateo Jackson and the one on the right is for Mary Major. Mateo gets information about the AnyOrgApp project and Mary gets information about the ACME project.

Mateo chooses Sources under the question about team members to take a closer look at the team member information, and Mary choosing Sources under the question for new team member onboarding activities. The following screenshots show their updated views.

Mateo and Mary want to find out more about the benefits their new job offers and how the benefits are applicable to their personal and family situations.

The following screenshot shows that Mary asks the employee AI assistant questions about her benefits and eligibility.

Mary can also refer to the source documents.

The following screenshot shows that Mateo asks the employee AI assistant different questions about his eligibility.

Mateo looks at the following source documents.

Both Mary and Mateo first want to know their eligibility for benefits. But after that, they have different questions to ask. Even though the benefits-related documents are accessible by both Mary and Mateo, their conversations with employee AI assistant are private and personal. The assurance that their conversation history is private and cant be seen by any other user is critical for the success of a generative AI employee productivity assistant.

If you created a new Amazon Q Business application to try out the integration with IAM Identity Center, and dont plan to use it further, unsubscribe and remove assigned users from the application and delete it so that your AWS account does not accumulate costs.

To unsubscribe and remove users go to the application details page and select Manage access and subscriptions.

Select all the users, and then use the Edit button to choose Unsubscribe and remove as shown below.

Delete the application after removing the users, going back to the application details page and selecting Delete.

For enterprise generative AI assistants such as the one shown in this post to be successful, they must respect access control as well as assure the privacy and confidentiality of every employee. Amazon Q Business and IAM Identity Center provide a solution that authenticates each user and validates the user identity at each step to enforce access control along with privacy and confidentiality.

To achieve this, IAM Identity Center acts as a gateway to sync user and group identities from an IdP (such as Okta), and Amazon Q Business uses IAM Identity Center-provided identities to uniquely identify a user of an Amazon Q Business application (in this case, an employee AI assistant). Document ACLs and local users set up in the data source (such as Confluence) are matched up with the user and group identities provided by IAM Identity Center. At query time, Amazon Q Business answers questions from users utilizing only those documents that they are provided access to by the document ACLs.

If you want to know more, take a look at the Amazon Q Business launch blog post on AWS News Blog, and refer to Amazon Q Business User Guide. For more information on IAM Identity Center, refer to the AWS IAM Identity Center User Guide.

Abhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.

Venky Nagapudi is a Senior Manager of Product Management for Q Business, Amazon Comprehend and Amazon Translate. His focus areas on Q Business include user identity management, and using offline intelligence from documents to improve Q Business accuracy and helpfulness.

Originally posted here:
Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center ... - AWS Blog

Snapchat introduces augmented reality (AR) & machine learning tools catered to brands and advertisers – afaqs!

Snapchat has introduced AR Extensions, enabling advertisers to embed AR Lenses and filters across the platform's diverse ad formats. This integration extends to Dynamic Product Ads, Snap Ads, Collection Ads, Commercials, and Spotlight Ads, offering advertisers a comprehensive toolkit to engage users with immersive AR experiences.

The company also revealed its upcoming launch of a sports channel on Snapchat called the 'Snap Sports Network'. This channel will spotlight sports such as dog surfing, extreme ironing, water bottle flipping, and more. The channel will be hosted by Snap Stars.

Additionally, Snapchat is broadening its collaboration with Live Nation through the introduction of a new Snap Nation Public Profile. This profile will showcase exclusive behind-the-scenes content from concerts, expanding the platform's engagement with the music industry. The company plans to curate stories from Live Nation concerts and festivals, integrating public posts from users to enhance the overall experience.

Originally posted here:
Snapchat introduces augmented reality (AR) & machine learning tools catered to brands and advertisers - afaqs!

Pico Launches Machine Learning and AI Capabilities in Corvil Analytics 10.0 Software Release – Yahoo Finance

Pico

NEW YORK, April 23, 2024 (GLOBE NEWSWIRE) -- Pico, a leading global provider of mission-critical technology services, software, data and analytics for the financial markets community, today announced the general availability of its Corvil Analytics 10.0 software release. This latest version leverages groundbreaking internal research in machine learning (ML) and artificial intelligence (AI) to enable proactive notification and natural language descriptions, correlating performance-impacting events, unusual events and extreme events that influence trading outcomes and infrastructure performance.

Corvil Analytics is widely deployed across the financial services community, delivering crucial performance and actionable business insights for trading infrastructure and IT operations and teams focused on trade reconciliation and regulatory compliance. Additionally, data scientists and quantitative analysts are equipped with advanced tools for in-depth data analysis and operational support. A single Corvil Analytics appliance deployed in these environments scales to provide up to 7.5 million data points every day. Corvil Analytics 10.0 applies our research into ML / AI techniques relevant to the data patterns, volume of analytics and the performance challenges in the financial services sector. The resulting innovation is a new capability in 10.0 that automatically identifies, correlates and narrates the underlying cause of the most significant business-impacting events.

The Corvil Analytics 10.0 release represents a significant milestone in our continuous research and innovation on the platform. Years of research in ML / AI techniques by our data science team has delivered the capability to automatically detect business-impacting events in trading infrastructure and corporate IT infrastructure, said Ken Jinks, Managing Director, Product Management at Pico. And proactively detecting these events in real-time is only the first step. Highlighting other correlated events and describing the events in natural language enables all users to quickly understand and communicate root cause analysis to all stakeholders, resulting in decisive corrective action.

Corvil Analytics 10.0 is a major software release that also introduces:

Enhanced User Experience Smarter analytics tooltips, flexible comparison of business time periods, and a new customer portal that offers access to software updates, documentation, support tickets, webinars, and knowledge base articles.

Reduced Cost of Ownership Enhanced configuration capabilities enable easy Corvil setup, higher accuracy, and lower cost of ownership.

Advanced Timestamp Options Corvil now supports start-of-frame timestamps addressing the need of trading applications where specific timing is critical for real-time performance analytics.

Story continues

Corvil Analytics is trusted by the worlds largest banks, exchanges, electronic market makers, quantitative hedge funds, data service providers and brokers. With a twenty-plus-year legacy, the Corvil Analytics platform continues to improve its ability to extract and correlate technology and transaction performance intelligence from dynamic network environments. This release of Corvil Analytics 10.0 continues our investment in the platform, focusing on the unique patterns of data in the financial services markets to intelligently identify events of interest, improve the user experience, and lower the cost of ownership/configuration in these complex environments.

The Corvil Analytics 10.0 release will be available to download and deploy starting May 1st via the Pico Client Portal.

Register nowto learn more about Corvil Analytics 10.0 in an upcoming webinar hosted by Pico on May 2, 2024 at 10:00am EDT | 3:00pm BST.

About Pico Pico is a leading global provider of technology services for the financial markets community. Picos technology and services power mission-critical systems for global banks, exchanges, electronic trading firms, quantitative hedge funds, and financial technology service providers. Pico provides a best-in-class portfolio of innovative, transparent, low-latency markets solutions coupled with an agile and expert service delivery model. Instant access to financial markets is provided via PicoNet, a globally comprehensive network platform instrumented natively with Corvil to generate analytics and telemetry. Clients choose Pico when they want the freedom to move fast and create an operational edge in the fast-paced world of financial markets.

To learn more about Pico, please visit https://www.pico.net

Contact info: Pico Press Office pr@pico.net

See the rest here:
Pico Launches Machine Learning and AI Capabilities in Corvil Analytics 10.0 Software Release - Yahoo Finance

Machine learning approach predicts heart failure outcome risk – HealthITAnalytics.com

April 22, 2024 -Researchers from the University of Virginia (UVA) have developed a machine learning tool designed to assess and predict adverse outcome risks for patients with advanced heart failure with reduced ejection fraction (HFrEF), according to a recent study published in the American Heart Journal.

The research team indicated that risk models for HFrEF exist, but few are capable of addressing the challenge of missing data or incorporating invasive hemodynamic data, limiting their ability to provide personalized risk assessments for heart failure patients.

Heart failure is a progressive condition that affects not only quality of life but quantity as well, explained Sula Mazimba, MD, an associate professor of medicine at UVA and cardiologist at UVA Health, in the news release. "All heart failure patients are not the same. Each patient is on a spectrum along the continuum of risk of suffering adverse outcomes. Identifying the degree of risk for each patient promises to help clinicians tailor therapies to improve outcomes.

Outcomes like weakness, fatigue, swollen extremities and death are of particular concern for heart failure patients, and the risk model is designed to stratify the risk of these events.

The tool was built using anonymized data pulled from thousands of patients enrolled in heart failure clinical trials funded by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI).

Patients in the training and validation cohorts were categorized into five risk groups based on left ventricular assist device (LVAD) implantation or transplantation, rehospitalization within six months of follow-up and death, if applicable.

To make the model robust in the presence of missing data, the researchers trained it to predict patients risk categories using either invasive hemodynamics alone or a feature set incorporating noninvasive hemodynamics data.

Prediction accuracy for each category was determined separately using area under the curve (AUC).

Overall, the model achieved high performance across all five categories. The AUCs ranged from 0.896 +/- 0.074 to 0.969 +/- 0.081 for the invasive hemodynamics feature set and 0.858 +/- 0.067 to 0.997 +/- 0.070 for the set incorporating all features.

The research team underscored that the inclusion of hemodynamic data significantly aided the models performance.

This model presents a breakthrough because it ingests complex sets of data and can make decisions even among missing and conflicting factors, said Josephine Lamp, a doctoral researcher in the UVA School of Engineerings Department of Computer Science. It is really exciting because the model intelligently presents and summarizes risk factors reducing decision burden so clinicians can quickly make treatment decisions.

The researchers have made their tool freely available online for researchers and clinicians in the hopes of driving personalized heart failure care.

In pursuit of personalized and precision medicine, other institutions are also turning to machine learning.

Last week, a research team from Clemson University shared how a deep learning tool can help researchers better understand how gene-regulatory network (GRN) interactions impact individual drug response.

GRNs map the interactions between genes, proteins and other elements. These insights are crucial for exploring how genetic variations influence a patients phenotypes such as drug response. However, many genetic variants linked to disease are in areas of DNA that dont directly code for proteins, creating a challenge for those investigating the role of these variants in individual health.

The deep learning-based Lifelong Neural Network for Gene Regulation (LINGER) tool helps address this by using single-cell multiome data to predict how GRNs work, which can shed light on disease drivers and drug efficacy.

Read the original post:
Machine learning approach predicts heart failure outcome risk - HealthITAnalytics.com

The best free AI courses (and whether AI ‘micro-degrees’ and certificates are worth it) – ZDNet

Generative AI is an astonishing technology that is not only here to stay but will impact all sectors of work and business, and it has already made unprecedented inroads into our daily lives.

We all have a lot to learn about it. Spewing out a few prompts to ChatGPT may be easy, but before you can turn all these new capabilities into productive tools, you need to grow your skills. Fortunately, there are a wide range of classes that can help.

Also: Want to work in AI? How to pivot your career in 5 steps

Many companies and schools will try to sell you on their AI education programs. But as I'll show in the following compendium of great resources, you can learn a ton about AI and even get some certifications -- all for free.

Speaking personally, I have to say that this has been really cool. I don't normally get a lot of time to hang out and watch stuff. But because I've been going hands-on with AI for you all here on ZDNET, I've had the excuse opportunity to watch a bunch of these videos. Sitting there on the couch, cup of coffee in one hand, doggo on the lap, and able to legitimately claim, "I'm working."

Also:The best AI image generators: Tested and reviewed

I have taken at least one class from each of the providers below, and they've all been pretty good. Obviously, some teachers are more compelling than others, but it's been a very helpful process. When working on AI projects for ZDNET, I've also sometimes gone back and taken other classes to shore up my knowledge and understanding.

So, I recommend you take a quick spin through my short reviews, possibly dig deeper into the linked-to articles, and bookmark all of these, because they're valuable resources. Let's get started.

LinkedIn Learning is one of the oldest online learning platforms, established in 1995 as Lynda.com. The company offers an enormous library of courses on a broad range of topics. There is a monthly fee, but many companies and schools have accounts for all their employees and students.

Also: Two powerful LinkedIn Premium features that make the subscription worth it

LinkedIn Learning (and Lynda.com, which is what it started) has probably been the one online education site I've used more than any other. I've used it regularly since at least the end of the 1990s. For years, I paid for a membership. Then I got a membership as an alum of my grad school, which is how I use it now. So many courses on so many topics, it's a great go-to learning resource.

I took two classes on LinkedIn Learning. Here's my testimonial on one of them.

I also took the two-hour Machine Learning with Python: Foundations course, which had a great instructor -- Prof. Frederick Nwanganga -- who was previously unknown to me. I have to hand it to LinkedIn. They choose people who know how to teach.

I learned a lot in this course, especially about how to collect and prepare data for machine learning. I also was able to stretch my Python programming knowledge, specifically about how a machine learning model can be built in Python. In just two hours, I felt like I got a friendly and comprehensive brain dump.

You can read more here:How LinkedIn's free AI course made me a better Python developer.

Since there are so many AI courses, you're bound to find a helpful series. To get you started, I've picked three that might open some doors:

It's worth checking with your employer, agency, or school to see if you qualify for a free membership. Otherwise, you can pay by month or year (the by-year option is about half price).

Amazon puts the demand in infrastructure on demand. Rather than building out their own infrastructure, many companies now rely on Amazon to provide scalable cloud infrastructure on demand. Nearly every aspect of IT technology is available for rent from Amazon's wide range of web services. This also includes a fairly large spectrum of individual AI services from computer vision to human-sounding speech to Bedrock, which "makes LLMs from Amazon and leading AI startups available through an API."

Amazon also offers a wide range of training courses for all these services. Some of them are available for free, while others are available via a paid subscription. Here are three of the free courses you can try out:

In addition to classes located on Amazon's sites, they also have quite a few classes on YouTube. I spent a fun and interesting weekend gobbling up theGenerative AI Foundations series, which is an entire playlist of cool stuff to learn about AI.

If you're using or even just considering AWS-based services, these courses are well worth your time.

IBM, of course, is IBM. It led the AI pack for years with its Watson offerings. Its generative AI solution is called Watsonx. It focuses on enabling businesses to deploy and manage both traditional machine learning and generative AI, tailored to their unique needs.

Also:Have 10 hours? IBM will train you in AI fundamentals - for free

The company's SkillsBuild Learning classes offer a lot, providing basic training for a few key IT job descriptions -- including cybersecurity specialist, data analyst, user experience designer, and more.

Right now, there's only one free AI credential, but it's one that excited a lot of our readers. That's the AI Fundamentals learning credential, which offers six courses. You need to be logged in to follow the link. But registration is easy and free. When you're done, you get an official credential, which you can list on LinkedIn. After I took the course, I did just that -- and, of course, I documented it for you:

See: How to add a new credential to your LinkedIn profile, and why you should

My favorite was the AI Ethics class, which is an hour and 45 minutes. Through real-world examples you'll learn about AI ethics, how they are implemented, and why AI ethics are so important in building trustworthy AI systems.

DeepLearning is an education-focused company specializing in AI training. The company is constantly adding new courses that provide training, mostly for developers, in many different facets of AI technology. It partnered with OpenAI (the makers of ChatGPT) to create a number of pretty great courses.

I took the ChatGPT Prompt Engineering for Developers course below, which was my first detailed introduction to the ChatGPT API. If you're at all interested in how coders can use LLMs like ChatGPT, this course is worth your time. Just the interspersing of traditional code with detailed prompts that look more like comments than commands can help you get your head around integrating these two very different styles of coding.

Read more: I took this free AI course for developers in one weekend and highly recommend it

Three courses I recommend you check out are:

With AI such a hot growth area, I never cease to be amazed at the vast quantity of high-value courseware available for free. Definitely bookmark DeepLearning and keep checking back as it adds more courses.

Udemy is a courseware aggregator that publishes courses produced by individual trainers. That makes course style and quality a little inconsistent, but the rating system does help the more outstanding trainers rise to the top. Udemy has a free trial, which is why it's on this list.

Read more:I'm a ChatGPT pro but this quick course taught me new tricks, and you can take it for free

I spent some time in Steve Ballinger'sComplete ChatGPT Course For Work 2023 (Ethically)! and found it quite helpful. Clocking in at a little over two hours, it helps you understand how to balance ChatGPT with your work processes, while keeping in mind the ethics and issues that arise from using AI at work.

It also sells a $20/mo all-you-can-eat plan, as well as its courses individually. I honestly can't see why anyone would buy the courses individually, since most of them cost more for one course than the entire library does on a subscription.

Also:I'm taking AI image courses for free on Udemy with this little trick - and you can too

Here are three courses you might want to check out:

One of the more interesting aspects of Udemy is that you may find courses on very niche applications of AI, which might not suit vendors offering a more limited selection of mainstream courses. If you have a unique application need, don't hesitate to spend some extra time searching for just the right course.

Google's Grow With Google program offers a fairly wide range of certificate programs, which are normally run through Coursera. Earning one of those certificates often requires paying a subscription fee. But we're specifically interested in one Grow With Google program, which is aimed at teachers, and does not involve any fees.

The Generative AI for Educators class, developed in concert with MIT's Responsible AI for Social Empowerment and Education, is a 2-hour program designed to help teachers learn about generative AI, and how to use it in the classroom.

Also:Google and MIT launch a free generative AI course for teachers

Generative AI is a big challenge in education because it can provide amazing support for students and teachers and, unfortunately, provide an easy way out for students to cheat on their assignments. So a course that can help teachers come up to speed on all the issues can be very powerful.

The course provides a professional development certificate on completion, and this one is free.

I've been working with AI for a very long time. I conducted one of the first-ever academic studies of AI ethics as a thesis project way back in the day. I created and launched an expert system development environment before the first link was connected on the World Wide Web. I did some of the first research of AI on RISC-based computing architectures (the chips in your phone) when RISC processors were the size of refrigerators.

Also:Six skills you need to become an AI prompt engineer

When it comes to the courses and programs I'm spotlighting here, there's no way I could take all of them. But I have taken at least one course from each vendor, in order to test them out and report back to you. And, given my long background in the world of AI, this is a topic that has fascinated and enthralled me for most of my academic and professional career.

With all that, I will say that the absolute high point was when I could get an AI to talk like a pirate.

Let's be clear: A micro-degree is not a degree. It's a set of courses with a marketing name attached. Degrees are granted by accredited academic institutions, accredited by regional accrediting bodies. I'm not saying you won't learn anything in those programs. But they're not degrees and they may cost more than just-as-good courses that don't have a fancy marketing name attached.

Yes, but how much value they have depends on your prospective employer's perspective. A certificate says you completed some course of study successfully. That might be something of value to you, as well. You can set a goal to learn a topic, and if you get a credential, you can be fairly confident you achieved some learning. Accredited degrees, by contrast, are an assurance that you not only learned the material, but did so according to some level of standard and rigor common to other accredited institutions.

Also:How to write better ChatGPT prompts in 5 steps

My advice: If you can get a certificate, and the price for getting it doesn't overly stretch your budget, go ahead and get it. It still is a resume point. But don't fork over bucks on the scale of a college tuition for some promise that you'll get qualified for a job faster and easier than, you know, going to college.

See original here:
The best free AI courses (and whether AI 'micro-degrees' and certificates are worth it) - ZDNet

Microsoft wants to bolster the manufacturing process of future Surface devices with AI and machine learning – Windows Central

What you need to know

Microsoft is seemingly placing all its bets on generative AI. As you might have noticed, the tech giant has ramped up its efforts in the category and virtually integrated the technology across most of its products and services.

Now, the company shared a detailed blog post highlighting how its Microsoft Surface and Azure team used Azure's high-performance computing technology to revolutionize the product design process of manufacturing Surface products while simultaneously saving time and cost.

According to Microsoft's Principal Engineer and structural designer, Prasad Raghavendra, the company integrated Abaqus, "a Finite Element Analysis (FEA) software," into Azure HPC in 2016. Abaqus helped the company solve many issues and fully transition "product-level structural simulations for Surface Pro 4 and the original Surface laptopto Azure HPC from on-premises servers."

Raghavendra indicates the availability of Azure HPC for structural simulations using Abaqus, which has completely revolutionized the product design process for Surface devices. It translated design concepts created in digital computer-aided design (CAD) systems into the FEA model.

This made it easier for analysts to use FEA models to run numerous tests in different reliability conditions in a virtual environment rather than physically going through the entire process step-by-step. Consequently, the team ran hundreds of simulations to determine the feasibility of proposed design ideas and solutions. This ability made narrowing down potential design ideas easier, which were turned into prototypes for further scrutiny.

Reliability and customer satisfaction remain a top priority for the Microsoft Surface team. To scale greater heights, Microsoft intends to continue using digital prototypes (FEA model) for simulation runs on Azure HPC clusters. Microsoft seeks to leverage machine learning and AI in product manufacturing and developing future Surface devices.

Microsoft unveiled its new lineup of business-focusedSurface devices in March, including the Surface Pro 10 and Surface Laptop 6. The entries will ship with Intel Core Ultra, new NPUs, and display upgrades. The company is potentially leaning toward AI PCs featuring a dedicated Copilot button.

All the latest news, reviews, and guides for Windows and Xbox diehards.

That aside, Microsoft's Windows and Surface engineering department has a new boss. When Panos Panay left the company and later joined Amazon, his role split into two. Pavan Davuluri took over the Surface wing, while Mikhail Parakhin handled everything Windows-related.

However, normalcy seems to have been restored at the company. Pavan Davuluri is now in charge of both Windows and Surface engineering. Microsoft also started selling replacement parts for Surface PCs, including screens, kickstands, batteries, SSDs, and more directly from the Microsoft Store.This strategy is designed to improve the repairability of Surface devices.

Continued here:
Microsoft wants to bolster the manufacturing process of future Surface devices with AI and machine learning - Windows Central

Machine-learning prediction of a novel diagnostic model using mitochondria-related genes for patients with bladder … – Nature.com

The diagnosis of BC represents a pivotal medical challenge, encompassing the application of various methods29. Presently, diagnostic approaches for BC include clinical symptom analysis, urine testing, imaging examinations, and tissue biopsies30,31. Nonetheless, these methods exhibit limitations in terms of early detection, accuracy, and invasiveness. While clinical symptom analysis and urine testing can capture potential BC symptoms and cellular information, their specificity and sensitivity need improvement to mitigate the risk of misdiagnosis or missed diagnosis. Although imaging techniques offer insights into tumor location and size, their efficacy in detecting early lesions remains constrained, often demanding prolonged time and considerable costs. Conversely, tissue biopsies, the "gold standard" for diagnosing BC, entail invasive procedures that cause patient discomfort and carry risks of complications. Furthermore, a reliable non-invasive method for early BC screening is lacking32,33. Hence, a pressing need arises to research and develop innovative technologies and methods, such as the integration of machine learning with transcriptome sequencing. This integration holds the promise to enhance the accuracy and early detection rate of BC diagnosis, ultimately offering improved medical services to patients. Overall, while the field of BC diagnosis confronts several challenges, it concurrently provides an opportunity to explore inventive diagnostic strategies and methodologies. Thus, identification of novel sensitive biomarker is very important the clinical prognosis of BC patients.

The critical role of mitochondria within cells goes beyond energy production, encompassing various biological processes, including cell survival, apoptosis, and signal transduction. Consequently, mitochondria may play a central role in tumor development, including BC. Several studies suggest a potential link between mitochondrial dysfunction and BC34,35. Tumor tissues from BC patients might exhibit abnormalities in mitochondrial function, including mitochondrial DNA mutations, alterations in mitochondrial membrane potential, and increased oxidative stress. These alterations have the potential to impair mitochondrial energy production and disrupt apoptotic pathways, thereby promoting the survival and proliferation of cancer cells. Moreover, BC progression is intricately connected to changes in metabolic pathways, which may also be associated with mitochondrial dysfunction. Some research suggests that tumor cells tend to favor glycolysis for energy production over oxidative phosphorylation. This shift in metabolic pathways, known as the " Warburg effect," could be influenced by changes in mitochondrial function6,36. In this study, we analyzed GSE13507 datasets and identified 752 DE-MRGs in BC patients. Through functional correlation analysis of 752 DE-MRGs, we have revealed their potential roles in the progression of BC. The analysis results indicated that these DE-MRGs were primarily involved in biological processes related to pattern specification, cell fate commitment, and transcription regulator complexes, which are closely associated with cell development and gene regulation. Additionally, KEGG pathway analysis has uncovered associations between these genes and neurodegenerative diseases (such as Huntington's disease, Parkinson's disease, Alzheimer's disease), cellular energy metabolism (oxidative phosphorylation), as well as metabolic pathways (such as valine, leucine, and isoleucine degradation, and the citrate cycle). Furthermore, the DO analysis indicated a correlation between these DE-MRGs and diseases such as muscular disorders, myopathy, muscle tissue diseases, and inherited metabolic disorders. In conclusion, the 752 DE-MRGs may participate in diverse biological processes and pathways during the progression of BC. These processes encompass cell development, gene regulation, energy metabolism, and neurodegenerative diseases. These findings suggested the intricate involvement of these genes in BC development, potentially influencing tumor growth, progression, metabolic anomalies, and associations with other diseases.

Machine learning combined with transcriptomic data offers several advantages in the screening of tumor biomarkers compared to traditional methods37. First, machine learning can handle high-dimensional transcriptomic data by extracting essential features to accurately identify gene expression patterns relevant to tumors. Second, machine learning can capture intricate nonlinear relationships and interactions among genes, unveiling molecular mechanisms underlying tumor development, which traditional methods may overlook. Moreover, machine learning enables personalized biomarker selection, tailoring diagnostic and treatment plans based on patients' transcriptomic data, thus enhancing precision38,39. In the realm of large-scale data analysis, machine learning's efficient processing capabilities are better equipped to uncover crucial information hidden within extensive datasets, providing timely decision support. Simultaneously, machine learning techniques rapidly generate predictive models, expediting decision-making processes with increased efficiency compared to traditional methods. Furthermore, machine learning can uncover novel biological insights, offering clues to new mechanisms of tumor development and guiding further research and therapeutic strategies40,41. Overall, the amalgamation of machine learning and transcriptomic data in tumor biomarker screening offers advantages by delivering more accurate, comprehensive, and personalized information, thereby revolutionizing tumor diagnosis and treatment. In this study, we performed LASSO and SVM-RFE, and identified four critical diagnostic genes, including GLRX2, NMT1, OXSM and TRAF3IP3. Then, we used the above four genes and developed a novel diagnostic model. Its diagnostic value was further confirmed in GSE13507, GSE3167 and GSE37816 datasets. For BC, these findings hold significant clinical implications and potential application value. Firstly, the identification of these four diagnostic genes suggests their potential pivotal role in early detection and confirmation of BC. Secondly, the development of a novel diagnostic model held the promise of providing a more precise and reliable means of diagnosing BC, thus aiding healthcare professionals in better assessing disease progression and treatment strategies. Furthermore, our findings offered substantial support for the investigation of the molecular mechanisms underpinning BC. It has the potential to uncover the latent mechanistic roles of these diagnostic genes in the progression of BC. In summary, this research paved the way for new approaches to early detection and diagnosis of BC, providing valuable insights for the advancement of precision medicine and personalized treatment.

GLRX2 is a protein closely associated with mitochondrial function and redox balance. It belongs to the glutaredoxin family of proteins, whose members exhibit redox activity within cells, aiding in the maintenance of cellular redox states and thereby sustaining normal biological functions42,43. GLRX2 is primarily localized within mitochondria, allowing it to play a crucial role in regulating mitochondrial redox balance and other mitochondrial functions. Its structural features enable it to switch between oxidized and reduced forms, participating in redox reactions. As a member of the mitochondrial glutaredoxin family, GLRX2 is involved in ensuring proper protein folding, redox state, and related biological functions within mitochondria, contributing to the maintenance of normal mitochondrial functions, including energy production processes such as ATP synthesis44,45. To data, the potential function of GLRX2 in BC was rarely reported. In this study, we found that GLRX2 was highly expressed in BC specimens. The low GLRX2 expression group exhibited an activation trend in several biological processes and diseases, including asthma, drug metabolism (via the cytochrome P450 pathway), IgA production in the intestinal immune network, xenobiotic metabolism (via the cytochrome P450 pathway), systemic lupus erythematosus, and viral myocarditis. These findings suggested a potential association between low GLRX2 expression and the aberrant activation of these biological processes, as well as the development of multiple diseases. However, further research was required to confirm specific mechanisms and interrelationships. These discoveries contributed to a deeper understanding of GLRX2's roles in biology and disease development. In addition, we found that the levels of GLRX2 were positively associated with NK cells activated and Plasma cells. The study found a positive connection between GLRX2 levels and activated NK cells as well as plasma cells, suggesting that GLRX2 might play a role in boosting NK cell activity and contributing to immune responses. Additionally, the link between GLRX2 and plasma cells hinted at its potential involvement in regulating immune reactions and inflammation. These findings could point towards GLRX2 as a potential biomarker for monitoring immune system activity and response. Further research was needed to fully comprehend the mechanisms underlying these associations.

TRAF3IP3 is a gene that encodes a protein which plays a significant role in various cellular functions, including signal transduction, apoptosis (programmed cell death), and inflammation in biological processes46,47. AIP1 typically interacts with proteins like TRAF3 (Tumor Necrosis Factor Receptor-Associated Factor 3) and RIP1 (Receptor-Interacting Protein 1), participating in the regulation of multiple signaling pathways. Among these, TRAF3 is a signaling molecule that plays a critical role in immune responses mediated by Toll-like receptors, RIG-I-like receptors, and other receptors. AIP1's interaction with TRAF3 may play an important role in regulating these immune signaling pathways48,49. Furthermore, AIP1 is believed to have a significant role in the pathway of apoptosis. Apoptosis is a programmed cell death that cells regulate to maintain the normal development and function of tissues and organs. AIP1 may influence intracellular signal transduction and impact the regulation of apoptotic pathways. In recent years, several studies have reported the potential function of TRAF3IP3 in several types of tumors. For instance, Lin et al. reported that high TRAF3IP3 levels in glioma are linked to poorer survival, possibly due to its role in promoting glioma growth through ERK signaling. TRAF3IP3 might serve as a prognostic biomarker for glioma50. However, the function of TRAF3IP3 in BC has not been investigated. In this study, we observed that TRAF3IP3 expression was distinctly decreased in BC specimens suggesting it as a tumor promotor in BC. Moreover, we found that TRAF3IP3 may play a role in regulating immune responses, antigen processing and presentation, cell adhesion, and chemokine signaling. These findings indicated that TRAF3IP3 could have significant functions in modulating immunity and cellular communication during the development of BC.

NMT1 is a gene that encodes a protein. The protein encoded by NMT1 plays a crucial role in cellular processes involving protein modification and signal transmission51,52. Belonging to the acyltransferase enzyme family, the protein produced by NMT1 is primarily responsible for attaching myristic acid molecules to amino acid residues of other proteins, a process known as N-myristoylation. This common cellular protein modification, N-myristoylation, affects protein localization, interactions, and function. Specifically, NMT1 catalyzes the N-myristoylation reaction, linking myristic acid molecules to amino acid residues of target proteins. This modification can impact various cellular processes, including signal transduction, apoptosis, and proteinprotein interactions53,54. NMT1's role in these processes is likely associated with regulating the function, stability, and localization of specific proteins. Previously, several studies have reported that NMT1 served as a tumor promotor in several tumors. For instance, Deng et al. showed that blocking N-myristoyltransferase at the genetic level breast cancer cell proliferation, migration, and invasion were all inhibited by NMT1 through the stress-activated c-Jun N-terminal kinase pathway55. In BC, elevated NMT1 expression was found to be inversely correlated with overall survival, indicating that NMT1 overexpression is associated with a poor prognosis. Moreover, increased levels of NMT1 were observed to facilitate cancer progression while simultaneously inhibiting autophagy both in vitro and in vivo56. Based on our findings, a comprehensive analysis suggested that NMT1 may have a multifaceted role in BC. Elevated NMT1 expression could be linked to interactions involving the extracellular matrix and neuroactive ligand receptor pathways, implying a potential involvement of NMT1 in tumor cell interactions with the extracellular matrix and neuro-pathways. Conversely, reduced NMT1 expression may relate to metabolic pathways (ascorbate and aldarate metabolism, starch and sucrose metabolism) and the TGF-beta signaling pathway, indicating that NMT1 might influence tumor cell metabolism and growth regulation. In this study, we also found that NMT1 was highly expressed in BC specimens and its knockdown suppressed the proliferation of BC cells, which was consistent with previous findings.

However, there were several limitations in this study. Firstly, the GEO datasets were the primary resources for our clinical data. The majority of its patients are either White, Black, or Latinx. Our results should not be generalized to patients of different races without further investigation. The current research was motivated by the statistical analysis of previously collected data; nevertheless, an optimum threshold must be established before the findings may be applied clinically. Secondly, more experiments are needed to determine the role of these essential diagnostic genes and their protein expression levels in the etiology and development of BC.

See the rest here:
Machine-learning prediction of a novel diagnostic model using mitochondria-related genes for patients with bladder ... - Nature.com

Topics – The ultimate guide to machine learning – Charity Digital News

Machine learning picked the TV show you watched last night. It likely picked the music that youre currently playing. It almost certainly led you to the current article youre reading. All media recommendations, in fact, are based on machine learning and most uses of artificial intelligence (AI) involve machine learning in some form.

As explained by MIT Sloan professor, Thomas W Malone: In the last five or 10 years, machine learning has become a critical way, arguably the most important way, most parts of AI are done. Most advances in AI, including generative AI, depend on elements of machine learning.

With the growing ubiquity of machine learning, and misinterpretations that follow confusing terminology, we thought wed write an article that makes everything as simple as possible. So, with that in mind, lets start from the top, with definitions of AI and machine learning.

Skip to: What is artificial intelligence?

Skip to: The different branches of AI

Skip to: The definition of machine learning

Skip to: How machine learning actually works

Skip to: Real-world examples of machine learning

To define ML, you need to first define AI.

AI works by using iterative, fast processing, and intelligent algorithms, married with huge amount of data. The tech learns automatically from patterns or features of the data and uses that information to improve processing and algorithms.

AI acts as a simulation of human intelligence in machines that are programmed to think like humans.Indeed, AI refers to any machine that exhibits traits associated with a human mind, such as learning and problem-solving.

There are myriad different branches of AI. Neural networks, for example, effectively learn through external inputs, relaying information between each unit of input. Neural networks are made up of interconnected units, which allowrepeat processes to find connections and derive meaning from previously meaningless data.

Neural networks are a form of machine that takes inspiration from the workings of the human brain. Examples of a neural network includesales forecasting, industrial process control, customer research, data validation, and eventargeted marketing.

Deep learning uses extensive neural networks with various layers of processing units. Deep learning utilises the vast advances of computing power and training techniques to learn complicated patterns, employing massive data sets.Face ID authenticationis an often-cited example of deep learning, with biometric tech employing a deep learning framework to detect features from users faces and match them with previous records.

Natural learning processing is a commonly used for of AI. Natural learning processing relies on the ability of computers to analyse, understand, and generate human language particularly around speech. The most common form is chatbots. Natural learning processing, at more evolved stages, allows humans to communicate with computers using normal language and ask them to perform certain tasks.

Expert systems use AI to mimic the behaviour of humans or organisations that possess specific knowledge or experience. Expert systems are not designed to replace particular roles but assist complex decisions. Expert systems aid decision-making processes by using data, in-depth knowledge, alongside facts and heuristics.

Expert systems are typically employed in technical vocations, such as science, mechanics, mathematics, and medicine. They are used toidentify cancerin early stages, for example, or toalert dentists to unknown organic molecules.

Fuzzy logic is a rule-based system that aids decision-making. Fuzzy logic uses data, experience, and knowledge to advance decision-making and assess how true something might be on a scale of 0-1. Fuzzy logic answers a question with a number, such as 0.4 or 0.8, and aims to overcome the binary human response of true and false and give degrees of truth over vague concepts.

The application of fuzzy logic appears in low-level machines to perform simplistic tasks, such as controlling exposure in cameras and defining the timing of washing machines.

These are the main areas of AI, especially in relation to machine learning. That might seem like a lot to take in, and the definitions can be hard to absorb. Thats why we created an effective glossary, which you can access here: A glossary of artificial intelligence terms and definitions.

All of the above branches of AI all closely relate to machine learning. Neural networks are a type of machine learning process. Deep learning is a subset of machine learning. Natural learning processing combines machine learning models with computational linguistics. Expert systems provides a different model to machine learning, with a stricter set of rules. Fuzzy logic is a method of machine learning that has been developed to extract patterns from a data set.

So machine learning is pivotal to understanding so many areas of AI. But what is machine learning? Machine learning was defined in the 1950s by Arthur Samuel: The field of study that gives computers the ability to learn without being explicitly programmed. The absence of explicit rules, or the expectation that the rules will evolve, is the core element of machine learning. Machine learning asks computers to program themselves.

It starts, as everything concerning AI starts, with data. Machine learning focusses on using that data and complex algorithms to allow AI systems to imitate the way humans learn, gradually improving accuracy. As ever with AI, the more data, the better the results. Machine learning is an analytic model, which allows software applications to become more accurate at predicting outcomes.

Machine learning, as highlighted by MIT Sloan, can be descriptive (explain what happened), predictive (predict what happens), and prescriptive (explain how to make something happen).

There are three subcategories of machine learning. The first is supervised machine learning, which refers to models trained on labelled data sets. These grow more accurate over time, though AI drift remains a concern. Supervised models are the most common form of machine learning.

The second is unsupervised machine learning. Unsupervised models look for patterns in data that are specially not labelled. Unsupervised models find patterns that people are not looking to find, which can provide unexpected insight. Unsupervised models are often used in sales and marketing to find opportunities that have been missed, or to provide options for engagement.

The third, and final, is reinforcement machine learning. Such models rely on trial and error, allowing the models to grow through establishing a reward system. Reinforcement learning can train autonomous vehicles, for example, by alerted them to the right, and wrong, decisions. Reinforcement models are built on the basic premise of positive/negative reinforcement.

The above offers an overview of machine learning, providing you with an overarching glimpse of how it works. But how does it actually work? What are the technical processes that inform machine learning? We cover that next, so its about to get a bit technical.

Machine learning, as shown above, has so many applications. And while models are often trained for various purposes, and require different forms of training, some elements inform most machine learning models. Below is a step-by-step guide to how machine learning actually works.

Data collection: The first step is always data collection. Data is typically gathered from various sources, which could be structured or unstructured. The best outcomes typically rely on reliable data, which means data that is clean, concise, accurate, and enriched.

Data processing and standardisation: The data is processed to ensure its suitable for analysis. That process depends on effectively cleaning the data, as above, then transforming it through data normalisation, rendering data standard and scaled, and splitting into relevant training sets.

Training: The model, selected from one of the three mentioned above, is then trained on the clean and standardised data. The algorithm, during training, will likely adjust its parameters to minimise the difference between predictions and actual outcomes in the training data. That process often involves optimisation techniques, such as gradient descent and stochastic gradient descent.

Evaluation: The models performance is evaluated using the testing data. The testing phase assesses how the model generalises to unseen data, whether it can make predictions, decisions, or suggestions, depending on the desired outcome of the model, as explained above.

Fine-tuning: The results may lead to changes. That might mean adjusting hyperparameters or suggesting different features to improve performance. Changing hyperparameters can be done manually, or through automated means, and is usually conducted iteratively. These typically follow common techniques, such as Bayesian optimisation or grid search. Once the fine-tuning is complete, the model will likely go through the evaluation phase again to check success.

Deployment: Once the machine learning model is trained and evaluated satisfactorily, it can be deployed to make predictions or decisions on new, unseen data. That might involve integrating the model into software applications or systems where it can automate tasks or provide insights.

Re-evaluation: Machine learning is always an iterative process. Models need to be continually refined and improved, based on feedback and new data. Models can also be subject to AI drift, which can reduce the accuracy and validity of the model. Taking an iterative approach to evaluation helps the machine learning model remain effective and relevant over time.

So, we know the definition of machine learning, the various subcategories, and how machine learning models work in practice. Now lets look at some real-world examples of machine learning, with reference to the application and some familiar examples that you likely know.

Machine learning can recognise patterns, objects, and images. One common approach is through the use of convolutional neural networks, which are perfect for recognition, classification, and segmentation. The tech works by utilising neurons to automatically learn features from the images, enabling them to identify objects with high accuracy.

Real-world applications include facial recognition, a controversial piece of tech prone to practicing (and furthering) bias. Medical imaging relies on machine learning image analysis. And, on many smart phones, deciphering people by their faces also relies on image detection.

Machine learning can predict and suggest items based on preferences and user behaviour. The systems. One common approach is through content-based filtering, in which recommendations are made based on the characteristic of items and users historical preferences. In the simplest terms, the machine takes all your data, as well as the data of all potential suggestions, and simply aligns recommendations based on successful patterns in the past.

There are some obvious real-world applications, such as Netflix and YouTube suggestions, the information that appears on your social media feeds, product recommendations on almost any shopping website, the next song that plays on Spotify, and so on.

Machine learning enhances the capabilities of chatbots, making them more intelligent, more reactive, and more capable of providing an adequate response to user queries. Chatbots use natural language processing, and natural language understanding, along with intent recognition and sentiment analysis to allow the chatbot to respond in the best way. Machine learning enhances through personalisation, allowing chatbots to learn through each interaction.

Real-world applications includeWaterAids Sellu, which provides an immersive experience and gives insight into the work of the charity. Another is Is This OK?, result of a partnership betweenRunaway HelplineandChildline, withfunding provided byChildren in Need, which provides support and useful information for teens that are feeling pressured and confused.

Machine learning algorithms analyse text data from social media, customer reviews, or surveys to determine the sentiment (positive, negative, or neutral) associated with a particular topic, product, or service. That informs recommendations, as above, but can also provide valuable information to organisations, such as customer opinions, market trends, and brand reputation. Sentiment analysis is often used for inform decision-making.

Link:
Topics - The ultimate guide to machine learning - Charity Digital News

Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies … – Nature.com

In this section, we first describe how Ecological Momentary Assessments work and how they differentiate from assessments that are collected within a clinical environment. Second, we present the studies and ML use cases for each dataset. Next, we introduce the non-ML baseline heuristics and explain the ML preprocessing steps. Finally, we describe existing train-test-split approaches (cross-validation) and the splitting approaches at the user- and assessment levels.

Within this context, ecological means within the subjects natural environment", and momentary within this moment" and ideally, in real time16. Assessments collected in research or clinical environments may cause recall bias of the subjects answers and are not primarily designed to track changes in mood or behavior longitudinally. Ecological Momentary Assessments (EMA) thus increase validity and decrease recall bias. They are suitable for asking users in their daily environment about their state of being, which can change over time, by random or interval time sampling. Combining EMAs and mobile crowdsensing sensor measurements allows for multimodal analyses, which can gain new insights in, e.g., chronic diseases8,15. The datasets used within this work have EMA in common and are described in the following subsection.

From ongoing projects of our team, we are constantly collecting mHealth data as well as Ecological Momentary Assessments6,17,18,19. To investigate how the machine learning performance varies based on the splits, we wanted different datasets with different use cases. However, to increase comparability between the use cases, we created multi-class classification tasks.

We train each model using historical assessments, the oldest assessment was collected at time tstart, the latest historical assessment at time tlast. A current assessment is created and collected at time tnow, a future assessment at time tnext. Depending on the study design, the actual point of time tnext may be in some hours or in a few weeks from tnow. For each dataset and for each user, we want to predict a feature (synonym, a question of an assessment) at time tnext using the features at time tnow. This feature at time tnext is then called the target. For each use case, a model is trained using data between tstart and tlast, and given the input data from tnow, it predicts the target at tnext. Figure1 gives a schematic representation of the relevant points of time tstart,tlast,tnow, and tnext.

At time tstart, the first assessment is given; tlast is the last known assessment used for training, whereas tnow is the currently available assessment as input for the classifier and the target is predicted at time ttext.

To increase comparability between the approaches, we used the same model architecture with the same pseudo-random initialisation. The model is a Random Forest classifier with 100 trees and the Gini impurity as the splitting criterion. The whole coding was in Python 3.9, using mostly scikit-learn, pandas and Jupyter Notebooks. Details can be found on GitHub in the supplementary material.

For all datasets that we used in this study, we have ethical approvals (UNITI No. 20-1936-101, TYT No. 15-101-0204, Corona Check No. 71/20-me, and Corona Health No. 130/20-me). The following section provides an overview of the studies, the available datasets with characteristics, and then describes each use case in more detail. An brief overview is given in Table1 with baseline statistics for each dataset in Table2.

To provide some more background info about the studies: The analyses happen with all apps on the so-called EMA questionnaires (synonym: assessment), i.e., the questionnaires that are filled out multiple times in all apps and the respective studies. This can happen several times a day (e.g., for the tinnitus study TrackYourTinnitus (TYT)) or at weekly intervals (e.g., studies in the Corona Health (CH) app). Nevertheless, the analysis happens on the recurring questionnaires, which collect symptoms over time and in the real environment through unforeseen (i.e., random) notifications.

The TrackYourTinnitus (TYT) dataset has the most filled-out assessments with more than 110,000 questionnaires as by 2022-10-24. The Corona Check (CC) study has the most users. This is because each time an assessment is filled out, a new user can optionally be created. Notably, this app has the largest ratio of non-German users and the youngest user group with the largest standard deviation. The Corona Health (CH) app with its studies Mental health for adults, adolescents and physical health for adults has the highest proportion of German users because it was developed in collaboration with the Robert Koch Institute and was primarily promoted in Germany. Unification of treatments and Interventions for Tinnitus patients (UNITI) is a European Union-wide project, which overall aim is to deliver a predictive computational model based on existing and longitudinal data19. The dataset from the UNITI randomized controlled trial is described by Simoes et al.20.

With this app, it is possible to record the individual fluctuations in tinnitus perception. With the help of a mobile device, users can systematically measure the fluctuations of their tinnitus. Via the TYT website or the app, users can also view the progress of their own data and, if necessary, discuss it with their physician.

The ML task at hand is a classification task with target variable Tinnitus distress at time tnow and the questions from the daily questionnaire as the features of the problem. The targets values range in [0,1] on a continuous scale. To make it a classification task, we created bins with step size of 0.2 resulting in 5 classes. The features are perception, loudness, and stressfulness of tinnitus, as well as the current mood, arousal and stress level of a user, the concentration level while filling out the questionnaire, and perception of the worst tinnitus symptom. A detailed description of the features was already done in previous works21. Of note, the time delta of two assessments of one user at tnext and tnow varies between users. Its median value is 11 hours.

The overall goal of UNITI is to treat the heterogeneity of tinnitus patients on an individual basis. This requires understanding more about the patient-specific symptoms that are captured by EMA in real time.

The use case we created at UNITI is like that of TYT. The target variable encumbrance, coded as cumberness, which was also continuously recorded, was divided into an ordinal scale from 0 to 1 in 5 steps. Features also include momentary assessments of the user during completion, such as jawbone, loudness, movement, stress, emotion, and questions about momentary tinnitus. The data was collected using our mobile apps7. Here, of note: on average, the median time gap between two assessment is 24 hours for each user.

At the beginning of the COVID-19 pandemic, it was not easy to get initial feedback about an infection, given the lack of knowledge about the novel virus and the absence of widely available tests. To assist all citizens in this regard, we launched the mobile health app Corona Check together with the Bavarian State Office for Health and Food Safety22.

The Corona Check dataset predicts whether a user has a Covid infection based on a list of given symptoms23. It was developed in the early pandemic back in 2020 and helped people to get quick estimate for an infection without having an antigen test. The target variable has four classes: First, suspected coronavirus (COVID-19) case", second, symptoms, but no known contact with confirmed corona case", third, contact with confirmed corona case, but currently no symptoms", and last, neither symptoms nor contact".

The features are a list of Boolean variables, which were known at this time to be typically related with a Covid infection, such as fever, a sore throat, a runny nose, cough, loss of smell, loss of taste, shortness of breath, headache, muscle pain, diarrhea, and general weakness. Depending on the answers given by a user, the application programming interface returned one of the classes. The median time gap of two assessments for the same user is 8 hours on average with a much larger standard deviation of 24.6 days.

The last four use cases are all derived from a bigger Covid-related mHealth project called Corona Health6,24. The app was developed in collaboration with the Robert Koch-Institute and was primarily promoted in Germany, it includes several studies about the mental or physical health, or the stress level of a user. A user can download the app and then sign up for a study. He or she will then receive a baseline one-time questionnaire, followed by recurring follow-ups with between-study varying time gaps. The follow-up assessment of CHA has a total of 159 questions including a full PHQ9 questionnaire25. We then used the nine questions of PHQ9 as features at tnow to predict the level of depression for this user for tnext. Depression levels are ordinally scaled from None to Severe in a total of 5 classes. The median time gap of two assessments for the same user is 7.5 days. That is, the models predict the future in this time interval.

Similar to the adult cohort, the mental health of adolescents during the pandemic and its lock-downs is also captured by our app using EMA.

A lightweight version of the mental health questionnaire for adults was also offered to adolescents. However, this did not include a full PHQ9 questionnaire, so we created a different use case. The target variable to be classified on a 4-level ordinal scale is perceived dejection coming from the PHQ instruments, features are a subset of quality of live assessments and PHQ questions, such as concernment, tremor, comfort, leisure quality, lethargy, prostration, and irregular sleep. For this study, the median time gap of two follow up assessments is 7.3 days.

Analogous to the mental health of adults, this study aims to track how the physical health of adults changes during the pandemic period.

Adults had the option to sign up for a study with recurring assessments asking for their physical health. The target variable to be classified asks about the constraints in everyday life that arise due to physical pain at tnext. The features for this use case include aspects like sport, nutrition, and pain at tnow. The median time gap of two assessments for the same user is 14.0 days.

This additional study within the Corona Health app asks users about their stress level on a weekly basis. Both features and target are assessed on a five-level ordinal scale from never to very often. The target asks for the ability of stress management, features include the first nine questions of the perceived stress scale instrument26. The median time gap of two assessments for the same user on average is 7.0 days.

We also want to compare the ML approaches with a baseline heuristic (synonym: Baseline model). A baseline heuristic can be a simple ML model like a linear regression or a small Decision Tree, or alternatively, depending on the use case, it could also be a simple statement like The next value equals the last one". The typical approach for improving ML models is to estimate the generalization error of the model on a benchmark data set when compared to a baseline heuristic. However, it is often not clear, which baseline heuristic to consider, i.e.: The same model architecture as the benchmark model, but without tuned hyperparameters? A simple, intrinsically explainable model with or without hyperparameter tuning? A random guess? A naive guess, in which the majority class is predicted? Since we have approaches on a user-level (i.e., we consider users when splitting) and on an assessment-level (i.e., we ignore users when splitting), we also should create baseline heuristics on both levels. We additionally account for within-user variance in Ecological Momentary Assessments by averaging a users previously known assessments. Previously known here means that we calculate the mode or median of all assessments of a user that are older than the given timestamp. In total, this leads to four baseline heuristics (user-level latest, user-level average, assessment-level latest, assessment-level average) that do not use any machine learning but simple heuristics. On the assessment-level, the latest known target or the mean of all known targets so far is taken to predict the next target, no matter of the user-id of this assessment. On the user-level, either the last known, or median, or mode value of this user is taken to predict the target. This, in turn, leads to a cold-start problem for users that appear for the first time in a dataset. In this case, either the last known, or mode, or median of all assessments that are known so far are taken to predict the target.

Before the data and approaches could be compared, it was necessary to homogenize them. In order for all approaches to work on all data sets, at least the following information is necessary: Assessment_id, user_id, timestamp, features, and the target. Any other information such as GPS data, or additional answers to questions of the assessment, we did not include into the ML pipeline. Additionally, targets that were collected on a continuous scale, had to be binned into an ordinal scale of five classes. For an easier interpretation and readability of the outputs, we also created label encodings for each target. To ensure consistency of the pre-processing, we created helper utilities within Python to ensure that the same function was applied on each dataset. For missing values, we created a user-wise missing value treatment. More precisely, if a user skipped a question in an assessment, we filled the missing value with the mean or mode (mode = most common value) of all other answers of this user for this assessment. If a user had only one assessment, we filled it with the overall mean for this question.

For each dataset and for each script, we set random states and seeds to enhance reproducibility. For the outer validation set, we assigned the first 80 % of all users that signed up for a study to the train set, the latest 20% to the test set. To ensure comparability, the test users were the same for all approaches. We did not shuffle the users to simulate a deployment scenario where new users join the study. This would also add potential concept drift from the train to the test set and thus improve the simulation quality.

For the cross-validation within the training set, which we call internal validation, we chose a total of 5 folds with 1 validation fold. We then applied the four baseline heuristics (on user level and assessment level with either latest target or average target as prediction) to calculate the within-train-set performance standard deviation and the mean of the weighted F1 scores for each train fold. The mean and standard deviation of the weighted F1 score are then the estimator of the performance of our model in the test set.

We call one approach superior to another if the final score is higher. The final score to evaluate an approach is calculated as:

$${f}_{1}^{final}={f}_{1}^{test}-alpha {sigma }left({f}_{1}^{train}right)$$

(1)

If the standard deviation between the folds during training is large, the final score is lower. The test set must not contain any selection bias against the underlying population. The pre-factor of the standard deviation is another hyperparameter. The more important model robustness for the use case, the higher should be set.

Within cross-validation, there exist several approaches on how to split up the data into folds and validate them, such as the k-fold approach with k as the number of folds in the training set. Here, k1 folds form the training folds and one fold is the validation fold27. One can then calculate k performance scores and their standard deviation to get an estimator for the performance of the model in the test set, which itself is an estimator for the models performance after deployment (see also Fig.2).

Schematic visualisation of the steps required to perform a k-fold cross-validation, here with k=5.

In addition, there exist the following strategies: First, (repeated) stratified k-fold, in which the target distribution is retained in each fold, which can also be seen in Fig.3. After shuffling the samples, the stratified split can be repeated3. Second, leave-one-out cross-validation28, in which the validation fold contains only one sample while the model has been trained on all other samples. And third, leave-p-out cross-validation, in which (left(begin{array}{c}n\ pend{array}right)) train-test-pairs are created with n equals number of assessments (synonym sample)29.

While this approach retains the class distribution in each fold, it still ignores user groups. Each color represents a different class or user id.

These approaches, however, do not always focus on samples that might belong to our mHealth data peculiarities. To be more specific, they do not account for users (syn. groups, subjects) that generate daily assessments (syn. samples) with a high variance.

To precisely explain the splitting approaches, we would like to differentiate between the terms folds and sets. We call a chunk of samples (synonym: assessments, filled-out questionnaires) a set on the outer split of the data, for which we cut-off the final test set. However, within the training set, we then split further to create training and validation folds. That is, using the term fold, we are in the context of cross validation. When we use the term set, then we are in the outer split of the ML pipeline. Figure4 visualizes this approach. Following this, we define 4 different approaches to split the data. For one of them we ignore the fact that there are users, for the other three we do not. We call these approaches user-cut, average-user, user-wise and time-cut. All approaches have in common that the first 80 % of all users are always in the training set and the remaining 20 % are in the test set. A schematic visualization of the splitting approaches is shown in Fig.5. Within the training set, we then split on user-level for the approaches user-cut, average-user and user-wise, and on assessment-level for the approach time-cut.

In the second step, users are ordered by their study registration time, with the initial 80 % designated as training users and the remaining 20 % as test users. Subsequently, assessments by training users are allocated to the training set, and those by test users to the test set. Within the training set, user grouping dictates the validation approach: group-cross-validation is applied if users are declared as a group, otherwise, standard cross-validation is utilized. We compute the average f1 score, ({f}_{1}^{train}), from training folds and the f1 score on the test set, ({f}_{1}^{test}). The standard deviation of ({f}_{1}^{train},sigma ({f}_{1}^{train})), indicates model robustness. The hyperparameter adjusts the emphasis on robustness, with higher values prioritizing it. Ultimately, ({f}_{1}^{final}), which is a more precise estimate if group-cross-validation is applied, offers a refined measure of model performance in real-world scenarios.

Yellow means that this sample is part of the validation fold, green means it is part of a training fold. Crossed out means that the sample has been dropped in that approach because it does not meet the requirements. Users can be sorted by time to accommodate any concept drift.

In the following section, we will explain the splitting approaches in more detail. The time-cut approach ignores the fact of given groups in the dataset and simply creates validation folds based on the time the assessments arrive in the database. In this example, the month, in which a sample was collected, is known. More precisely, all samples from January until April are in the training set while May is in the test set. The user-cut approach shuffles all user ids and creates five data folds with distinct user-groups. It ignores the time dimension of the data, but provides user-distinct training and validation folds, which is like the GroupKFold cross-validation approach as implemented in scikit-learn30. The average-user approach is very similar to the user-cut approach. However, each answer of a user is replaced by the median or mode answer of this user up to the point in question to reduce within-user-variance. While all the above-mentioned approaches require only one single model to be trained, the user-wise approach requires as many models as distinct users are given in the dataset. Therefore, for each user, 80 % of his or her assessments are used to train a user-specific model, and the remaining 20% of the time-sorted assessments are used to test the model. This means that for this approach, we can directly evaluate on the test set as each model is user specific and we solved the cold-start problem by training the model on the first assessments of this user. If a user has less than 10 assessments, he or she is not evaluated on that approach.

Approval for the UNITI randomized controlled trial and the UNITI app was obtained by the Ethics Committee of the University Clinic of Regensburg (ethical approval No. 20-1936-101). All users read and approved the informed consent before participating in the study. The study was carried out in accordance with relevant guidelines and regulations. The procedures used in this study adhere to the tenets of the Declaration of Helsinki. The Track Your Tinnitus (TYT) study was approved by the Ethics Committee of the University Clinic of Regensburg (ethical approval No. 15-101-0204). The Corona Check (CH) study was approved by the Ethics Committee of the University of Wrzburg (ethical approval no. 71/20-me) and the universitys data protection officer and was carried out in accordance with the General Data Protection Regulations of the European Union. The procedures used in the Corona Health (CH) study were in accordance with the 1964 Helsinki declaration and its later amendments and was approved by the ethics committee of the University of Wrzburg, Germany (No. 130/20-me). Ethical approvals include secondary use. The data from this study are available on request from the corresponding author. The data are not publicly available, as the informed consent of the participants did not provide for public publication of the data.

Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.

See the article here:
Practical approaches in evaluating validation and biases of machine learning applied to mobile health studies ... - Nature.com

Application of power-law committee machine to combine five machine learning algorithms for enhanced oil recovery … – Nature.com

This study combines the predictions of five machine learning models by means of the PLCM method to increase the generalization of the model in the context of EOR screening. This study not only assesses the individual machine learning methods in predicting the most suitable EOR techniques, but also takes benefit from the PLCM method optimized by the PSO to increase the prediction accuracy, for the first time in the context of EOR screening. In this manner, the predictive tool is not limited to only one data-driven model, but also takes advantage of the strength points of different types of machine learning algorithms. Figure1 shows the flowchart of this study. First, the required dataset to build and evaluate the utilized models is collected. Then, the data is preprocessed, which includes encoding the textual data into numeric values and normalizing the variables into [0,1]. Then, the individual machine learning models are trained. The hyperparameters of the models are tuned using a grid search with fivefold cross-validation. After training the individual models, their outputs are combined using the PLCM method optimized by the PSO algorithm. Then, the performance of the utilized methods is compared in terms of quantitative and visual evaluation metrics. The metrics, including the accuracy, precision, recall, F1-score, confusion matrix, precision-recall curve, and Receiver Operating Characteristic (ROC) curve to analyze their ability to handle the class imbalance issue. In the end, a feature importance analysis is conducted to find out the most influential input variables on the prediction of suitable EOR techniques. Another specialty of this study is that it uses a more comprehensive dataset than those in the previous studies, which increases the generalization of the developed model.

General flowchart of the study.

In this study, a dataset including 2563 EOR projects (available in Supplementary Information) from 23 different countries applied to sandstone, carbonate, limestone, dolomite, unconsolidated sandstone, and conglomerate reservoirs was collected from the literature5,20,21,22,23,24,25,26,27 to develop the screening methods. The utilized variables include the formation type, porosity (%), permeability (mD), depth (ft), viscosity (cP), oil gravity (API), temperature (F), and the production mechanism before conducting EOR. The EOR techniques include injection of steam, hydrocarbon miscible, hydrocarbon immiscible, CO2 miscible, CO2 immiscible, carbonated water, low-salinity water, CO2 foam, nitrogen miscible, nitrogen immiscible, micellar polymer, surfactant/polymer, surfactant, cyclic steam drive, steam-assisted gas drive (SAGD), liquefied petroleum gas (LPG) miscible, in-situ combustion, polymer, alkaline/surfactant/polymer (ASP), hot water, microbial, air-foam, hydrocarbon miscible-WAG, and hydrocarbon immiscible-WAG. Table 2 reports the statistical analysis of the variables. Since formation is a categorical feature, it was converted to numerical values. Among fifteen different formation types, sandstone, carbonate, and dolomite are the most prevalent formation types with 45%, 10%, and 10% of the total data, respectively. To assess the accuracy of the developed models on unseen data, 85% of the data was used for training and the remaining 15% was used as blind test cases, and fivefold cross-validation is used for hyperparameter tuning. It is common to divide the dataset with a ratio of 70:15:15 as training, validation, and testing subsets. The validation subset is commonly used for tuning the hyperparameters of the models. Nonetheless, in the current study, 5-Fold cross validation was used to tune the hyperparameters, which does not require putting aside a portion of the data for validation. In this technique, the training subset is divided into K (5 in this study) non-overlapping folds. Then, the model is trained and validated K times with the fixed hyperparameters. One of the folds is used for validation and the others for training. Finally, the validation score is calculated as the average of scores over K repetitions. This is repeated for all configurations of the hyperparameters and the set of hyperparameters with the highest cross-validation score is selected. Thereby, as we did not need a separate validation subset, all samples, except for the testing subset, were used for training (85%).

One of the crucial steps before moving to model development is data preprocessing. One type of preprocessing is to encode textual values to numerical values, which is called label encoding. For example, the formation type, previous production mechanism, and EOR techniques are textual features, which were encoded as numbers. Another preprocessing step is scaling the data into similar intervals since the scale of the features differ significantly. For example, viscosity is in the order of 106, while porosity is in the order of tens. In this study, the features were normalized into [0,1] interval using ((X - X_{min } )/(X_{max } - X_{min } )), where (X_{min }) and (X_{max }) are the minimum and maximum of the features in the training subset.

ANN is a learning algorithm that is inspired by the human brain. ANN can figure out the relationship between the inputs and outputs without the need for complex mathematical or computational methods. Among the various types of ANN, the Multilayer Perceptron (MLP-ANN) stands out as the most commonly used28,29,30. The MLP includes three layers, namely input, hidden, and output layers31,32, as illustrated in Fig.2. As shown, each layer consists of computational units known as neurons. The number of neurons in the input and output layers is the same as the dimension of the input and output variables, respectively. The number of hidden layers and their size should be determined by trial and error. Each neuron is connected to all neurons of the previous layers, which represents a unique linear combination of the data coming in from previous layer. The linear combination takes place using a set of weights. For example, (W_{xh}) represents the set of weights mapping the inputs to the hidden layers, and (W_{ho}) represents the set of weights mapping the hidden neurons to the output layer. Another critical aspect of an ANN model is the activation function, which receives the results of the linear combination, known as activations, and determines the activation of each neuron. Including hidden layers with non-linear activation functions in an ANN empowers it to capture non-linear dependencies. The weights are learned during the training phase of the model, which is the ultimate goal of the training process. Using these weights, the outputs, represented by (hat{y}), are calculated by the feed-forward process as below.

$$hat{y} = fleft( {mathop sum limits_{i = 1} W_{ij} x_{i} + b_{j} } right),$$

(1)

where f isthe activation function; (b_{j}) is the hidden layer bias; (x_{i}) is theinput for the ith variable; and, (W_{ij}) is theconnection weight between the ith input and jth neuron.

Schematic structure of an ANN.

The learning process in an ANN is actually adjusting the weights and biases in the hidden layers using the backpropagation algorithm to minimize the loss function between the predicted and actual values28,33. In a multiclass classification problem, the outputs are converted to one-hot encoded vectors, where all elements of the vectors are zeros except for the element corresponding to that specific sample class. To handle multiclass classification, the categorical cross entropy is used as the loss function, which is defined as follows.

$$CCEleft( W right) = mathop sum limits_{i = 1}^{C - 1} y_{i} log left( {hat{y}_{i} } right),$$

(2)

where y denotes the vector of actual outputs and C is the number of classes. Each output in a multiclass problem is a vector of probabilities for each class. The probabilities are calculated using the Softmax activation function. To minimize the loss function, the gradient of the loss with respect to the weights and biases must be calculated and back propagated to all layers to update the weights. Given the gradient of the loss function, the weights can be updated as follows.

$$W^{t + 1} = W^{t} - eta nabla_{W} CCE,$$

(3)

where (W^{t + 1}) and (W^{t}) are the new and current weights, (eta) is the learning rate, and (nabla_{W} CCE) is the gradient of the loss function calculated by an optimization algorithm, such as Adam, Stochastic Gradient Descent (SGD), RMSprop, Adagrad, Momentum, Nestrov and Accelerated Gradient34,35.

ANNs offer a variety of hyperparameters that can be tuned to optimize the models performance. It includes options for controlling model structure, learning rates, and regularization. Furthermore, ANNs incorporate class weights into the loss function, addressing the problem of class-imbalance, which is useful for the problem understudy. It also supports multiclass classification. Accordingly, one of the utilized methods in this study is the ANN.

According to the explanations, the control parameters of the ANN are the number of hidden layers, number of neurons in the hidden layers, activation functions, the optimizer, and learning rate, which should be fine-tuned to achieve a satisfactory performance.

CatBoost is a gradient-boosting tree construction method36, which makes use of both symmetric and non-symmetric construction methods. In CatBoost, a tree is learned at each iteration with the aim of reducing the error made by previous trees. Figure3 shows the process of CatBoost tree building. In this figure, the orange and blue circles represent a dataset with two classes. The process starts with a simple initial model, assigning the average of the entire dataset to a single leaf node. Then, the misclassified samples (enlarged circles in Fig.3) are identified and new trees are added based on the gradient boosting approach. Afterward, the predictions are updated to the combination of the predictions made by all trees. By adding new trees at each iteration, the number of misclassified samples decreases. Adding the trees continues until either the minimum number of samples required for splits or the maximum depth of the trees is reached. For categorical features, the CatBoost algorithm employs a symmetric splitting method for each feature. Then, based on the type of the feature, it chooses one of the split methods for each feature to create a new branch for each category37.

Schematic of the CatBoost tree construction.

Considering a training dataset with (N) samples, where (X) is the matrix of inputs ((x_{1} ,; ldots ,;x_{N})) and (y) is the vector of outputs ((y_{1} ,; ldots ,;y_{N})), the goal is to find a mapping function, (f(X)), from the inputs to the outputs. Here, (f(X)) is the boosted trees. Just like the ANN, the CatBoost needs a loss function ((L(f))) to be minimized to perform the optimal tree building strategy.

Now, the learning process entails minimizing the (L(f)).

$$f^{*} (X) = arg ;mathop {min }limits_{f} L;(f) = arg ;mathop {min }limits_{f} mathop sum limits_{i = 1}^{N} L;(y_{i} ,;hat{y}_{i} ),$$

(4)

If the algorithm entails M gradient boosting steps, a new estimator hm can be added to the model.

$$f_{m + 1} ;(x_{i} ) = f_{m} ;(x_{i} ) + h_{m} ;(x_{i} ),$$

(5)

where (f_{m + 1} ;(x_{i} )) is the new model, and (h_{m} ;(x_{i} )) is the newly added estimator. The new estimator is determined by employing the gradient boosting algorithm, where the steepest descent obtains (h_{m} = - ;alpha_{m} g_{m}) where (alpha_{m}) is the step length and (g_{m}) is the gradient of the loss function.

Now, the addition of a new tree/estimator can be accomplished by

$$f_{m + 1} (x) = f_{m} (x) + left( {arg mathop {min }limits_{{h_{m} in H}} left[ {mathop sum limits_{i = 1}^{N} Lleft( {y_{i} , ;f_{m} (x_{i} ) + h_{m} (x_{i} ) } right)} right]} right);(x),$$

(6)

$$f_{m + 1} (x) = f_{m} (x) - alpha_{m} g_{m} .$$

(7)

By taking benefit from the gradient boosting approach, the ensemble of decision trees built by the CatBoost algorithm often leads to a high prediction accuracy. The CatBoost also uses a strategy known as ordered boosting to improve the efficacy of its gradient-boosting process. In this type of boosting, a specific order is used to train the trees, which is determined by their feature importance. This prioritizes the most informative features, resulting in more accurate models38. The algorithm offers a wide range of regularization methods, such as depth regularization and feature combinations, which helps prevent overfitting. This is specifically useful when dealing with complex datasets.

The CatBoost offers a range of control parameters to optimize the structure of the model. These parameters include the number of estimators, maximum depth of the trees, maximum number of leaves, and regularization coefficients. These control parameters are optimized in this study to obtain the best performance from the model.

KNN is a non-parametric learning algorithm proposed by Fix and Hodges39. This algorithm does not have a training step and determines the output of a sample based on the output of the neighboring samples10. The number of neighbors is denoted by K. With K=1, the label of the sample is as of the nearest sample. As the name of this algorithm implies, the K nearest neighbors are found based on the distance between the query sample and all samples in the dataset. Euclidean, Minkowski, Chebyshev, and Manhattan distances are some common distance measures. The Minkowski distance is a generalization of the Euclidean and the Manhattan distance with (p = 2) and (p = 1), respectively. p is the penalty term in Lp norm, which can be a positive integer. The distance between the samples greatly depends on the scale of the features. Therefore, feature scaling is of great importance40. After finding the K nearest samples to the new sample (query), its label is determined using Eq.(8).

$$hat{f}(x_{q} ) leftarrow {text{arg }};mathop {max }limits_{c in C} mathop sum limits_{i = 1}^{K} delta (c, ;f(x_{i} )), quad delta (a,;b) = 1 quad {text{if}};; a = b.$$

(8)

where (x_{q}) is the new sample, (f(x_{i} )) is the label of the ith neighboring sample, C denotes the number of classes, and (delta (a,;b)) is the Kronecker delta which is 1 if (a = b) and 0 otherwise. An extension to KNN is the distance-weighted KNN, where the inverse of the distances between the samples are used as the weights. In this manner, the prediction for the query sample will be

$$hat{f}(x_{q} ) leftarrow {text{arg }};mathop {max }limits_{c in C} mathop sum limits_{i = 1}^{K} w_{i} delta (c,; f(x_{i} )),quad delta (a,;b) = 1 quad {text{if}} ;;a = b,$$

(9)

where (w_{i}) is the inverse of the distance between the query sample and sample i, (w_{i} = 1/D(x_{q} ,;x_{i} )). Consequently, the closer neighbors will have a higher impact on the predicted label.

One distinctive feature of KNN that sets it apart from other machine learning methods is its ability to handle incomplete observations and noisy data41. This technique enables the identification of significant patterns within noisy data records. Another advantage of KNN is that it does not require any training and building and the model optimization can be done quite quickly. According to the above explanations, the controlling parameters of KNN are the number of neighbors (K), using/not using distance weighting, penalty terms, and the algorithm used to compute the nearest neighbors.

SVM is a binary classification algorithm introduced by Cortes and Vapink42. SVM can be implemented to solve problems with linear or non-linear behavior43,44. However, non-linear data should be mapped into a higher-dimensional space to make it linearly separable. This technique is called the kernel trick. The classification is done by a decision boundary which has the maximum margin from both classes. Figure4 shows the schematic of an SVM classifier for a binary classification task. The margins are constructed by finding the support vectors in each class and drawing the hyperplanes from the support vectors45. The hyperplanes are shown by dashed lines and the decision boundary is drawn between them. In this figure, the green circles represent the positive (+1) and the blue circles represent the negative (1) classes. The circles on the hyperplanes are the support vectors. The decision boundary with the maximum margin from the classes results in the highest generalization.

Schematic of a binary SVM.

By considering the mapping function (emptyset (X)) and inputs (X) and outputs (y), the equation of the decision boundary can be written as follows46:

$$W^{T} emptyset (X) + b = 0,$$

(10)

where W is the weight parameters and b is the bias term. The smallest perpendicular distance between the hyperplanes is known as the margin, which is double the distance between the support vectors and the decision boundary. Assuming that the data is separated by two hyperplanes with margin (beta), after rescaling W and b by (beta /2) in the equality, for each training example we have

$$y_{i} left[ {W^{T} emptyset (x_{i} ) + b} right] ge 1,quad i = left{ {1,;2, ldots ,;M} right}.$$

(11)

For every support vector ((X_{s} , ;y_{s})) the above inequality is an equality. Thereby, the distance between each support vector and the decision boundary, r, is as follows

$$r = frac{{y_{s} (W^{T} X_{s} + b)}}{left| W right|} = frac{1}{left| W right|},$$

(12)

where (left| W right|) is the L2 norm of the weights. Therefore, the margin between the two hyperplanes becomes (frac{2}{left| W right|}). The goal is to maximize (frac{2}{left| W right|}), which is equivalent to minimizing (frac{1}{2}W^{T} W). Consequently, the optimization problem of the SVM is:

$$begin{gathered} arg ;mathop {min }limits_{W,b} frac{1}{2}W^{T} W, hfill \ subject; to ;y_{i} left[ {W^{T} emptyset (x_{i} ) + b} right] ge 1,quad {text{for}};;i = 1,; ldots ,;M. hfill \ end{gathered}$$

(13)

Nonetheless, to increase the generalization of the model and avoid overfitting, slack variables ((xi))46,47 are used (see Fig.3), which allow the model to have some miss-classified samples during training. This approach is known as the soft margin approach. Now, the optimization problem becomes

$$begin{gathered} arg ;mathop {min }limits_{W,b} left( {frac{1}{2}W^{T} W + cmathop sum limits_{i} xi_{i} } right), hfill \ subject; to; y_{i} left[ {W^{T} emptyset (x_{i} ) + b} right] ge 1 - xi_{i} ,quad {text{for}};;i = 1,; ldots ,;M. hfill \ end{gathered}$$

(14)

where c is a regularization factor that controls the weight of the slack variables in the loss function. Equation(14) is a dual optimization problem, which is solved using the Lagrange approach. The Lagrange approach converts a dual-optimization problem to a standard one by incorporating the equality and inequality constraints to the loss function. Thereby, Eq.(14) becomes

$$begin{gathered} L(W,;b,;alpha ) = frac{1}{2}W^{T} W - mathop sum limits_{i = 1}^{M} alpha_{i} left[ {y_{i} left( {W^{T} emptyset (X_{i} ) + b} right) - 1} right], hfill \ subject; to ;;0 le alpha_{i} le c,quad i = 1,; ldots ,;M. hfill \ end{gathered}$$

(15)

where (alpha_{i})s are Lagrange multipliers. To minimize the above loss function, its derivatives with respect to W and b are set equal to zero. By doing this, we obtain (W = sumnolimits_{i = 1}^{M} {alpha_{i} y_{i} emptyset (X_{i} )}) and (sumnolimits_{i = 1}^{M} {alpha_{i} y_{i} = 0}). Plugging these back into the Lagrange gives the dual formulation.

$$begin{gathered} arg ;mathop {max }limits_{alpha } - frac{1}{2}mathop sum limits_{i,j = 1}^{M} alpha_{i} alpha_{j} y_{i} y_{j} emptyset (X_{i} )emptyset (X_{j} ) + mathop sum limits_{i = 1}^{M} alpha_{i} , hfill \ subject;; to; mathop sum limits_{i = 1}^{M} alpha_{i} y_{i} = 0, ;;0 le alpha_{i} le c, ;;i = 1,; ldots ,;M. hfill \ end{gathered}$$

(16)

Equation(16) is solved using a Quadratic Programming solver to obtain the Lagrange multipliers (alpha_{i}). (alpha_{i}) is non-zero only for the support vectors. Parameter b does not appear in the dual formulation, so it is determined separately from the initial constraints. Calculating (emptyset (X_{i} )emptyset (X_{j} )) is computationally expensive since it requires two mapping operations and one multiplication, especially if the data is high-dimensional. To tackle this problem, the Kernel trick is introduced, where (emptyset (X_{i} )emptyset (X_{j} )) is represented as a kernel function (K(X_{i} ,;X_{j} )) based on the Mercers Theorem48. Finally, after determining the Lagrange multipliers, the prediction for a new sample z is calculated as follows

$$y = signleft( {mathop sum limits_{i = 1}^{n} alpha_{i} y_{i} K(X_{i,} z) + b} right).$$

(17)

The kernel function should be determined by trial and error. Some of the commonly used kernels are the linear, polynomial, and radial basis function (RBF) kernels.

SVM is one of the most successful machine learning algorithms in hand-written digit recognition49,50. SVMs can handle high-dimensional data, making them suitable for tasks with a large number of features. Because of taking benefit from the maximum margin theory and slack variables, SVMs are resistant to overfitting. One special feature of the SVMs, making them different than other artificial intelligence tools, is the kernel trick that enables SVMs to solve different kinds of non-linear classification problems. The convex nature of the loss function of the SVM leads to a convex optimization problem, which ensures converging to a global optimum. Finally, memory efficiency due to using only support vectors to construct the model and ability to handle class-imbalance by incorporating the class weights to the loss function are two other advantages of the SVMs making them suitable for the EOR screening problem in this study.

According to above explanations, some of the most important control parameters of the SVM are the kernel function, regularization factor (c), the degree of polynomial kernels, the intercept of polynomial kernels (coef0), and class weights. Class weights are used to tackle the class-imbalance issue by giving larger weights to rare classes in calculating the loss function.

Since SVM is a binary classifier, to perform multi-class classification, one-to-rest or one-to-one approaches are used. In this study, the one-to-rest approach is used, where (C) SVM models are trained. Each SVM model predicts membership of the samples in one of the C classes.

In the context of machine learning, Random Forest (RF) is an ensemble learning technique that builds a multitude of decision trees during training and combines their outputs to make more accurate and robust predictions51. RF is a supervised learning method, suitable for classification and regression tasks. Each tree in the forest is constructed independently, using a random subset of the features and samples with replacement from the training data52. This randomness adds diversity to the decision-making process, preventing the model from too much focusing on idiosyncrasies in the data. An RF takes a random approach to selecting a subset of input variables/features (controlled by the maximum number of features), and performs the optimal split to divide a node based on a split criterion. Avoiding tree pruning ensures maximal tree growth. As a result, a multitude of trees are constructed, and the model employs a voting mechanism to determine the most prevalent class in a classification task.

Each tree makes its own prediction, and the final decision is determined by the majority voting paradigm. This approach not only enhances the prediction accuracy of the model but also makes it stronger against overfitting. Figure5 shows the schematic of a random forest where n trees are used to make a prediction. Each subset is randomly selected from the dataset and divided into two parts, including the bag and out-of-bag (OOB) parts. The data in each bag is used to build a tree and the data in OOB is used to test that tree. The OOB subset serves as an ongoing and unbiased estimation of the general prediction error, predating the verification of prediction accuracy through the independent testing subset for the aggregated results. When (X) is inputted to the ensemble, each tree provides a separate output ((o_{1} ,; ldots , ;o_{n})). In the end, the ultimate class of the inputs is determined by the same approach given in Eq.(8).

Schematic of the random forest tree construction.

The RF produces competing results to boosting and bagging, without any alteration to the training set. It minimizes the bias by incorporating a random sample predictor before each node segmentation. The RF model can handle high-dimensional data, without need for feature selection. Its implementation in Python is relatively straightforward, boosting training speeds and easy parallelization. Given these advantages, it is becoming increasingly popular among data scientists52,53.

According to the above explanations, the control parameters of a random forest are the split criterion, maximum depth of trees, the number of estimators, and the maximum number of features. These control parameters are fine-tuned to achieve the best performance. There is also another control parameter, which is the minimum number of samples required to split a node, but it is not investigated in this study.

A committee machine is a technique to merge the output of a multitude of predictive models to come up with a single prediction33. The benefit of this technique is to take advantage of the results of different alternatives for modeling a particular problem, instead of using only one model. The individual models are selected in such a way that at least one model from each type of machine learning models is included. Thereby, we can take benefit from the strength points of different types of learning algorithms. By using the PLCM technique, the chance of overfitting can be lowered33. There are two main approaches to combine the output of individual models, namely the static and dynamic approaches. In the static method, a linear combination of the individual outputs is used to get the ultimate output, while the dynamic approach uses a non-linear combination of the outputs. In this study, the dynamic approach with a power-law model is used to accomplish the integration task. Equation(18) shows the power-law model.

$$y = mathop sum limits_{i = 1}^{5} alpha_{i} y_{i}^{{beta_{i} }} ,$$

(18)

where (y) is the ultimate output, (alpha_{i}) and (beta_{i}) are the coefficients that must be optimized to achieve the goal of the power-law committee machine, and (y_{i}) is the output of the (i)-th individual predictive model. In this study, the coefficients of the power-law model ((alpha_{i}) and (beta_{i})) are optimized by the PSO algorithm to achieve a satisfactory integration of the outputs. The PSO is described in the following subsection.

Kennedy and Eberhart54 introduced the PSO as a population-based optimization algorithm. This algorithm starts solving the problem with random solutions65. Each solution in this algorithm is known as a particle, where a swarm is composed of a multitude of particles. The particles change their position in the solution space by a specified velocity which is updated at each iteration. The particles position determines the solution found by the particle. When the position of the particle changes, a new solution is obtained. The following equations give the updating formulae for the velocity and position of a particle

$$v_{i} (t + 1) = omega v_{i} (t) + c_{1} r_{1} (x_{best,i} (t) - x_{i} (t)) + c_{2} r_{2} (x_{best,g} (t) - x_{i} (t)),$$

(19)

$$x_{i} (t + 1) = x_{i} (t) + v_{i} (t + 1),$$

(20)

where (x_{i}) and (v_{i}) are the position and velocity of particle (i), respectively, (t) is the iteration number, (omega) is the inertia coefficient, (c_{1}) and (c_{2}) are the self-learning and social-learning coefficient, respectively, (r_{1}) and (r_{2}) are two random numbers, (x_{best,i}) is the best solution found by the particle, and (x_{best,g}) is the global best solution. The values of the (x_{best,i}) and (x_{best,g}) are obtained by evaluating the objective function. In this study, the objective function is the negative of prediction accuracy by the PLCM method. The velocity and position of the particles are updated until the algorithm reaches the stopping criterion. The parameters used in Eq.(19) are determined based on the work by Poli et al.56, where (omega ,) (c_{1} ,) and (c_{2}) are set at 0.7298, 1.49618, and 1.49618, respectively.

The PSO is one of the most commonly used optimization algorithms in petroleum engineering57,58,59,60. Among different metaheuristic optimization algorithms, the PSO has shown a better performance compared to the most of other optimization algorithms, such as the genetic algorithm and simulated annealing. The PSO has shown the ability to reach better optimal solutions and faster convergence to similar results than its rivals in many applications61. Thereby, this algorithm is used in this study to optimize the coefficients of the PLCM method.

After describing the tools used in this study, it is necessary to define the evaluation metrics, which are required to evaluate the performance of the proposed method. These metrics include the quantitative and visual indicators that are described in the following subsection.

In this study, quantitative and visual evaluation metrics are used to assess the performance of the proposed method. These metrics include the accuracy, precision, recall, F1-score, confusion matrix, Receiver Operating Characteristic (ROC) curve, and precision-recall curve.

Accuracy is the total number of correct predictions divided by the total number of data points. In binary classification, accuracy is defined as the number of true positives (TP) divided by the number of samples (accuracy = frac{TP}{N}), where N is the total number of data points/samples.

Precision is the portion of positive predictions that are actual positives. Precision focuses on the accuracy of positive predictions. For a binary classification precision is defined as (Precision = frac{TP}{{TP + FP}}), where FP is the number of false positives, which means that the prediction by the model is positive, whereas the actual label of the sample is negative.

Recall gives the portion of the positive samples that are identified as positives. Recall focuses on how well the model captures positive instances. In other words, it is the ratio of true positives to all positive samples in the dataset defined as ({text{Re}} call = frac{TP}{{TP + FN}}), where FN is the number of false negative predictions defined as the samples which are incorrectly classified as negative.

The inverse of the harmonic average of the recall and precision multiplied by 2 is known as F1-Score. F1-Score is defined in Eq.(21).

$$F1{ - }Score = 2frac{PR}{{P + R}},$$

(21)

where P and R are the precision and recall, respectively. A good classifier should have high values of precision and recall, which indicates a high F1-Score.

In multi-class classification, as the problem in this study, each metric is calculated for individual classes and averaged across all classes to obtain a single value. In this manner, each time, one of the classes is considered positive, and other classes are assumed as negative.

In a multiclass problem, the confusion matrix is a (C times C) matrix, where the rows represent the actual class and the columns represent the predicted class of the samples. The values on the main diagonal of the matrix show the number of correct predictions (true positives), and off-diagonal values show the number of incorrect predictions (false positives). The sum of the values on the main diagonal of the matrix divided the total number of samples gives the accuracy, as described above. Also, the diagonal value for each class if divided by the sum of all values in each column gives the class-specific precision, and if divided by the sum of all values in each row gives the class-specific recall.

Excerpt from:
Application of power-law committee machine to combine five machine learning algorithms for enhanced oil recovery ... - Nature.com