Page 1,463«..1020..1,4621,4631,4641,465..1,4701,480..»

Podcast: A Sleep Scientist on New Technology and Sleep as a Social Justice Issue – InventUM | University of Miami Miller School of Medicine

Reading Time: 2 minutes

While the study of sleep is a relatively new field, researchers over the past two decades have revealed the powerful effects of sleep or the lack of it on overall health and well-being. But who are these slumber scientists, who research a realm that begins when their patients slip into a dream state?

Meet Azizi Seixas, Ph.D., associate professor of psychiatry and behavioral sciences at the University of Miami Miller School of Medicine. Dr. Seixas dedicates his career to advancing our knowledge of sleep to improve the health of communities, particularly those that are underserved. His motivation to specialize in sleep health was spurred by his own struggles and those he witnessed while growing up in a low-income inner city.

I know exactly how the lack of sleep can have a deleterious effect on your health, your livelihood and every other facet, said Dr. Seixas, who is also the interim chair of the Department of Informatics and Health Data Science and associate director of the Center for Translational Sleep and Circadian Sciences.

On the latest edition of Inside U Miami Medicine, Dr. Seixas shares his journey from growing up in Jamaica to becoming a faculty member at prestigious academic institutions in the U.S. Much of Dr. Seixas work focuses on disparities in health outcomes between ethnic and socioeconomic groups disparities that are correlated with differences in quantity and quality of sleep.

People believe that sleep is a luxury, he said. We believe sleep is a social justice issue.

Thats why he and his team are committed to translational research and creating solutions that improve the health of these communities. One such innovation is the MILBOX, a project that provides participants with a variety of in-home and wearable technology that acts as a remote health monitoring system.

Tune in to the episode to hear from Dr. Seixas about these inventions and more. Click here to listen on Spotify, or search Inside U Miami Medicine wherever you listen to podcasts.

View post:

Podcast: A Sleep Scientist on New Technology and Sleep as a Social Justice Issue - InventUM | University of Miami Miller School of Medicine

Read More..

Synthetic financial data: banking on it – FinTech Magazine

By banking on synthetic financial data, banks can tackle head-on the challenge highlighted by Gartner that, by 2030, 80% of heritage financial services firms will go out of business, become commoditised or exist only formally but without being able to compete effectively.

A pretty dire prophecy, but nonetheless realistic, with small neobanks and big tech companies eyeing their market. Survivalist banks and financial institutions need a strategy in which creating, using, and sharing synthetic financial data is a key component, says Tobias Hann, CEO of MOSTLY AI.

Banks and financial institutions are aware of their data and innovation gaps and AI-generated synthetic data is one area theyre investing in to gain a competitive edge. Synthetic financial data is generated by AI thats trained on real-world data. The resulting synthetic data looks, feels and means the same as the original. Its a perfect proxy for the original, since it contains the same insights and correlations, plus its completely privacy-secure.

Easy-to-deploy data science use cases in banking demonstrate clear value from the adoption of synthetic financial data, including advanced analytics, AI, and machine learning; data sharing; and software testing.

AI and machine learning unlock a range of business benefits for retail banks. These include advanced analytics which improve customer acquisition by optimising the marketing engine with hyper-personalised messages and precise next-best actions. Intelligence from the very first point of contact increases customer lifetime value. Since synthetic financial data is GDPR compliant, yet containing the intelligence of the original data, no customer consent is needed to harness its power, says Hann.

Synthetic financial data also enables the lowering of operating costs should decision-making in acquisition and servicing be supported with well-trained machine learning algorithms. In addition, underserved customer segments can get the credit they need by fixing embedded biases via data synthesisation; and it facilitates mass-market AI explainability, which is increasingly demanded by tech-savvy customers.

Open financial data is the ultimate form of data sharing. According to McKinsey, economies embracing financial data sharing could see GDP gains of 1-5% by 2030, with benefits flowing to consumers and financial institutions. More data means better operational performance, better AI models, more powerful analytics, and enhanced customer-centric digital banking products.

One of the most common data sharing use cases is connected to developing and testing digital banking apps and products. Banks accumulate tons of apps, continuously developing them, onboarding new systems, and adding new components. Manually generated test data for such complex systems is a hopeless task, and many revert to the risky use of production data for testing.

Generally, manual test data generation tools miss most of the business rules and edge cases that are vital for robust testing practices.

To put it simply, it's impossible to develop intelligent banking products without intelligent test data. The same goes for testing AI and machine learning models. Testing those models with synthetically simulated edge cases is extremely important to do when developing from scratch and when recalibrating models to avoid drifting.

Not all synthetic data generators are created equal. Its important to select the right synthetic data vendor who can match the financial institution's needs. If a synthetic data generator is inaccurate, the resulting synthetic datasets can lead your data science team astray. If it's too accurate, the generator overfits or learns the training data too well and could accidentally reproduce some of the original information from the training data.

Open-source options are also available. However, the control over quality is fairly low. Until a global standard for synthetic financial data is in place, it's important to proceed with caution when selecting vendors. Opt for synthetic data companies with extensive experience in dealing with sensitive financial data and know-how when it comes to integrating synthetic data successfully within existing infrastructures.

Our team at MOSTLY AI has seen large banks and financial organisations from up close. We know that synthetic financial data will be the data transformation tool that will change the financial data landscape forever, enabling the flow and agility necessary for creating competitive digital services, concludes Hann.

Go here to read the rest:

Synthetic financial data: banking on it - FinTech Magazine

Read More..

Faculty approve the creation of statistics major – The Middlebury Campus

The Mathematics Department, soon to be called the Department of Mathematics and Statistics, will now be home to a new statistics major. On Friday, April 7, Middlebury faculty voted to approve the new major with a vote of 68 to 26. The faculty discussed the proposal for over an hour before the vote.

Psychology Professor Mike Dash is on the Educational Affairs Committee (EAC) and has been working closely with the Mathematics Department on their proposal of the new major ahead of the faculty vote. The faculty vote was supposed to take place in February but was delayed until April to allow the EAC to write up a formal recommendation and the Mathematics Department to revise their proposal based on feedback they receive from the EAC.

Students should be able to graduate with a major in statistics starting next academic year, Dash said.

Alex Lyford is one of the mathematics professors who worked on the proposal.

Students interested in majoring in statistics or wanting to learn more about our statistics and data science offerings can reach out to any of our statisticians Professors Lyford, Tang, Malcolm-White or Peterson, Lyford said on behalf of the whole department.

We hope that this major and the courses within it allow students at Middlebury to explore how data, probabilistic thinking and mathematics can help us solve some of the world's most challenging and interesting problems, Lyford said.

flowchart of the courses required for the newly approved statistics major

Lily Jones 23 is an online editor and senior writer.

She previously served as a Senior News Writer and SGA Correspondent.

Jones is double majoring in Philosophy and Political Science. She also is an intern for the Rohatyn Center for Global Affairs and on the ultimate frisbee team.

View post:

Faculty approve the creation of statistics major - The Middlebury Campus

Read More..

Why semantics matter in the modern data stack – VentureBeat

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Most organizations are now well into re-platforming their enterprise data stacks to cloud-first architectures. The shift in data gravity to centralized cloud data platforms brings enormous potential. However, many organizations are still struggling to deliver value and demonstrate true business outcomes from their data and analytics investments.

The term modern data stack is commonly used to define the ecosystem of technologies surrounding cloud data platforms. To date, the concept of a semantic layer hasnt been formalized within this stack.

When applied correctly, a semantic layer forms a new center of knowledge gravity that maintains the business context and semantic meaning necessary for users to create value from enterprise data assets. Further, it becomes a hub for leveraging active and passive metadata to optimize the analytics experience, improve productivity and manage cloud costs.

Wikipedia describes the semantic layer as a business representation of data that lets users interact with data assets using business terms such as product, customer or revenue to offer a unified, consolidated view of data across the organization.

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

The term was coined in an age of on-premise data stores a time when business analytics infrastructure was costly and highly limited in functionality compared to todays offerings. While the semantic layers origins lie in the days of OLAP, the concept is even more relevant today.

While the term modern data stack is frequently used, there are many representations of what it means. In my opinion, Matt Bornstein, Jennifer Li and Martin Casado from Andreessen Horowitz (A16Z) offer the cleanest view in Emerging Architectures for Modern Data Infrastructure.

I will refer to this simplified diagram based on their work below:

This representation tracks the flow of data from left to right. Raw data from various sources move through ingestion and transport services into core data platforms that manage storage, query and processing and transformation prior to being consumed by users in a variety of analysis and output modalities. In addition to storage, data platforms offer SQL query engines and access to Artificial Intelligence (AI) and machine learning (ML) utilities.A set of shared services cuts across the entire data processing flow at the bottom of the diagram.

A semantic layer is implicit any time humans interact with data: It arises organically unless there is an intentional strategy implemented by data teams. Historically, semantic layers were implemented within analysis tools (BI platforms) or within a data warehouse. Both approaches have limitations.

BI-tool semantic layers are use case specific; multiple semantic layers tend to arise across different use cases leading to inconsistency and semantic confusion. Data warehouse-based approaches tend to be overly rigid and too complex for business users to work with directly; work groups will end up extracting data to local analytics environments again leading to multiple disconnected semantic layers.

I use the term universal semantic layer to describe a thin, logical layer sitting between the data platform and analysis and output services that abstract the complexity of raw data assets so that users can work with business-oriented metrics and analysis frameworks within their preferred analytics tools.

The challenge is how to assemble the minimum viable set of capabilities that gives data teams sufficient control and governance while delivering end-users more benefits than they could get by extracting data into localized tools.

The set of transformation services in the A16Z data stack includes metrics layer, data modeling, workflow management and entitlements and security services. When implemented, coordinated and orchestrated properly, these services form a universal semantic layer that delivers important capabilities, including:

Lets step through each transformation service with an eye toward how they must interact to serve as an effective semantic layer.

Data modeling is the creation of business-oriented, logical data models that are directly mapped to the physical data structures in the warehouse or lakehouse. Data modelers or analytics engineers focus on three important modeling activities:

Making data analytics-ready: Simplifying raw, normalized data into clear, mostly de-normalized data that is easier to work with.

Definition of analysis dimensions: Implementing standardized definitions of hierarchical dimensions that are used in business analysis that is, how an organization maps months to fiscal quarters to fiscal years.

Metrics design: Logical definition of key business metrics used in analytics products. Metrics can be simple definitions (how the business defines revenue or ship quantity). They can be calculations, like gross margin ([revenue-cost]/revenue). Or they can be time-relative (quarter-on-quarter change).

I like to refer to the output of semantic layer-related data modeling as a semantic model.

The metrics layer is the single source of metrics truth for all analytics use cases. Its primary function is maintaining a metrics store that can be accessed from the full range of analytics consumers and analytics tools (BI platforms, applications, reverse ETL, and data science tools).

The term headless BI describes a metrics layer service that supports user queries from a variety of BI tools. This is the fundamental capability for semantic layer success if users are unable to interact with a semantic layer directly using their preferred analytics tools, they will end up extracting data into their tool using SQL and recreating a localized semantic layer.

Additionally, metrics layers need to support four important services:

Metrics curation: Metrics stewards will move between data modeling and the metrics layer to curate the set of metrics provided for different analytics use cases.

Metrics change management: The metrics layer serves as an abstraction layer that shields the complexity of raw data from data consumers. As a metrics definition changes, existing reports or dashboards are preserved.

Metrics discoverability: Data product creators need to easily find and implement the proper metrics for their purpose. This becomes more important as the list of curated metrics grows to include a broader set of calculated or time-relative metrics.

Metrics serving: Metrics layers are queried directly from analytics and output tools. As end users request metrics from a dashboard, the metrics layer needs to serve the request fast enough to provide a positive analytics user experience.

Transformation of raw data into an analytics-ready state can be based on physical materialized transforms, virtual views based on SQL or some combination of those. Workflow management is the orchestration and automation of physical and logical transforms that support the semantic layer function and directly impact the cost and performance of analytics.

Performance: Analytics consumers have a very low tolerance for query latency. A semantic layer cannot introduce a query performance penalty; otherwise, clever end users will again go down the data extract route and create alternative semantic layers. Effective performance management workflows automate and orchestrate physical materializations (creation of aggregate tables) as well as decide what and when to materialize. This functionality needs to be dynamic and adaptive based on user query behavior, query runtimes and other active metadata.

Cost: The primary cost tradeoff for performance is related to cloud resource consumption. Physical transformations executed in the data platform (ELT transforms) consume compute cycles and cost money. End user queries do the same. The decisions made on what to materialize and what to virtualize directly impact cloud costs for analytics programs.

Analytics performance-cost tradeoff becomes an interesting optimization problem that needs to be managed for each data product and use case. This is the job of workflow management services.

Transformation-related entitlements and security services relate to the active application of data governance policies to analytics. Beyond cataloging data governance policies, the modern data stack must enforce policies at query time, as metrics are accessed by different users. Many different types of entitlements may be managed and enforced alongside (or embedded in) a semantic layer.

Access control: Proper access control services ensure all users can get access to all of the data they are entitled to see.

Model and metrics consistency: Maintaining semantic layer integrity requires some level of centralized governance of how metrics are defined, shared and used.

Performance and resource consumption: As discussed above, there are constant tradeoffs being made on performance and resource consumption. User entitlements and use case priority may also factor into the optimization.

Real time enforcement of governance policies is critical for maintaining semantic layer integrity.

Layers in the modern data stack must seamlessly integrate with other surrounding layers. The semantic layer requires deep integration with its data fabric neighbors most importantly, the query and processing services in the data platform and analysis and output tools.

A universal semantic layer should not persist data outside of the data platform. A coordinated set of semantic layer services needs to integrate with the data platform in a few important ways:

Query engine orchestration: The semantic layer dynamically translates incoming queries from consumers (using the metrics layer logical constructs) to platform-specific SQL (rewritten to reflect the logical to physical mapping defined in the semantic model).

Transform orchestration: Managing performance and cost requires the capability to materialize certain views into physical tables. This means the semantic layer must be able to orchestrate transformations in the data platform.

AI/ML integration: While many data science activities leverage specialized tools and services accessing raw data assets directly, a formalized semantic layer creates the opportunity to provide business vetted features from the metrics layer to data scientists and AI/ML pipelines.

Tight data platform integration ensures that the semantic layer stays thin and can operate without persisting data locally or in a separate cluster.

A successful semantic layer, including a headless BI approach to implementing the metrics layer, must be able to support a variety of inbound query protocols including SQL (Tableau), MDX (Microsoft Excel), DAX (Microsoft Power BI), Python (data science tools), and RESTful interfaces (for application developers) using standard protocols such as ODBC, JDBC, HTTP(s) and XMLA.

Leading organizations incorporate data science and enterprise AI into everyday decision-making in the form of augmented analytics. A semantic layer can be helpful in successfully implementing augmented analytics. For example:

The A16Z model implies that organizations could assemble a fabric of home-grown or single-purpose vendor offerings to build a semantic layer. While certainly possible, success will be determined by how well-integrated individual services are. As noted, even if a single service or integration fails to deliver on user needs, localized semantic layers are inevitable.

Furthermore, it is important to consider how vital business knowledge gets sprinkled across data fabrics in the form of metadata. The semantic layer has the advantage of seeing a large portion of active and passive metadata created for analytics use cases. This creates an opportunity for forward-thinking organizations to better manage this knowledge gravity and better leverage metadata for improving the analytics experience and driving incremental business value.

While the semantic layer is still emerging as a technology category, it will clearly play an important role in the evolution of the modern data stack.

This article is a summary of my current research around semantic layers within the modern, cloud-first data stack. Ill be presenting my full findings at the upcoming virtual Semantic Layer Summit on April 26, 2023.

David P. Mariani is CTO and cofounder of AtScale, Inc.

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even considercontributing an articleof your own!

Read More From DataDecisionMakers

View original post here:

Why semantics matter in the modern data stack - VentureBeat

Read More..

Rao honored with Long Service Award // Mizzou Engineering – University of Missouri College of Engineering

April 11, 2023

Praveen Rao has been honored with a Long Service Award from PLOS One, a peer-reviewed open access scientific journal. Rao, an associate professor of electrical engineering and computer science, has been an academic editor of the journals editorial board for more than five years.

Rao directs the Scalable Data Science (SDS) Lab at Mizzou, where his research focuses on big data management, data science, health informatics and cybersecurity. He is also director of graduate studies for the PhD in Informatics program.

Last month, he was tapped to be an associate editor for a newly approved journal under the umbrella of the Association for Computing Machinery (ACM). ACM Transactions on Probabilistic Machine Learning will publish work around probabilistic methods that learn from data to improve performance on decision making or prediction tasks under uncertainty. Rao has been a senior member of ACM since 2020.

Visit link:

Rao honored with Long Service Award // Mizzou Engineering - University of Missouri College of Engineering

Read More..

Neural network based integration of assays to assess pathogenic … – Nature.com

A vector representation of the SBRL assays that preserves species discrimination

The CDC SBRL dataset contains more than 30 different assays that include tests to determine substrate utilization and catalytic activities. Prior to the advent of DNA sequencing, these phenotypic assays were the only method available for bacterial species identification among bacteria that had similar gram staining and colony morphology. The dataset was narrowed down to focus on eight assays that had measurements listed in them for at least 80% of the strains (Table 1).

To determine if these eight assays can differentiate between various types of bacteria, a Uniform Manifold Approximation and Projection (UMAP) dimension reduction was performed to visualize the dataset (Fig.2A). Every point in the plot was a bacterial strain. The clusters that were formed based on the results from the selected eight assays belonged to bacteria with the same species names, suggesting the machine-learning approach to use the SBRL results to aggregate similar bacteria together can recapitulate the observations of human microbiologists that were made over the course of decades. The subset of assays that the computer scientists used maintained discriminative power across species.

Exploratory data analysis discovered that the SBRL dataset discriminate between different bacterial species. (A) 2D UMAP was performed on the SBRL assays followed by k-means clustering to provide the bacterial samples cluster labels. Every point in the plot is a bacterial sample. The points form groups in the UMAP, suggesting that the SBRL assays can aggregate similar bacteria together. The colors in the figure are the k-mean labels. (B) The neural network model pushes the samples from the same bacteria species closer together. An example output of two species, Vibrio parahaemolyticus and Yersinia enterocolitica, are shown in the UMAP before and after training to show clusters are refined by the model. We quantified how well the samples from the same species are clustered together before and after the training and found the normalized mutual information went from 0.65 to 0.74.

The next challenge was to develop a vector representation for the assays that would be useful to downstream machine learning models. Two solutions were investigated to address this limitation, both of which integrated the data based on species identification. The first method computed the percent of species that have a positive signal from the assay, henceforth referred to as pps (percent positive signal). PPS was considered as a positive control, as it enhanced the pathogenicity assays with the SBRL dataset but did so without the use of machine learning. The second method used a neural network embedding model (NNEM) to create bacterial species vectors using the data from the biochemical assays, henceforth referred to as vectorization. Given we only used data from eight assays and wanted to remain comparable to the PPS, we did not choose to change the dimensionality from eight. The model simply transformed the representation of the eight assays into an eight dimensional vector per species. This process involved as input the various bacterial strains and their biochemical characteristics into NNEM, then asking the model to predict the species name for each strain based on the assay. As Fig.2A showed, this should be possible by the model. The architecture of the neural network model is shown in Supplementary Fig. 3. As the model was trained to predict the species name for each strain, it created distinct vectors for each species and these new distinct vectors represented the species for downstream analyses. This learned vector representation of the SBRL biochemical assays was then integrated into our pathogenic models at the species level. In a sense, this approach combined very old data with very new algorithms to enhance the predictive power of machine learning models trained to predict pathogenic potential. We observed that after the NNEM training, the Vibrio parahaemolyticus strains and Yersinia enterocolitica strains from the initial panel of 40 formed tighter clusters (Fig.2B). We quantified how much the NNEM helped the strains that belong to the same species cluster together and found an improvement in the normalized mutual information7, a metric used to measure how well groups cluster, from 0.65 to 0.74. It should be noted that we do not claim that the NNEM can distinguish between strains perfectly, as can be seen from the normalized mutual information scores. Namely, if it was perfect, NMI=1. We instead used the vectorization to provide a species prior for our machine learning models trained only on pathogenicity assays to benefit from the additional context.

Previously, the PathEngine platform2 was developed to evaluate results of four phenotypic assays that measure pathogenic potential of a blinded set of 40 bacterial strains. These four pathogenicity assays would reasonably be expected to associated with bacterial pathogenicity due to known biological mechanisms8,9. The host immune activation assay detected activation of NF-B signal Jurkat T lymphocytes to capture presence of pathogen-associated molecular patterns (PAMPs)10,11. The AMR assay was used to discover antibiotic resistance, providing an indication whether any instance of infection could be efficiently treated12. The host adherence assay measured the ability of bacteria binding to host cells, a crucial step for pathogens to establish an infection13,14. Lastly, the host toxicity assay detected host cell death induced by the bacteria to measure the cytotoxicity of these strains15,16. The data produced by the assays were used to train ML models to predict a strains pathogenic potential from these properties. Traditionally, an expert would review the data and make a pathogenic call based on their interpretation of the data. Here, the model learns the features from each assay and then combines those features into an ensemble model that makes a pathogenic call. The model from each assay as well as the ensemble is compared to the friend or foe designation provided by NIST. Details can be found in our prior work2. The CDC SBRL dataset contains some of the same species as the bacteria used for PathEngine analysis. It was therefore hypothesized that by integrating the SBRL data with the results of the four pathogenicity phenotypic assay data, the models would have more context about each species and achieve better performance.

However, the SBRL data was not easily integrated with the results of the pathogenicity assays, since none of the actual strains tested for pathogenic potential were present in the SBRL dataset. The two representations described in the previous section were then integrated at the species level, rather than actual strains. In other words, every strain was supplemented with SBRL data that was represented through pps or vectorization. Having established two ways to integrate the SBRL biochemical data with results from the pathogenicity assays, we then performed three tests to evaluate if, and how much, the integration of the SBRL biochemical data impacted the ML results. A total of 22 bacterial strains that belong to 14 unique species were enriched with the SBRL data based on the species names. Note that we had 40 strains to use without integrating with the SBRL data but only 22 left after the integration as the remaining species were not in the SBRL dataset (Supplementary Table 1). With many fewer strains for training and testing, the accuracy of the ML models to predict pathogenic potential was expected to be lower than we had in the original PathEngine paper2, as smaller dataset sizes are generally understood to result in lower performance for this sort of model. For each assay, we tested a model with 10 cross validation that used either (1) the pathogenicity assays only, (2) the pathogenicity assay combined with the pps or, (3) the pathogenicity assay combined with the vector representation created by the NNEM. These models were used to test how well the PathEngine predictions matched the pathogenicity designations provided by NIST. We used balanced accuracy as the metric to ensure that the performance was not biased towards the majority class and henceforth refer to this metric as accuracy. The possibility that the observed prediction improvement was due entirely to the removal of less well-understood bacterial strains from the analysis was precluded by the fact that a control condition of prediction from assay without SBRL vectors, as well as with SBRL pps. Any and all improvement can thus be attributed to the vector representation we developed.

For the immune activation assay, adding the pps increased the ML accuracy up to 24% (Fig.3A,B). When the vector representation were used instead of the average values, the accuracy improved from 51 to 85% (Fig.3A,C).

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the immune activation assay up to 34%. (A) Ten-fold cross validation of an ML model with an A. immune activation assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 51%, 75%, to 85%, balanced accuracy respectively.

For the AMR assay, pps increased the accuracy by 2% (Fig.4A,B) and the vectors improved the accuracy by 8% (Fig.4C). For the adherence assay, pps increased the accuracy by 2% (Fig.5A,B) and the vectors improved the accuracy by 7% (Fig.5C). The toxicity assay is the only exception where the performance decreased when the SBRL representations were included (Supplementary Fig. 1AC).

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the AMR assay up to 8%. (A) Ten-fold cross validation of an ML model with an A. AMR assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 61%, 63%, to 69%, balanced accuracy respectively.

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the adherence assay up to 7%. (A) Ten-fold cross validation of an ML model with an A. adherence assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 58%, 60%, to 65%, balanced accuracy respectively.

In order to investigate the cause for decrease in performance of the toxicity assay predictions, all the predictions were grouped into four prediction classes ( predicted as , predicted as +, + predicted as and + predicted as +). Namely, each bacterial observation was classified as either non-pathogenic () or pathogenic (+). DAPI signals from the toxicity assay showed that the host cell death induced by the bacteria can be distinguishable between different prediction classes (Supplementary Fig. 2A). After integrating the DAPI signal and the SBRL assays, we observed that the signals were masked by the presence of the SBRL assays and stayed flat throughout the time course. The assays were not as distinct between different classes as before (Supplementary Fig. 2B). Similar observations were seen when the SBRL vectors were incorporated (Supplementary Fig. 2C).

As each assay reveals different aspects of bacterial pathogenicity8,9, we combined predictions from the best performing model from each of the four assays to make a final threat assessment call. Using the models trained without using the SBRL vectors for the ensemble, we achieved accuracy of 70%, precision of 86%, recall of 73% and F1 of 79%. When the SBRL vectors were included, the ensemble performance achieved accuracy of 79%, precision of 90%, recall of 82% and F1 of 86% (Table 2). These results confirmed that adding the SBRL data provided useful context about the bacterial species for the ML models and thus improved the pathogenicity predictions.

To understand which SBRL assays were useful for the model predictions, we annotated each assay based on literature review and also quantified the assay importance by data-driven approaches. The assays are listed and annotated with their relevance for threat assessment in Table 1. For the data-driven approaches, we first examined the signals for the four prediction classes. If the as (non-pathogens predicted to be non-pathogens) and + as + (pathogens predicted to be pathogens) had dramatically different signals, it suggested that the assay is likely useful for threat assessment.

Consistent with the literature designation, MacConkey Agar (MacC) and Salmonella Shigella (SS) Agar are most relevant for threat assessment as they have the most pronounced difference between the as and + as + classes (Fig.6A). This is consistent with established microbiological understanding. Specifically, growth on MacConkey agar and SS agar are highly associated with pathogenicity, because most Enterics will grow on these agars. These assays are what have always been used to separate coliform bacteria from other similar bacteria. The re-discovery of these markers by computer scientists with no training in microbiology is a testament to the usefulness of a data-driven approach. It gives us confidence that heretofore unrecognized markers of pathogenicity will be similarly detectable. Supplementary Table 1 lists all the species used in this assay. Details of these strains and associated tags have been described previously2. The rest of the assays used were not as distinguishable as MacC and SS between the as and + as + classes but did show noticeable differences to be considered as assays useful for threat assessment as supported by the literature (Table 1). To quantify the importance, we performed drop-assay tests where we dropped one assay at a time and compared the change in the model performance to the baseline where no assay was dropped. The change in the performance quantified the importance of the assay. We found the majority of the assays have positive importance for predicting pathogenicity with the exception of lead acetate paper (TSI:H2S=paper) and oxidase tests (O) (Fig.6B).

Comparison of threat designations of the SBRL assays based on literature and the contribution determined by the models. (A) Data-driven qualitative assessment of threat relevance of the SBRL assays based on ML predictions. Non-pathogenic strains annotated as and pathogenic strains as +. The predictions belong to 4 groups: predicted to be , predicted to be +, + predicted to be and + predicted to be +. SS, MacC are the most useful assays as their predicted to be and + predicted to be + groups are differentiable. (B) The quantitative measurement of the assay contribution by determining the changes in performance when each assay is dropped one by one. If an assay is dropped and the accuracy decreases, the assay gets a positive importance score and vice versa.

Read this article:

Neural network based integration of assays to assess pathogenic ... - Nature.com

Read More..

Predictive Analytics Helps Everyone in the Enterprise | NASSCOM … – NASSCOM Community

Gartner has predicted that, predictive and prescriptive analytics will attract 40% of net new enterprise investment in the overall business intelligence and analytics market. Why the focus on predictive analytics? Its simple! Investment in predictive analytics benefits everyone in the organization, including business users and team members, data scientists and the organization in general.

When an enterprise selects an assisted predictive modeling solution, it can satisfy the needs of business users, IT, and data scientists and achieve impressive results for the organization.

In this article, we review some of the many benefits of predictive analytics:

General Benefits

Bringing together data from across the enterprise to use for analytics ensures that your organization is considering all available information when it makes a decision. By analyzing historical data and using it to test theories and to hypothesize, the business can determine the best alternative and better understand the outcome before it decides on a direction, thereby avoiding missteps. Predictive analytics provides support for data-driven, fact-based decisions and enables insight, perspective and clarity for improved business agility and efficiency. Decisions are made on a more timely basis, problem solving is easier and the business can avoid re-work and damaging missteps in the market.

Business Users/Citizen Data Scientists

Assisted Predictive Modeling with Augmented Data Science and Machine Learning allows business users without data science or analytical skills to apply predictive analytics to any use case using forecasting, regression, clustering and other techniques with auto-recommendations and guidance to suggest appropriate analytical techniques. With self-serve predictive analytics tools, business users can leverage sophisticated predictive techniques with auto-recommendations to choose the right kind of predictive algorithm or technique for the best results. Team members can bridge the gap of data science skills so they dont have to wait for IT or data scientists to help them produce a report or perform analytics. Instead, they can use assisted predictive modeling to improve business agility and align processes, activities and tasks with business objectives and goals.

Data Scientists

Data Scientists spend much of their day addressing requests from management and business users instead of focusing on strategic initiatives where data analytics must be 100% accurate to ensure appropriate strategy. With sophisticated predictive analytics tools that are founded on the latest, most effective algorithms and analytical techniques, data scientists can spend less time coding and creating queries and pulling data together manually, less time slogging through complex systems and solutions to achieve their goals. The ability to combine data science skills with simple, easy-to-use tools and sophisticated features and functionality will make your data scientists and business analysts more productive and effective. Data Scientists can create and re-purpose analytical models and focus on strategic initiatives

These are just a few of the many benefits of predictive analytics. When an enterprise selects an assisted predictive modeling solution, it can satisfy the needs of business users, IT, and data scientists and achieve impressive results for the organization.

Investment in predictive analytics benefits everyone in the organization, including business users and team members, data scientists and the organization in general.

Originally posted here:

Predictive Analytics Helps Everyone in the Enterprise | NASSCOM ... - NASSCOM Community

Read More..

Spring Tack Faculty Lecture to challenge our notions of AI – William & Mary

Should we be frightened or excited by the rise of artificial intelligence? According to Dan Runfola, associate professor of applied science and data science at William & Mary, the answer is both.

Runfola will address the AI revolution in the spring 2023Tack Faculty Lecture, Everything Is AI-awesome: Rise of the Machines, on May 2 at 7 p.m.in theSadler CentersCommonwealth Auditorium. The event is free and open to the public with a reception to follow, andattendees are asked to RSVP.

At William & Mary, we have this wonderful nexus of individuals who care not only about the models, but also about how we are going to use these algorithms in practice, said Runfola, whose research takes place at the intersection of deep learning and satellite imagery analysis.

Runfola compared the age of AI to the Industrial Revolution, with its potential to disrupt almost every job including his own.

The number of jobs at which humans are better than AI is going to go down, which leads us to a more fundamental question: How do we handle reallocation of wealth in a society where entire sectors that used to be dominated by human labor no longer require human attention?

Because research in this field is frequently in commercial settings, commercial applications have been a key driver of innovation, explained Runfola, who iscurrently the principal investigator of the Geospatial Evaluation and Observation Lab.Here at William & Mary, were seeing students consider implications well outside of commercial opportunities. Rather than asking how do we make models more accurate, they are focusing on how to ensure that a particular modeling strategy will not result in some populations being left behind. This raises fundamentally different questions about data collection and modeling.

The group of individuals controlling the creation of algorithms is also relatively small. As reported by Runfola, over the past 20 years much of AI research has happened within the private sector.

So, what comes next? Runfola is optimistic about a future in which AI can be a copilot for our everyday lives, taking on an increasingly broad spectrum of tasks.

Today, with the right prompts you can ask a generative algorithm to write a poem; tomorrow, an AI might decide to read you a poem it wrote because you were looking glum, said Runfola, with little doubt that such can be created in the near term. That could be a beautiful thing.But on the other side if we dont put safeguards, these same technologies could be used in detrimental ways: I see youre looking a little tired today, Dan would you like me to order you a bottle of wine from our online store?

But is it all doom and gloom? Some may find it refreshing that Runfola also mentioned AIs potential to do incredibly helpful things for our society, making us happier and more productive. His lecture promises to challenge the audiences notions of AI and provide a glimpse (with live examples) into what the future will look like.

It is up to us to determine whether the implications of these changes will lead to a more vibrant world, or to one in which power is consolidated in an exceptionally small number of hands, concluded Runfola.

The Tack Faculty Lecture Series is made possible through a generous commitment by Martha 78 and Carl Tack 78. Initially launched in 2012, the Tacks commitment has created an endowment for the series of speakers from the W&M faculty.

Editors note:Datais one of four cornerstone initiatives in W&Ms Vision 2026 strategic plan. Visit theVision 2026 websiteto learn more.

Antonella Di Marzio, Senior Research Writer

View post:

Spring Tack Faculty Lecture to challenge our notions of AI - William & Mary

Read More..

CNN vs. GAN: How are they different? – TechTarget

Convolutional neural networks (CNNs) and generative adversarial networks (GANs) are examples of neural networks -- a type of deep learning algorithm modeled after how the human brain works.

CNNs, one of the oldest and most popular of the deep learning models, were introduced in the 1980s and are often used in visual recognition tasks.

GANs are relatively newer. Introduced in 2014, GANs were one of the first deep learning models used for generative AI.

CNNs are sometimes used within GANs to generate and discern visual and audio content.

"GANs are essentially pairs of CNNs hooked together in an 'adversarial' way, so the difference is one of approach to output or insight creation, albeit there exists an inherent underlying similarity," said John Blankenbaker, principal data scientist at SSA & Company, a global management consulting firm. "How they answer a given question, however, is slightly different."

For example, CNNs might try to determine if a picture contains a cat -- a recognition task -- while GANs will try to make a picture of a cat, a generation task. In both cases, the networks are building up a representation of what makes a picture of a cat distinctive.

Let's look deeper into CNNs and GANs.

History. French computer scientist Yann LeCun, a professor at New York University and chief AI scientist at Meta, invented CNNs in the 1980s when he was a researcher at the University of Toronto. His aim was to improve the tools for recognizing handwritten digits by using neural networks. Although his work on optical character recognition was seminal, it stalled due to limited training data sets and computing power.

Interest in the technique exploded after 2010, following the introduction of ImageNet -- a large, labeled database of images -- and the launch of its annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC). One of the most promising entries in the inaugural year of the competition was the AlexNet model based on CNNs, which was optimized for GPUs. Its success demonstrated that CNNs could efficiently scale to achieve good performance on even the largest image databases.

How they work. "CNNs are designed to use data with spatial structure such as images or video," said Donncha Carroll, a partner at Lotis Blue Consulting who leads the firm's Data Science Center of Excellence.

The convolutional neural network is composed of filters that move across the data and produce an output at every position. For example, a convolutional neural network designed to recognize animals in an image would activate when it recognizes legs, a body or a head.

It's also important to note that CNNs are designed to recognize the lines, edges and textures in patterns near each other, said Blankenbaker. "The 'C' in CNNs stands for convolutional, which means that we are processing something where the idea of neighborhood is important -- such as, for example, pixels around a given pixel or signal values slightly before and after a given moment."

History. GANs were invented by American computer scientist Ian Goodfellow, currently a research scientist at DeepMind, when he was working at Google Brain from 2014 to 2016.

GANs, as noted, are a type of deep learning model used to generate images of numbers and realistic-looking faces. The field exploded once researchers discovered it could be applied to synthesizing voices, drugs and other types of images. GANs and their variations were heralded by CNN inventor LeCun as the most interesting idea of the last 10 years in machine learning.

How they work. The term adversarial comes from the two competing networks creating and discerning content -- a generator network and a discriminator network. For example, in an image-generation use case, the generator network creates new images that look like faces. In contrast, the discriminator network tries to tell the difference between authentic and generated images. The discriminator performance data then helps to train the overall system.

One important distinction between CNNs and GANs, Carroll said, is that the generator in GANs reverses the convolution process. "Convolution extracts features from images, while deconvolution expands images from features."

Here is a rundown of the chief differences between CNNs and GANs and their respective use cases.

Although GANs are getting a lot of the attention lately, CNNs continue to be used under the hood -- that is, within GANs for generating and discerning authenticity. Indeed, Pierre Custeau, CTO of ToolsGroup, a supply chain planning and optimization firm, considers the two neural networks to be complementary in terms of function. "Since CNNs are so effective at image processing, both the generator and discriminator networks are by default CNNs," he said.

It is important to note that CNNs and GANs only tend to be combined in one way, said Matthew Mead, CTO at IT consultancy SPR.

"GANs typically work with image data and can use CNNs as the discriminator. But this doesn't work the other way around, meaning a CNN cannot use a GAN," Mead said.

One of the biggest challenges is always the data quality itself for training the models, especially when we're talking about business-specific solutions instead something as generic as a cat. John BlankenbakerPrincipal data scientist, SSA & Company

Early GANs generated relatively simple, low-resolution faces. One of the reasons interest in GANs has grown is the dramatic decline in cost per unit of compute, which has enabled teams to build more complex neural networks, Carroll pointed out. Advancements in hardware, software and neural network design have also fueled the growth of other generative AI models like transformers, variational autoencoders and diffusion.

Blankenbaker cautions against getting caught up in the latest model rather than focusing on specific goals and the underlying data. "We see too many companies getting excited about the buzzwords and trying to fit a square peg into a round hole, resulting in overspending on overkill solutions," Blakenbaker said.

"One of the biggest challenges is always the data quality itself for training the models, especially when we're talking about business-specific solutions instead something as generic as a cat," he said.

View post:

CNN vs. GAN: How are they different? - TechTarget

Read More..

Building a Precise Assistive-Feeding Robot That Can Handle Any … – Stanford HAI

Eating a meal involves multiple precise movements to bring food from plate to mouth.

We grasp a fork or spoon to skewer or scoop up a variety of differently shaped and textured food items without breaking them apart or pushing them off our plate. We then carry the food toward us without letting it drop, insert it into our mouths at a comfortable angle, bite it, and gently withdraw the utensil with sufficient force to leave the food behind. And we repeat that series of actions until our plates are clear three times a day.

For people with spinal cord injuries or other types of motor impairments, performing this series of movements without assistance can be nigh on impossible, meaning they must rely on caregivers to feed them. This reduces individuals autonomy while also contributing to caregiver burnout, says Jennifer Grannen, graduate student in computer science at Stanford University.

One alternative: robots that can help people with disabilities feed themselves. Although there are already robotic feeding devices on the market, they typically make pre-programmed movements, must be precisely set up for each person and each meal, and bring the food to a position in front of a persons mouth rather than into it, which can pose problems for people with very limited movement, Grannen says.

A research team in Dorsa Sadighs ILIAD lab, including Grannen and fellow computer science students Priya Sundaresan,Suneel Belkhale, Yilin Wu, and Lorenzo Shaikewitz hopes to make robot-assistive feeding more comfortable for everyone involved. The team has now developed several novel robotic algorithms for autonomously and comfortably accomplishing each step of the feeding process for a variety of food types. One algorithm combines computer vision and haptics to evaluate the angle and speed at which to insert a fork into a food item; another uses a second robotic arm to push food onto a spoon; and a third delivers food into a persons mouth in a way that feels natural and comfortable.

The hope is that by making progress in this domain, people who rely on caregiver assistance can eventually have a more independent lifestyle, Sundaresan says.

Food items come in a range of shapes and sizes. They also vary in their fragility or robustness. Some (such as tofu) break into pieces when skewered too firmly; others that are harder (such as raw carrots) require a firm skewering motion.

To successfully pick up diverse items, the team fitted a robot arm with a camera to provide visual feedback and a force sensor to provide haptic feedback. In the training phase, they offered the robot a variety of fare including foods that look the same but have differing levels of fragility (e.g., raw versus cooked butternut squash) and foods that feel soft to the touch but are unexpectedly firm when skewered (e.g., raw broccoli).

To maximize successful pickups with minimal breakage, the visual system first homes in on a food item and brings the fork in contact with it at an appropriate angle using a method derived from prior research. Next, the fork gently probes the food to determine (using the force sensor) if it is fragile or robust. At the same time, the camera provides visual feedback about how the food responds to the probe. Having made its determination of fragility/robustness using both visual and tactile cues, the robot chooses between and instantaneously acts on one of two skewering strategies: a faster more vertical movement for robust items, and a gentler, angled motion for fragile items.

The work is the first to combine vision and haptics to skewer a variety of foods and to do so in one continuous interaction, Sundaresan says. In experiments, the system outperformed approaches that dont use haptics, and also successfully retrieved ambiguous foods like raw broccoli and both raw and cooked butternut squash. The system relies on vision if the haptics are ambiguous, and haptics if the visuals are ambiguous, Sundaresan says. Nevertheless, some items evaded the robots fork. Thin items like snow peas or salad leaves are super difficult, she says.

She appreciates the way the robot pokes its food just as people do. Humans also get both visual and tactile feedback and then use that to inform how to insert a fork, she says. In that sense, this work marks one step toward designing assistive-feeding robots that can behave in ways that feel familiar and comfortable to use.

Existing approaches to assistive feeding often require changing utensils to deal with different types of food. You want a system that can acquire a lot of different foods with a single spoon rather than swapping out what tool youre using, Grannen says. But some foods, like peas, roll away from a spoon while others, like jello or tofu, break apart.

Grannen and her colleagues realized that people know how to solve this problem: They use a second arm holding a fork or other tool to push their peas onto a spoon. So, the team set up a bimanual robot with a spoon in one hand and a curved pusher in the other. And they trained it to pick up a variety of foods.

As the two utensils move toward each other on either side of a food item, a computer vision system classifies the item as robust or fragile and learns to notice when the item is close to breaking. At that point, the utensils stop moving toward one another and start scooping upward, with the pusher following and rotating toward the spoon to keep the food in place.

This is the first work to use two robotic arms for food acquisition, Grannen says. Shes also interested in exploring other bimanual feeding tasks such as cutting meat, which involves not only planning how to cut a large piece of food but also how to hold it in place while doing a sawing motion. Soup, too, is an interesting challenge, she says. How do you keep the spoon from spilling, and how do you tilt the bowl to retrieve the last few drops?

Once food is on a fork or spoon, the robot arm needs to deliver it to a persons mouth in a way that feels natural and comfortable, Belkhale says. Until now, most robots simply brought food to just in front of a persons mouth, requiring them to lean forward or crane their necks to retrieve the food from the spoon or fork. But thats a difficult movement for people who are completely immobile from the neck down or for people with other types of mobility challenges, he says.

To solve that problem, the Stanford team developed an integrated robotic system that brings food all the way into a persons mouth, stops just after the food enters the mouth, senses when the person takes a bite, and then removes the utensil.

The system includes a novel piece of hardware that functions like a wrist joint, making the robots movements more human-like and comfortable for people, Belkhale says. In addition, it relies on computer vision to detect food on the utensil; to identify key facial features as the food approaches the mouth; and to recognize when the food has gone past the plane of the face and into the mouth.

The system also uses a force sensor that has been designed to make sure the entire process is comfortable for the person being fed. Initially, as the food comes toward the mouth, the force sensor is very reactive to ensure that the robot arm will stop moving when the utensil contacts a persons lips or tongue. Next, the sensor registers the person taking a bite, which serves as a signal for the robot to begin withdrawing the utensil, at which point the force sensor needs to be less reactive so that the robot arm will exert sufficient force to leave the food in the mouth as the utensil retreats. This integrated system can switch between different controllers and different levels of reactivity for each step, Belkhale says.

Algorithms combine haptics and computer vision to evaluate how to insert a fork into a person's mouth naturally and comfortably.

Theres plenty more work to do before an ideal assistive-feeding robot will be deployed in the wild, the researchers say. For example, robots need to do a better job of picking up what Sundaresan calls adversarial food groups, such as very fragile or very thin items. Theres also the challenge of cutting large items into bite-sized pieces, or picking up finger foods. Then theres the question of whats the best way for people to communicate with the robot about what food they want next. For example, should the users say what they need next, should the robot learn the humans preferences and intents over time, or should there be some form of shared autonomy?

A bigger question: Will all of the food acquisition and bite transfer steps eventually occur together in one system? Right now, were still at the stage where we work on each of these steps independently, Belkhale says. But eventually, the goal would be to start fitting them together.

Read more:

Stanford HAIs mission is to advance AI research, education, policy and practice to improve the human condition.Learn more.

Go here to see the original:

Building a Precise Assistive-Feeding Robot That Can Handle Any ... - Stanford HAI

Read More..