Category Archives: Data Science

Five Key Trends in AI and Data Science for 2024 From MIT Sloan Management Review – Yahoo Finance

CAMBRIDGE, Mass., Jan. 9, 2024 /PRNewswire/ -- MIT Sloan Management Reviewreveals the insights of more than 500 senior data and technology executives in"Five Key Trends in AI and Data Science for 2024," a part of its AI in Actionseries.

Artificial intelligence and data science became front-page news in 2023 thanks to generative AI, state coauthorsThomas H. Davenport, the President's Distinguished Professor of Information Technology and Management at Babson College and a fellow of the MIT Initiative on the Digital Economy, and Randy Bean, an industry thought leader who currently serves as innovation fellow, data strategy, for global consultancy Wavestone.

To find out what might keep it on the front page in 2024, Davenport and Bean, during the past several months, conducted three surveys involving more than 500 executives closest to companies' data science and AI strategies to bring to light what organizations are thinking and doing.

"Data science is increasingly critical to every organization. But it's not a static discipline, and organizations need to continually adjust data science skills and processes to get the full value from data, analytics, and AI," said Davenport.

"Expect 2024 to be a year of transformation and change driven by adoption of AI and a reshaping of the data, analytics, and AI leadership role within leading companies," added Bean. "With 33% of midsize to large organizations having appointed or in search of a chief AI officer, and with 83.2% of leading companies having a chief data and analytics officer in place today, it is inevitable that we will witness consolidation of roles, restructuring of responsibilities, elimination of some positions, and some critical rethinking of data and AI leadership expectations during the course of 2024."

"Five Key Trends in AI and Data Science for 2024"culls the surveys to identify developing issues that should be on every leader's radar screen this year:

Story continues

Generative AI sparkles but needs to deliver value. Survey responses suggest that although excitement is high, the value of generative AI has not been delivered. Large percentages of respondents believe the technology has the potential to be transformational; 80% in one survey said they believe it will transform their organizations, and 64% in another survey said it is the most transformational technology in a generation. A large majority of survey takers are also increasing investment in the technology.

Data science is shifting from artisanal to industrial. Companies are investing in platforms, processes and methodologies, feature stores, machine learning operations (MLOps) systems, and other tools to increase productivity and deployment rates. Automation is helping to increase productivity and enable broader data science participation.

Two versions of data products will dominate. Eighty percent of data and technology leaders in one survey said that their organizations were using or considering the use of data products and product management. But they mean two different things by "data products." Just under half (48%) of respondents said that they include analytics and AI capabilities in the concept of data products. Some 30% view analytics and AI as separate from data products and presumably reserve that term for reusable data assets alone. What matters is that an organization is consistent in how it defines and discusses data products.

Data scientists will become less sexy. The proliferation of roles such as data engineers that can address pieces of the data science problem along with the rise of citizen data science, where savvy businesspeople create models or algorithms themselves is causing the star power of data scientists to recede.

Data, analytics, and AI leaders are becoming less independent.In 2023, increasing numbers of organizations cut back on the proliferation of technology and data "chiefs," including chief data and analytics officers (and sometimes chief AI officers). The functions performed by data and analytics executives haven't gone away; rather, they're increasingly being subsumed within a broader set of technology, data, and digital transformation functions managed by a "supertech leader" who usually reports to the CEO. In 2024, expect to see more of these overarching tech leaders who have all the capabilities to create value from the data and technology professionals reporting to them.

The MIT Sloan Management Review article "Five Key Trends in AI and Data Science for 2024"publishes at 8 a.m. ET on Jan. 9, 2024. This column is part of the series AI in Action.

About the AuthorsThomas H. Davenportis the President's Distinguished Professor of Information Technology and Management at Babson College, a fellow of the MIT Initiative on the Digital Economy, and senior adviser to the Deloitte Chief Data and Analytics Officer Program.He is coauthor of All-In On AI: How Smart Companies Win Big With Artificial Intelligence(HBR Press, 2023) and Working With AI: Real Stories of Human-Machine Collaboration(MIT Press, 2022). Randy Beanis an industry thought leader, author, founder, and CEO and currently serves as innovation fellow, data strategy, for global consultancy Wavestone. He is the author of Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI(Wiley, 2021).

About MIT Sloan Management ReviewMIT Sloan Management Reviewis an independent, research-based magazine and digital platform for business leaders published at the MIT Sloan School of Management. MIT SMR explores how leadership and management are transforming in a disruptive world. We help thoughtful leaders capture the exciting opportunities and face down the challenges created as technological, societal, and environmental forces reshape how organizations operate, compete, and create value.

Connect with MIT Sloan Management Review on:

Tess WoodsTess@TessWoodsPR.com 617-942-0336

Cision

View original content to download multimedia:https://www.prnewswire.com/news-releases/five-key-trends-in-ai-and-data-science-for-2024-from-mit-sloan-management-review-302029337.html

SOURCE MIT Sloan Management Review

Read more:

Five Key Trends in AI and Data Science for 2024 From MIT Sloan Management Review - Yahoo Finance

Five Key Trends in AI and Data Science for 2024 From MIT Sloan Management Review – PR Newswire

CAMBRIDGE, Mass., Jan. 9, 2024 /PRNewswire/ -- MIT Sloan Management Reviewreveals the insights of more than 500 senior data and technology executives in"Five Key Trends in AI and Data Science for 2024," a part of its AI in Actionseries.

Artificial intelligence and data science became front-page news in 2023 thanks to generative AI, state coauthorsThomas H. Davenport, the President's Distinguished Professor of Information Technology and Management at Babson College and a fellow of the MIT Initiative on the Digital Economy, and Randy Bean, an industry thought leader who currently serves as innovation fellow, data strategy, for global consultancy Wavestone.

Orgs need to continually adjust data science skills and processes to get the full value from data, analytics, and AI.

To find out what might keep it on the front page in 2024, Davenport and Bean, during the past several months, conducted three surveys involving more than 500 executives closest to companies' data science and AI strategies to bring to light what organizations are thinking and doing.

"Data science is increasingly critical to every organization. But it's not a static discipline, and organizations need to continually adjust data science skills and processes to get the full value from data, analytics, and AI," said Davenport.

"Expect 2024 to be a year of transformation and change driven by adoption of AI and a reshaping of the data, analytics, and AI leadership role within leading companies," added Bean. "With 33% of midsize to large organizations having appointed or in search of a chief AI officer, and with 83.2% of leading companies having a chief data and analytics officer in place today, it is inevitable that we will witness consolidation of roles, restructuring of responsibilities, elimination of some positions, and some critical rethinking of data and AI leadership expectations during the course of 2024."

"Five Key Trends in AI and Data Science for 2024"culls the surveys to identify developing issues that should be on every leader's radar screen this year:

Generative AI sparkles but needs to deliver value. Survey responses suggest that although excitement is high, the value of generative AI has not been delivered. Large percentages of respondents believe the technology has the potential to be transformational; 80% in one survey said they believe it will transform their organizations, and 64% in another survey said it is the most transformational technology in a generation. A large majority of survey takers are also increasing investment in the technology.

Data science is shifting from artisanal to industrial. Companies are investing in platforms, processes and methodologies, feature stores, machine learning operations (MLOps) systems, and other tools to increase productivity and deployment rates. Automation is helping to increase productivity and enable broader data science participation.

Two versions of data products will dominate. Eighty percent of data and technology leaders in one survey said that their organizations were using or considering the use of data products and product management. But they mean two different things by "data products." Just under half (48%) of respondents said that they include analytics and AI capabilities in the concept of data products. Some 30% view analytics and AI as separate from data products and presumably reserve that term for reusable data assets alone. What matters is that an organization is consistent in how it defines and discusses data products.

Data scientists will become less sexy. The proliferation of roles such as data engineers that can address pieces of the data science problem along with the rise of citizen data science, where savvy businesspeople create models or algorithms themselves is causing the star power of data scientists to recede.

Data, analytics, and AI leaders are becoming less independent.In 2023, increasing numbers of organizations cut back on the proliferation of technology and data "chiefs," including chief data and analytics officers (and sometimes chief AI officers). The functions performed by data and analytics executives haven't gone away; rather, they're increasingly being subsumed within a broader set of technology, data, and digital transformation functions managed by a "supertech leader" who usually reports to the CEO. In 2024, expect to see more of these overarching tech leaders who have all the capabilities to create value from the data and technology professionals reporting to them.

The MIT Sloan Management Review article "Five Key Trends in AI and Data Science for 2024"publishes at 8 a.m. ET on Jan. 9, 2024. This column is part of the series AI in Action.

About the AuthorsThomas H. Davenportis the President's Distinguished Professor of Information Technology and Management at Babson College, a fellow of the MIT Initiative on the Digital Economy, and senior adviser to the Deloitte Chief Data and Analytics Officer Program.He is coauthor of All-In On AI: How Smart Companies Win Big With Artificial Intelligence(HBR Press, 2023) and Working With AI: Real Stories of Human-Machine Collaboration(MIT Press, 2022). Randy Beanis an industry thought leader, author, founder, and CEO and currently serves as innovation fellow, data strategy, for global consultancy Wavestone. He is the author of Fail Fast, Learn Faster: Lessons in Data-Driven Leadership in an Age of Disruption, Big Data, and AI(Wiley, 2021).

About MIT Sloan Management ReviewMIT Sloan Management Reviewis an independent, research-based magazine and digital platform for business leaders published at the MIT Sloan School of Management. MIT SMR explores how leadership and management are transforming in a disruptive world. We help thoughtful leaders capture the exciting opportunities and face down the challenges created as technological, societal, and environmental forces reshape how organizations operate, compete, and create value.

Connect with MIT Sloan Management Review on:

Tess Woods[emailprotected] 617-942-0336

SOURCE MIT Sloan Management Review

Visit link:

Five Key Trends in AI and Data Science for 2024 From MIT Sloan Management Review - PR Newswire

The Future Of Clinical Data Science Is Closer Than It Appears – Clinical Leader

By Clinical Leader Editorial Staff

The pharmaceutical and biotech industries have experienced multiple sea changes in the last few years, as emerging science promises advanced therapeutics that require new, complex clinical trial designs. But has the technology that supports clinical research kept up with the science behind it? In 2021, Patrick Nadolny, global head of clinical data management at Sanofi, made several predictions about clinical trial technology in Designing a Data Management Strategy for the Future. Clinical Leader editorial staff recently caught up with Nadolny to revisit his predictions, examine current trends in clinical trial technology, and imagine what innovations will shape the industry in the next few years.

Nadolny predicted that data management would evolve into clinical data science due to the influx of data sources and emerging protocols with new trial designs. This evolution has already begun, and its progress hinges on four main pillars: risk-based methodologies, AI, complex protocol designs, and DCTs.

How Are Risk-Based Methodologies Changing Trial Life Cycle?

First, risk-based methodologies have transformed study conduct. The 2016 ICH-E6 revision on good clinical practice demonstrated the urgency of adopting and adapting to risk-based study monitoring across functions, which is anticipated to be re-enforced in the upcoming revision.1Going beyond study conduct, the ICH E8 revision advocates for quality by design, focusing on both critical quality factors and operational feasibility.2This requires clinical data scientists earlier involvement to design appropriate data collection and review strategies. Likewise, the EMAs recent reflection paper on the use of AI in clinical research is fully risk-based, providing early insights on risk levels and validation needs for AI solutions during the drug development lifecycle.3

Moreover, todays clinical trials generate enormous amounts of data, with a growing proportion being eSource, and Nadolny states that current technology is not as efficient in managing the 5Vs of clinical data as required (volume, velocity, variety, veracity, and value), but its progressing in the right direction.4Also, risk-based approaches interconnect with the other three pillars. For example, companies may turn to AI to manage the large amounts of data generated from these studies to identify risks or automate repetitive activities. Likewise, these protocols are complex and often employ DCTs or hybrid technology in conjunction with risk-based approaches.

In the past, companies used the same processes for multiple studies, Nadolny explained. "But now, the methods developed for one study may not translate to the next. Several years ago, choosing a trial design was like finding a recipe in a cookbook. But now, companies are given a list of ingredients and must decide the best way to combine them to maximize each element while creating a unified whole."

What Is The Role Of AI?

AI is also revolutionizing the pharmaceutical and biotech industries. Nadolny explained that although AI is not a new technology, the advent of generative AI platforms like ChatGPT has accelerated investment and interest.

AI, especially generative AI, can be useful across the entire lifespan of a trial, from recruitment to post-study data management, Nadolny stated. Previously, AI utility was limited to simpler tasks such as identifying data patterns or reading medical images, but generative AI could ultimately also create study plans, read protocols, and suggest potential root causes of a study problem. Additionally, it can assist recruitment by better identifying potential participants. With DCTs, it can review data and identify complex data anomalies from wearable technologies. AI solutions will develop rapidly in the next few years as large and small pharmaceutical and biotech companies discover ways to automate or radically transform their processes to improve study timelines.

According to Nadolny, there are not enough qualified people in the industry to manage the vast volume of information generated by todays studies, and AI is necessary to create insights from billions of data points and various data sources. Many processes could be automated to reduce workload and inefficiencies in clinical trials, expediting data management. However, AI may not be the best solution for every study currently underway. For ongoing large or long-term trials, such as many oncology studies, integrating AI after the study starts would not necessarily save time or effort due to the complexity of transitioning to a new model. However, for new studies, implementing this technology from the beginning can accelerate timelines and enable new processes that companies can use as a template for future studies.

How Will Protocols Become Patient-Driven?

Meanwhile, complex protocol designs present more challenges to clinical data scientists. Umbrella, basket, and adaptive trial designs are just a few study protocols that can accelerate drug development but create operational complexities for data management. For example, evaluating one therapy across several indications simultaneously in a basket study is more efficient than examining one indication at a time but adds complexity to data collection across multiple medical conditions. Likewise, adaptive study design improves the predictability of study outcomes by allowing sponsors to adjust dosages and timing based on individual participant responses to the IP. However, collecting and managing the interconnected web of data these studies generate is an intricate process that today's platforms aren't fully equipped for. Too often, information is siloed, and separate systems must be integrated and reconfigured for each design adaptation, adding time to the study.

Operationally complex protocol designs may also result from the desire to meet requirements from regulatory bodies, such as ensuring patient diversity and patient centricity. Nadolny emphasizes that being patient-driven is a complex issue and is not the same as being patient-centric. For example, decentralized clinical trial procedures appear patient-centric because they allow subjects to participate remotely. However, mandating telehealth technology or wearable devices may burden some participants who would rather go to a traditional clinical setting to receive care.

On the other hand, a truly patient-driven trial would be much more complex. A patient-driven trial considers these factors and creates a flexible operational design that best fits each participant's lifestyle. Subjects would choose between participating in the trial remotely, in-person, or a hybrid mix. However, this hypothetical trial design is not yet possible to deploy efficiently with today's technology because it would create protocols that are too complex to pragmatically operationalize. The push for greater patient centricity and growing recruitment needs may drive the industry toward achieving highly adaptable, customizable trials. Nadolny predicts that technology will adapt to make such trials possible in the next two to five years.

Are Fully Decentralized Trials Possible?

In addition to meeting patients needs, the DCT trend that took off during the COVID-19 pandemic shows no signs of slowing down. Nadolny expects DCTs to continue to rise in popularity in response to other types of emergencies, such as wars or natural disasters, which can otherwise halt studies. By decentralizing trials and running global studies, companies can pivot when factors beyond their control shut down sites. However, Nadolny points out that currently, no single platform can run a fully decentralized pivotal clinical trial, and DCT technology is often a patchwork of integrated solutions. He expects the industry to invest heavily in creating new systems to accommodate the unique needs of DCTs.

The pandemic forced the industry to rethink how we work to become more resilient and adaptable," Nadolny explained. Weve learned to balance the risk of implementing new technology against the risk of doing nothing. The industry is changing, if slowly. There's a divide between the old ways and the new, and we're still coping with legacy systems while investing in the future."

In his 2021 predictions, Nadolny stated that data managers weren't fully utilizing emerging technologies because decentralized workflows and shifting protocol designs were still very new, and users faced challenges adjusting to the new normal. Currently, however, data management is catching up with industry trends.

"Everything that's happened in the past few years has forced us to adapt and maximize all our solutions," Nadolny explained. "Data management has evolved significantly. However, we're still putting patches on things and learning what we can leverage regarding new protocol designs and technologies. We still have room to improve, especially regarding DCT support, but we're moving in the right direction."

What Is The Future Of Data Management?

In 2021, Nadolny stated that clinical data management needed to evolve into clinical data science. That evolution is still necessary and is ongoing. As risk-based methodologies, AI, complex protocols, and DCTs continue to shape the industry, data management platforms must adapt to meet their needs. In addition, resiliency to emergency crises has become an imperative to infuse into our daily operations. Therefore, technology must adapt to ongoing clinical research changes at the study, country, site, and even patient level. At the same time, technology should allow for greater patient-driven solutions by giving subjects more opportunities to participate on their terms. Nadolny is optimistic that these changes will benefit companies, sites, and patients.

The industry will continue to show resiliency as we walk the tightrope between adaptive, highly complex protocol designs and patient centricity, Nadolny states. Well also see the gap close between clinical research and regular standards of care so that we dont have different processes for running a study and caring for patients. The technology we need to cope with todays challenges is still emerging, but were closer today than we were three years ago.

Here is the original post:

The Future Of Clinical Data Science Is Closer Than It Appears - Clinical Leader

Symposium highlights UGAs interdisciplinary AI and data science research and scholarship – University of Georgia

Ian Bogost, center, of Washington University in St. Louis speaks during a panel discussion at the AI and Data Science Across Disciplines Symposium on Nov. 30 at the University of Georgia Center for Continuing Education & Hotel. Meg Mittelstadt, left, director of UGAs Center for Teaching and Learning, and Tianming Liu, right, Distinguished Research Professor in the School of Computing, and Youjin Kong (not pictured), assistant professor in the department of philosophy, joined Bogost in discussing advances in artificial intelligence. (Photo by Mike Wooten)

Faculty from across the University of Georgia campus gathered on Nov. 30 to discuss the expanding influence of artificial intelligence, share insights into their research and consider how AI may shape higher education and society in the future.

The universitys inaugural Artificial Intelligence and Data Science Across Disciplines Symposium was hosted by the Institute for Artificial Intelligence with support from the Office of the Senior Vice President for Academic Affairs and Provost, the Office of Research and the Franklin College of Arts of Sciences.

The symposium is part of a series of events geared toward bringing together the AI and data science faculty at UGA, said Khaled M. Rasheed, director of the Institute for Artificial Intelligence and a professor in the School of Computing. It was an exciting opportunity for the AI community at UGA to connect and learn.

The symposium, held at the University of Georgia Center for Continuing Education & Hotel, showcased UGAs significant investments in the fields of artificial intelligence and data science. Those investments include an ambitious presidential interdisciplinary faculty hiring initiative that aims to recruit 70 faculty members with expertise in applying data science and artificial intelligence to some of societys most urgent challenges.

Rather than being housed exclusively in a single department, the majority of UGAs newly recruited faculty will focus on the fusion of data science and AI in cross-cutting areas such as infectious diseases, integrative precision agriculture, ethics, cybersecurity, resilient communities and the environment.

The breadth of experience and expertise at the University of Georgia uniquely positions our institution to advance AI and data science scholarship and research, said Jeanette Taylor, the universitys vice provost for academic affairs. We are able to integrate perspectives from a diverse array of disciplines as we consider not only potential uses for AI but also the ethical and social questions that arise.

Ian Bogost, Barbara and David Thomas Distinguished Professor at Washington University in St. Louis with a dual appointment as professor and director of film and media studies and professor of computer science and engineering, provided the symposiums keynote address.

Bogost urged attendees to avoid viewing generative AI, such as ChatGPT and Dall-E, as a tool for process optimization at the expense of imagination.

AI works best for me when I use it to extend my imagination, he said.

The symposium also featured two lightning rounds of brief talks by UGA faculty members from a wide range of disciplines. Faculty highlighted their use of AI and data science in research topics such as crop modeling and assessment, physics-informed machine learning for infectious disease forecasting, data science in advanced manufacturing and AIs integration into society.

A panel discussion closed the symposium. Participants examined the impact of AI and ChatGPT on teaching and learning at UGA, industries that stand to benefit from AI and the ethics of AI in research and society, among other topics.

Building upon the momentum of the symposium, UGAs Office of Research will host an AI Team Acceleration Event on Feb. 5 at the Delta Innovation Hub. This event will include presentations from research teams funded by Presidential Interdisciplinary Seed Grants and an overview of major university resources available to research teams.

UGA is now gathering input from faculty regarding potential interdisciplinary research collaborations. The Office of Research will filter those responses through AI to identify affinity groups faculty can join, and the AI Team Acceleration Event will include time for those groups to meet and begin discussions of possible research projects.

See original here:

Symposium highlights UGAs interdisciplinary AI and data science research and scholarship - University of Georgia

Examining the Influence Between NLP and Other Fields of Study – Towards Data Science

MOTIVATION

A fascinating aspect of science is how different fields of study interact and influence each other. Many significant advances have emerged from the synergistic interaction of multiple disciplines. For example, the conception of quantum mechanics is a theory that coalesced Plancks idea of quantized energy levels, Einsteins photoelectric effect, and Bohrs atom model.

The degree to which the ideas and artifacts of a field of study are helpful to the world is a measure of its influence.

Developing a better sense of the influence of a field has several benefits, such as understanding what fosters greater innovation and what stifles it, what a field has success at understanding and what remains elusive, or who are the most prominent stakeholders benefiting and who are being left behind.

Mechanisms of field-to-field influence are complex, but one notable marker of scientific influence is citations. The extent to which a source field cites a target field is a rough indicator of the degree of influence of the target on the source. We note here, though, that not all citations are equal and subject to various biases. Nonetheless, meaningful inferences can be drawn at an aggregate level; for example, if the proportion of citations from field x to a target field y has markedly increased as compared to the proportion of citations from other fields to the target, then it is likely that the influence of x on y has grown.

WHY NLP?

While studying influence is useful for any field of study, we focus on Natural language Processing (NLP) research for one critical reason.

NLP is at an inflection point. Recent developments in large language models have captured the imagination of the scientific world, industry, and the general public.

Thus, NLP is poised to exert substantial influence despite significant risks. Further, language is social, and its applications have complex social implications. Therefore, responsible research and development need engagement with a wide swathe of literature (arguably, more so for NLP than other fields).

By tracing hundreds of thousands of citations, we systematically and quantitatively examine broad trends in the influence of various fields of study on NLP and NLPs influence on them.

We use Semantic Scholars field of study attribute to categorize papers into 23 fields, such as math, medicine, or computer science. A paper can belong to one or many fields. For example, a paper that targets a medical application using computer algorithms might be in medicine and computer science. NLP itself is an interdisciplinary subfield of computer science, machine learning, and linguistics. We categorize a paper as NLP when it is in the ACL Anthology, which is arguably the largest repository of NLP literature (albeit not a complete set of all NLP papers).

Read this article:

Examining the Influence Between NLP and Other Fields of Study - Towards Data Science

Best Data Analytics Courses in the USA to Enroll in 2024 – Analytics Insight

Data analytics is a rapidly growing field that requires a combination of technical and analytical skills. With so many courses available online, it can be challenging to choose the right one. In 2024, for individuals looking for the best Data Analytics courses in the USA, programs typically cover data interpretation, website tracking codes, marketing campaigns, and program management. Emphasis is often placed on integrating analytics into various business facets. The USA, a premier destination for international students, boasts globally ranked universities and diverse study locations. A masters degree follows the completion of an undergraduate program. In this article, we will explore the best data analytics courses in the USA to enroll in 2024.

University: Canisius University

Campus location: Buffalo, USA

Duration: 1-3 years

Tuition fee: US$910/per credit

Course Description: With the MS in Data Analytics in the USA program at Canisius University, set off on a life-changing adventure. The curriculum lasts one to three years and is based in Buffalo, USA. It offers a thorough education in the ever-evolving subject of data analytics for US$910 per credit.

Enroll now

University: Pacific Lutheran University

Campus location: Tacoma, USA

Duration: 9-21 months

Tuition fee: US$1,104 / per credit

Course Description:

Pacific Lutheran University offers a dynamic Master of Science in Marketing Analytics in Tacoma, USA. This 9-21 month program equips students with strategic insights. The tuition is USD 1,104 per credit, ensuring a high-quality education in the heart of the Pacific Northwest.

Enroll now

University: Drew University

Campus location: Madison, USA

Duration: 1-2 years

Tuition fee: US$ 22,248 / per credit

Course Description: Get a data analytics certification program, Drew University offers a cutting-edge Master of Science in Data Science program in Madison, USA. Designed to be completed in 1-2 years, this advanced degree provides a comprehensive exploration of data science, equipping students with the skills and knowledge needed for success in this dynamic field.

Enroll now

University: Alliant International University

Campus Location: San Diego, USA

Duration: 1 year

Tuition fee: US$ 768 / per credit

Course Description: Join the MS program in Healthcare Analytics at Alliant International University in San Diego, USA, to start a life-changing adventure. Immerse yourself in cutting-edge insights at a lively campus in only one year, opening the door to a fascinating career in the rapidly changing field of healthcare analytics.

Enroll now

University: Illinois Institute of Technology

Campus Location: Chicago, USA

Duration: 2 years

Tuition fee: US$ 1,712 / per credit

Course Description:

With the Illinois Institute of Technologys Master of Science in Sustainability Analytics and Management, set off on a life-changing adventure. This two-year, STEM-designated program in the heart of Chicago gives students access to cutting-edge perspectives. It is a forward-thinking investment in a sustainable future, valued at US$1,712 per credit.

Enroll now

University: Southern Methodist University

Campus Location: Dallas, USA

Duration: 2 years

Tuition fee: US$ 74,000

Course Description:

The MS in Applied Statistics and Data Analytics at Southern Methodist University, located in Dallas, USA, spans two years. This program promises a comprehensive exploration of statistical methodologies and data analytics, preparing students for impactful roles in the dynamic field of data science.

Enroll now

University: Mercyhurst University

Campus Location: Erie, USA

Duration: 2 years

Tuition fee: USD 33,000 / per year

Course Description:

Pursue a Master of Science in Applied Intelligence at Mercyhurst University in the United States and set off on a revolutionary adventure. The two-year curriculum, which is based in Erie, provides a dynamic combination of theoretical knowledge and real-world application in the constantly changing field of applied intelligence.

Enroll now

Join our WhatsApp and Telegram Community to Get Regular Top Tech Updates

Read the rest here:

Best Data Analytics Courses in the USA to Enroll in 2024 - Analytics Insight

Custom Object Detection: Exploring Fundamentals of YOLO and Training on Custom Data – Towards Data Science

Deep learning has made huge progress over the last decade, and while early models were hard to understand and apply, modern frameworks and tools allow everyone with a bit of code understanding to train their own neural network for computer vision tasks.

In this article, I will thoroughly demonstrate how to load and augment data as well as the bounding boxes, train an object detection algorithm, and eventually see how accurately were able to detect objects in the test images. While the available tool kits have become much easier to use over time, there are still a few pitfalls you might run into.

Computer vision is both a very popular and, even more, a broad field of research and application. Advances that have been made in deep learning, especially over the last decade, tremendously accelerated our understanding of deep learning and its broad potential of usage.

Why do we see those advances right now? As Francois Chollet (the father of Keras library) describes it, we witnessed an increase of computational capabilities in CPUs that rose by a factor of roughly 5000, just between 1990 and 2010. Investments in GPUs have even gotten research further.

In general, we see three essential tasks that are related to CV:

More here:

Custom Object Detection: Exploring Fundamentals of YOLO and Training on Custom Data - Towards Data Science

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation – Towards Data Science

Components of whylogs

Lets begin by understanding the important characteristics of whylogs.

This is all we need to know about whylogs. If youre curious to know more, I encourage you to check the documentation. Next, lets work to set things up for the tutorial.

Well use a Jupyter notebook for this tutorial. To make our code work anywhere, well use JupyterLab in Docker. This setup installs all needed libraries and gets the sample data ready. If youre new to Docker and want to learn how to set it up, check out this link.

Start by downloading the sample data (CSV) from here. This data is what well use for profiling and validation. Create a data folder in your project root directory and save the CSV file there. Next, create a Dockerfile in the same root directory.

This Dockerfile is a set of instructions to create a specific environment for the tutorial. Lets break it down:

By now your project directory should look something like this.

Awesome! Now, lets build a Docker image. To do this, type the following command in your terminal, making sure youre in your projects root folder.

This command creates a Docker image named pyspark-whylogs. You can see it in the Images tab of your Docker Desktop app.

Next step: lets run this image to start JupyterLab. Type another command in your terminal.

This command launches a container from the pyspark-whylogs image. It makes sure you can access JupyterLab through port 8888 on your computer.

After running this command, youll see a URL in the logs that looks like this: http://127.0.0.1:8888/lab?token=your_token. Click on it to open the JupyterLab web interface.

Great! Everythings set up for using whylogs. Now, lets get to know the dataset well be working with.

Well use a dataset about hospital patients. The file, named patient_data.csv, includes 100k rows with these columns:

As for where this dataset came from, dont worry. It was created by ChatGPT. Next, lets start writing some code.

First, open a new notebook in JupyterLab. Remember to save it before you start working.

Well begin by importing the needed libraries.

Then, well set up a SparkSession. This lets us run PySpark code.

After that, well make a Spark dataframe by reading the CSV file. Well also check out its schema.

Next, lets peek at the data. Well view the first row in the dataframe.

Now that weve seen the data, its time to start data profiling with whylogs.

To profile our data, we will use two functions. First, theres collect_column_profile_views. This function collects detailed profiles for each column in the dataframe. These profiles give us stats like counts, distributions, and more, depending on how we set up whylogs.

Each column in the dataset gets its own ColumnProfileView object in a dictionary. We can examine various metrics for each column, like their mean values.

whylogs will look at every data point and statistically decide wether or not that data point is relevant to the final calculation

For example, lets look at the average height.

Next, well also calculate the mean directly from the dataframe for comparison.

But, profiling columns one by one isnt always enough. So, we use another function, collect_dataset_profile_view. This function profiles the whole dataset, not just single columns. We can combine it with Pandas to analyze all the metrics from the profile.

We can also save this profile as a CSV file for later use.

The folder /home/jovyan in our Docker container is from Jupyter's Docker Stacks (ready-to-use Docker images containing Jupyter applications). In these Docker setups, 'jovyan' is the default user for running Jupyter. The /home/jovyan folder is where Jupyter notebooks usually start and where you should put files to access them in Jupyter.

And thats how we profile data with whylogs. Next, well explore data validation.

For our data validation, well perform these checks:

Now, lets start. Data validation in whylogs starts from data profiling. We can use the collect_dataset_profile_view function to create a profile, like we saw before.

However, this function usually makes a profile with standard metrics like average and count. But what if we need to check individual values in a column as opposed to the other constraints, that can be checked against aggregate metrics? Thats where condition count metrics come in. Its like adding a custom metric to our profile.

Lets create one for the visit_date column to validate each row.

visit_date_condition = {"is_date_format": Condition(Predicate().is_(check_date_format))}

Once we have our condition, we add it to the profile. We use a Standard Schema and add our custom check.

Then we re-create the profile with both standard metrics and our new custom metric for the visit_date column.

With our profile ready, we can now set up our validation checks for each column.

constraints = builder.build()constraints.generate_constraints_report()

We can also use whylogs to show a report of these checks.

Itll be an HTML report showing which checks passed or failed.

Heres what we find:

Lets double-check these findings in our dataframe. First, we check the visit_date format with PySpark code.

+----------+-----+|null_check|count|+----------+-----+|not_null |98977||null |1023 |+----------+-----+

It shows that 1023 out of 100,000 rows dont match our date format. Next, the weight column.

+------+-----+|weight|count|+------+-----+|0 |2039 |+------+-----+

Again, our findings match whylogs. Almost 2,000 rows have a weight of zero. And that wraps up our tutorial. You can find the notebook for this tutorial here.

Here is the original post:

Streamline Data Pipelines: How to Use WhyLogs with PySpark for Data Profiling and Validation - Towards Data Science

Harness the Data Tsunami: Master the Waves of Data Science in 2024 – Medium

Data science is no longer just a hot buzzword; it is the key to unlocking hidden insights, driving profitable decisions, and innovating across industries. But with the explosion of data science courses, choosing the right one can feel like navigating a jungle of options. Fear not, intrepid data explorer! This blog is your trust map, guiding you through the exciting landscape of data science education and helping you find the perfect course to catapult your career into the stratosphere.

First things first: Why enroll in a data science course?

You can try to learn it all on your own sifting through blogs, tutorials, and mountains of documentation. But a good data science course offers much more:

Now, let us explore the diverse terrain of data science courses:

Finding your perfect match:

The ideal course depends on your learning style, budget, and career goals. Consider these factors:

Pro tips for choosing the right data science course:

Remember, the journey to data science mastery is yours to own. Choose a course that fuels your passion, ignites your curiosity, and equips you with the skills to conquer the data deluge.

Ready to embark on your data science adventure? Start exploring, ask questions, and find the course that aligns with your unique path. And remember to have fun on the way!

See the original post:

Harness the Data Tsunami: Master the Waves of Data Science in 2024 - Medium

Does Using an LLM During the Hiring Process Make You a Fraud as a Candidate? – Towards Data Science

Employers, ditch the AI detection tools and ask one important question instead.

I saw a post on LinkedIn from the Director of a Consulting Firm describing how he assigned an essay about model drift in machine learning systems to screen potential candidates.

Then, based on criteria that he established based on his intuitions (you can smell it) he used four different AI detectors to confirm that the applicants used ChatGPT to write their responses to the essay.

The criteria for suspected bot-generated essays were:

One criteria notably missing: accuracy.

The rationale behind this is that using AI tools is trying to subvert the candidate selection process. Needless to say, the comments are wild (and very LinkedIn-core).

I can appreciate that argument, even though I find his methodology less than rigorous. It seems like he wanted to avoid candidates who would copy and paste a response directly from ChatGPT without scrutiny.

However, I think this post raises an interesting question that we as a society need to explore is using an LLM to help you write cheating during the hiring process?

I would say it is not. Here is the argument for why using an LLM to help you write is just fine and why it should not exclude you as a candidate.

As a bonus for the Director, Ill include a better methodology for filtering candidates based on how they use LLMs and AI tools.

Excerpt from:

Does Using an LLM During the Hiring Process Make You a Fraud as a Candidate? - Towards Data Science