Category Archives: Data Science
Fine Tuning: What is it? What is it used for in AI? – DataScientest
By allowing AI models to specialize on specific tasks, Fine-Tuning maximizes their performance. This technique is at the heart of the AI revolution, enabling the technology to be deployed in a huge variety of fields.
New developments are expected in this field in the future. Multi-task fine-tuning will enable pre-trained models to evolve into architectures capable of adapting to multiple tasks simultaneously, optimizing efficiency in real-world scenarios requiring diverse skills.
Likewise, methods could become more dynamic to allow continuous adjustment of models as new data becomes available. This will eliminate the need to start the whole process from scratch.
To master all the subtleties of Fine Tuning, you can choose DataScientest training courses. Through our various courses, you can quickly acquire real expertise in artificial intelligence.
Our Data Scientist and Machine Learning Engineer courses will teach you how to program in Python, master DataViz tools and techniques, Machine Learning and Data Engineering, as well as integration and continuous deployment practices for ML models.
These courses can be completed by continuing education or intensive BootCamp, and lead to Project Manager in Artificial Intelligence certification from the Collge de Paris, a certificate from Mines ParisTech PSL Executive Education and AWS Cloud Practitioner certification.
The Deep Learning course runs continuously over 10 weeks. It teaches you how to use the Keras and Tensorflow tools, and AI techniques such as Computer Vision and NLP.
Finally, to enable you to exploit the full potential of AIs such as DALL-E and ChatGPT, our Prompt Engineering & Generative AI training course extends over two days and will make you an expert in prompt writing and fine-tuning.
All our training courses take place entirely by distance learning via the web, and our organization is eligible for funding options. Discover DataScientest now!
Read the rest here:
Fine Tuning: What is it? What is it used for in AI? - DataScientest
TimeGPT vs TiDE: Is Zero-Shot Inference the Future of Forecasting or Just Hype? – Towards Data Science
12 min read
This post was co-authored with Rafael Guedes.
Forecasting is one of the core domains of Artificial Intelligence (AI) in academic research and industrial applications. In fact, it is probably one of the most ubiquitous challenges we can find across all industries. Accurately predicting future sales volumes and market trends is essential for businesses to optimize their planning processes. This includes enhancing contribution margins, minimizing waste, ensuring adequate inventory levels, optimizing the supply chain, and improving decision-making overall.
Developing a forecast model represents a complex and multifaceted challenge. It requires a deep understanding of State-Of-The-Art (SOTA) forecasting methodologies and the specific business domain to which they are applied. Furthermore, the forecast engine will act as a critical infrastructure within an organization, supporting a broad spectrum of processes across various departments. For instance:
Recent advancements in forecasting have also been shaped by the successful development of foundational models across various domains, including text (e.g., ChatGPT), text-to-image (e.g., Midjourney), and text-to-speech (e.g., Eleven Labs). The wide
Go here to read the rest:
NVIDIA and HP Supercharge Data Science and Generative AI on Workstations – NVIDIA Blog
Coming to Z by HP AI Studio, NVIDIA CUDA-X Data Processing Libraries Boost Python Pandas Software for Millions of Data Scientists
HP Amplify NVIDIA and HP Inc. today announced that NVIDIA CUDA-X data processing libraries will be integrated with HP AI workstation solutions to turbocharge the data preparation and processing work that forms the foundation of generative AI development.
Built on the NVIDIA CUDA compute platform, CUDA-X libraries speed data processing for a broad range of data types, including tables, text, images and video. They include the NVIDIA RAPIDS cuDF library, which accelerates the work of the nearly 10 million data scientists using pandas software by up to 110x using an NVIDIA RTX 6000 Ada Generation GPU instead of a CPU-only system, without requiring any code changes.
RAPIDS cuDF and other NVIDIA software will be available as part of Z by HP AI Studio on HP AI workstations to provide a full-stack development solution that speeds data science workflows.
Pandas is the essential tool of millions of data scientists processing and preparing data for generative AI, said Jensen Huang, founder and CEO at NVIDIA. Accelerating pandas with zero code changes will be a massive step forward. Data scientists can process data in minutes rather than hours, and wrangle orders of magnitude more data to train generative AI models.
Data science provides the foundation for AI, and developers need fast access to software and systems to power this critical work, said Enrique Lores, president and CEO of HP Inc. With the integration of NVIDIA AI software and accelerated GPU compute, HP AI workstations provide a powerful solution for our customers.
NVIDIA CUDA-X Speeds Data Science on HP Workstation Solutions Pandas provides a powerful data structure, called DataFrames, which lets developers easily manipulate, clean and analyze tabular data.
The NVIDIA RAPIDS cuDF library accelerates pandas so that it can run on GPUs with zero code changes, rather than relying on CPUs, which can slow workloads as data size grows. RAPIDS cuDF is compatible with third-party libraries and unifies GPU and CPU workflows so data scientists can develop, test and run models in production seamlessly.
As datasets continue to grow, RTX 6000 Ada Generation GPUs provide 48GB of memory per GPU to process large data science and AI workloads on Z by HP workstations. With up to four RTX 6000 GPUs, the HP Z8 Fury is one of the worlds most powerful workstations for AI creation. The close collaboration between HP and NVIDIA allows data scientists to streamline development by working on local systems to process even large generative AI workloads.
Availability NVIDIA RAPIDS cuDF for accelerated pandas with zero code changes is expected to be available on HP AI workstation solutions with NVIDIA RTX and GeForce RTX GPUs this month and on HP AI Studio later this year.
Visit link:
NVIDIA and HP Supercharge Data Science and Generative AI on Workstations - NVIDIA Blog
Best Free Resources to Learn Data Analysis and Data Science – KDnuggets
Sponsored Content
In my decade of teaching online, the most significant inspiration has been that online learning democratizes access to education globally. Regardless of your ethnic background, income level, and geographical locationas long as you can surf the webyou can find an ocean of free educational content to help you learn new skills.
This article introduces six top-notch, free data science resources ideal for aspiring data analysts, data scientists, or anyone aiming to enhance their analytical skills.
At 365 Data Science, we offer a range of expertly designed flashcard decks to teach fundamental data science conceptsincluding terms and glossaries for tools like Microsoft Excel, SQL, Python, and ChatGPT. Additionally, our flashcards cover math, statistics, probability, and machine learningproviding an excellent starting point for beginners by familiarizing them with the essential data science language. Free Udemy Courses
Udemy is the go-to marketplace for online courses. Their content library includes over 100,000 titles on almost every topicincluding data analytics and data science. They also offer free courses uploaded by authors eager to share their knowledge with the public at no cost. Use the filtering options when browsing the marketplace to discover unique learning materials.
Recently, 365 Data Science launched a series of statistics calculators designed for university students and practitioners eager to grasp statistical calculations' underlying mechanics and theorybeyond merely achieving results through tools like Excel or Python. These calculators enable users to input problem data for homework, exams, or practice, revealing each step necessary to arrive at the solution rather than only the outcome. Each calculator includes a detailed statistical article to explain the conceptoffering an invaluable opportunity to learn through comprehension and application.
Utilize these complimentary statistics calculators to discover how to:
YouTube's data content creators deliver immense valueoffering everything from concise tutorials and job-seeking advice in data science to comprehensive courses. If you're seeking exceptional, free data science learning resources, consider these top recommendations:
At 365 Data Science, we offer a wealth of free resources to our students. For those keen on exploring data science without financial constraints, download our complimentary course notes and career guides. You can access various topics at no cost, including Intro to Data Science, Statistics, Probability, Python, Machine Learning, Data Strategy, and others.
In addition to course notes, you can download 365s free Data Analyst Career Guide and Data Scientist Career Guide.
Google, Microsoft, Amazon, and other Big Tech organizations have shown increasing interest in providing free online courses to individuals worldwide.
For instance, you can sign up at no cost for Google's Data Analytics Professional Certification on Coursera, accessing courses for free until opting to pay a reasonable fee for a certificate of completion.
This comprehensive list of free data science learning resources is a testament to the quality content available online at no cost. The wealth of available educational resources will motivate future data analysts and scientistsenhancing global skillsets and career prospects, regardless of humble origins.
Link:
Best Free Resources to Learn Data Analysis and Data Science - KDnuggets
Garud Iyengar named Director of Columbia Data Science Institute – The American Bazaar
Indian American professor Garud Iyengar has been appointed as the next Avanessians Director of the Data Science Institute at Columbia University. He will begin as DSI Director on July 1.
Currently the Tang Family Professor in the Department of Industrial Engineering and Operations Research and Vice Dean of Research, Iyengar will play a key role in leading Columbias AI initiative, identified as a key University priority.
Iyengar brings a wealth of experience in academic leadership and long record of success in convening faculty from disparate fields to tackle pressing interdisciplinary challenges, Interim Provost Dennis A. Mitchell, stated announcing Iyengars appointment.
As the Data Science Institute builds on its decade-plus record of remarkable success in advancing the frontiers of the field, developing collaborations across Columbias many schools, and training the next generation of data science leaders, it is in exceptionally capable hands, he added.
READ: IIT Madras to set up Wadhwani School of Data Science & AI (January 31, 2024)
I am honored to lead the Data Science Institute as we continue to advance the frontiers of the field and train the next generation of data science leaders, said Iyengar. I am excited to build upon the institutes decade-plus record of success and further strengthen its impact across Columbias diverse schools, he added.
With artificial intelligence identified as one of the key priorities for the University in the coming years, under Iyengars direction, DSI will partner with EVP for Research Jeannette Wing and Columbia Engineering Dean Shih-Fu Chang in leading Columbias AI initiative, the announcement stated.
READ: Vinod Khosla slams Musks OpenAI case as sour grapes (March 5, 2024)
Iyengar has been a member of the Columbia Engineering faculty since 1998. He is currently the Senior Vice Dean for Research and Academic Programs, leading educational programs, and large-scale interdisciplinary research at the school.
Iyengar has played a central roles at DSI since its founding. He was Associate Director for Research from 2017-19 and helped launch the Institutes PhD concentration, seed fund program, and postdoc program.
His own research has brought significant advances to the study of information, control, and optimization, and his current work addresses a broad range of domains, including cellular signaling, labor platforms, power networks, supply chains, and causal inference.
Iyengar received a B Tech in electrical engineering from the Indian Institute of Technology in 1993 and a PhD in electrical engineering from Stanford University in 1998. He is a member of Columbias Data Science Institute
Visit link:
Garud Iyengar named Director of Columbia Data Science Institute - The American Bazaar
CISS and Spark! partnership combines quantitative data and social science in research The Daily Free Press – Daily Free Press
When Meghann Lucy, a graduate affiliate of the Center for Innovation in Social Science, or CISS, collected data on cases of hoarding, she wanted to identify the patterns of where these cases were most common, she said. The problem was, she said she did not know how to visually represent the data to do the analysis.
Lucy, a sixth-year Ph.D. candidate in sociology in the Graduate School of Arts and Sciences, received a grant from CISS for her research, but struggled with using mapping softwares and quantitative data analysis to make full use of her data, she said.
She said she decided to apply for the partnership program to get help from students in BU Spark!, BUs innovation and experiential learning lab for quantitative data-driven projects, to figure it out.
After working with the lab for a semester last spring, Spark! students analyzed the data Lucy had and visualized it using maps, some of which Lucy has since included in her dissertation, she said.
[The Spark! team] had perspectives that I dont have, Lucy said. Im very much trained as a social scientist, Im not trained as a computer scientist.
The partnership between Spark! and the Center for Innovation in Social Science (CISS) began in 2023 to apply data science and computational skills to social science research, according to the CISS website.
Seth Villegas, the program lead of the partnership, and a post-doctoral fellow in the Faculty of Computing and Data Sciences, said one challenge in social science research is that researchers do not have the technical skills necessary, because the technology is fairly new.
Villegas said the younger students trained in cutting-edge technology in Spark! can provide faculty and graduate students with technical support.
CISS and Spark! partnership was made possible by funding from the College of Arts and Sciences dedicated to experiential learning, initiated by Dean Stan Sclaroff, according to the CISS website.
The interdisciplinary nature of the partnership aligns with Spark!s goals, to be the bridge between the computing and data sciences and the rest of the university, Ziba Cranmer, the director of Spark! said.
Deborah Carr, the director of CISS, and a professor of sociology in the CAS, said CISS contributes to this partnership through bringing topics that students in Spark! may not be familiar with.
What [CISS and Spark!] are doing is making science that neither [program] could have created on their own, Carr said.
The partnership solicits research proposals from faculty and graduate students, according to the CISS website.
Last year, four projects, including Lucys, were chosen according to the CISS website. This year, after receiving far more proposals than expected, nine projects were chosen in total and six were matched with undergraduate courses in the Faculty of Computing and Data Sciences, the website says.
One of the Spark!-funded projects of this years grant is led by Ana Barun, a fifth-year Ph.D. student in the Graduate School of Arts and Sciences studying archaeology. Barun said she pursued the grant because she needed help analyzing data using geospatial technology, tools for geographic mapping and analysis.
You dont learn that as an archaeologist and I just didnt have enough time to teach myself, Barun said. The fact that they were even offering something like this in the first place is amazing, so why let the opportunity pass by?
The archaeology field is constantly evolving with new technologies, Barun said. Due to the nature of working in a field with large datasets, Barun said, its necessary to collaborate with people who can use these technologies.
The partnership will allow for research on crucial topics like racism, crime and immigration, Carr said.
Molly Richard, a postdoctoral associate at CISS, had a project that was matched with an undergraduate data visualization course in CDS.
Richard used census data to estimate the scale of so-called doubled-up homelessness, often referred to as couch surfing, Richard said.
Richard said their goal is to create a data visualization website where people can find data on the rate of doubled-up homelessness in their community each year.
The instructor for the data visualization course is Anthony Chamberas, an adjunct lecturer in CDS, who has a background in analytics and data with a focus on data visualization.
Im hoping it teaches [the students] how to work on a project that has had its share of challenges and difficulties, he said. Data is never the way you want it to be when youre working with it.
He said when he was a student, he never learned to handle those difficulties because he was always given data that was clean and ready to go.
In the end, the goal is for CISS/Spark! to eventually have the resources to support all applicants, Cranmer said.
I think thats really the main thing about being part of an academic university, is that you have people in different fields that can collaborate with you and help you, Barun said.
Read the rest here:
Visualizing Household Income from Government Sources A Guided Walkthrough | by James Koh, PhD | Mar, 2024 – Towards Data Science
12 min read
Are you unsure about how to navigate through the sea of data on the internet, and about what can be done after obtaining relevant data from reliable sources?
Do you wonder how your household income compares with all others across your country? (I guess at least 90% of everyone out there, regardless of profession or interest.)
If your answer to either of the above is yes, this article is for you!
At the end of this guided walkthrough, you will be able to obtain the following yourself. By replacing just a single variable in the code below according to your household income, as indicated by ### just replace this with your data ###, you will be able to create a chart showing your own percentile relative to households in Singapore! (And with additional work, you can replace the dataframe using data specific to your own country.)
Visit link:
End-to-End NLP Project with Hugging Face, FastAPI, and Docker – Towards Data Science
This tutorial explains how to build a containerized sentiment analysis API using Hugging Face, FastAPI and Docker 10 min read
Many AI projects fail, according to various reports (eg. Hardvard Business Review). I speculate that part of the barrier to AI project success is the technical step from having built a model to making it widely available for others in your organization.
So how do you make your model easily available for consumption? One way is to wrap it in an API and containerize it so that your model can be exposed on any server with Docker installed. And thats exactly what well do in this tutorial.
We will take a sentiment analysis model from Hugging Face (an arbitrary choice just to have a model thats easy to show as an example), write an API endpoint that exposes the model using FastAPI, and then well containerize our sentiment analysis app with Docker. Ill provide code examples and explanations all the way.
The tutorial code has been tested on Linux, and should work on Windows too.
We will use the Pipeline class from Hugging Faces transformers library. See Hugging Faces tutorial for an introduction to the Pipeline if youre unfamiliar with it.
The pipeline makes it very easy to use models such as sentiment models. Check out Hugging Faces sentiment analysis tutorial for a thorough introduction to the concept.
You can instantiate the pipe with several different constructor arguments. One way is to pass in a type of task:
This will use Hugging Faces default model for the provided task.
Another way is to pass the model argument specifying which model you want to use. You dont
See more here:
End-to-End NLP Project with Hugging Face, FastAPI, and Docker - Towards Data Science
Truman State University adds undergraduate option to Data Science Program – Kirksville Daily Express and Daily News
Truman State University
Truman State Universitys data science program continues to grow following the addition of a bachelor of science option starting this fall.
Last month, the Missouri Department of Higher Education and Workforce Development approved Trumans newest bachelors program. Students can now get an undergraduate degree in data science. Truman is among the first institutions in Missouri to offer a bachelors degree in this emerging career field.
Simply put, data science focuses on making meaning from information. Program participants learn how to collect and analyze data, as well as gain knowledge of techniques to effectively communicate insights that can be used to solve problems through informed decision making. Nearly every industry utilizes data to some extent, from businesses trying to effectively manage inventory and purchasing decisions to streaming platforms suggesting what to watch next.
Everyone has tons of data and they dont know what to do with it, said Scott Alberts, chair of the Department of Statistics and Data Science. This program focuses on making meaning from information, including use of tools such as distributed computing and machine learning. Those skills can be used in a wide array of career fields, making this a versatile and valuable degree.
Data science is a field that naturally fits with a liberal arts education. Practitioners draw heavily on critical thinking and problem-solving skills associated with the liberal arts. Currently, starting salaries for data scientists typically range from $70,000-80,000 per year.
The bachelor of science joins a growing stable of data science options at Truman. The University already offers an online masters degree, as well as a 15-credit certificate program for working adults seeking to add skills to enhance their careers. Because Truman has been making strides in data science for years, the course infrastructure to support a full undergraduate program is already in place.
Trumans Bachelor of Science in data science degree includes a minor or second major as part of the program. Complementary fields of study are computer science, mathematics and statistics, but data science students can have a concentration in most fields, including biology, health sciences, psychology or business, among others.
The data science options at Truman have been designed to be stackable so students can tailor them to best fit their needs, said Hyun-Joo Kim, chair of the Department of Computer Science. Every class gives you a new job skill.
Students can start working toward a Bachelor of Science degree in data science as early as this fall. More information about data science offerings through Truman can be found online or by contacting the Admissions Office at admissions@truman.edu or (660) 785-4114.
Link:
W&M, industry partnership leverages AI to support patients with chronic conditions – William & Mary
William & Mary data scientist Haipeng Chen believes in AI for social good. So, he is using his expertise to help deliver personalized and more accessible health care to patients with chronic conditions.
Chen, an assistant professor of data science, is leading a partnership between William & Mary and the health care technology company Generated Health. He and his team will develop synthetic patient data to help train a more autonomous version of the companys digital nurse, Florence.
If we can have an AI system that can deliver automated, personalized management of patients, then we will relieve some of the growing pressures created by the accelerating prevalence of chronic conditions and workforce shortages, said Chen.
The contract with Generated Health, starting July 1 this year, will also cover the stipend of a graduate research assistant from Chens lab.
This partnership is part of a growing portfolio of externally funded data science research at William & Mary. The data science program has attracted over $2 million in research funding last year and is now extending its scope with projects supported by federal agencies and the private sector as well as pursuing technology transfer opportunities.
As the disciplinary home of AI on campus, the data science unit is particularly interested in studying AI solutions as they impact the world, said Professor Anthony Stefanidis, data science program director. He described the research program as particularly focused on the intersections of data science and AI with location, health, information generation and dissemination, and large-scale experiments and simulations.
The data science program will be part of a proposednew schoolat William & Mary, which will expand among other things the universitysfocus on data fluencyand data-intensive research by building on the strengths of existing programs.
According to aGenerated Health press release, the digital nurse Florence has already managed over 25 million clinical conversations with 200,000 patients in three countries, delivering a better patient experience and improved clinical outcomes.
Chen said that Florence has been used to help chronic disease patients monitor and control their conditions.
In many cases, patients cant get an appointment soon enough to get to know their condition better, said Chen. Using AI, we can have an automated way to accelerate and augment the current health care system.
Chen and his team will be developing an AI diffusion model simulating real patient behavior, which will be used to train the nurse model combining generative AI and reinforcement learning.
The goal is developing a next-generation digital nurse with the ability to take effective decisions learning from its interaction with an environment within a set of clinical rules and protocols that eliminate the risk for hallucination that is, incorrect information presented as factual.
Chens interest in health care is not new. While a postdoctoral fellow at Harvard, he started working on AI in the public health domain. At William & Mary, he and Associate Professor of Kinesiology Carrie Dolan are developinga projectusing data science to get timely vaccinations to rural communities in Kenya.
I believe that AI should be used for the good: Its a kind of philosophical belief, said Chen. Many people mostly care about the fancy techniques, but then at the end of the day what really makes AI useful is its application to domains related to society.
According to Chen, one advantage of applying AI to the medical domain is freeing up clinicians time, helping alleviate the impact of workforce shortages in health care across the nation currentlyestimatedat 200,000 among nurses and 124,000 among physicians by the 2030s. Also, he sees AI as a support tool for auxiliary health care workers, helping remove barriers and create job opportunities.
This collaboration is a very important piece of my general vision, he said. I would be excited to see this system benefiting tens of thousands or even millions of patients around the world because thats one of the end goals for researchers in AI for social good.
Antonella Di Marzio, Senior Research Writer
Original post:
W&M, industry partnership leverages AI to support patients with chronic conditions - William & Mary