Just about everyone agrees that data scientists and AI developers are the new superstars of the tech industry. But ask a group of CIOs to define the precise area of expertise for data science-related job titles, and discord becomes the word of the day.
As businesses seek actionable insights by hiring teams that include data analysts, data engineers, data scientists, machine learning engineers and deep learning engineers, a key to success is understanding what each role can and cant do for the business.
Read on to learn what your data science and AI experts can be expected to contribute as companies grapple with ever-increasing amounts of data that must be mined to create new paths to innovation.
In a perfect world, every company employee and executive works under a well-defined set of duties and responsibilities.
Data science isnt that world. Companies often will structure their data science organization based on project need: Is the main problem maintaining good data hygiene? Or is there a need to work with data in a relational model? Perhaps the team requires someone to be an expert in deep learning, and to understand infrastructure as well as data?
Depending on a companys size and budget, any one job title might be expected to own one or more of these problem-solving skills. Of course, roles and responsibilities will change with time, just as theyve done as the era of big data evolves into the age of AI.
That said, its good for a CIO and the data science team she or he is managing today to remove as much of the ambiguity as possible regarding roles and responsibilities for some of the most common roles those of the data analyst, data engineer, data scientist, machine learning engineer and deep learning engineer.
Teams that have the best understanding of how each fits into the companys goals are best positioned to deliver a successful outcome. No matter the role, accelerated computing infrastructure is also key to powering success throughout the pipeline as data moves from analytics to advanced AI.
Its important to recognize the work of a data analyst, as these experts have been helping companies extract information from their data long before the emergence of the modern data science and AI pipeline.
Data analysts use standard business intelligence tools like Microsoft Power BI, Tableau, Qlik, Yellowfin, Spark, SQL and other data analytics applications. Broad-scale data analytics can involve the integration of many different data sources, which increases the complexity of the work of both data engineers and data scientists another example of how the work of these various specialists tends to overlap and complement each other.
Data analysts still play an important role in the business, as their work helps the business assess its success. A data engineer might also support a data analyst who needs to evaluate data from different sources.
Data scientists take things a step further so that companies can start to capitalize on new opportunities with recommender systems, conversational AI, and computer vision, to name a few examples.
A data engineer makes sense of messy data and theres usually a lot of it. People in this role tend to be junior teammates who make data nice and neat (as possible) for data scientists to use. This role involves a lot of data prep and data hygiene work, including lots of ETL (extract, transform, load) to ingest and clean data.
The data engineer must be good with data jigsaw puzzles. Formats change, standards change, even the fields a team is using on a webpage can change frequently. Datasets can have transmission errors, such as when data from one field is incorrectly entered into another.
When datasets need to be joined together, data engineers need to fix the data hygiene problems that occur when labeling is inconsistent. For example, if the day of the week is included in the source data, the data engineer needs to make sure that the same format is used to indicate the day, as Monday could also be written as Mon., or even represented by a number that could be one or zero depending on how the days of the week are counted.
Expect your data engineers to be able to work freely with scripting languages like Python, and in SQL and Spark. Theyll need programming language skills to find problems and clean them up. Given that theyll be working with raw data, their work is important to ensuring your pipeline is robust.
If enterprises are pulling data from their data lake for AI training, this rule-based work can be done by a data engineer. More extensive feature engineering is the work of a data scientist. Depending on their experience and the project, some data engineers may support data scientists with initial data visualization graphs and charts.
Depending on how strict your company has been with data management, or if you work with data from a variety of partners, you might need a number of data engineers on the team. At many companies, the work of a data engineer often ends up being done by a data scientist, who preps her or his own data before putting it to work.
Data scientists experiment with data to find the secrets hidden inside. Its a broad field of expertise that can include the work of data analytics and data processing, but the core work of a data scientist is done by applying predictive techniques to data using statistical machine learning or deep learning.
For years, the IT industry has talked about big data and data lakes. Data scientists are people who finally turn these oceans of raw data into information. These experts use a broad range of tools to conduct analytics, experiment, build and test models to find patterns. To be great at their work, data scientists also need to understand the needs of the business theyre supporting.
These experts use many applications, including NumPy, SciKit-Learn, RAPIDS, CUDA, SciPy, Matplotlib, Pandas, Plotly, NetworkX, XGBoost, domain-specific libraries and many more. They need to have domain expertise in statistical machine learning, random forests, gradient boosting, packages, feature engineering, training, model evaluation and refinement, data normalization and cross-validation. The depth and breadth of these skills make it readily apparent why these experts are so highly valued at todays data-driven companies.
Data scientists often solve mysteries to get to the deeper truth. Their work involves finding the simplest explanations for complex phenomena and building models that are simple enough to be flexible yet faithful enough to provide useful insight. They must also avoid some perils of model training, including overfitting their data sets (that is, producing models that do not effectively generalize from example data) and accidentally encoding hidden biases into their models.
A machine learning engineer is the jack of all trades. This expert architects the entire process of machine and deep learning. They take AI models developed by data scientists and deep learning engineers and move them into production.
These unicorns are among the most sought-after and highly paid in the industry and companies work hard to make sure they dont get poached. One way to keep them happy is to provide the right accelerated computing resources to help fuel their best work. A machine learning engineer has to understand the end-to-end pipeline, and they want to ensure that pipeline is optimized to deliver great results, fast.
Its not always easily intuitive, as the machine learning engineers must know the apps, understand the downstream data architecture, and key in on system issues that may arise as projects scale. A person in this role must understand all the applications used in the AI pipeline, and usually needs to be skilled in infrastructure optimization, cloud computing, containers, databases and more.
To stay current, AI models need to be reevaluated to avoid whats called model drift as new data impacts the accuracy of the predictions. For this reason, machine learning engineers need to work closely with their data science and deep learning colleagues who will need to reassess models to maintain their accuracy.
A critical specialization for the machine learning engineer is deep learning engineer. This person is a data scientist who is an expert in deep learning techniques. In deep learning, AI models are able to learn and improve their own results through neural networks that imitate how human beings think and learn.
These computer scientists specialize in advanced AI workloads. Their work is part science and part art to develop what happens in the black box of deep learning models. They do less feature engineering and far more math and experimentation. The push for explainable AI (XAI) model interpretability and explainability can be especially challenging in this domain.
Deep learning engineers will need to process large datasets to train their models before they can be used for inference, where they apply what theyve learned to evaluate new information. They use libraries like PyTorch, TensorFlow and MXNet, and need to be able to build neural networks and have strong skills in statistics, calculus and linear algebra.
Given all of the broad expertise in these key roles, its clear that enterprises need a strategy to help them grow their teams success in data science and AI. Many new applications need to be supported, with the right resources in place to help this work get done as quickly as possible to solve business challenges.
Those new to data science and AI often choose to get started with accelerated computing in the cloud, and then move to a hybrid solution to balance the need for speed with operational costs. In-house teams tend to look like an inverted pyramid, with more analysts and data engineers funneling data into actionable tasks for data scientists, up to the machine learning and deep learning engineers.
Your IT paradigm will depend on your industry and its governance, but a great rule of thumb is to ensure your vendors and the skills of your team are well aligned. With a better understanding of the roles of a modern data team, and the resources they need to be successful, youll be well on your way to building an organization that can transform data into business value.
ABOUT THE AUTHOR
By Scott McClellan, Head of Data Science, NVIDIA
Read more here:
The CIO's Guide to Building a Rockstar Data Science and AI Team | eWEEK - eWeek
- What Is Machine Learning? | How It Works, Techniques ... [Last Updated On: September 5th, 2019] [Originally Added On: September 5th, 2019]
- Start Here with Machine Learning [Last Updated On: September 22nd, 2019] [Originally Added On: September 22nd, 2019]
- What is Machine Learning? | Emerj [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Microsoft Azure Machine Learning Studio [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Machine Learning Basics | What Is Machine Learning? | Introduction To Machine Learning | Simplilearn [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- What is Machine Learning? A definition - Expert System [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- Machine Learning | Stanford Online [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- How to Learn Machine Learning, The Self-Starter Way [Last Updated On: October 17th, 2019] [Originally Added On: October 17th, 2019]
- definition - What is machine learning? - Stack Overflow [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Artificial Intelligence vs. Machine Learning vs. Deep ... [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning in R for beginners (article) - DataCamp [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning | Udacity [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning Artificial Intelligence | McAfee [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- AI-based ML algorithms could increase detection of undiagnosed AF - Cardiac Rhythm News [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip - TechCrunch [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Can the planet really afford the exorbitant power demands of machine learning? - The Guardian [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- New InfiniteIO Platform Reduces Latency and Accelerates Performance for Machine Learning, AI and Analytics - Business Wire [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- How to Use Machine Learning to Drive Real Value - eWeek [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Machine Learning As A Service Market to Soar from End-use Industries and Push Revenues in the 2025 - Downey Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Rad AI Raises $4M to Automate Repetitive Tasks for Radiologists Through Machine Learning - - HIT Consultant [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning Improves Performance of the Advanced Light Source - Machine Design [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Synthetic Data: The Diamonds of Machine Learning - TDWI [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The transformation of healthcare with AI and machine learning - ITProPortal [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Workday talks machine learning and the future of human capital management - ZDNet [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning with R, Third Edition - Free Sample Chapters - Neowin [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Verification In The Era Of Autonomous Driving, Artificial Intelligence And Machine Learning - SemiEngineering [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Podcast: How artificial intelligence, machine learning can help us realize the value of all that genetic data we're collecting - Genetic Literacy... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The Real Reason Your School Avoids Machine Learning - The Tech Edvocate [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Siri, Tell Fido To Stop Barking: What's Machine Learning, And What's The Future Of It? - 90.5 WESA [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Microsoft reveals how it caught mutating Monero mining malware with machine learning - The Next Web [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The role of machine learning in IT service management - ITProPortal [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Global Director of Tech Exploration Discusses Artificial Intelligence and Machine Learning at Anheuser-Busch InBev - Seton Hall University News &... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The 10 Hottest AI And Machine Learning Startups Of 2019 - CRN: The Biggest Tech News For Partners And The IT Channel [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Startup jobs of the week: Marketing Communications Specialist, Oracle Architect, Machine Learning Scientist - BetaKit [Last Updated On: November 30th, 2019] [Originally Added On: November 30th, 2019]
- Here's why machine learning is critical to success for banks of the future - Tech Wire Asia [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- 3 questions to ask before investing in machine learning for pop health - Healthcare IT News [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Caterpillar Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Measuring Employee Engagement with A.I. and Machine Learning - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Amazon Wants to Teach You Machine Learning Through Music? - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Nvidia Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- AI and machine learning platforms will start to challenge conventional thinking - CRN.in [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Twitter Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Seagate Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If BlackBerry Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Amazon Releases A New Tool To Improve Machine Learning Processes - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Another free web course to gain machine-learning skills (thanks, Finland), NIST probes 'racist' face-recog and more - The Register [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Kubernetes and containers are the perfect fit for machine learning - JAXenter [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- TinyML as a Service and machine learning at the edge - Ericsson [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- AI and machine learning products - Cloud AI | Google Cloud [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning | Blog | Microsoft Azure [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning in 2019 Was About Balancing Privacy and Progress - ITPro Today [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- CMSWire's Top 10 AI and Machine Learning Articles of 2019 - CMSWire [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- Here's why digital marketing is as lucrative a career as data science and machine learning - Business Insider India [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Dell's Latitude 9510 shakes up corporate laptops with 5G, machine learning, and thin bezels - PCWorld [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core - The Register [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Cloud as the enabler of AI's competitive advantage - Finextra [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Forget Machine Learning, Constraint Solvers are What the Enterprise Needs - - RTInsights [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Informed decisions through machine learning will keep it afloat & going - Sea News [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- The Problem with Hiring Algorithms - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- New Program Supports Machine Learning in the Chemical Sciences and Engineering - Newswise [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- AI-System Flags the Under-Vaccinated in Israel - PrecisionVaccinations [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- New Contest: Train All The Things - Hackaday [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- AFTAs 2019: Best New Technology Introduced Over the Last 12 MonthsAI, Machine Learning and AnalyticsActiveViam - www.waterstechnology.com [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Educate Yourself on Machine Learning at this Las Vegas Event - Small Business Trends [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Seton Hall Announces New Courses in Text Mining and Machine Learning - Seton Hall University News & Events [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Looking at the most significant benefits of machine learning for software testing - The Burn-In [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Leveraging AI and Machine Learning to Advance Interoperability in Healthcare - - HIT Consultant [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Adventures With Artificial Intelligence and Machine Learning - Toolbox [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Five Reasons to Go to Machine Learning Week 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Uncover the Possibilities of AI and Machine Learning With This Bundle - Interesting Engineering [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Learning that Targets Millennial and Generation Z - HR Exchange Network [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Red Hat Survey Shows Hybrid Cloud, AI and Machine Learning are the Focus of Enterprises - Computer Business Review [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Vectorspace AI Datasets are Now Available to Power Machine Learning (ML) and Artificial Intelligence (AI) Systems in Collaboration with Elastic -... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- What is Machine Learning? | Types of Machine Learning ... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Jenkins Creator Launches Startup To Speed Software Testing with Machine Learning -- ADTmag - ADT Magazine [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- An Open Source Alternative to AWS SageMaker - Datanami [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Machine Learning Could Aid Diagnosis of Barrett's Esophagus, Avoid Invasive Testing - Medical Bag [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- OReilly and Formulatedby Unveil the Smart Cities & Mobility Ecosystems Conference - Yahoo Finance [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]