This article is excerpted from the course "Fundamental Machine Learning," part of the Machine Learning Specialist certification program from Arcitura Education. It is the ninth part of the 13-part series, "Using machine learning algorithms, practices and patterns."
This article explores the numerical prediction and category prediction supervised learning techniques. These machine learning techniques are applied when the target whose value needs to be predicted is known in advance and some sample data is available to train a model. As explained in Part 4, these techniques are documented in a standard pattern profile format.
A data set may contain a number of historical observations (rows) amassed over a period of time where the target value is numerical in nature and is known for those observations. An example is the number of ice creams sold and the temperature readings, where the number of ice creams sold is the target variable. To obtain value from this data, a business use case might require a prediction of how much ice cream will be sold if the temperature reading is known in advance from the weather forecast. As the target is numerical in nature, supervised learning techniques that work with categorical targets cannot be applied (Figure 1).
The historical data is capitalized upon by first finding independent variables that influence the target dependent variable and then quantifying this influence in a mathematical equation. Once the mathematical equation is complete, the value of the target variable is predicted by inputting the values of the independent values.
The data set is first scanned to find the best independent variables by applying the associativity computation pattern to find the relationship between the independent variables and the dependent variable. Only the independent variables that are highly correlated with the dependent variable are kept. Next, linear regression is applied.
Linear regression, also known as least squares regression, is a statistical technique for predicting the values of a continuous dependent variable based on the values of an independent variable. The dependent and independent variables are also known as response and explanatory variables, respectively. As a mathematical relationship between the response variable and the explanatory variables, linear regression assumes that a linear correlation exists between the response and explanatory variables. A linear correlation between response and explanatory variables is represented through the line of best fit, also called a regression line. This is a straight line that passes as closely as possible through all points on the scatter plot (Figure 2).
Linear regression model development starts by expressing the linear relationship. Once the mathematical form has been established, the next step is to estimate the parameters of the model via model fitting. This determines the line of best fit achieved via least squares estimation that aims to reduce the sum of squared error (SSE). The last stage is to evaluate the model either using R squared or mean squared error (MSE).
MSE is a measure that determines how close the line of best fit is to the actual values of the response variable. Being a straight line, the regression line cannot pass through each point; it is an approximation of the actual value of the response variable based on estimated values. The distance between the actual and the estimated value of response variable is the error of estimation. For the best possible estimate of the response variable, the errors between all points, as represented by the sum of squared error, must be minimized. The line of best fit is the line that results in the minimum possible sum of squares errors. In other words, MSE identifies the variation between the actual value and the estimated value of the response variable as provided by the regression line (Figure 3).
The coefficient of determination, called R squared, is the percentage of variation in the response variable that is predicted or explained by the explanatory variable, with values that vary between 0 and 1. A value equal to 0 means that the response variable cannot be predicted from the explanatory variable, while a value equal to 1 means the response variable can be predicted without any errors. A value between 0 and 1 provides the percentage of successful prediction.
In regression, more than two explanatory variables can be used simultaneously for predicting the response variable, in which case it is called multiple linear regression.
The numerical prediction pattern can benefit from the application of the graphical summaries computation pattern by drawing a scatter plot to graphically validate if a linear relationship exists between the response and explanatory variables (Figure 4).
There are cases where a business problem involves predicting a category -- such as whether a customer will default on their loan or whether an image is a cat or a dog -- based on historical examples of defaulters and cats and dogs, respectively. In this case, the categories (default/not default and cat/dog) are known in advance. However, as the target class is categorical in nature, numerical predictive algorithms cannot be applied to train and predict a model for classification purposes (Figure 5).
Supervised machine learning techniques are applied by selecting a problem-specific machine learning algorithm and developing a classification model. This involves first using the known example data to train a model. The model is then fed new unseen data to find out the most appropriate category to which the new data instance belongs.
Different machine learning algorithms exist for developing classification models. For example, naive Bayes is probabilistic while K-nearest neighbors (KNN), support vector machine (SVM), logistic regression and decision trees are deterministic in nature. Generally, in the case of a binary problem -- cat or dog -- logistic regression is applied. If the feature space is n-dimensional (a large number of features) with complex interactions between the features, KNN is applied. Naive Bayes is applied when there is not enough training data or fast predictions are required, while decision trees are a good choice when the model needs to be explainable.
Logistic regression is based on linear regression and is also considered a class probability estimation technique, since its objective is to estimate the probability of an instance belonging to a particular class.
KNN, also known as lazy learning and instance-based learning, is a black-box classification technique where instances are classified based on their similarity, with a user-defined (K) number of examples (nearest neighbors). No model is explicitly generated. Instead, the examples are stored as-is and an instance is classified by first finding the closest K examples in terms of distance, then assigning the class based on the class of the majority of the closest examples (Figure 6).
Naive Bayes is a probability-based classification technique that predicts class membership based on the previously observed probability of all potential features. This technique is used when a combination of a number of features, called evidence, affects the determination of the target class. Due to this characteristic, naive Bayes can take into account features that may be insignificant when considered on their own but when considered accumulatively can significantly impact the probability of an instance belonging to a certain class.
All features are assumed to carry equal significance, and the value of one feature is not dependent on the value of any other feature. In other words, the features are independent. It serves as a baseline classifier for comparing more complex algorithms and can also be used for incremental learning, where the model is updated based on new example data without the need for regenerating the whole model from scratch.
A decision tree is a classification algorithm that represents a concept in the form of a hierarchical set of logical decisions with a tree-like structure that is used to determine the target value of an instance. [See discussion of decision trees in part 2 of this series.] Logical decisions are made by performing tests on the feature values of the instances in such a way that each test further filters the instance until its target value or class membership is known. A decision tree resembles a flowchart consisting of decision nodes, which perform a test on the feature value of an instance, and leaf nodes, also known as terminal nodes, where the target value of the instance is determined as a result of traversal through the decision nodes.
The category prediction pattern normally requires the application of a few other patterns. In the case of logistic regression and KNN, applying the feature encoding pattern ensures that all features are numerical as these two algorithms only work with numerical features. The application of the feature standardization pattern in the case of KNN ensures that none of the large magnitude features overshadow smaller magnitude features in the context of distance measurement. Naive Bayes requires the application of the feature discretization pattern as naive Bayes only works with nominal features. KNN can also benefit from the application of feature discretization pattern via a reduction in feature dimensionality, which contributes to faster execution and increased generalizability of the model.
The next article covers the category discovery and pattern discovery unsupervised learning patterns.
Read the original post:
2 supervised learning techniques that aid value predictions - TechTarget
- What Is Machine Learning? | How It Works, Techniques ... [Last Updated On: September 5th, 2019] [Originally Added On: September 5th, 2019]
- Start Here with Machine Learning [Last Updated On: September 22nd, 2019] [Originally Added On: September 22nd, 2019]
- What is Machine Learning? | Emerj [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Microsoft Azure Machine Learning Studio [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- Machine Learning Basics | What Is Machine Learning? | Introduction To Machine Learning | Simplilearn [Last Updated On: October 1st, 2019] [Originally Added On: October 1st, 2019]
- What is Machine Learning? A definition - Expert System [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- Machine Learning | Stanford Online [Last Updated On: October 2nd, 2019] [Originally Added On: October 2nd, 2019]
- How to Learn Machine Learning, The Self-Starter Way [Last Updated On: October 17th, 2019] [Originally Added On: October 17th, 2019]
- definition - What is machine learning? - Stack Overflow [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Artificial Intelligence vs. Machine Learning vs. Deep ... [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning in R for beginners (article) - DataCamp [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning | Udacity [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning Artificial Intelligence | McAfee [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- Machine Learning [Last Updated On: November 3rd, 2019] [Originally Added On: November 3rd, 2019]
- AI-based ML algorithms could increase detection of undiagnosed AF - Cardiac Rhythm News [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- The Cerebras CS-1 computes deep learning AI problems by being bigger, bigger, and bigger than any other chip - TechCrunch [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Can the planet really afford the exorbitant power demands of machine learning? - The Guardian [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- New InfiniteIO Platform Reduces Latency and Accelerates Performance for Machine Learning, AI and Analytics - Business Wire [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- How to Use Machine Learning to Drive Real Value - eWeek [Last Updated On: November 19th, 2019] [Originally Added On: November 19th, 2019]
- Machine Learning As A Service Market to Soar from End-use Industries and Push Revenues in the 2025 - Downey Magazine [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Rad AI Raises $4M to Automate Repetitive Tasks for Radiologists Through Machine Learning - - HIT Consultant [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning Improves Performance of the Advanced Light Source - Machine Design [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Synthetic Data: The Diamonds of Machine Learning - TDWI [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- The transformation of healthcare with AI and machine learning - ITProPortal [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Workday talks machine learning and the future of human capital management - ZDNet [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Machine Learning with R, Third Edition - Free Sample Chapters - Neowin [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Verification In The Era Of Autonomous Driving, Artificial Intelligence And Machine Learning - SemiEngineering [Last Updated On: November 26th, 2019] [Originally Added On: November 26th, 2019]
- Podcast: How artificial intelligence, machine learning can help us realize the value of all that genetic data we're collecting - Genetic Literacy... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The Real Reason Your School Avoids Machine Learning - The Tech Edvocate [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Siri, Tell Fido To Stop Barking: What's Machine Learning, And What's The Future Of It? - 90.5 WESA [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Microsoft reveals how it caught mutating Monero mining malware with machine learning - The Next Web [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The role of machine learning in IT service management - ITProPortal [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Global Director of Tech Exploration Discusses Artificial Intelligence and Machine Learning at Anheuser-Busch InBev - Seton Hall University News &... [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- The 10 Hottest AI And Machine Learning Startups Of 2019 - CRN: The Biggest Tech News For Partners And The IT Channel [Last Updated On: November 28th, 2019] [Originally Added On: November 28th, 2019]
- Startup jobs of the week: Marketing Communications Specialist, Oracle Architect, Machine Learning Scientist - BetaKit [Last Updated On: November 30th, 2019] [Originally Added On: November 30th, 2019]
- Here's why machine learning is critical to success for banks of the future - Tech Wire Asia [Last Updated On: December 2nd, 2019] [Originally Added On: December 2nd, 2019]
- 3 questions to ask before investing in machine learning for pop health - Healthcare IT News [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Caterpillar Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Measuring Employee Engagement with A.I. and Machine Learning - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Amazon Wants to Teach You Machine Learning Through Music? - Dice Insights [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- Machine Learning Answers: If Nvidia Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 8th, 2019] [Originally Added On: December 8th, 2019]
- AI and machine learning platforms will start to challenge conventional thinking - CRN.in [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Twitter Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If Seagate Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning Answers: If BlackBerry Stock Drops 10% A Week, Whats The Chance Itll Recoup Its Losses In A Month? - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Amazon Releases A New Tool To Improve Machine Learning Processes - Forbes [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Another free web course to gain machine-learning skills (thanks, Finland), NIST probes 'racist' face-recog and more - The Register [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Kubernetes and containers are the perfect fit for machine learning - JAXenter [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- TinyML as a Service and machine learning at the edge - Ericsson [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- AI and machine learning products - Cloud AI | Google Cloud [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning | Blog | Microsoft Azure [Last Updated On: December 23rd, 2019] [Originally Added On: December 23rd, 2019]
- Machine Learning in 2019 Was About Balancing Privacy and Progress - ITPro Today [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- CMSWire's Top 10 AI and Machine Learning Articles of 2019 - CMSWire [Last Updated On: December 25th, 2019] [Originally Added On: December 25th, 2019]
- Here's why digital marketing is as lucrative a career as data science and machine learning - Business Insider India [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Dell's Latitude 9510 shakes up corporate laptops with 5G, machine learning, and thin bezels - PCWorld [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core - The Register [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Cloud as the enabler of AI's competitive advantage - Finextra [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Forget Machine Learning, Constraint Solvers are What the Enterprise Needs - - RTInsights [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- Informed decisions through machine learning will keep it afloat & going - Sea News [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- The Problem with Hiring Algorithms - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- New Program Supports Machine Learning in the Chemical Sciences and Engineering - Newswise [Last Updated On: January 13th, 2020] [Originally Added On: January 13th, 2020]
- AI-System Flags the Under-Vaccinated in Israel - PrecisionVaccinations [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- New Contest: Train All The Things - Hackaday [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- AFTAs 2019: Best New Technology Introduced Over the Last 12 MonthsAI, Machine Learning and AnalyticsActiveViam - www.waterstechnology.com [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Educate Yourself on Machine Learning at this Las Vegas Event - Small Business Trends [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Seton Hall Announces New Courses in Text Mining and Machine Learning - Seton Hall University News & Events [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Looking at the most significant benefits of machine learning for software testing - The Burn-In [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Leveraging AI and Machine Learning to Advance Interoperability in Healthcare - - HIT Consultant [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Adventures With Artificial Intelligence and Machine Learning - Toolbox [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Five Reasons to Go to Machine Learning Week 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Uncover the Possibilities of AI and Machine Learning With This Bundle - Interesting Engineering [Last Updated On: January 22nd, 2020] [Originally Added On: January 22nd, 2020]
- Learning that Targets Millennial and Generation Z - HR Exchange Network [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Red Hat Survey Shows Hybrid Cloud, AI and Machine Learning are the Focus of Enterprises - Computer Business Review [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- Vectorspace AI Datasets are Now Available to Power Machine Learning (ML) and Artificial Intelligence (AI) Systems in Collaboration with Elastic -... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- What is Machine Learning? | Types of Machine Learning ... [Last Updated On: January 23rd, 2020] [Originally Added On: January 23rd, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Jenkins Creator Launches Startup To Speed Software Testing with Machine Learning -- ADTmag - ADT Magazine [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- An Open Source Alternative to AWS SageMaker - Datanami [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- Machine Learning Could Aid Diagnosis of Barrett's Esophagus, Avoid Invasive Testing - Medical Bag [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]
- OReilly and Formulatedby Unveil the Smart Cities & Mobility Ecosystems Conference - Yahoo Finance [Last Updated On: January 30th, 2020] [Originally Added On: January 30th, 2020]