Learn Software

Syllabus Point

Describe types of algorithms associated with ML

Including:

linear regression
logistic regression
K-nearest neighbour

Describe types of algorithms associated with ML

Key Terms

Bias: Not all models start from (0,0), therefore bias is an intercept or offset
Feature: An input variable to a ML model - an example consists of one or more features
Label: The answer or result portion of an example
Example: The values of one row of features and possibly a label
Parameter: The weights and biases that a model learns during training
Weight: A value that a model multiplies by another value

Bias-Variance Trade-off

The bias variance trade off is a concept describing the relationship between a model's accuracy and its complexity.

Overview

Bias: measures how far off a model's predictions are from the true values
Variance: measures how much of those predictions fluctuate with different training data

Balance for Good Performance

Achieving good prediction performance requires balancing of these areas
High bias (overly simple models) leads to underfitting and poor accuracy
High variance (overly complex models) leads to overfitting and poor performance on new data

Linear Regression

A supervised learning algorithm used for predicting continuous values by learning from labelled datasets.

Overview

Assumes there is a linear relationship (output changes at a constant rate as the input changes)
Used for predicted linear relationships (e.g. house prices, marks)
Models the relationship between a dependent variable (also known as the target variable) and one or more independent variables (also known as features or predictors)

Residuals

Residuals = Actual Y vs Predicted Y
Small residuals mean the model is accurate

Advantages

Simple to implement and interpret output coefficients
Good for linear relationships
Each variable's effect on the outcome can be seen

Disadvantages

Outliers have big impact
Assumes linear relationship and independence between attributes
Struggles with complex, nonlinear data

Logistic Regression

Logistic regression is a supervised ML algorithm that predicts a probability by analysing the relationship between one or more independent variables, and classifying data into discrete classes.

Overview

Used in predictive modeling, where the model estimates the mathematical probability of whether an instance belongs to a specific category or not
Produces an s-shaped curve that maps values between 0 and 1 (sigmoid curve)

Use Cases

Probability of heart attacks
Probability of enrolling in university
Identifying spam emails

Advantages

Easier to implement than other methods of ML - training doesn't require high computational power
Works well when the dataset is linearly separable - when straight line separates the two data classes
Provides valuable insights - measures how relevant or appropriate an independent/predictor variable is + reveals direction of their relationship (positive or negative)

Sigmoid Curve/Function

Takes numbers in, and the output is a number between 0 and 1
Can use to predict probability
Boundary/threshold is usually 0.5

K-Nearest Neighbour

KNN (k-nearest neighbour) is a supervised learning algorithm used for classification and regression.

Overview

Function under the idea of "similar things exist in close proximity"
It follows instance-based learning (no model is trained beforehand and all training data is stored)
Non-parametric (makes no assumptions about underlying data distribution)

How It Works

Stores all available examples: instances from the training dataset
Picking K: K is number of points the algorithm looks at to make a decision; Should be odd to avoid ties for classification
Calculating distance: Measure similarity between target and training data points; Calculated between data points in the dataset and target point; Distance metric e.g. Euclidean
Find the neighbours: K data points with the closest distance
Prediction: Classification - new point is assigned the most common class; Regression - value is predicted by taking the average of its K neighbour's values

Use Cases

Healthcare (predict a patient's diagnosis based on past cases)
Finance (categorise a transaction as fraudulent or not)
Recommender systems (based on users with similar tastes)

Advantages

Simple to understand and implement
No training phase
Non parametric (makes no assumptions about data distribution)
Can use with 2+ class labels

Disadvantages

Computationally expensive
Performance decreases as data size increases
Affected by choice of K
Sensitive to irrelevant features
No interpretable internal model

Summary

KNN is a supervised machine learning algorithm used for both classification and regression tasks. It is considered an instance-based learning algorithm, because it doesn't learn a model by training, and instead stores the training data to make predictions based on the similarity between data points. When making a prediction, KNN calculates the distance between a new input and all examples in the training dataset using a distance metric, then identifies the 'K' closest neighbours to make a decision. For classification, it assigns the most frequent class among the K neighbours, and for regression it averages the values of those neighbours. KNN demonstrates how automated decision-making can occur by analysing patterns in historical data to make predictions about new inputs.

Sample Answers

No sample answers added yet.

Related Resources

Keep Progressing

Use the lesson navigation below to move through the module sequence.

Previous: Models Used by Software Engineers to Design and Analyse MLLast content page in topic

Types of Algorithms Associated with ML

Syllabus Point

Including:

Key Terms

Bias-Variance Trade-off

Overview

Balance for Good Performance

Linear Regression

Overview

Residuals

Advantages

Disadvantages

Logistic Regression

Overview

Use Cases

Advantages

Sigmoid Curve/Function

K-Nearest Neighbour

Overview

How It Works

Use Cases

Advantages

Disadvantages

Summary

Related Resources

Keep Progressing