AI: Making an ML Model

DAIM Team

Module 4: Part 1 Learning Objectives

  • Appreciate machine learning vs. AI
  • Understand the high-level steps needed to train a statistical model
  • Rationale for “verifying your model findings” and how to use a test set
  • Understand dataset bias and clinical factors

What is machine learning?

  • TODO Understand what machine learning is at a broad level (LL2)

Big Data, Machine Learning, and Artificial Intelligence

  • Big data → ML/AI
  • The six Vs of big data: volume, variety, velocity, veracity, value, and variability
  • I would like to add vulnerability

Machine Learning vs. Artificial Intelligence

  • There are no set, global definitions for either term!
  • Machine learning often refers to a specific subset of statistical algorithms
    • Artificial neural networks
    • Decision trees
    • Support vector machines
    • And more - it is a loose category

Machine Learning vs. Artificial Intelligence

  • Different types of machine learning:
    • Supervised learning
    • Unsupervised learning
    • Reinforcement Learning
  • Deep Learning refers to neural networks with multiple layers
    • We will be building one in the workshop

Supervised / Unsupervised Learning

  • Supervised learning uses labelled data.
    • For instance, a radiologist has labelled CT head studies with different types of intracranial haemorrhage, and the ML system learns which scans contain which type of haemorrhage.

TODO: find image

Supervised / Unsupervised Learning

  • Unsupervised learning uses unlabelled data
    • For instance, an ML system is given a series of normal chest radiographs and learns how best to compress and uncompress them.
    • This is known as an autoencoder.

TODO: find image

Appreciate

Right??

How to train an ML model

What steps are needed to train a machine learning model?

  • Collecting or identifying a dataset
  • Partitioning a dataset into groups
  • Preprocessing your dataset
  • Designing the architecture of the model
  • Training the model
  • Evaluating the model
  • Deploying the model

Machine Learning Process

The MLOps Lifecycle

  • MLOps = Machine Learning + Operations
  • Key stages include:
    • ETL pipeline: Extract, Transform, and Load data
    • Train the model: Run computations to optimize model performance
    • Deploy the model: Serve it to users in real-time, with continuous deployment

How do you split up a dataset?

  • A dataset is typically split into three parts
    1. Training data (training the model)
    2. Testing data (testing the model after training)
    3. Validation data (testing the model during training)
  • The reasons for this will be discussed over the course of these two seminars.

Bias in a clinical dataset

  • TODO - Understand dataset bias and clincial factors that can confound a dataset (LL4)
  • TODO - Use examples to demonstrate clinical bias (LL4)
  • TODO - Be able to list clinical uses for ML (LL4)

How to preprocess data for an ML model?

Data preprocessing

  • Data preprocessing refers to optimising the data in your dataset for the model that you have chosen to train.
  • Different models have different formats which the data must be in
  • Data is usually passed to the model as a NumPy array
    • You may see these referred to as tensors

What kind of NumPy Array?

  • This will depend on the type of data being used.
  • For images, it is likely to be an image with certain:
    • Dimensions (e.g. 256 x 256)
    • Number of channels (e.g. 1 or 3)
  • We will explore this further in the workshop.

What is data augmentation?

  • Often, datasets are small and machine learning models need a large amount of data to be robust to natural variations.
  • This includes changes in brightness, contrast, rotation, etc.
  • This can be simulated in a dataset by varying these parameters for each image.

Example of data augmentation

Examples of different data augementation techniques.

Going further than this…

  • In imaging, further data augmentation techniques can be applied to simulate anatomical variation
  • For instance, elastic deformation
  • Subtle image distortion to simulate data acquired from a different patient.

TODO find image

Data augmentation

  • There are many more techniques that can be applied to extend a small dataset
  • There are more techniques for other types of data (e.g. time series data).

Break!

The workshop task

  • In the workshop, we will be building a machine learning system to predict pneumonia on chest XR.
  • This will demonstrate the process of training a model on a dataset.
  • The technique is a cornerstone of building neural networks that interpret images
    • It is not cutting edge, but will demonstrate first principles.

Machine Learning and Philosophy of Learning

  • Learning? Why do we learn?
  • Difference between learning and memorizing
  • What does it mean mathematically?
  • What does training mean for you and for a machine learning model?

Learning vs. Memorizing

Using an example of adding and the symbol +:

  • You learn to add and use the symbol +

  • This allows you to generalize rather than memorizing specific sums, like 1002 + 2003 = 3005

  • You train by practicing simple exercises like 1 + 1 = 2

  • Your teacher corrects if you get 1 + 1 = 3

  • You are then examined on new material to ensure you can generalize and apply your learnings

  • Similarly: You use a training dataset to train the model so it can learn and apply in a general context.

What does learning mean for a computer?

\[ Y = f(x_1, x_2, x_3, ...) \]

  • Find the set of variables (x1, etc.) and that minimises the outcome measure (Y) of a function (f(x))
    • Variables - the weights, biases, and other parameters of your network
    • f(x) - the loss function, which tells the model how incorrect it is by calculating the loss (Y)

What does learning mean for a computer?

  • How do we know how to change the variables?
    • Optimizer = acts like a teacher, with the goal to minimize the loss function

How does the optimiser make the network “learn”?

  • A common analogy is one of a ball rolling down a hill.
  • The height of the hill (z) represents the loss (error), and the two other directions (x1, x2) are model parameters

How does the optimiser make the network “learn”?

  • The “ball” (the model) rolls down the hill to find the value of the two values that minimises the loss.
  • This is called gradient descent.

How does the optimiser make the network “learn”?

  • The animation demonstrates the process for 2 parameters (x1, x2) in 2D.
  • Machine learning models have 100,000s of parameters.
  • It’s difficult to visualise a ball rolling around in a million dimensional space!

Break!

How do we create an ML model?

Creating an ML model

  • Many ML algorithms are designed and trained with frameworks within Python
  • Frameworks are Python packages that allow for simplified use of high-level network building blocks.
    • This saves the user from dealing with low-level details.
    • This is called abstraction.
  • Popular frameworks include TensorFlow and PyTorch.

What are these “building blocks”?

  • Various layers are used to build a neural network.
  • A layer is a set of neurons with inputs/outputs and parameters which can be learned.
  • For simple networks, layers are connected to each other sequentially.
  • The presence of multiple layers is where the term deep learning originates.

What types of layer are there?

  • Dense layer
    • This is the most common layer.
    • Consists of neurons that are connected to every neuron in the preceding and subsequent layer (hence Dense)
  • Therefore they take up lots of memory and computing power!
    • Think about how the number of connections grows between neurons as the number increases.

Dense layers

TODO image of a Dense layer

Convolutional layers

  • Convolutional layers involve convolving a kernel with the input image
    • Look back to module 1 for a refresher.
  • However, the parameters of the kernel are learned
  • Multiple different kernels are used in the layer to evaluate different aspects
  • As convolution uses the same kernel for the whole image, they have fewer parameters than Dense layers.

Convolutional layers

TODO image of a convolutional layer

Pooling layers

  • Pooling layers reduce the size of the image through the network
  • This is often done by a factor of 2 e.g. ((256, 256, 3) to (128, 128, 3))
  • This allows for the model to “shed” less important, fine detail information

Why do we need to reduce the image size throughout the network?

  • When images are connected to Dense layers, they are flattened.
    • e.g. (256, 256) to (65536)
  • Even small images will result in huge networks, as all neurons between Dense layers are connected
    • (256, 256) to a Dense layer with 256 neurons: 16.8 million parameters!

What are the inputs to the network?

  • The input to neural networks which process images will be an image.
    • The image will have a fixed resolution and number of channels
    • e.g. ((128, 128, 3))

What are the outputs?

  • The output will vary depending on the type of network:
    • Probabilities that certain features are present (e.g. pneumonia)
    • A processed image, showing the locations of objects/features
      • This is called semantic segmentation
    • It can even be the same image!
      • This is an autoencoder which is used for compression.
  • The architecture will affect the desired output.

How to train an ML model?

Batches of data

  • Data is fed to the network in batches
    • A batch is a collection of datapoints e.g. 16 images
  • The overall direction to change the network parameters is established after asking the network to predict each batch item
  • This gradually minimises the loss and therefore the error of the network

Epochs of training

  • After a certain number of batches are fed to the network, an epoch of training has been completed.
  • The network is then usually evaluated on the validation dataset.
  • Repeat… until a set number of epochs has been completed

The process of training

TODO image of loss against epoch index showing the learning process.

What is a hyperparameter?

  • Hyperparameters determine how your network learns and the network structure.
  • Any parameter that cannot be modified by the network optimiser during training can be seen as a hyperparameter
  • Tuning hyperparameters is important for extracting the best performance from your network.

What is a hyperparameter?

  • Examples include:
    • The number of neurons in each layer
    • The number of kernels in a convolutional layer
    • The learning rate of the network

Conclusion

  • We have covered a lot of information in this lecture.
  • We will recap nearly all of it during the workshop.

Further resources

Thank you!

Any questions?