DAIM Seminar 1 Introduction

Let’s Do Digital Team

Introduction

  • Welcome to the first seminar of the Data Analytics in Medicine (DAIM) - Images course!
  • Today, we will be discussing:
    • The clinical benefits of learning to work with images
    • Programming resources that you can use throughout the course

Why programming in medicine?

  • The global market for AI in medicine has grown from
    • $0.2 billion in 2015 to $13.7 billion in 2023
  • “The usefulness of human–AI collaboration will likely depend on the specifics of the task and the clinical context.” - “AI in health and medicine”, in Nature
  • Clinicians will be, and should be, leading the deployment of these technologies

Why programming in medicine?

  • There is risk of harm to patients if technology is deployed incorrectly or inappropropriately
    • “…almost half recorded instances of potential patient harm linked to their systems.”
  • Involving doctors in development helps idea sharing and helps us advocate for patients.

Why programming in medicine?

  • Most translational bioengineering or informatics research will use some degree of programming in Python, MatLab, or R
  • This skills allows for some degree of work across clinical settings and academia

5-minute open discussion

  • What is the top reason that doctors should learn to program?
  • What are the disadvantages of programming for doctors?

Break!

Python Environments

  • We will be using Google Colab for the workshops of DAIM - Images.
  • Google Colab uses Jupyter Notebooks to run Python code
    • Easy to set up
    • Easy to use
    • Powerful
  • Many other biomedical coding resources use this environment, e.g. Kaggle, AlphaFold

Google Colab Window

TODO Use marked-up screenshot of first workshop

Using Google Colab

  • A notebook consists of cells
    • Code cells
    • Markdown cells
  • When you run the first line of code, the kernel starts
  • Variables persist between cells:

Trying out Google Colab

  • We would recommend testing out Google Colab before the first workshop.
  • The default notebook from Google can be found here.

Resources to use

  • There are multiple resources that programmers often use when trying to solve a particular problem
    • Code documentation - reliable and accurate
    • StackOverflow - real-world solutions with explanations
    • Github - Talk with the developers

Code documentation

  • Documentation contains information about how to use the code in a library.
  • It should be the first place you go if you don’t understand how to use a function/class.
  • Contains information about arguments and use cases.
  • Search for the function (e.g. numpy.shape()) followed by “docs”.

Reading Documentation

The documentation for NumPy’s ‘shape’ method, a very commonly used NumPy method

StackOverflow

  • StackOverflow is a website created in 2008 to allow programmers to share programming issues.
  • There are other websites (Stack Exchanges) for other fields that have been created in the same format.

A StackOverflow Window

A StackOverflow question about the yield keyword in Python.

Github

  • A site where code is developed and hosted
  • Built around Git repositories
    • Git is a useful tool to know about
  • Many answers can be found there by:
    • Looking at source code
    • Github issues

Github Code File

A GitHub code file for a simple CT segmentation algorithm.

Github Issues

A GitHub issue on the NumPy codebase.

Break and Questions!

Clinical Considerations

  • Some data is easier for computers to process than others.
  • How can we make clinical data easier to work with?

Clinical Considerations

  • Standardised data formats
    • This allows for interoperatibility between systems and facilitates research
    • DICOM is a successful example of this.
    • We will be discussing DICOM in Module 2 of this course.

Clinical Considerations

  • Structured data
    • Labelling data consistently
    • Human-labelled data is often variable
    • Use an encoding in your data collection protocol to make processing easier (e.g. “CXR” = 1)
scan = "CXR"
# "chest X-ray", "chestXR", "cxr", "chest film", etc.

5-minute open discussion

  • Can you think of other considerations that can make clinical data easier to process?
  • Have you ever encountered any issues with this in your practice?

Thank you!

Any questions?