Introduction to Dask
0.1.0
  • Prerequisite
  • Learning Outcomes
  • Modules
  • Python Virtual Environment
  • Dask
    • Task Graphs
    • Dask Arrays
    • Dask DataFrame
    • Dask Delayed
    • Dask Futures
  • Distributed Dask
    • Dask Distributed
  • Dask-ML
    • Dask ML
    • Hyper Parameter Search
    • Parallel Prediction
    • Incremental Learning
    • Distributed Learning
  • Reference
  • Contributers
Introduction to Dask
  • Dask
  • View page source

Dask

In this tutorial, we’ll be using the Gadi HPC machine at NCI. A Python virtual environment will be provided for you during the session.

  • Task Graphs
  • Dask Arrays
    • Relationship to NumPy Arrays
    • When to Use Dask Arrays
    • Comparisson between Numpy and Dask Array
    • Choosing Chunk Sizes
      • How does compute() work?
    • How does visualize() work?
  • Dask DataFrame
    • Dask DataFrame vs. Pandas DataFrame
    • When to Use Dask DataFrame
  • Dask Delayed
    • Key Concepts of Dask Delayed
    • How to Use Dask Delayed?
    • Key Advantages
  • Dask Futures
    • Key Concepts of Dask Futures
    • How to Use Dask Futures
    • Key Methods with Dask Futures

Distributed Dask

  • Dask Distributed
    • Local cluster
    • PBS cluster

Dask-ML

  • Dask ML
    • Dimensions of Scale
      • Challenge 1: Scaling Model Size
      • Challenge 2: Scaling Data Size
    • Dask and Scikit-Learn
  • Hyper Parameter Search
    • Implicit Usage in Dask-ML Operations
  • Parallel Prediction
  • Incremental Learning
  • Distributed Learning

GitHub Repo: https://github.com/NCI900-Training-Organisation/intro-to-dask.git

Previous Next

© Copyright 2025, National Computational Infrastructure.

Built with Sphinx using a theme provided by Read the Docs.