Introduction to Dask
0.1.0
  • Prerequisite
  • Learning Outcomes
  • Modules
  • Python Virtual Environment
  • Dask
  • Distributed Dask
  • Dask-ML
  • Reference
  • Contributers
Introduction to Dask
  • Introduction to Dask
  • View page source

Introduction to Dask

This repository provides a comprehensive guide to understanding and utilizing Dask, a parallel computing library for Python that is designed to scale data analysis and computation workflows from a single machine to a cluster.

Note

This project is under active development.

Contents

Topic Covered

Topics

Duration

Dask

1.5 Hours

Dask Distributed

1.5 Hours

Dask-ML

3 Hours

  • Prerequisite
  • Learning Outcomes
  • Modules
  • Python Virtual Environment
  • Dask
    • Task Graphs
    • Dask Arrays
      • Relationship to NumPy Arrays
      • When to Use Dask Arrays
      • Comparisson between Numpy and Dask Array
      • Choosing Chunk Sizes
        • How does compute() work?
      • How does visualize() work?
    • Dask DataFrame
      • Dask DataFrame vs. Pandas DataFrame
      • When to Use Dask DataFrame
    • Dask Delayed
      • Key Concepts of Dask Delayed
      • How to Use Dask Delayed?
      • Key Advantages
    • Dask Futures
      • Key Concepts of Dask Futures
      • How to Use Dask Futures
      • Key Methods with Dask Futures
  • Distributed Dask
    • Dask Distributed
      • Local cluster
      • PBS cluster
  • Dask-ML
    • Dask ML
      • Dimensions of Scale
        • Challenge 1: Scaling Model Size
        • Challenge 2: Scaling Data Size
      • Dask and Scikit-Learn
    • Hyper Parameter Search
      • Implicit Usage in Dask-ML Operations
    • Parallel Prediction
    • Incremental Learning
    • Distributed Learning
  • Reference
  • Contributers
Next

© Copyright 2025, National Computational Infrastructure.

Built with Sphinx using a theme provided by Read the Docs.