Parallel Prediction ---------------------- .. admonition:: Overview :class: Overview * **Tutorial:** 20 Minutes **Objectives:** - Learn how to do parallel prediction in Dask. `ParallelPostFit` in Dask-ML wraps around a scikit-learn model, enabling parallel prediction and transformation operations. While the model's training step remains on a single machine, the predict and transform methods are executed in parallel using Dask, which is especially useful for large datasets that exceed a single machine's memory. Here we a first generate a small training data .. code-block:: python :linenos: from sklearn.ensemble import GradientBoostingClassifier import sklearn.datasets import dask_ml.datasets from dask_ml.wrappers import ParallelPostFit X, y = sklearn.datasets.make_classification(n_samples=1000, random_state=0) and then wrap the `GradientBoostingClassifier` model using `ParallelPostFit`. .. code-block:: python :linenos: clf = ParallelPostFit(estimator=GradientBoostingClassifier()) clf.fit(X, y) The training occurs on a single node, while the prediction is distributed across the cluster, returning a Dask array. .. code-block:: python :linenos: X_big, _ = dask_ml.datasets.make_classification(n_samples=100000, chunks=10000, random_state=0) clf.predict(X_big) .. admonition:: Key Points :class: hint - `ParallelPostFit` can be used to parallelize prediction across a cluster.