Open Source Training ,

Dask Workshop

Introduces Dask for scaling data analysis in Python. The workshop begins with an overview of the fundamentals of parallel computing in Python with explorations of technical limitations of NumPy & Pandas. After exploring core Dask data structures, participants will apply Dask arrays & dataframes in practice, using dashboard tools to monitor Dask workflows and measure performance.

View Course details

Quansight

6 hours of instruction

PREREQUISITES

Participants should have prior experience using the Python language and, in particular, using standard Python tools for data analysis (notably NumPy, Pandas, Scikit-Learn, Jupyter). No prior exposure to Dask or to parallel computing is required.

LEARNING OBJECTIVES

Explain relevant parallel computing concepts in the context of data analysis pipelines.
Identify where in a data-processing pipeline parallelism is attainable or difficult.
Identify opportunities for parallel computation in existing Python data workflows.
Develop scalable Dask data pipelines to extend examples using Pandas/NumPy.
Select Dask data structures appropriate to a given compute-intensive scenario.
Construct scalable data analysis pipelines in Python using Dask from scratch.
Apply Dask dashboard tools to monitor performance of data analytics.
Use Dask diagnostic tools to assess and tune performance in applications.
Apply distinct schedulers appropriate to relevant hardware available.
Plan out & execute embarrassingly parallel Dask workflows on remote data.

About Instructor

Quansight

13 Courses

Dask Workshop

About Instructor

Quansight

Committed to your success with open source. OpenTeams is your easy point of access to a range of services from our open source expert network, from commercial open source support to open source training, staffing & recruiting services, and more.

Resources

OpenTeams