Matthew Rocklin

Matthew Rocklin

Developer - Continuum Analytics, Inc.

Bio: Matthew is a graduate student in Computer Science at the University of Chicago. I'm particularly interested in the interface of computational techniques and scientific applications. My background is in Physics, Astronomy, and Engineering. Specialties: Scientific data analysis Numerical linear algebra Statistics and Optimization Complex networks Simple forms of distributed computing

Dask Release 0.17.2

Dask Release 0.17.2

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation. I’m pleased to announce the release of Dask version 0.17.2. This is a minor release with new features and stability improvements. This blogpost outlines notable changes since the 0.17.0 release on February 12th. You can conda install Dask: conda install dask […]

Craft Minimal Bug Reports

Craft Minimal Bug Reports

Following up on a post on supporting users in open source this post lists some suggestions on how to ask a maintainer to help you with a problem. You don’t have to follow these suggestions. They are optional. They make it more likely that a project maintainer will spend time helping you. It’s important to remember that […]

Streaming in Python Prototype

This work is supported by Continuum Analytics, and the Data Driven Discovery Initiative from the Moore Foundation. This blogpost is about experimental software. The project may change or be abandoned without warning. You should not depend on anything within this blogpost. This week I built a small streaming library for Python. This was originally an exercise to help me […]

Experiment with Dask and TensorFlow

Experiment with Dask and TensorFlow

This post briefly describes potential interactions between Dask and TensorFlow and then goes through a concrete example using them together for distributed training with a moderately complex architecture. This post was written in haste, see disclaimers below. This work was originally at matthewrocklin.com and is supported by Continuum Analytics and the XDATA Program as part of the […]

Write tests

Write tests

Tests are important for community driven open source software. This post contains brief reasons why you should test your code, particularly if you submit changes to existing open source projects. This work was originally at matthewrocklin.com and is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project Why we don’t test. A […]

Custom Parallel Algorithms on a Cluster with Dask

Custom Parallel Algorithms on a Cluster with Dask

Summary This post describes Dask as a computational task scheduler that fits somewhere on a spectrum between big data computing frameworks like Hadoop/Spark and task schedulers like Airflow/Celery/Luigi. We see how, by combining elements from both of these types of systems Dask is able to handle complex data science problems particularly well. This post is […]

Dask Release 0.13.0

Dask Release 0.13.0

Summary Dask just grew to version 0.13.0. This is a signifcant release for arrays, dataframes, and the distributed scheduler. This blogpost outlines some of the major changes since the last release November 4th. Python 3.6 support Algorithmic and API improvements for DataFrames Dataframe to Array conversions for Machine Learning Parquet support Scheduling Performance and Worker […]

Dask and Celery

Dask and Celery

This post compares two Python distributed task processing systems, Dask.distributed and Celery. Disclaimer: technical comparisons are hard to do well. I am biased towards Dask and ignorant of correct Celery practices. Please keep this in mind. Critical feedback by Celery experts is welcome. Celery is a distributed task queue built in Python and heavily used […]

Dask for Institutions

Dask for Institutions

Introduction Institutions use software differently than individuals. Over the last few months I’ve had dozens of conversations about using Dask within larger organizations like universities, research labs, private companies, and non-profit learning systems. This post provides a very coarse summary of those conversations and extracts common questions. I’ll then try to answer those questions. Note: […]