5 strategies for converting Big Data into actionable insights

5 strategies for converting Big Data into actionable insights

The strategy to turn the raw data into actionable insights is to integrate and analyze data from all data sources to reach better and optimized business decisions. The word “big” in big data refers to the huge volume of data involved. Big data technologies aim at storing, analyzing, querying, and updating large chunks of data […]

Data Science x Project Planning

Data Science x Project Planning

A non-technical guide to k-NN algorithm and its application on forecasting, from a project planning point of view, and for beginners. Introduction The intended audience for this short blog post are data science practitioners who seek to implement predictive algorithms in a business-project-based setting, with special focus on presenting the work process flow. We will […]

Using Excel for Data Entry

Using Excel for Data Entry

This article shows you how to enter data so that you can easily open in statistics packages such as R, SAS, SPSS, or jamovi (code or GUI steps below). Excel has some statistical analysis capabilities, but they often provide incorrect answers. For a comprehensive list of these limitations, see http://www.forecastingprinciples.com/paperpdf/McCullough.pdfand http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction. Simple Data Sets Most data sets are easy […]

Dask Release 0.17.2

Dask Release 0.17.2

This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation. I’m pleased to announce the release of Dask version 0.17.2. This is a minor release with new features and stability improvements. This blogpost outlines notable changes since the 0.17.0 release on February 12th. You can conda install Dask: conda install dask […]

The Compression of the Hype Cycle

The Compression of the Hype Cycle

I spend a lot of time thinking about hype cycles, across industries (Big Data/AI, IoT) and ecosystems (New York). Whether you use the Carlota Perez surge cycle (see this great Fred Wilson post) or the Gartner version, hype cycles convey the fundamental idea that technology markets don’t develop linearly, but instead go through phases of boom and bust before they […]

When shuffling large arrays, how much time can be attributed to random number generation?

When shuffling large arrays, how much time can be attributed to r...

It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int[] arr, int i, int j) { int tmp = […]