Has media reached a reality/complexity tipping point?

Has media reached a ...

A landmark paper appeared last December in the National Science Review (summary).  It describes the complex interdependencies between climate, consumption, population, demographics, inequality, economic growth, migration, and more.  Written by an interdisciplinary team of 20 authors hailing from organizations worldwide including NASA, Johns Hopkins, and more, the paper explains that it is impossible to understand these systems […]

Building an Interactive Web “mapp” with Shiny

Building an Interact...

The purpose of this post is to discuss the key elements in developing an interactive web application that displays data with geographic component. I discuss developing an app using Shiny – a powerful R package. I briefly compare that process to building a similar product in Tableau. Rather than going line-by-line through code, I highlight […]

EXPLORATORY ANALYSIS – WHEN TO CHOOSE R, PYTHON, TABLEAU OR A COMBINATION

EXPLORATORY ANALYSIS...

Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While tools like Tableau make data visualization […]

Data Visualization – Part 3

Data Visualization ...

What Type of Data Visualization Do You Choose (if any)? Determining whether or not you need a visualization is step one. While it seems silly, this is probably something everyone (including myself) should be doing more often. A lot of times, it seems like a great way to showcase the amount of work you have been […]

Plotting author statistics for Git repos using Git of Theseus

Plotting author stat...

I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I’ve written […]

What makes a data visualization successful?

What makes a data vi...

Data visualizations can have very different goals and functions depending on the area of application. Success must therefore be measured against different quality criteria, depending on the task.   When used in data analysis, success means, that a data scientist can identify the structures and patterns that she needs for the next step. In reporting, […]

Word2Vec – the world of word vectors

Word2Vec – the...

Have you ever wondered how a chatbot can learn about the meaning of words in a text? Does this sound interesting? Well, in this blog we will describe a very powerful method, Word2Vec, that maps words to numbers (vectors) in order to easily capture and distinguish their meaning. We will briefly describe how Word2Vec works without going into many technical details. And although it was […]

Data Visualization – Part 2

Data Visualization &...

A Quick Overview of the ggplot2 Package in R While it will be important to focus on theory, I want to explain the ggplot2 package because I will be using it throughout the rest of this series. Knowing how it works will keep the focus on the results rather than the code. It’s an incredibly […]

The retreat from religion is accelerating

The retreat from rel...

This is an extended version of my article in the Scientific American blog. The data I used and all of my code are available in this Jupyter notebook. Secularization in the Unites States For more than a century religion in the the United States has defied gravity.  According to the Theory of Secularization, as societies become more modern, […]