How a Start in Reality Television Showed Me Success in Data Science

How a Start in Reali...

Fun trivia fact: I have a background in reality television. No, I wasn’t on camera, but I was one of the guys behind the scenes making sure all the magic happened. Somehow, now I am immersed in the data science and technology space. I’ve had people ask me, “How on earth did that happen? Do […]

For Machine Learning Beginners: A Source for Core Concepts

For Machine Learning...

To solve machine learning problems, there is a wide range of different techniques and methods required, some suited better than others. As a data scientist it can be difficult to encapsulate all of them, and choose which work best for specific scenarios. If one is starting out in this space, it suits to understand the […]

Google Makes AI Education Accessible and Free for Everyone

Google Makes AI Educ...

When people Google “resources to learn AI” or a variation of those keywords to educate themselves about the major risings in artificial intelligence, machine learning and related fields, countless blogs, courses and bootcamps appear in the search feed. Paralysis by analysis ensues. What resource do you choose? What is the cost? What is the time […]

R Tip: Introduce Indices to Avoid for() Class Loss Issues

R Tip: Introduce Ind...

Here is an R tip. Use loop indices to avoid for()-loops damaging classes. Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()-loop. d <- c(Sys.time(), Sys.time()) print(d) #> [1] "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST" for(di in d) { print(di) } #> [1] 1518977777 #> [1] […]

McKinsey’s Succinct AI Guide Helps Executives Quickly Understand

McKinsey’s Suc...

Day in and day out, news headlines, blog articles, Youtube channels and more feature the different ways in which technology regularly impacts various aspects of business. A keyword through all of the talk and noise is “AI.” The term, “artificial intelligence” might just be the buzzword of 2018. As the AI race continues and grows […]

Principal Component Analysis Tutorial

Principal Component ...

The Problem Imagine that you are a nutritionist trying to explore the nutritional content of food. What is the best way to differentiate food items? By vitamin content? Protein levels? Or perhaps a combination of both? Knowing the variables that best differentiate your items has several uses: 1. Visualization. Using the right variables to plot […]

Implementing a Principal Component Analysis (PCA) in Python, step by step

Implementing a Princ...

Sections Sections Introduction Principal Component Analysis (PCA) Vs. Multiple Discriminant Analysis (MDA) What is a “good” subspace? Summarizing the PCA approach Generating some 3-dimensional sample data Why are we chosing a 3-dimensional sample? 1. Taking the whole dataset ignoring the class labels 2. Computing the d-dimensional mean vector 3. a) Computing the Scatter Matrix 3. […]

Installing Jupyter with the PySpark and R kernels for Spark development

Installing Jupyter w...

This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs for following this tutorial is to have a Hadoop/Spark cluster deployed and the relevant services up and running (e.g. HDFS, YARN, Hive, Spark etc.). In this tutorial I am using IBM’s Hadoop […]

Scikit-learn Tutorial: Statistical-Learning for Scientific Data Processing

Scikit-learn Tutoria...

Zip file for off-line browsing: https://github.com/GaelVaroquaux/scikit-learn-tutorial/zipball/gh-pages Statistical learning Machine learning is a technique with a growing importance, as the size of the datasets experimental sciences are facing is rapidly growing. Problems it tackles range from building a prediction function linking different observations, to classifying observations, or learning the structure in an unlabeled dataset. This tutorial […]