2017 Data Science in Review, Topic Modeling

2017 Data Science in...

This blogpost is about topic modeling using data from this blog, opendatascience.com. From this, combined with the most visited articles of the year, we will generate the most popular topics of 2017. Last year, we did something similar with popular articles streamed through twitter using Non-Negative Matrix Factorization to determine topics, article here, example visual below. Feature Image […]

Named Entity Recognition: Milestone Models, Papers and Technologies

Named Entity Recogni...

Named Entity Recognition: Extracting named entities from text Named Entity Recognition (NER), or entity extraction is an NLP technique which locates and classifies the named entities present in the text. Named Entity Recognition classifies the named entities into pre-defined categories such as the names of persons, organizations, locations, quantities, monetary values, specialized terms, product terminology and […]

General Tips for Web Scraping with Python

General Tips for Web...

The great majority of the projects about machine learning or data analysis I write about here on Bigish-Data have an initial step of scraping data from websites. And since I get a bunch of contact emails asking me to give them either the data I’ve scraped myself, or help with getting the code to work for themselves. […]

Understanding Gender Roles in Movies with Text Mining

Understanding Gender...

I have a new visual essay up at The Pudding, using text mining to explore how women are portrayed in film.   In April 2016, we broke down film dialogue by gender. The essay presented an imbalance in which men delivered more lines than women across 2,000 screenplays. But quantity of lines is only part of the story. What characters […]

An example of web scraping with R: Online Food Blogs

An example of web sc...

In this blog post I will discuss web scraping using R. As an example, I will consider scraping data from online food blogs to construct a data set of recipes. This data set contains ingredients, a short description, nutritional information and user ratings. Then, I will provide a simple exploratory analysis which provides some interesting […]

Intro to Data Mining, K-means and Hierarchical Clustering

Intro to Data Mining...

Introduction In this article, I will discuss what is data mining and why we need it?  We will learn a type of data mining called clustering and go over two different types of clustering algorithms called K-means and Hierarchical Clustering and how they solve data mining problems    Table of Contents What is data mining? […]

Scraping CRAN with rvest

Scraping CRAN with r...

I am one of the organizers for a session at userR 2017 this coming July that will focus on discovering and learning about R packages. How do R users find packages that meet their needs? Can we make this process easier? As somebody who is relatively new to the R world compared to many, this […]

The Official Open Data Science March Madness Bracket

The Official Open Da...

Today is the first day of the most exciting event in sports, that’s right I’m talking about the NCAA Basketball Tourney aka “March Madness.” And we here at Open Data Science have totally caught March Madness fever and have decided to try our hand at making a bracket of predictions. Since we’re in the business of data […]

Will My Kiva Loan Get Funded?

Will My Kiva Loan Ge...

Web Scraping Project contributed by Christian Holmes – Data Science Student in the NYC Data Science Academy Bootcamp Kiva Basics Microlending has become increasingly popular in recent years. If you’ve never heard of the concept before, microlending is a method of poverty alleviation implemented in the developing world. Small amounts of capital is loaned to people who would […]