Requests for Research

Requests for Researc...

Table of contents: Task-independent data augmentation for NLP Few-shot learning for NLP Transfer learning for NLP Multi-task learning Cross-lingual learning Task-independent architecture improvements It can be hard to find compelling topics to work on and know what questions are interesting to ask when you are just starting as a researcher in a new field. Machine […]

How To Create Data Products That Are Magical Using Sequence-to-Sequence Models

How To Create Data P...

A tutorial on how to summarize text and generate features from Github Issues using deep learning with Keras and TensorFlow. Teaser: Training a model to summarize Github Issues Predictions are in rectangular boxes. The above results are randomly selected elements of a holdout set. Keep reading below, there will be a link to many more examples! Motivation: I never imagined I […]

Named Entity Recognition: Milestone Models, Papers and Technologies

Named Entity Recogni...

Named Entity Recognition: Extracting named entities from text Named Entity Recognition (NER), or entity extraction is an NLP technique which locates and classifies the named entities present in the text. Named Entity Recognition classifies the named entities into pre-defined categories such as the names of persons, organizations, locations, quantities, monetary values, specialized terms, product terminology and […]

Word2Vec – the world of word vectors

Word2Vec – the...

Have you ever wondered how a chatbot can learn about the meaning of words in a text? Does this sound interesting? Well, in this blog we will describe a very powerful method, Word2Vec, that maps words to numbers (vectors) in order to easily capture and distinguish their meaning. We will briefly describe how Word2Vec works without going into many technical details. And although it was […]

Introduction to NLP with NLTK – Part 1

Introduction to NLP ...

Introduction: The idea of using a structured programming language to interact with computers is being challenged by Natural Language Processing (NLP) and Natural Language Understanding methods. NLP holds great promises of making computer interfaces accessible to a wide range of audiences – as humans would be able to talk to computers in their own native […]

Processing the Language of Pitchfork Part 2: Word Count

Processing the Langu...

In the second part of this three-part ODSC series on analyzing Pitchfork album reviews, we’ll introduce the Natural Language Toolkit library to discover patterns, trends, and other interesting things hidden in the words of album reviews. For this article I found the most commonly used words and adjectives/adverbs in my collection of 17,000 reviews. I also […]

Processing The Language of Pitchfork Part 1

Processing The Langu...

Pitchfork.com is the web’s premier site for music criticism and news. Their album reviews are famous for their overt detail, astute prose, and cutting wit. They are often credited for the popularity of indie music in the 00s and 10s and for “breaking” bands such as Animal Collective, Bon Iver, and Grizzly Bear. A good […]

The Sentiment Behind The Declaration of Independence

The Sentiment Behind...

The American political season often conjures numerous references to the country’s origins from either side of the aisle. What better way to join in than by looking at the country’s birth using Data Science, the field that will dictate much of its future. I’ll do this by leveraging a subset of Natural Language Processing (NLP) […]

Naive Bayes and Spam Detection

Naive Bayes and Spam...

In natural language processing, text classification techniques are used to assign a class to a given text.  For example, in spam detection, the classifiers decides an email belongs to a spam or non spam (ham) class. Deciding what the topic of a news article is, or whether a movie review is positive or negative, Authorship […]