dplyr

Converting nested JSON to a tidy data frame with R

In this “how-to” post, I want to detail an approach that others may find useful for converting nested (nasty!) json to a tidy (nice!) data.frame/tibble that is should be much easier to work with. 1 For this demonstration, I’ll start out by scraping National Football League (NFL) 2018 regular season week 1 score data from ESPN, which involves lots of nested data in its raw form. 2 Then, I’ll work towards getting the data in a workable format (a data.

Investigating Ranks, Monotonicity, and Spearman's Rho with R

The Problem I have a bunch of data that can be categorized into many small groups. Each small group has a set of values for an ordered set of intervals. Having observed that the values for most groups seem to increase with the order of the interval, I hypothesize that their is a statistically-significant, monotonically increasing trend. An Analogy To make this abstract problem more relatable, imagine the following scenario.

A Tidy Text Analysis of R Weekly Posts

I’m always intrigued by data science “meta” analyses or programming/data-science. For example, Matt Dancho’s analysis of renown data scientist David Robinson. David Robinson himself has done some good ones, such as his blog posts for Stack Overflow highlighting the growth of “incredible” growth of python, and the “impressive” growth of R in modern times. With that in mind, I thought it would try to identify if any interesting trends have risen/fallen within the R community in recent years.

Dealing with Interval Data and the nycflights13 package using R, Part 2

In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design. Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sampled on a minute basis (as a consequence of each file holding data for sub-daily intervals).

Dealing with Interval Data and the nycflights13 package using R

In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these cases, I lean on R to scrape and compile the data.

A Tidy Text Analysis of My Google Search History

While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how to download your personal Google history and the format of the downloads (nowadays, it’s in a .