In this “how-to” post, I want to detail an approach that others may find useful for converting nested (nasty!) json to a tidy (nice!) data.frame/tibble that is should be much easier to work with. 1
For this demonstration, I’ll start out by scraping National Football League (NFL) 2018 regular season week 1 score data from ESPN, which involves lots of nested data in its raw form. 2
Then, I’ll work towards getting the data in a workable format (a data.
The Problem I have a bunch of data that can be categorized into many small groups. Each small group has a set of values for an ordered set of intervals. Having observed that the values for most groups seem to increase with the order of the interval, I hypothesize that their is a statistically-significant, monotonically increasing trend.
An Analogy To make this abstract problem more relatable, imagine the following scenario.
I’m always intrigued by data science “meta” analyses or programming/data-science. For example, Matt Dancho’s analysis of renown data scientist David Robinson. David Robinson himself has done some good ones, such as his blog posts for Stack Overflow highlighting the growth of “incredible” growth of python, and the “impressive” growth of R in modern times.
With that in mind, I thought it would try to identify if any interesting trends have risen/fallen within the R community in recent years.
In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design.
Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sampled on a minute basis (as a consequence of each file holding data for sub-daily intervals).
In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these cases, I lean on R to scrape and compile the data.
While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how to download your personal Google history and the format of the downloads (nowadays, it’s in a .