Thoughts on Using Flexdashboard

I’ve experimented with the flexdashboard R package for a couple of things after first trying out not so long ago. In particular, I found the storyboard format to be my favorite. I used it to create the storyboard that I wrote about in a previous post about tracking the activity of NBA team Twitter accounts. I also used flexdashboard for a presentation that I gave at my company’s data science group.

Read more

Share Comments

Investigating Ranks, Monotonicity, and Spearman's Rho with R

The Problem I have a bunch of data that can be categorized into many small groups. Each small group has a set of values for an ordered set of intervals. Having observed that the values for most groups seem to increase with the order of the interval, I hypothesize that their is a statistically-significant, monotonically increasing trend. An Analogy To make this abstract problem more relatable, imagine the following scenario.

Read more

Share Comments

Analyzing Professional Sports Team Colors with R, Part 2

NOTE: This write-up picks up where the previous one left off. All of the session data is carried over. Color Similarity Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best. Because I’ll end up plotting team names at some point and some of the full names are relatively lengthy, I want to get the official abbreviations for each team.

Read more

Share Comments

Analyzing Professional Sports Team Colors with R

When working with the ggplot2 package, I often find myself playing around with colors for longer than I probably should be. I think that this is because I know that the right color scheme can greatly enhance the information that a plot portrays; and, conversely, choosing an uncomplimentary palette can suppress the message of an otherwise good visualization. With that said, I wanted to take a look at the presence of colors in the sports realm.

Read more

Share Comments

NBA Team Twitter Analysis Flexdashboard

I just wrapped up a mini-project that allowed me to do a handful of things I’ve been meaning to do: Try out the flexdashboard R package. Test out my (mostly completed) personal tetext package for quick and tidy text analysis. (It implements a handful of the techniques shown by David Robinson and Julia Silge, in their blogs and in their Tidy Text Mining with R book. Explore the interaction of social media and the NBA, which is well regarded for leading the sports industry in engaging fans through modern technological means.

Read more

Share Comments

A Tidy Text Analysis of R Weekly Posts

I’m always intrigued by data science “meta” analyses or programming/data-science. For example, Matt Dancho’s analysis of renown data scientist David Robinson. David Robinson himself has done some good ones, such as his blog posts for Stack Overflow highlighting the growth of “incredible” growth of python, and the “impressive” growth of R in modern times. With that in mind, I thought it would try to identify if any interesting trends have risen/fallen within the R community in recent years.

Read more

Share Comments

Conversion of Old Posts to Bookdown

I’m happy to announce that I’ve finished converting the bulk of my old posts to an e-book, using the Yihui Xie’s wonderful bookdown package. The e-book is live on the docs branch of a GitHub repo. The posts (now chapters) apply concepts in the field of decision analysis to evaluate “value” in the NBA Draft. Although analysis of the NBA draft itself is certainly not novel and , I think my approach is fairly original.

Read more

Share Comments

Dealing with Interval Data and the nycflights13 package using R, Part 2

In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design. Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sampled on a minute basis (as a consequence of each file holding data for sub-daily intervals).

Read more

Share Comments

Dealing with Interval Data and the nycflights13 package using R

In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these cases, I lean on R to scrape and compile the data.

Read more

Share Comments

A Tidy Text Analysis of My Google Search History

While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how to download your personal Google history and the format of the downloads (nowadays, it’s in a .

Read more

Share Comments