Generating a Gallery of Visualizations for a Static Website (using R)

While I was browsing the website of fellow R blogger Ryo Nakagawara1, I was intrigued by his “Visualizations” page. The concept of creating an online “portfolio” is not novel 2, but I hadn’t thought to make one as a compilation of my own work (from blog posts)… until now 😄. The code that follows shows how I generated the body of my visualization portfolio page. The task is achieved in a couple of steps.

Making a Cheat Sheet with Rmarkdown

Unfortunately, I haven’t had as much time to make blog posts in the past year or so. I started taking classes as part of Georgia Tech’s Online Master of Science in Analytics (OMSA) program last summer (2018) while continuing to work full-time, so extra time to code and write hasn’t been abundant for me. Anyways, I figured I would share one neat thing I learned as a consequence of taking classes—writing compact “cheat sheets” with {rmarkdown}.

Text Parsing and Text Analysis of a Periodic Report (with R)

Some Context Those of you non-academia folk who work in industry (like me) are probably conscious of any/all periodic reports that an independent entity publishes for your company’s industry. For example, in the insurance industry in the United States, the Federal Insurance Office of the U.S. Department of the Treasury publishes several reports on an annual basis discussing the industry at large, like this past year’s Annual Report on the Insurance Industry.

Summarizing rstudio::conf 2019 Summaries with Tidy Text Techniques

UPDATE (2019-07-07): Check out this {usethis} article for a more automated way of doing a pull request. To be honest, I planned on writing a review of this past weekend’s rstudio::conf 2019, but several other people have already done a great job of doing that—just check out Karl Broman’s aggregation of reviews at the bottom of the page here! (More on this in a second.) In short, my thoughts on the whole experience are captured perfectly by Nick Strayer’s tweet the day after the conference ended.

A Newbie's Guide to Making A Pull Request (for an R package)

I had the wonderful opportunity to participate in the {tidyverse} Developer Day the day after rstudio::conf2019 officially wrapped up. 1 One of the objectives of the event was to encourage open-source contributor newbies (like me 😄) to gain some experience, namely through submitting pull requests to address issues with {tidyverse} packages. Having only ever worked with my own packages/repos before, I found this was to be perfect opportunity to “get my feet wet”!

Re-creating a Voronoi-Style Map with R

Introduction I’ve written some “tutorial”-like content recently—see here, here, and here—but I’ve been lacking on ideas for “original” content since then. With that said, I thought it would to try to re-create something with R. (Not too long ago I saw that Andrew Heiss did something akin to this with Charles Minard’s well-known visualization of Napoleon’s 1812.) The focus of my re-creation here is the price contour map shown on the front page of the website for the Electric Reliability Council of Texas, the independent system operator of electric power flow for about 90 percent of Texas residents, as well as the employer of yours truly).

Converting nested JSON to a tidy data frame with R

UPDATE: The data retrieval demonstrated in this post no longer seems to work due to a change in the ESPN’S “secret” API. In any matter, the techniques for working with JSON data are still valid. In this “how-to” post, I want to detail an approach that others may find useful for converting nested (nasty!) json to a tidy (nice!) data.frame/tibble that is should be much easier to work with. 1

The Split-Apply-Combine Technique for Machine Learning with R

Introduction Much discussion in the R community has revolved around the proper way to implement the “split-apply-combine”. In particular, I love the exploration of this topic in this blog post. It seems that the “preferred” approach is dplyr::group_by() + tidyr::nest() for splitting, dplyr::mutate() + purrr::map() for applying, and tidyr::unnest() for combining. Additionally, many in the community have shown implementations of the “many models” approach in {tidyverse}-style pipelines, often also using the {broom} package.

Fuzzy Matching with Texas High School Academic Competition Results and SAT/ACT Scores

Introduction As a follow-up to a previous post about correlations between Texas high school academic UIL competition scores and SAT/ACT scores, I wanted explore some of the “alternatives” to joining the two data sets—which come from different sources. In that post, I simply perform a an inner_join() using the school and city names as keys. While this decision ensures that the data integrity is “high”, there are potentially many un-matched schools that could have been included in the analysis with some sound “fuzzy matching”.

Visualizing Texas High School SAT Math Scores with Bubble Grids

Two awesome things inspired this post: {ggplot2}’s version 3.0 release on CRAN, including full support for the {sf} package and new functions geom_sf() and coord_sf(), which make plotting data from shapefiles very straightforward. Jonas Scholey’s blog post discussing the use of “bubble grid” maps as an alternative to choropleth maps, which seem to be used more prevalent. As Jonas implies, using color as a visual encoding is not always the best option, a notion with which I strongly agree.

Correlations Between Texas High School Academic Competition Results and SAT/ACT Scores

Introduction I wanted to do a follow-up on my series of posts about Texas high school University Interscholastic League (UIL) academic competitions to more closely evaluate the relationship between the school performance in those competitions with school-wide SAT) and ACT scores. For those who may not be familiar with these tests, these are the two most popular standardized tests used for college admission in the United States. In my introduction to that series, I stated the following: School-wide … scores on state- and national-standardized tests (e.

An Analysis of Texas High School Academic Competition Results, Part 1 - Introduction

.toggle { height: 1.85em; overflow-y: hidden; } .toggle.open { height: auto; } $(“.toggle”).click(function() { $(this).toggleClass(“open”); }); Show-- NOTE: This is part of a series of write-ups discussing my findings of Texas high school academic University Interscholastic Scholarship (UIL) competitions. To keep this and the other write-ups concise and to focus reader attention on the content, I have decided not to show the underlying code (especially that which is used to create the visuals).

An Analysis of Texas High School Academic Competition Results, Part 2 - Competitions

Competition Participation Some of the first questions that might come to mind are those regarding the number of schools in each level of competition (District, Region, and State) and each conference classification level (1A, 2A, … 6A). It seems fair to say that the distribution of schools among Districts, Regions, and Conferences is relatively even. 1 2 This is to be expected since the UIL presumably tries to divide schools evenly among each grouping (to the extent possible) in order to stimulate fair competition.

An Analysis of Texas High School Academic Competition Results, Part 3 - Individuals

Let’s take a look at individual competitors in the academic UIL competitions. Individual Participation The first question that comes to mind is that of participation–which individuals have competed the most? NOTE: To give some context to the values for individual participants, I’ll include the numbers for myself (“Elhabr, Anthony”) in applicable contexts. rnk name school city conf n 1 Jansa, Wade GARDEN CITY GARDEN CITY 1 57 2 Chen, Kevin CLEMENTS SUGAR LAND 5 56 3 Hanson, Dillon LINDSAY LINDSAY 1 53 4 Gee, John CALHOUN PORT LAVACA 4 47 5 Zhang, Mark CLEMENTS SUGAR LAND 5 47 6 Robertson, Nick BRIDGE CITY BRIDGE CITY 3 46 7 Ryan, Alex KLEIN KLEIN 5 46 8 Strelke, Nick ARGYLE ARGYLE 3 45 9 Niehues, Taylor GARDEN CITY GARDEN CITY 1 44 10 Bass, Michael SPRING HILL LONGVIEW 3 43 1722 Elhabr, Anthony CLEMENS SCHERTZ 4 13 Note: 1 # of total rows: 123,409

An Analysis of Texas High School Academic Competition Results, Part 4 - Schools

Having investigated individuals elsewhere, let’s now take a look at the schools. NOTE: Although I began the examinations of competitions and individuals by looking at volume of participation (to provide context), I’ll skip an analogous discussion here because the participation of schools is shown indirectly through those analyses.) School Scores Let’s begin by looking at some of the same metrics shown for individual students, but aggregated across all students for each school.

An Analysis of Texas High School Academic Competition Results, Part 5 - Miscellaneous

There’s a lot to analyze with the Texas high school academic UIL data set. Maybe I find it more interesting than others due to my personal experiences with these competitions. Now, after examining some of the biggest topics associated with this data–including competitions, individuals, and schools–in a broad manner, there are some other things that don’t necessarily fall into these categories that I think are worth investigating. Siblings Let’s look at the performance of siblings.

The DRY Principle and Knowing When to Make a Package

Don’t Repeat Yourself (DRY) Probably everyone who has done some kind of programming has heard of the “Don’t Repeat Yourself” (DRY) principle. In a nutshell, it’s about reducing code redundancy for the purpose of reducing error and enhancing readability. Undoubtedly the most common manifestation of the DRY principle is the creation of a function for re-used logic. The “rule of 3” is a good shorthand for identifying when you might want to rethink how your code is organized– “You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.

Thoughts on Using Flexdashboard

I’ve experimented with the {flexdashboard} package for a couple of things after first trying out not so long ago. In particular, I found the storyboard format to be my favorite. I used it to create the storyboard that I wrote about in a previous post about tracking the activity of NBA team Twitter accounts. I also used {flexdashboard} for a presentation that I gave at my company’s data science group.

Investigating Ranks, Monotonicity, and Spearman's Rho with R

The Problem I have a bunch of data that can be categorized into many small groups. Each small group has a set of values for an ordered set of intervals. Having observed that the values for most groups seem to increase with the order of the interval, I hypothesize that their is a statistically-significant, monotonically increasing trend. An Analogy To make this abstract problem more relatable, imagine the following scenario.

Analyzing Professional Sports Team Colors with R, Part 2

NOTE: This write-up picks up where the previous one left off. All of the session data is carried over. Color Similarity Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best. Because I’ll end up plotting team names at some point and some of the full names are relatively lengthy, I want to get the official abbreviations for each team.