fuzzyjoin

Fuzzy Matching with Texas High School Academic Competition Results and SAT/ACT Scores

Introduction As a follow-up to a previous post about correlations between Texas high school academic UIL competition scores and SAT/ACT scores, I wanted explore some of the “alternatives” to joining the two data sets—which come from different sources. In that post, I simply perform a an inner_join() using the school and city names as keys. While this decision ensures that the data integrity is “high”, there are potentially many un-matched schools that could have been included in the analysis with some sound “fuzzy matching”.