In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design.
Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sampled on a minute basis (as a consequence of each file holding data for sub-daily intervals).
In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these cases, I lean on R to scrape and compile the data.
While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how to download your personal Google history and the format of the downloads (nowadays, it’s in a .
As a person who’s worked with various programming languages over time, I have become interested in the nuances and overlaps among languages. In particular, concepts related to code syntax and organization–everything from technical concepts such as lexical scoping, to more broad concepts such as importing and naming data–really fascinate me. Organization “enthusiasts” like me truly appreciate software/applications that follow consistent norms.
In the R community, the tidyverse ecosystem has become extremely popular because it implements a consistent, intuitive framework that is, consequently, easy to learn.
If you’re not completely new to the data science community (specifically, the #rstats community), then you’ve probably seen a version of the “famous” data science workflow diagram. 1
If one is fairly familiar with a certain topic, then one might not spend much time with the initial “visualize” step of the workflow. Such is the case with me and NBA data–as a relatively knowledgeable NBA follower, I don’t necessarily need to spend much of my time exploring raw NBA data prior to modeling.
As of today, I’ve officially made the jump to using the R package blogdown (which uses the Hugo static-site generator under the hood) for my personal website. Previously, I had been using WordPress for my blogging purposes. In sync with the change in platform, I’m changing the name of this site from “Number Sense” (www.numbersense.org) to something involving my name (Tony ElHabr). Nonetheless, my original intents to write about math, sports, and data-related things have not changed.