Learning Progress: Paused. Current progress 21%.⏱
Unifying principles
The tidyverse is a language for solving data science challenges with R code. Its primary goal is to facilitate the conversation that a human has with a dataset, and we want to help dig a “pit of success” where the least-effort path trends towards a positive outcome. The primary tool to dig the pit is API design: by carefully considering the external interface to a function, we can help guide the user towards success.
-
The tidyverse has four guiding principles:
It is human centered, i.e. the tidyverse is designed specifically to support the activities of a human data analyst.
It is consistent, so that what you learn about one function or package can be applied to another, and the number of special cases that you need to remember is as small as possible.
It is composable, allowing you to solve complex problems by breaking them down into small pieces, supporting a rapid cycle of exploratory iteration to find the best solution.
It is inclusive, because the tidyverse is not just the collection of packages, but it is also the community of people who use them.
Idea of “chunking” is important. Some setup cost to learn a new chunk, but once you’ve internalised it, it only takes up one spot in your working memory. In some sense the goal of the tidyverse is to discover the minimal set of chunks needed to do data science and have some sense of the priority of the remainder.
Most importantly performance of code depends not only on how long it takes to run, but also how long it takes to write and read. Human brains are typically slower than computers, so this means we spend a lot of time thinking about how to create intuitve interfaces, focussing on writing and reading speed. Intuitive interfaces sometimes are at odds with running speed, because writing the fastest code for a problem often requires designing the interface for performance rather than usability. Generally, we optimise first for humans, then use profiling to discover bottlenecks that cause friction in data analysis. Once we have identified an important bottleneck, then performance becomes a priority and we rewrite the existing code. Generally, we’ll attempt to preserve the existing interface, only changing it when the performance implications are significant.
-
There are two ways that we make functions consistent that are so important that they’re explicitly pull out as high-level principles below:
Functions should be composable: each individual function should tackle one well contained problem, and you solve complex real-world problems by composing many individual functions.
Overall, the API should feel “functional”, which is a technical term for the programming paradigm favoured by the tidyverse.