Although I’ve been an R user for some time, and have taught a variety of courses in R for statistics, I’ve never been a great user of the data science elements of R; I had a little spare time over the summer and have been trying to catch up with the tidyverse, mostly by starting with Hadley Wickham’s excellent book, R for Data Science

Whilst I’m not sure I’ll ever be a data scientist, I find the power of this quite amazing, especially compared to how I used to teach graphing in R. It does take a little more time, but filtering large data sets in R, and graphing becomes a breeze.

I’ve been working for some time on a statistical model for test cricket, which seems quite promising. I’ve used the yorkr package , modified a little for test cricket, in order to download every ball of test cricket from the excellent cricsheet website. There’s some 415 published test matches, and after some data issues I’ve so far successfully converted 399 of them.

Anyway, to demonstrate how easy it is to get interesting results using the tidyverse, here’s some data on the number of runs scored and overs faced for each test wicket.