One of the new chapters I pushed yesterday to Wrangling F1 Data With R covers "streakiness", so I thought I could try to use the routines described there to review in season streaks of length five or more from previous seasons using data from the ergast database.
As a first (optimisation) pass, I thought I'd identify British drivers who have won 5 or more races in a season; this could then be followed by looking for streaks of 5 or more wins by those drivers within their multiple-win seasons.
Firstly, we can get the drivers of a particular nationality with multiple wins within a season by querying the ergast database using a query along the lines of:
multiwinners.gb = dbGetQuery(ergastdb, 'SELECT driverRef, d.driverId, nationality, MAX(wins), year FROM driverStandings ds JOIN races r JOIN drivers d WHERE ds.raceId=r.raceId AND ds.driverId=d.driverId AND ds.driverId IN (SELECT DISTINCT driverId FROM drivers WHERE nationality="British") GROUP by year,d.driverId HAVING MAX(wins)>=5')
This gives us a set of results of the form:
driverRef driverId nationality MAX(wins) year 1 clark 373 British 7 1963 2 clark 373 British 6 1965 3 stewart 328 British 6 1969 4 stewart 328 British 6 1971 5 stewart 328 British 5 1973 6 hunt 231 British 6 1976 7 mansell 95 British 5 1986 8 mansell 95 British 6 1987 9 mansell 95 British 5 1991 10 mansell 95 British 9 1992 11 damon_hill 71 British 6 1994 12 damon_hill 71 British 8 1996 13 hamilton 1 British 5 2008 14 button 18 British 6 2009
We can then generate streak reports for each of those drivers in each of those years, identifying the follow streaks of 5 wins or more within a season by a British driver using the streakReview() function:
ddply(multiwinners.gb,.(driverRef,year),function(x) streakReview(x$driverRef,length=5,topN=1,years=x$year,typ=1)) driverRef year start end l startc endc starty 1 clark 1965 1 6 6 Prince George Circuit Nürburgring 1965 2 mansell 1992 1 5 5 Kyalami Autodromo Enzo e Dino Ferrari 1992 endy brokenbyy brokenbyc 1 1965 1965 Autodromo Nazionale di Monza 2 1992 1992 Circuit de Monaco
- In 1965, Jim Clark won the first 6 rounds of the season, starting with a win at Prince George Circuit with the last win in the streak at the Nürburgring.
- In 1992, Nigel Mansell won the first five rounds of the season, starting at Kyalami with the final win of the streak at Autodromo Enzo e Dino Ferrari.
For more detailed code examples on wrangling Formula One data with R, see the Wrangling F1 Data With R book.
No comments:
Post a Comment
There seem to be a few issues with posting comments. I think you need to preview your comment before you can submit it... Any problems, send me a message on twitter: @psychemedia