F1 Data Junkie: F1 2011 Progress Throughout the Year

Prompted by a comment from @sidepodcast, I thought I'd try to come up with some ways of tracking how the teams made progress over the course of 2011. This is still a work in progress, but here are some early thoughts based on fastest lap time recorded by the teams in each car recorded across the season.

The charts I've been exploring rely on taking the fastest overall laptime for each race, and then using this to scale the fastest laptime of every car in the race according the formula:

scaled time for car N = fastest lap time for car N / fastest overall laptime

The fastest car thus has scaled laptime of 1.0, and slower cars have a scaled laptime of greater than 1.

Looking at the times for a particular team, we get scatterplots that look something like this:

Looking at these times, we see that Mercedes appeared to be off the pace compared to the fastest lapping car in Australia, had a reasonable run from Malaysia to Turkey, fell back in Spain but then steadily improved (relative to the fastest lapping car) in the run through to Japan (with a blip in Singapore), then fell back but held steady over the last four races.

We could instead connect the points to generate a line chart, but the impression we get as a result is rather ragged and it can be hard to detect any meaningful trend across the season:

If we group the drivers in each team to get two samples (usually!) for the fastlap time for each race, and then generate a simple linear model/best fit line through this data, we can generate a visualisation that may help us identify a general trend in performance of each team based on a comparison with the fastest lap times recorded overall. These trend lines then suggest how much progress each team makes in the wider scheme of things - that is, compared to the fastest lapping car in each race. (Note that we may get a different picture if we compared the laptimes of one team with the fastest laptime recorded by another team... which I guess I could leave as an exercise for the reader?!;-))

Other best fit models are, of course, possible - here's the LOESS algorithm

Here are the charts showing the best fit lines for each team (note that outlier times have been excluded from the best fit line calculation (err, I think?!), first a simple linear model:

Then using LOESS:

See also: F1 2011 Review - Another Look at Fastest Laptime Evolution for an improved chart that plots the fastest laptime recorded by each team for each race.

Note: whilst I was collecting timing data over the course of the year, I cheated for this post and grabbed the data from the formula1.com results pages. Here's the Python script I used to scrape the data (data as a spreadsheet).

I then used R via RStudio and the ggplot2 library to generate the plots:

require(ggplot2)
#load in the CSV filed saved from the spreadsheet
#Note: we could load it in direct from the spreadsheet
#fastestLaps2011x <- read.csv("~/code/f1/fastestLaps2011.csv")

#Find the fastest times in each race
mintimes=tapply(fastestLaps2011x$stime,fastestLaps2011$race,min)
#Add the fastest time in each race to each row
fastestLaps2011x$min=sapply(fastestLaps2011x$race,function(d) mintimes[d])
#Calculate the fastest lap time ratio 
fastestLaps2011x$minstimepc=fastestLaps2011x$stime/fastestLaps2011$min

#Order the levels in the race factor in terms of calendar order
fastestLaps2011x$race=factor(fastestLaps2011$race,levels=c("AUSTRALIA","MALAYSIA","CHINA","TURKEY","SPAIN","MONACO","CANADA","EUROPE","GREAT BRITAIN","GERMANY","HUNGARY","BELGIUM","ITALY","SINGAPORE","JAPAN","KOREA","INDIA","ABU DHABI","BRAZIL"),ordered=T)

#Order the teams
fastestLaps2011x$team=factor(fastestLaps2011$team,levels=c("RBR-Renault","McLaren-Mercedes","Ferrari","Mercedes","Renault","Force India-Mercedes","Sauber-Ferrari","STR-Ferrari","Williams-Cosworth","Lotus-Renault","HRT-Cosworth","Virgin-Cosworth"),ordered=T)

#Command used to generate the Mercedes scatterplot
ggplot(subset(fastestLaps2011x,driverNum==7 | driverNum==8)) + geom_point(aes(x=race,y=minstimepc,col=factor(driverNum))) + opts(title="F1 2011 - Fastest Lap Comparison (Mercedes)",axis.text.x=theme_text(angle=-90)) + scale_colour_discrete(name="Driver Number") + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime") + ylim(1.00,1.1)

#Command used to generate the Mercedes line plot
ggplot(subset(fastestLaps2011x,driverNum==7 | driverNum==8)) + geom_line(aes(x=race,y=minstimepc,col=factor(driverNum),group=driverNum)) + opts(title="F1 2011 - Fastest Lap Comparison (Mercedes)",axis.text.x=theme_text(angle=-90)) + scale_colour_discrete(name="Driver Number") + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime") + ylim(1.00,1.1)

#Command used to generate the Mercedes scatterplot plot and simple linear model
ggplot(subset(fastestLaps2011x,driverNum==7 | driverNum==8)) + geom_point(aes(x=race,y=minstimepc,col=factor(driverNum))) + opts(title="F1 2011 - Fastest Lap Comparison (Mercedes)",axis.text.x=theme_text(angle=-90)) + scale_colour_discrete(name="Driver Number") + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime") + ylim(1.00,1.1) + stat_smooth(method="lm",aes(x=race,y=minstimepc,group=team,col=factor(team)), se=FALSE)

#Command used to generate the Mercedes scatterplot plot and loess best fit
ggplot(subset(fastestLaps2011x,driverNum==7 | driverNum==8)) + geom_point(aes(x=race,y=minstimepc,col=factor(driverNum))) + opts(title="F1 2011 - Fastest Lap Comparison (Mercedes)",axis.text.x=theme_text(angle=-90)) + scale_colour_discrete(name="Driver Number") + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime") + ylim(1.00,1.1) + stat_smooth(method="loess",aes(x=race,y=minstimepc,group=team,col=factor(team)), se=FALSE)

#Command used to generate team based linear models 
ggplot(fastestLaps2011x)+stat_smooth(method="lm",aes(x=race,y=minstimepc,group=team,col=factor(team)), se=FALSE)+ylim(0.99,1.10)+opts(title="F1 2011 Fastest Laptime Evolution",axis.text.x=theme_text(angle=-90))+scale_colour_manual(name="Teams",values = c("blue","darkgray","red","lightsteelblue3","goldenrod3","darkorange","gray8","firebrick4","midnightblue","darkgreen","gray0","darkred")) + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime")

#Command used to generate team based loess models 
ggplot(fastestLaps2011x)+stat_smooth(method="loess",aes(x=race,y=minstimepc,group=team,col=factor(team)), se=FALSE)+ylim(0.99,1.10)+opts(title="F1 2011 Fastest Laptime Evolution",axis.text.x=theme_text(angle=-90))+scale_colour_manual(name="Teams",values = c("blue","darkgray","red","lightsteelblue3","goldenrod3","darkorange","gray8","firebrick4","midnightblue","darkgreen","gray0","darkred")) + xlab(NULL) + ylab("Fastest laptime as % of fastest overall laptime")

4 comments:

Tony HirstDecember 27, 2011 at 1:09 AM
Hmm... I wonder if, on the x-axis, i should space the races out according to week number, as a way of trying to distinguish between back to back races and races with rather more development time between them?
Tony HirstDecember 27, 2011 at 10:31 PM
A couple more commands that may or may not be useful.

fl=fastestLaps2011x

#An alternative way of reporting the minimum laptime per race:
by(fl,fl[,"race"],function(d) min(d$stime))
#I'm not sure how to use this data though?

#Reporting the minimum laptime per team per race
by(fl,fl[,c("race","team")],function(d) min(d$stime))
Tony HirstDecember 28, 2011 at 12:52 AM
Moving on a little more, rather than use by(), we can use the plyr function, ddply():

require(plyr)
#find the min overall laptime in each race, assigning an appropriate column name as we do so
flm=ddply(.variables=c("race"),.data=fl,.fun= function(d) data.frame(minstime=min(d$stime)))

#find the min time in each team for each race, assigning an appropriate column name as we do so
fll=ddply(.variables=c("race","team"),.data=fl,.fun= function(d) data.frame(minteamtime=min(d$stime)))

#now merge this data with data from the fl dataframe
fl2=merge(fl,flm,by=c("race"))
fl2=merge(fl2,fll,by=c("race","team"))

#find as the percentage of the fastest overall laptime each team's fastest laptime, per race
flmt=ddply(.variables=c("race","team"),.data=fl2,.fun= function(d) data.frame(minteampc=min(d$stime)/d$minstime))

#Plot these fastest team laptimes

teamcolours=c("blue","darkgray","red","lightsteelblue3","goldenrod3","darkorange","gray8","firebrick4","midnightblue","darkgreen","gray0","darkred")
ggplot(flmt)+stat_smooth(method="loess",aes(x=race,y=minteampc,group=team,col=factor(team)), se=FALSE)+ylim(0.99,1.08)+opts(title="F1 2011 Fastest Laptime by Team Evolution",axis.text.x=theme_text(angle=-90))+scale_colour_manual(name="Teams",values = teamcolours)+xlab(NULL)+ylab("Min team laptime as % of fastest overall lap")

#or a plot of actual fastest laptimes per race per team
ggplot(flmt)+geom_line(aes(x=race,y=minteampc,group=team,col=factor(team)), se=FALSE)+ylim(0.99,1.08)+opts(title="F1 2011 Fastest Laptime by Team Evolution",axis.text.x=theme_text(angle=-90))+scale_colour_manual(name="Teams",values = teamcolours)+xlab(NULL)+ylab("Min team laptime as % of fastest overall lap")

#after a bit of thought, it struck me that I should be able to use ddply to just extend the original dataframe. For example:

fl=ddply(.variables=c("race","team"),.data=fl,.fun= function(d) data.frame(d,minteamtime=min(d$stime)/d$min))
Tony HirstDecember 28, 2011 at 11:31 AM
Looking at the traces for all the teams on a single ggplot chart can at times be confusing. To split the traces out into a separate panel for each team, we can just add "+ facet_wrap(~team)" to the ggplot command.

To look at the best laptime traces for each driver within a team, we can use something like this:

ggplot(fl)+geom_line(aes(x=race,y=minstimepc,group=team,order=driverName,col=factor(driverName)), se=FALSE)+ylim(0.99,1.15)+opts(title="F1 2011 Fastest Laptime by Driver Evolution",axis.text.x=theme_text(angle=-90))+xlab(NULL)+ylab("Min team laptime as % of fastest overall lap")+ facet_wrap(~team)

There seem to be a few issues with posting comments. I think you need to preview your comment before you can submit it... Any problems, send me a message on twitter: @psychemedia

F1 Data Junkie

Pages

Tuesday, December 27, 2011

F1 2011 Progress Throughout the Year

4 comments: