So how can we capture the track position of each car - that is, the order in which they cross the start/finish?
The timing sheets published via the FIA website include a Race History Chart that tabulates the order in which cars pass the start/finish line relative to the laps completed by the current leader of the race. As the example below shows, if the leader laps a car on any given lead lap, the passed car does not have a time recorded for the previous leader lap because it did not complete that lap.
Unfortunately, the FIA don't release the timing sheets as data, preferring instead to use immutable PDF documents. (That doesn't mean we can't scrape the data of course...)
So how might we generate the track position given data we do have ready access to? The ergast database, for example, published lap time information - so can we use that to recreate track positions? Indeed we can...
One observation we might make is that a race track is a closed circuit; the second that the accumulated race time to date is the same for each driver, given that they all start the race at the same time. (The race clock is not started as each driver passes the start finish line - the race clock starts when the lights go green. To this extent, drivers lower placed on the grid server a positional time penalty compared to cars further up grid. This effective time penalty corresponds to the time it takes a lower placed car to physically get as far up the track as the cars in the higher placed grid positions.)
If we get hold of all of the lap time data for a particular race, with laptimes described in a milliseconds column, we can find the track position of a car in the following way.
First, identify which leader’s lap each driver is on and then use this as the basis for deciding whether a car is on the same lap - or a different one - compared with any car immediately ahead or behind on track. One way of doing this is on the basis of accumulated race time. If we order the drivers by the accumulated race time, and flag whether or not a particular driver is the leader on particular lap, we can count the accumulated number of “lap leader” flags to give us the current lead lap count irrespective of how many laps a given driver has completed.
library(plyr)
#For each driver, calculate their accumulated race time at the end of each lap
lapTimes=ddply(lapTimes, .(driverId), transform,
acctime=cumsum(milliseconds))
#Order the rows by accumulated lap time
lapTimes=arrange(lapTimes,acctime)
#This ordering need not necessarily respect the ordering by lap.
#Flag the leader of a given lap - this will be the first row in new leader lap block
lapTimes$leadlap= (lapTimes$position==1)
head(lapTimes[lapTimes$position<=3,c('driverRef','leadlap')],n=5)
This gives a result of the form:
## driverRef leadlap
## 1 button TRUE
## 2 hamilton FALSE
## 3 michael_schumacher FALSE
## 22 button TRUE
## 23 hamilton FALSE
A Boolean TRUE value has numeric value 1, a Boolean FALSE numeric value 0.
#Calculate a rolling count of leader lap flags.
#Recall that the cars are ordered by accumulated race time.
#The accumulated count of leader flags is the lead lap number each driver is on.
lapTimes$leadlap=cumsum(lapTimes$leadlap)
head(lapTimes[lapTimes$position<=3,c('driverRef','leadlap')],n=6)
So when we count the flags, we get something like this:
## driverRef leadlap
## 1 button 1
## 2 hamilton 1
## 3 michael_schumacher 1
## 22 button 2
## 23 hamilton 2
## 24 michael_schumacher 2
Let’s now calculate the track position for a given lead lap, where the leader in a given lap is in both race position and track position 1, the second car through the start/finish line is in track position 2 (irrespective of their race position), and so on. (In your mind’s eye, you might imagine the cars passing the finish line to complete each lap, first the race leader, then either car in second, or a lapped back marker, and so on.) Specifically, we group by leadlap and then accumulated race time within that lap, and assign track positions in incremental order.
lapTimes=arrange(lapTimes,leadlap,acctime)
lapTimes=ddply(lapTimes,.(leadlap),transform,
trackpos=1:length(position))
lapTimes[lapTimes$leadlap==33,c('code','lap','position','acctime','leadlap','trackpos')]
We now have track - as well as race - positions:
## code lap position acctime leadlap trackpos
## 616 BUT 33 1 3100735 33 1
## 617 HAM 33 2 3111538 33 2
## 618 VET 33 3 3113745 33 3
## 619 SEN 32 16 3115035 33 4
## 620 RIC 32 17 3115829 33 5
## 621 ALO 33 4 3125951 33 6
## 622 WEB 33 5 3131009 33 7
## 623 MAL 33 6 3133006 33 8
## 624 RAI 33 7 3141269 33 9
## 625 KOB 33 8 3147051 33 10
## 626 GLO 32 18 3150703 33 11
## 627 PER 33 9 3153818 33 12
## 628 ROS 33 10 3159053 33 13
## 629 VER 33 11 3162088 33 14
## 630 DIR 33 12 3172712 33 15
## 631 MAS 33 13 3177681 33 16
## 632 PET 33 14 3184974 33 17
## 633 PIC 32 19 3186685 33 18
## 634 KOV 33 15 3188375 33 19
In this example, we see Timo Glock (GLO) has only completed 32 laps compared to 33 for the race leader and the majority of the field. On track, he is placed between Kobyashi (KOB) and Perez (PER).
This code will form part of forthcoming chapter in the Wrangling F1 Data With R book, initially in a chapter that revisits an old idea: battle charts.
No comments:
Post a Comment
There seem to be a few issues with posting comments. I think you need to preview your comment before you can submit it... Any problems, send me a message on twitter: @psychemedia