The method comes via an answer to a question posted on Stack Overflow about how to spot disjoint sets of grouped items in a list. The trick is to construct a graph in which edges are placed between elements in each subset, and then clusters identified from the whole set of items.
So for example, in this fragment of a lap chart showing race positions going from one lap to another, we see several position changes:
Many of the drivers do not change position at all, but there are position changes between four distinct groups of drivers: those in 1st and 2nd; those in 4th, 5th and 6th; those in 9th and 10th; and those in 17th, 18th and 19th.
If we connect nodes in a graph for each driver going from the position they held in the previous lap to the position they hold in the current graph (and ignore drivers that didn't change position), we get the following groupings:
Notice how the nodes - representing positions - are connected to each other by arrows, showing how a car placed in one position moved to another position. So for example, we see that the cars in positions 9 and 10 changed place with each other, as did those in positions 1 and 2. The car in 19th went to 18th, the one in 18th to 17th, and the one in 17th fell back to 19th. And so so.
The chapter containing the code for constructing the graph and partitioning it into separate clusters can currently be found as part of the preview for the Wrangling F1 Data With R book... but I'm not sure how long it will remain so...
See also: OUseful.info - Identifying Position Change Groupings in Rank Ordered Lists
No comments:
Post a Comment
There seem to be a few issues with posting comments. I think you need to preview your comment before you can submit it... Any problems, send me a message on twitter: @psychemedia