Well, to be honest, I see a far higher chance for him to win the tour, but first let’s look at the data. Having collected 10 years of Tour de France data, it is time to look at structural features of a whole tour. With a sample size of 10 (yes, still far away from big data …) we might want to look at the rank of the winner of a tour within the tour.
The graph shows the empirical probabilities (supported by a natural spline smoother of degree 5) for each stage that
- the current leader wins the tour
- the winner is in the top 3
- in the top 5, or
- in the top 10
From this model we can read off the graph, that the chance to win the tour is 50% if you are the leader after stage 14.
What really surprised me is the fact there is such a big gap between leader and top-3 and a far smaller between top-3 and top-5.
But everyone who knows the basic set-up of a tour knows that the race is decided in the mountains, i.e., the Alps and the Pyrenees, which usually come up between stage 11 - 14 and 16 – 19, depending on the route the Alps first or not. As there is often an individual time trial as the last “counting” stage (you might know of the “non-agression pact” in the last stage), this time trial might switch the leader for a last time if the gaps are non-bigger than say 3′-4′.
So this concludes my personal assessment that NIBALI has a far greater chance to win than 50%, as his lead is almost 5′ now, and if he can maintain his performance in the remaining stages in the Pyrenees (which is still some way to go), he will be this year’s winner.
I conclude with a parallel box plot for the ranks of the winners of the last 10 years:
(the highlighted winner is LANDIS in 2006, who was found guilty of doping immediately after his phenomenal comeback in stage 17, harming the sport as well as my statistics …)