Updated: Jun 10, 2019
Running training is full of uncertainty. Let's examine what uncertainty faces a coach when he/she is designing training for an athlete. There is uncertainty in training methodology - nobody has a perfect formula. Even given some best practice, there is uncertainty about whether it even applies to any given athlete. There is uncertainty about an athlete's physiologic and mental state on any given day. There is uncertainty about how different athletes communicate that state to the coach. There is uncertainty about steps that an athlete will take to recover from a stimulus. Okay, you get it.
In the face of this uncertainty, there is naturally a trend to rely on things that seem very certain: HR data, pace, scientific papers. However, we tend to forget that there is actually large uncertainty with how applicable all of these things are to effective training. This leads to overfitting.
In statistics, overfitting means relying too much on the data that you have in front of you, such that you might be able to perfectly explain that data, but poorly predict new data. The same thing happens in running training all the time.
For instance, say an athlete's heart rate is 5 beats per minute higher than normal on their easy run. Overfitting to this data would be for the coach to now adjust the athlete's goal marathon pace based on this one run, whereas a healthier interpretation of the data might be to realize that the athlete only got 5 hours of sleep the night before, so that data point shouldn't be weighed heavily.
The same thing happens with running studies. Say a new study comes out showing that for a certain group of test subjects following different workout protocols for six weeks, 3 minute intervals seem to be the most effective at improving VO2max. You'll surely see an army of coaches now employing 3 minute intervals in their programs. This is again a case of potentially overfitting. Are the test subjects similar to the coach's athletes? Is six weeks the right timeframe? What other training were the athletes doing? Are there other protocols that are maybe better that the study didn't consider? In other words, there are lots of confounding variables that make it easy to overfit.
With training, information is of course valuable. With no information about an athlete or the context of their training, it would be nearly impossible to deliver effective coaching. The key is to assign the right weights to each piece of data coming in. What we are really doing in finding the right weights to give to data is minimizing uncertainty in the expected response to the stimulus that we prescribe to the athlete.
When a coach assigns a run to an athlete, that assignment is a hypothesis - it is a best guess as to the best stimulus for the athlete on that given day. Based on how the athlete performs, we can evaluate that hypothesis and update training, but we still want to get as close to optimal as possible, in the face of lots of uncertainty.
Back to statistics: a key strategy to avoid overfitting is called regularization. This is where you intentionally penalize changes to how you view the world based on new information coming in. This practice keeps you from swerving off the freeway when you see a pebble! That is because that new information must be really really significant in order for you to change your view (that you are driving straight on a freeway, with tires much much bigger than the pebble that can handle it no problem).
In running training, how can we perform something similar, without constantly having to make explicit calculations (which themselves will suffer from lots of uncertainty!)? The answer is to use heuristics.
Heuristics are rules of thumb that generally apply over a range of circumstances. Some people scoff at heuristics because they are not precise formulas, but that is exactly why we are interested in them! The fact that they are not exact, and are generalizations means that they are somewhat already regularized. In other words, heuristics give us a way to avoid overfitting while still designing very good training.
How do heuristics do this? It's simple - experience. Heuristics are refined over time by many coaches and athletes sometimes making lots of mistakes and sometimes getting lots of things right. The fact that people are trying to get better means that heuristics will evolve to produce better athletes. The fact that they evolve through exposure to many people in many different circumstances means that they are robust to overfitting!
What is an example of a heuristic? It could be something seemingly basic like, "Marathon runners must emphasize aerobic development." That might be obvious, so how would something like this be useful back in the early days of marathon training? Well, a coach could be faced with a tradeoff between volume and intensity (which is still a ubiquitous situation) and would apply the heuristic that aerobic development, and thus volume, wins.
That example is straightforward, but when used in a hierarchical way, like a decision tree, heuristics can be extremely powerful and safely reduce uncertainty. This is especially helpful when things are not straightforward. What if you have a marathoner that has an injury history, that transitioned to marathoning from a middle distance background, and only has two months before the race? Good luck finding a study that fits your situation. But if you have a series of heuristics pertaining to injuries, aerobic development, and specificity of training, you can apply these heuristics in the order of their importance and get pretty far.
There are situations in which it might make sense not to be safe, where you can risk overfitting, but this really only applies once an athlete is very close to their genetic potential and must squeeze out every last ounce of improvement, which is only true for a handful of athletes in the world and not a focus of this framework.
Much more to come on this idea of heuristics, and applying them in order as decision trees.