Ball Progression is All You Need

r
soccer
observablejs
Identifying where passes (even incomplete ones) have the most positive impact on the pitch.
Author

Tony ElHabr

Published

April 20, 2024

Modified

April 22, 2024

Introduction

I’ve written a lot about expected goals (xG) in soccer, but I haven’t yet talked much about possession value (PV) models1, another big topic in soccer analytics. What are they? Well, every PV model is different, but they all generally try to assign value to every on-ball action on the pitch. Such a model can help inform decisions about how to improve player and team performance.

I heard someone recently say something like “PV models in soccer basically come down to ball progression”. That’s an interesting thought, and I add a hunch that it probably isn’t too wrong.

One way of getting at that idea is to look at how your PV model treats incomplete passes. Does it say that all long passes are “good”? What is the importance of the starting and end points of the pass? How does PV for an unsuccessful pass compare to a successful one, holding all else equal?

I attempt to answer some of these questions with a VAEP model–an open-source PV model.23

Possession Value (PV) for Passes

Completed Passes

We’ll want to eventually look at the PV of incomplete passes, but it’s probably easier to start with completed passes, as we have a pretty strong intuition about them–pass the ball successfully closer to the goal, and you’re most likely helping your team (i.e. positive PV).

From One Spot, To Anywhere on the Pitch

In the interactive 8x12 pitch below, the blue cell illustrates where a pass is made, and the colored cells illustrate the average PV associated with all historicalLy successful passes made to that area. Hovering over a cell shows the PV value above the pitch as well.4

Overall, I’d say that this illustration matches intuition–forward completed passes into the final third should be assigned a non-trivial positive value.

Figure 1: A heatmap showing the average possession value (PV) of historically completed passes from the center spot (annotated in blue) to all areas on the pitch. The relative frequency of successful passes from the center spot to each other cell is shown as a percentage. The exact PV value associated with a complete pass ending at the hover point can be viewed above the pitch. Black cells represent areas to which successful passes from the center spot have never been made.

Note that the gradient in the pitch above is for PV, not for the relative frequency of completed passes from the center spot, which is instead shown as overlayed text. While passes into the box from the center spot have really strong positive PV, they’re uncommon because defenders are generally looking to stop those kinds of threatening passes.

The gradient in the plot below illustrates the relative frequency of successful passes from the center spot directly.

Figure 2: A heatmap where the gradient and text illustrate the relative frequency of historically successful passes from the center spot (annotated in blue) to all areas on the pitch. Black cells represent areas to which successful passes from the center spot have never been made.

From Anywhere on the Pitch, To Anywhere on the Pitch

Now, to give the full picture, the interactive pitch below dynamically updates to show the average PV values associated with a pass starting from any cell that you hover over. The minimum and maximum PV achieved with a successful pass from the hovered spot are shown in the text above the pitch.

Figure 3: A heatmap showing the average possession value (PV) of historically completed pass from the hover spot to all areas on the pitch. The relative frequency of successful passes from the hover spot to each other cell is shown as a percentage. The highest and lowest PV values across all end points associated with a completed pass from the hover point are shown above the pitch. Black cells represent areas to which successful passes from the hover spot have never been made.

There are several takeaways one might have from this view, but the big one that I have is this: As you move your mouse (i.e. the starting point of the pass) from the defender’s box to the opponent’s box, the consolidated green box of +0.025 PV doesn’t change much. It stays basically at around the final quarter of the pitch. So you can’t just complete a 30-yard pass from the top of your own box progressing the ball towards the middle of the pitch and expect to get anywhere near the same PV as completing a 30-yard pass from the center of the pitch to near the opponent’s 18-yard box. The end point really matters.

This conclusion gets at our primary question–“Are all long passes good?”–to which the answer so far is “not quite” (in the sense that “good” is more than just “positive PV” for completed passes). A long completed pass in your own half doesn’t boast a huge positive PV, unless it ends up near the opponent’s box.

To get a more complete perspective, we’ll plot out the PV for incomplete passes to see what the answer is there.

Incomplete Passes

From One Spot, To Anywhere on the Pitch

Let’s start with an example again, looking at PV for unsuccessful passes from the center spot.

Figure 4: A heatmap showing the average possession value (PV) of historically incomplete passes from the center spot (annotated in blue) to all areas of the pitch. The relative frequency of unsuccessful passes from the center spot to each other cell is shown as a percentage. The exact PV value associated with an incomplete pass ending at the hover point can be viewed above the pitch. Black cells represent areas to which unsuccessful passes from the center spot have never been made.

I think this grid is fairly intuitive.5 Incomplete passes backward have fairly negative PVs, as those are turnovers probably setting up the opponent for good scoring opportunities. Incomplete passes forward mostly have neutral PVs, with some spots on the pitch having slightly positive PVs. Notably, a positive PV for an incomplete pass is a non-trivial revelation.

Some of the positive PV cells include the area at the top of the 18-yard box, i.e. “zone 14”. You can make the argument that the “risk” of losing possession to passes to zone 14 is justified from the potential to take a shot. Further, a loss of possession in this area can be advantageous, as it leaves the opponent likely in a vulnerable position.

From Anywhere on the Pitch, To Anywhere on the Pitch

Now let’s scale up our pass PV grid to all incomplete passes. As with the dynamic successful pass heatmap, hovering over a cell will show PV associated with unsuccessful passes from that point on the pitch.

Figure 5: A heatmap showing the average possession value (PV) of historically incomplete pass from the hover spot to all areas on the pitch. The relative frequency of successful passes from the center spot to each other cell is shown as a percentage. The highest and lowest PV values across all end points associated with an incomplete pass from the hover point are shown above the pitch. Black cells represent areas to which unsuccessful passes from the hover spot have never been made.

Hovering my mouse over various areas in the middle third of the pitch, I consistently see slightly positive values near the top of the 18-yard box. This is not all that dissimilar from the trend observed with the successful pass pitch, where the passes into the final quarter of the pitch had strong positive PV from basically anywhere. And, like the interactive pitch for completed passes, a 30-yard incomplete pass forward from one’s own 18-yard box doesn’t have the same PV as a 30-yard incomplete pass forward from the half line to the opponent’s 18-yard box. Not all long incomplete passes are judged equally.

Conclusion

Overall, my takeaways are as follows:

  1. Not all long passes add the same kind of value. The pass has to be one that ends up near the box to create non-trivial positive PV.
  2. And, while completed passes will almost always add more value, incomplete passes can also have positive PV when they’re played into dangerous areas.

For those who have built PV models or are very familiar with them in some way, perhaps the latter observation is not an unsurprising result. Indeed, we should want our PV models to see past the outcome of a pass and properly quantify the threat that a through ball can have, whether it’s completed or not.

Caveats

  • The choice of model surely plays a role in the inference we’ll make. Even atomic VAEP, the cooler younger brother to the baseline VAEP model, may yield different answers due to the way that it treats passes.6

  • Along the same lines, the gradients in the pitches are only as “good” as the quality of the model. While the cells show average PV values over many passes7, the PV value may not match intuition if the model doesn’t account for all relevant factors. If headed passes weren’t treated differently from footed passes, the gradients would likely show a lot more noise due to the randomness at which headed passes are successfully made.

  • The endpoint of incomplete passes is subject to a fundamental source of noise–interception locations. I’ve implicitly assumed that unsuccessful passes are intercepted very near the intended target, but this is not always the case. Interceptions where, for example, the defender blocks a long through ball near where the pass is made, can skew the model training, exaggerating the value of short incomplete passes.

Appendix

VAEP

For those really interested in the details, the PV I’ve been showing is actually the goal probabilities underlying the VAEP framework, but not actually VAEP. In other words, I’ve been showing

\[ P_{\text{goal}}(S_i, x) = P_{\text{scores}}(S_i, x) + (-P_{\text{concedes}}(S_i, x)) \tag{1}\]

where \(S_i\) is the \(i\)th game state and \(x\) is the team, either home or visiting. But VAEP is actually

\[ V(a_i, x) = \Delta P_{\text{scores}}(a_i, x) + (-\Delta P_{\text{concedes}}(a_i, x)) \tag{2}\]

where

\[ \Delta P_{\text{scores}}(a_i, x) = P_{\text{scores}}(S_i, x) - P_{\text{scores}}(S_{i-1}, x) \tag{3}\]

for action \(a_i\) moving the game from state \(S_{i-1}\) to \(S_i\), and where \(\Delta P_{\text{concedes}}(a_i, x)\) is defined similarly.

VAEP directly reflects the value added by an action relative to the prior action. For those who have worked with expected threat (xT) before, this is analogous to the “xT created” metric, as described by Singh.

… [T]he point of xT was to come up with a metric that can quantify threat at any location on the pitch… [W]e can value individual player actions in buildup play by computing the difference in xT between the start and end locations. In other words, we will say that an action that moves the ball from location \((x,y)\) to location \((z,w)\) has value \(\texttt{xT}_{z,w} - \texttt{xT}_{x,y}\).

Complete Passes

Assuming the reader is comfortable with the plotting style and notations before, we now skip to re-creating the dynamic completed pass pitch.

Figure 6: A heatmap showing the average VAEP of historically completed pass from the hover spot to all areas on the pitch. The relative frequency of successful passes from the hover spot to each other cell is shown as a percentage. The highest and lowest VAEP values across all end points associated with a completed pass from the hover point are shown above the pitch. Black cells represent areas to which successful passes from the hover spot have never been made.

The big takeaway for me here is that there are a lot more cells on the pitch showing negative values (now VAEP instead of “PV”), especially for passes backward. This makes sense, as the model should see that, on average, such passes put the ball in a less advantageous position.

Recall that our pre-Appendix “PV” pitches account for the probability of conceding. Instances in which the pre-Appendix complete pass pitch shows a negative value indicate a pass start-end pair in which the probability of conceding increases more than the probability of scoring increases (or instances in which the probability of conceding decreases less than the probability of scoring decreases). Naturally, this resulted in a few negative start-to-end pass location combinations, particularly for passes sent very far backward. But now that we’re also accounting for the value of the prior action with VAEP, the pitch shows a lot more negatively valued start-end pairs, particularly for short passes backward.

Incomplete Passes

And now we re-create the dynamic pitch for incomplete passes, but showing VAEP instead of goal probability.

Figure 7: A heatmap showing the average VAEP of historically incomplete pass from the hover spot to all areas on the pitch. The relative frequency of successful passes from the hover spot to each other cell is shown as a percentage. The highest and lowest VAEP values across all end points associated with an incomplete pass from the hover point are shown above the pitch. Black cells represent areas to which unsuccessful passes from the hover spot have never been made.

Compared to the pre-Appendix dynamic pitch for incomplete passes, this one shows a lot more negative values. In fact, there is only a very small subset of end points–those near the penalty spot–where an incomplete pass can have positive VAEP, no matter the starting point. So, when accounting for the value of the prior action with VAEP, it appears that incomplete passes only have positive impact in a handful of situations.

No matching items

Footnotes

  1. Except in this post, where I only briefly mention that I use a PV model.↩︎

  2. My model is trained on 2013/14 - 2023/24 English Premier League data.↩︎

  3. While all PV models are similar conceptually, it’s important to identify how they differ in their target variables. VAEP specifically tries to quantify the difference in the probability of scoring and conceding in the next 10 actions. In contrast, expected threat (xT)–perhaps the most well-known PV model–tries to quantify only the probability of scoring in the next 5 actions, not accounting for the conceding probability, which can undermine the “risk” associated with incomplete passes.↩︎

  4. Do not be alarmed by the small values! Values between 0.02 and 0.02 are very common for PV models. After all, the values represent goal probabilities over sequence of actions, and goals don’t happen all that frequently in soccer.↩︎

  5. We observe lots of missingness near the defender’s box. Such incomplete passes backward would be very illogical no matter the game situation, so it’s not surprising to see that such passes are not observed in our data set.↩︎

  6. Atomic VAEP splits passes into two actions–the pass itself and the reception (or lack of).↩︎

  7. There are over 3.4M total passes in the data set.↩︎