Estimating Shooting Performance Unlikeliness

r
soccer
Quantifying how unlikely a player’s season-long shooting performance is, factoring in their prior shot history
Author

Tony ElHabr

Published

May 5, 2024

Modified

May 18, 2024

Introduction

Towards the end of each soccer season, we naturally start to look back at player stats, often looking to see who has performed worse compared to their past seasons. We may have different motivations for doing so–we may be trying to attribute team under-performance to individuals, we may be hypothesizing who is likely to be transferred, etc.

It’s not uncommon to ask “How unlikely was their shooting performance this season?” when looking at a player who has scored fewer goals than expected.1 For instance, if a striker only scores 8 goals on 12 expected goals (xG), their “underperformance” of 4 goals is stark, especially if they had scored more goals than their xG in prior seasons.

The “Outperformance” (\(O_p\)) ratio–the ratio of a player \(p\)’s goals \(G_p\) to expected goals \(xG_p\)–is a common way of evaluating a player’s shooting performance.2

\[ O_p = \frac{G_p}{xG_p} \]

An \(O_p\) ratio of 1 indicates that a player is scoring as many goals as expected; a ratio greater than 1 indicates overperformance; and a ratio less than 1 indicates underperformance Our hypothetical player underperformed with \(O_p = \frac{8}{12} = 0.67\).

In most cases, we have prior seasons of data to use when evaluating a player’s \(O_p\) ratio for a given season. For example, let’s say our hypothetical player scored 14 goals on 10 xG (\(O_p = 1.4\)) in the season prior, and 12 goals on 8 xG (\(O_p = 1.5\)) before that. A \(O_p = 0.67\) after those seasons seems fairly unlikely, especially compared to an “average” player who theoretically achieves \(O_p = 1\) ratio every year.

So how do we put a number on the unlikeliness of that \(O_p = 0.67\) for our hypothetical player, accounting for their prior season-long performances?

Data

I’ll be using public data from FBref for the 2018/19 - 2023/24 seasons of the the Big Five European soccer leagues, updated through May 7. Fake data is nice for examples, but ultimately we want to test our methods on real data. Our intuition about the results can be a useful caliber of the sensibility of our results.

Get shot data
raw_shots <- worldfootballR::load_fb_match_shooting(
  country = COUNTRIES,
  tier = TIERS,
  gender = GENDERS,
  season_end_year = SEASON_END_YEARS
)
#> → Data last updated 2024-05-07 17:52:59 UTC

np_shots <- raw_shots |> 
  ## Drop penalties
  dplyr::filter(
    !dplyr::coalesce((Distance == '13' & round(as.double(xG), 2) == 0.79), FALSE)
  ) |> 
  dplyr::transmute(
    season_end_year = Season_End_Year,
    team = Squad,
    player_id = Player_Href |> dirname() |> basename(),
    player = Player,
    match_date = lubridate::ymd(Date),
    match_id = MatchURL |> dirname() |> basename(),
    minute = Minute,
    g = as.integer(Outcome == 'Goal'),
    xg = as.double(xG)
  ) |> 
  ## A handful of scored shots with empty xG
  dplyr::filter(!is.na(xg)) |> 
  dplyr::arrange(season_end_year, player_id, match_date, minute)

## Use the more commonly used name when a player ID is mapped to multiple names
##   (This "bug" happens because worldfootballR doesn't go back and re-scrape data
##   when fbref makes a name update.)
player_name_mapping <- np_shots |> 
  dplyr::count(player_id, player) |> 
  dplyr::group_by(player_id) |> 
  dplyr::slice_max(n, n = 1, with_ties = FALSE) |> 
  dplyr::ungroup() |> 
  dplyr::distinct(player_id, player)

player_season_np_shots <- np_shots |> 
  dplyr::summarize(
    .by = c(player_id, season_end_year), 
    shots = dplyr::n(),
    dplyr::across(c(g, xg), sum)
  ) |> 
  dplyr::mutate(
    o = g / xg
  ) |> 
  dplyr::left_join(
    player_name_mapping,
    by = dplyr::join_by(player_id)
  ) |> 
  dplyr::relocate(player, .after = player_id) |> 
  dplyr::arrange(player_id, season_end_year)
player_season_np_shots
#> # A tibble: 15,327 × 7
#>    player_id player          season_end_year shots     g    xg     o
#>    <chr>     <chr>                     <int> <int> <int> <dbl> <dbl>
#>  1 0000acda  Marco Benassi              2018    70     5  4.01 1.25 
#>  2 0000acda  Marco Benassi              2019    59     7  5.61 1.25 
#>  3 0000acda  Marco Benassi              2020    20     1  1.01 0.990
#>  4 0000acda  Marco Benassi              2022    10     0  0.99 0    
#>  5 0000acda  Marco Benassi              2023    19     0  1.35 0    
#>  6 000b3da6  Manuel Iturra              2018     2     0  0.41 0    
#>  7 00242715  Moussa Niakhate            2018    16     0  1.43 0    
#>  8 00242715  Moussa Niakhate            2019    10     1  1.5  0.667
#>  9 00242715  Moussa Niakhate            2020    11     1  1.02 0.980
#> 10 00242715  Moussa Niakhate            2021     9     2  1.56 1.28 
#> # ℹ 15,307 more rows

For illustrative purposes, we’ll focus on one player in particular–James Maddison. Maddison has had a sub-par 2023/2024 season by his own standards, underperforming his xG for the first time since he started playing in the Premier League in 2018/19.

Maddison’s season-by-season data
player_season_np_shots |> dplyr::filter(player == 'James Maddison')
#> # A tibble: 6 × 7
#>   player_id player         season_end_year shots     g    xg     o
#>   <chr>     <chr>                    <int> <int> <int> <dbl> <dbl>
#> 1 ee38d9c5  James Maddison            2019    81     6  5.85 1.03 
#> 2 ee38d9c5  James Maddison            2020    74     6  5.36 1.12 
#> 3 ee38d9c5  James Maddison            2021    75     8  3.86 2.07 
#> 4 ee38d9c5  James Maddison            2022    72    12  7.56 1.59 
#> 5 ee38d9c5  James Maddison            2023    83     9  7.12 1.26 
#> 6 ee38d9c5  James Maddison            2024    55     4  5.02 0.797
More variables useful for the rest of the post
TARGET_SEASON_END_YEAR <- 2024

player_np_shots <- player_season_np_shots |> 
  dplyr::mutate(
    is_target = season_end_year == TARGET_SEASON_END_YEAR
  ) |> 
  dplyr::summarize(
    .by = c(is_target, player_id, player),
    dplyr::across(
      c(shots, g, xg),
      \(.x) sum(.x, na.rm = TRUE)
    )
  ) |> 
  dplyr::mutate(o = g / xg) |> 
  dplyr::arrange(player, player_id, is_target)

wide_player_np_shots <- player_np_shots |>
  dplyr::transmute(
    player_id, 
    player,
    which = ifelse(is_target, 'target', 'prior'), 
    shots, g, xg, o
  ) |> 
  tidyr::pivot_wider(
    names_from = which, 
    values_from = c(shots, g, xg, o), 
    names_glue = '{which}_{.value}'
  )

all_players_to_evaluate <- wide_player_np_shots |> 
  tidyr::drop_na(prior_o, target_o) |> 
  dplyr::filter(
    prior_shots >= 50,
    target_shots >= 10,
    prior_g > 0, 
    target_g > 0
  )

Methods and Analysis

I’ll present 3 approaches to quantifying the “unlikelihood” of a player “underperforming” relative to their prior \(O_p\) history.3 I use “prior” to refer to an aggregate of pre-2023/24 statistics, and “target” to refer to 2023/24.

Approach 1: Weighted Percentile Rank

The first approach I’ll present is a handcrafted “ranking” method.

  1. Calculate the proportional difference between the pre-target and target season outperformance ratios–\(O_{p,\text{target}'}\) and \(O_{p,\text{target}'}\) respectively–for all players \(P\).

\[ \delta O_p = \frac{O_{p,\text{target}} - O_{p,\text{target}'}}{O_{p,\text{target}'}} \]

  1. Weight \(\delta O^w_p\) by the player’s \(xG_p\) accumulated in prior seasons.4

\[ \delta O^w_p = \delta O_p * xG_p \]

  1. Calculate the the underperforming unlikeliness \(U^-_p\) as a percentile rank of ascending \(\delta O^w_p\), i.e. more negative \(\delta O^w_p\) values correspond to a lower \(U^-_p\) percentile.56

This is pretty straightforward to calculate once you’ve got the data prepared in the right format.

Approach 1 implementation
## `u` for "underperforming unlikelihood"
all_u_approach1 <- all_players_to_evaluate |> 
  dplyr::transmute(
    player,
    prior_o,
    target_o,
    prior_xg,
    weighted_delta_o = prior_shots * (target_o - prior_o) / prior_o,
    u = dplyr::percent_rank(weighted_delta_o)
  ) |> 
  dplyr::arrange(u)

maddison_u_approach1 <- all_u_approach1 |> 
  dplyr::filter(player == 'James Maddison')
Approach 1 output for Maddison
maddison_u_approach1 |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 1 × 4
#>   player         prior_o target_o      u
#>   <chr>            <dbl>    <dbl>  <dbl>
#> 1 James Maddison    1.38    0.797 0.0233

This approach finds Maddison’s 2023/24 \(O_p\) of 0.797 to be about a 2nd percentile outcome. Among the 602 players evaluated, Maddison’s 2023/24 \(O_p\) ranks as the 15th most unlikely.

For context, here’s a look at the top 10 most unlikely outcomes for the 2023/24 season.

Approach 1 output, top 10 underperforming players
all_u_approach1 |> head(10) |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 10 × 4
#>    player              prior_o target_o       u
#>    <chr>                 <dbl>    <dbl>   <dbl>
#>  1 Ciro Immobile         1.23     0.503 0      
#>  2 Giovanni Simeone      1.03     0.306 0.00166
#>  3 Nabil Fekir           1.14     0.490 0.00333
#>  4 Wahbi Khazri          1.11     0.322 0.00499
#>  5 Kevin Volland         1.18     0.388 0.00666
#>  6 Adrien Thomasson      1.18     0.282 0.00832
#>  7 Timo Werner           0.951    0.543 0.00998
#>  8 Gaëtan Laborde        1.02     0.546 0.0116 
#>  9 Fabián Ruiz Peña      1.67     0.510 0.0133 
#> 10 Benjamin Bourigeaud   1.12     0.503 0.0150

Ciro Immobile tops the list, with several other notable attacking players who had less than stellar seasons.

Overall, I’d say that this methodology seems to generate fairly reasonable results, but it’s hard to pinpoint why exactly this approach is defensible other than it may lead to intuitive results. Despite the intuitively appealing outcomes, it’s important to scrutinize key factors affecting this methodology’s robustness and validity.

  • Subjectivity in Weighting: The choice to weight the difference in performance ratios by pre-2023/24 xG is inherently subjective. While it’s important to have some form of weighting so as to avoid disproportionately emphasizing players with minimal shooting opportunities, lternative weighting strategies could lead to significantly different rankings.
  • Sensitivity to Player Pool: The percentile ranking of a player’s \(O_{p,\text{target}}\) unlikeliness is highly sensitive to the comparison group. For instance, comparing forwards to defenders could skew results due to generally higher variability in defenders’ goal-to-expected goals ratios. Moreover, if a season unusually favors players outperforming their expected goals, even average performers might appear as outliers. This potential for selection bias underlines the importance of carefully choosing the comparison pool.
  • Assumption of Uniform Distribution: The methodology assumes a uniform distribution of performance unlikeliness across players. We presume that there must be, for example, 1% of players at the 1st percentile of underperformance. Although this assumption may hold over many seasons, it could misrepresent individual seasons where extremes are less pronounced or absent.

Approach 2: Resampling from Prior History of Shots

There’s only so much you can do with player-season-level data. We need to dive into shot-level data if we want to more robustly understand the uncertainty of outcomes.

Here’s a “resampling” approach to quantify the underperforming unlikeliness \(U^-_p\) of a player in the target season:

  1. Sample \(N_{p,\text{target}}\) shots from a player’s past shots \(S_{p,\text{target}'}\). Repeat this for \(R\) resamples.7
  2. Count the number of resamples \(r^-\) in which the outperformance ratio \(\hat{O}_{p,\text{target}'}\) of the sampled shots is less than or equal to the observed \(O_{p,\text{target}}\) in the target season for the player.8 The proportion \(U^-_p = \frac{r^-}{R}\) represents the unlikeness of a given player’s observed \(O_{p,\text{target}}\) (or worse) in the target season.

Here’s how that looks in code.

Approach 2 implementation
R <- 1000
resample_player_shots <- function(
    shots, 
    n_shots_to_sample, 
    n_sims = R,
    replace = TRUE,
    seed = 42
) {
  
  withr::local_seed(seed)
  purrr::map_dfr(
    1:n_sims,
    \(.sim) {
      sampled_shots <- shots |> 
        slice_sample(n = n_shots_to_sample, replace = replace)
      
      list(
        sim = .sim,
        xg = sum(sampled_shots$xg),
        g = sum(sampled_shots$g),
        o = sum(sampled_shots$g) / sum(sampled_shots$xg)
      )
    }
  )
}

resample_one_player_o <- function(shots, target_season_end_year) {
  target_shots <- shots |>
    dplyr::filter(season_end_year == target_season_end_year)
  
  prior_shots <- shots |>
    dplyr::filter(season_end_year < target_season_end_year)
  
  prior_shots |> 
    resample_player_shots(
      n_shots_to_sample = nrow(target_shots)
    )
}

resample_player_o <- function(shots, players, target_season_end_year = TARGET_SEASON_END_YEAR) {
  purrr::map_dfr(
    players,
    \(.player) {
      shots |> 
        dplyr::filter(player == .player) |> 
        resample_one_player_o(
          target_season_end_year = target_season_end_year
        ) |> 
        dplyr::mutate(
          player = .player
        )
    }
  )
}

maddison_resampled_o <- np_shots |> 
  resample_player_o(
    players = 'James Maddison'
  ) |> 
  dplyr::inner_join(
    wide_player_np_shots |> 
      dplyr::select(
        player,
        prior_o,
        target_o
      ),
    by = dplyr::join_by(player)
  ) |> 
  dplyr::arrange(player)

maddison_u_approach2 <- maddison_resampled_o |>
  dplyr::summarize(
    .by = c(player, prior_o, target_o),
    u = sum(o <= target_o) / n()
  ) |> 
  dplyr::arrange(player)
Approach 2 output for Maddison
maddison_u_approach2 |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 1 × 4
#>   player         prior_o target_o     u
#>   <chr>            <dbl>    <dbl> <dbl>
#> 1 James Maddison    1.38    0.797 0.109

The plot below should provide a bit of visual intuition as to what’s going on.

These results imply that Maddison’s 2023/24 \(G / xG\) ratio of 0.797 (or worse) occurs in 10.9% of simulations, i.e. a 11th percentile outcome. That’s a bit higher than what the first approach showed.

How can we feel confident about this approach? Well, in the first approach, we assumed that the underperforming unlikelihood percentages should be uniform across all players, hence the percentile ranking. I think that’s a good assumption, so we should see if the same bears out with this second approach.

The plot below shows a histogram of the underperforming unlikelihood across all players, where each player’s estimated unlikelihood is grouped into a decile.

Approach 2 implementation for all players
all_resampled_o <- np_shots |> 
  resample_player_o(
    players = all_players_to_evaluate$player
  ) |> 
  dplyr::inner_join(
    wide_player_np_shots |> 
      ## to make sure we just one Rodri, Danilo, and Nicolás González 
      dplyr::filter(player_id %in% all_players_to_evaluate$player_id) |> 
      dplyr::select(
        player,
        prior_o,
        target_o,
        prior_shots,
        target_shots
      ),
    by = dplyr::join_by(player)
  ) |> 
  dplyr::arrange(player, player)

all_u_approach2 <- all_resampled_o |>
  dplyr::summarize(
    .by = c(player, prior_o, target_o, prior_shots, target_shots),
    u = sum(o <= target_o) / n()
  ) |> 
  dplyr::arrange(u)

Indeed, the histogram shows a fairly uniform distribution, with a bit of irregularity at the very edges.

Looking at who is in the lower end of the leftmost decile, we see some of the same names–Immobile and Savanier–among the ten underperformers. (Withholding judgment on the superiority of any methodology, we can find some solace in seeing some of the same names among the most unlikely underperformers here as we did with approach 1.)

Approach 2 output, top 10 underperforming players
all_u_approach2 |> head(10) |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 10 × 4
#>    player                    prior_o target_o     u
#>    <chr>                       <dbl>    <dbl> <dbl>
#>  1 Pierre-Emerick Aubameyang    1.07    0.636 0.009
#>  2 Alex Baena                   1.54    0.326 0.01 
#>  3 Amine Harit                  1.27    0.262 0.01 
#>  4 Erling Haaland               1.26    0.897 0.013
#>  5 Kevin Volland                1.18    0.388 0.015
#>  6 Antonio Sanabria             1.05    0.380 0.018
#>  7 Kevin Behrens                1.39    0.673 0.019
#>  8 Elye Wahi                    1.38    0.770 0.021
#>  9 Ansu Fati                    1.31    0.430 0.024
#> 10 M'Bala Nzola                 1.10    0.274 0.025

One familiar face in the printout above is Manchester City’s striker Erling Haaland, whose underperformance this season has been called among fans and the media. His sub-par performance this year ranked as a 9th percentile outcome by approach 1, which is very low, but not quite as low as what this approach finds.

Here are some parting thoughts on this methodology before we look at another:

  • Assumption of Shot Profile Consistency: We assume that a player’s past shot behavior accurately predicts their future performance. This generally holds unless a player changes their role or team, or is recovering from an injury. But there are other exceptions as well. For example, Haaland has taken a lot more headed shots this season, despite playing effectively the same role on mostly the same team from last season The change in Haaland’s shot profile this year conflicts with the assumption of a consistent shot profile, perhaps explaining why this resampling approach finds Haaland’s shooting performance to be more unlikely the percentile ranking approach.
  • Non-Parametric Nature: This method does not assume any specific distribution for a player’s performance ratios; instead, it relies on the stability of a player’s performance over time. The resampling process itself shapes the outcome distribution, which can vary significantly between players with different shooting behaviors, such as a forward versus a defender.
  • Computational Demands: The resampling approach requires relatively more computational resources than the prior approach, especially without parallel processing. Even a relatively small number of resamples, such as \(R=1000\), can take a few second per player to compute.

Approach 3: Evaluating a Player-Specific Cumulative Distribution Function (CDF)

If we assume that the set of goals-to-xG ratios come from a Gamma data-generating process, then we can leverage the properties of a player-level Gamma distribution to assess the unlikelihood of a players \(O_p\) ratio.

To calculate the underperforming unlikeliness \(U^-_p\):

  1. Estimate a Gamma distribution \(\Gamma_{p,\text{target}'}\) to model a player’s true outperformance ratio \(O_{p}\) across all prior shots, excluding those in the target season–\(\hat{O}_{p,\text{target}'}\).
  2. Calculate the probability that \(\hat{O}_{p,\text{target}'}\) is less than or equal to the player’s observed \(O_{p,\text{target}}\) in the target season using the Gamma distribution’s cumulative distribution function (CDF).

While that may sound daunting, I promise that it’s not (well, aside from a bit of “magic” in estimating a reasonable Gamma distribution per player).

Approach 3 implementation
N_SIMS <- 10000

SHOT_TO_SHAPE_MAPPING <- list(
  'from' = c(50, 750),
  'to' = c(1, 25)
)
estimate_one_gamma_distributed_o <- function(
    shots,
    target_season_end_year
) {
  player_np_shots <- shots |> 
    dplyr::mutate(is_target = season_end_year == target_season_end_year)
  
  prior_player_np_shots <- player_np_shots |> 
    dplyr::filter(!is_target)
  
  target_player_np_shots <- player_np_shots |> 
    dplyr::filter(is_target)
  

  agg_player_np_shots <- player_np_shots |>
    dplyr::summarize(
      .by = c(is_target),
      shots = dplyr::n(),
      dplyr::across(c(g, xg), \(.x) sum(.x))
    ) |> 
    dplyr::mutate(o = g / xg)
  
  agg_prior_player_np_shots <- agg_player_np_shots |> 
    dplyr::filter(!is_target)
  
  agg_target_player_np_shots <- agg_player_np_shots |> 
    dplyr::filter(is_target)

  shape <- dplyr::case_when(
    agg_prior_player_np_shots$shots < SHOT_TO_SHAPE_MAPPING$from[1] ~ SHOT_TO_SHAPE_MAPPING$to[2],
    agg_prior_player_np_shots$shots > SHOT_TO_SHAPE_MAPPING$from[2] ~ SHOT_TO_SHAPE_MAPPING$to[2],
    TRUE ~ scales::rescale(
      agg_prior_player_np_shots$shots, 
      from = SHOT_TO_SHAPE_MAPPING$from, 
      to = SHOT_TO_SHAPE_MAPPING$to
    )
  )
  list(
    'shape' = shape,
    'rate' = shape / agg_prior_player_np_shots$o
  )
}

estimate_gamma_distributed_o <- function(
    shots,
    players,
    target_season_end_year
) {
  
  purrr::map_dfr(
    players,
    \(.player) {
      params <- shots |> 
        dplyr::filter(player == .player) |> 
        estimate_one_gamma_distributed_o(
          target_season_end_year = target_season_end_year
        )
      
      list(
        'player' = .player,
        'params' = list(params)
      )
    }
  )
}

select_gamma_o <- np_shots |> 
  estimate_gamma_distributed_o(
    players = 'James Maddison',
    target_season_end_year = TARGET_SEASON_END_YEAR
  ) |> 
  dplyr::inner_join(
    wide_player_np_shots |> 
      dplyr::select(
        player,
        prior_o,
        target_o
      ),
    by = dplyr::join_by(player)
  ) |> 
  dplyr::arrange(player)

maddison_u_approach3 <- select_gamma_o |> 
  dplyr::mutate(
    u = purrr::map2_dbl(
      target_o,
      params,
      \(.target_o, .params) {
        pgamma(
          .target_o, 
          shape = .params$shape, 
          rate = .params$rate,
          lower.tail = TRUE
        )
      }
    ),
    ou = 1 - u
  ) |> 
  tidyr::unnest_wider(params)
Approach 3 output for Maddison
maddison_u_approach3 |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 1 × 4
#>   player         prior_o target_o      u
#>   <chr>            <dbl>    <dbl>  <dbl>
#> 1 James Maddison    1.38    0.797 0.0469

We see that Maddison’s 2023/24 \(O_{p,\text{target}}\) ratio of 0.797 (or worse) is about a 5th percentile outcome given his prior shot history.

To gain some intuition around this approach, we can plot out the Gamma distributed estimate of Maddison’s \(O_p\). The result is a histogram that looks not all that dissimilar to the one from before with resampled shots, just much smoother (since this is a “parametric” approach).

As with approach 2, we should check to see what the distribution of underperforming unlikeliness looks like–we should expect to see a somewhat uniform distribution.

Approach 3 for all players
all_gamma_o <- np_shots |> 
  estimate_gamma_distributed_o(
    players = all_players_to_evaluate$player,
    target_season_end_year = TARGET_SEASON_END_YEAR
  ) |> 
  dplyr::inner_join(
    wide_player_np_shots |> 
      dplyr::filter(
        player_id %in% all_players_to_evaluate$player_id
      ) |> 
      dplyr::select(
        player,
        prior_o,
        target_o
      ),
    by = dplyr::join_by(player)
  ) |> 
  dplyr::arrange(player)

all_u_approach3 <- all_gamma_o |> 
  dplyr::mutate(
    u = purrr::map2_dbl(
      target_o,
      params,
      \(.target_o, .params) {
        pgamma(
          .target_o, 
          shape = .params$shape, 
          rate = .params$rate,
          lower.tail = TRUE
        )
      }
    )
  ) |> 
  tidyr::unnest_wider(params) |> 
  dplyr::arrange(u)

This histogram has a bit more distortion than our resampling approach, so perhaps it’s a little less calibrated.

Looking at the top 10 strongest underperformers, 2 of the names here–Volland Sanabria–are shared with approach 2’s top 10, and 7 are shared with approach 1’s top 10.

Approach 3 output, top 10 underperforming players
all_u_approach3 |> head(10) |> dplyr::select(player, prior_o, target_o, u)
#> # A tibble: 10 × 4
#>    player           prior_o target_o        u
#>    <chr>              <dbl>    <dbl>    <dbl>
#>  1 Ciro Immobile       1.23    0.503 0.000238
#>  2 Giovanni Simeone    1.03    0.306 0.000248
#>  3 Adrien Thomasson    1.18    0.282 0.000346
#>  4 Wahbi Khazri        1.11    0.322 0.000604
#>  5 Kevin Volland       1.18    0.388 0.00132 
#>  6 Nabil Fekir         1.14    0.490 0.00256 
#>  7 Fabián Ruiz Peña    1.67    0.510 0.00271 
#>  8 Antonio Sanabria    1.05    0.380 0.00796 
#>  9 Téji Savanier       1.42    0.548 0.0103  
#> 10 Jordan Veretout     1.16    0.360 0.0105

We can visually check the consistency of the results from this method with the prior two with scatter plots of the estimated underperforming unlikeliness from each.

If two of the approaches were perfectly in agreement, then each point, representing one of the 602 evaluated players, would fall along the 45-degree slope 1 line.

With that in mind, we can see that approach 3 is more precise agrees with approach 1, although approach 3 tends to assign slightly higher percentiles to players on the whole. The results from approaches 2 and 3 also have a fair degree of agreement, and the results are more uniformly calibrated.

Stepping back from the results, what can we say about the principles of the methodology?

  • Parametric Nature: The reliance on a Gamma distribution for modeling a player’s performance is both a strength and a limitation. The Gamma distribution is apt for positive, skewed continuous variables, making it suitable for modeling goals-to-xG ratios. However, the dependency on a single distribution type may restrict the scope of analysis.
  • Sensitivity to Distribution Parameters: The outcomes of this methodology are highly sensitive to the parameters defining each player’s Gamma distribution. Small adjustments in shape or rate parameters can significantly alter the distribution, causing substantial shifts in the percentile outcomes of player performances. This sensitivity underscores the need for careful parameter selection and calibration.
  • Flexibility of the Model: Despite its sensitivity, the Gamma distribution offers considerable flexibility. It allows for fine-tuning of the model to better fit the data, which can be advantageous for capturing the nuances of different players’ shot profiles.

Conclusion

Here’s a summary of the biggest pros and cons of each approach, along with the result for Maddison.

Approach Description Biggest Pro Biggest Con Maddison 2023/24 Underperformance Unlikeliness
1 Percentile Ranking customizable sensitive to the choice of players to evaluate 2nd percentile
2 Resampling non-parametric limited by player’s shot history 11th percentile
3 Cumulative Distribution Function (CDF) flexibility9 sensitive to choice of distribution parameters 5th percentile

I personally prefer either the second or third approach. In practice, perhaps the best thing to do is take an ensemble average of each approach, as they each have their pros and cons.

Potential Future Research

  1. Can these approaches be applied to teams or managers to understand the unlikeliness of their season-long outcomes?

I think the answer is “yes”, for the resampling approach. The non-parametric nature of resampling makes it easy to translate to other “levels of aggregation”, i.e. a set of players under a manager or playing as a team.

  1. Can we accurately attribute a percentage of underperformance to skill and luck?

Eh, I don’t know about “accurately”, especially at the player-level.The R-squared of year-over-year player-level G / xG ratios is nearly zero. If we equate “skill” to “percent of variance explained in year-over-year correlations of a measure (i.e. G / xG)”, then I suppose the answer is that basically 0% of seasonal over- or under-performance is due to innate factors; rather, we’d attribute all variation to “luck” (assuming that their “skill” and “luck” are the only factors that can explain residuals). That’s not all that compelling, although it may be the reality.

My prior work on “meta-metrics” for soccer perhaps has a more compelling answer. The “stability” measure defined in that post for \(G / xG\) comes out to about 70% (out of 100%).

Appendix

Approach 0: \(t\)-test

If you have some background in statistics, applying a \(t\)-test (using shot-weighted averages and standard deviations) may be an approach that comes to mind.

Approach 0
u_approach0 <- player_season_np_shots |> 
  dplyr::semi_join(
    all_players_to_evaluate |> dplyr::select(player_id),
    by = dplyr::join_by(player_id)
  ) |> 
  dplyr::filter(season_end_year < TARGET_SEASON_END_YEAR) |> 
  dplyr::summarise(
    .by = c(player),
    mean = weighted.mean(o, w = shots),
    ## could also use a function like Hmisc::wtd.var for weighted variance
    sd = sqrt(sum(shots * (o - weighted.mean(o, w = shots))^2) / sum(shots))
  ) |> 
  dplyr::inner_join(
    wide_player_np_shots |> 
      dplyr::select(player, prior_o, target_o),
    by = dplyr::join_by(player)
  ) |> 
  dplyr::mutate(
    z_score = (target_o - mean) / sd,
    ## multiply by 2 for a two-sided t-test
    u = pnorm(-abs(z_score))
  ) |> 
  dplyr::select(-c(mean, sd)) |> 
  dplyr::arrange(player)
Approach 0 output
u_approach0 |> 
  dplyr::filter(player == 'James Maddison') |> 
  dplyr::select(player, prior_o, target_o, u) 
#> # A tibble: 1 × 4
#>   player         prior_o target_o      u
#>   <chr>            <dbl>    <dbl>  <dbl>
#> 1 James Maddison    1.38    0.797 0.0707

In reality, this isn’t giving us a percentage of unlikelihood of the outcome. Rather, the p-value measures the probability of underformance as extreme as the underperformance observed in 2023/24 if the null hypothesis is true. The null hypothesis in this case would be that there is no significant difference between the player’s actual \(O_p\) ratio in the 2023/24 season and the distribution of outperformance ratios observed in previous seasons.

No matching items

Footnotes

  1. I only consider non-penalty xG and goals for this post. The ability to score penalties at a high success rate is generally seen as a different skill set than the ability to score goals in open play.↩︎

  2. The raw difference between goals and xG is a reasonable measure of shooting performance, but it can “hide” shot volume. Is it fair to compare a player who takes 100 shots in a year and scores 12 goals on 10 xG with a player who takes 10 shots and scores 3 goals on 1 xG? The raw difference is +2 in both cases, indicating no difference in the shooting performance for the two players. However, their \(O_p\) would be 1.2 and 3 respectively, hinting at the former player’s small sample size.↩︎

  3. While I focus on underperformance in this post, “overperformance” could be quantified in a similar (i.e. symmetrical) manner with each technique.↩︎

  4. The weighting emphasizes scenarios where a veteran player, typically overperforming or at worst neutral, suddenly underperforms, as opposed to a second-year player experiencing similar downturns.↩︎

  5. An overperforming unlikelihood \(U^+_p\) could be calculated by sorting \(\delta O^w_p\) in descending order instead.↩︎

  6. Percentiles greater than 50% generally correspond with players who have overperformed, so really the bottom 50% are the players we’re looking at when we’re considering underperformance unlikeliness.↩︎

  7. \(N_p\) should be set equal to the number of shots a player has taken in the target season, i.e. 2023/24 here. \(R\) should be set to some fairly large number, so as to achieve stability in the results.↩︎

  8. Similarly, to estimate the unlikeness of an overperforming season, count up in how many simulations \(r^+\) the outperformance ratio of the resampled shots is greater than \(O_{p,\text{target}}\) and calculate the proportion \(U^+_p = \frac{r^+}{R}\).↩︎

  9. Well, it’s “flexible” to the extent that a statistical distribution can be flexible.↩︎