Analysing Home Advantage in the Premier League

Does home advantage exist in the Premier League? I use data analytics to assess how much teams benefit from their respective home crowds during the 23/24 season.

Methodology

In this analysis, I used Python’s Pandas library to manipulate and analyse football data. I defined a function called home_away_results that takes a DataFrame containing match data as input. This function performs the following key tasks:

  • Melts the data to create a dedicated row for each team per game, ensuring a clear view of home and away performances.
  • Replaces team indicators (home_team and away_team) with 'Home' and 'Away', for improved readability.
  • Assigns points to each match based on the result, following the standard 3 points for a win, 1 point for a draw, and 0 points for a loss.
  • Groups the data by team and home/away status to analyse performance trends effectively.
  • Calculates the total points and average points per game for each team, categorised by home and away performance.

Visualisation

Data visualisation is important for extracting insights. I used Plotly for interactive graphing, to create a compelling visualisation, which is available here.

Interact with the plot here.

To gain a more granular perspective on home advantage, let’s create a facet grid using Seaborn. This will display a point plot for each team, visually representing their average points per game at home and away. Here’s the code to generate the plot:

g = sns.FacetGrid(results.reset_index(), col='team', col_wrap=5)
g.map(sns.pointplot, 'is_home', 'avg_points_per_game', order=["Away", "Home"])
g.set_axis_labels('', 'Avg Points per Game');
g.set_titles("{col_name}")
plt.savefig('home_away_points.png')

While most teams tend to have a higher average for home games, Burnley appears to be an exception. In addition, Burnley also struggles to secure points in general.

On the other hand, Newcastle’s resurgence under new ownership ignited a passionate fan base. The electric atmosphere at St. James’ Park, described by players like Anthony Gordon and managers like Mikel Arteta as one of the toughest places to play, can be a formidable force, intimidating opponents and inspiring the home team. This is evident in Newcastle’s significantly higher average points per game at home compared to away matches.

Statistical Significance: T-test for Home Advantage

To determine whether the observed differences in average points per game between home and away matches are statistically significant, I conducted a paired t-test. This test assumes that the same teams play both home and away matches, making their performances directly comparable.

from scipy.stats import ttest_rel

# Assuming `results` is the DataFrame you obtained earlier
def paired_t_test_on_points(results: pd.DataFrame):
    # Pivot the table to have home and away in separate columns
    pivot_table = results.unstack().reset_index()
    
    # Perform a paired t-test on avg_points_per_game for home and away
    home_points = pivot_table['avg_points_per_game']['Home']
    away_points = pivot_table['avg_points_per_game']['Away']
    
    # Perform the paired t-test
    t_stat, p_value = ttest_rel(home_points, away_points)
    
    return t_stat, p_value

# Calculate the t-statistic and p-value
t_stat, p_value = paired_t_test_on_points(results)

print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")

Output:

T-statistic: 6.9314, P-value: 0.0000

A p-value of 0.0000 is highly significant (typically, a p-value below 0.05 is considered significant). This indicates that we can reject the null hypothesis that there is no difference in average points per game between home and away matches.

Interpretation and Next Steps

The paired t-test confirms that the home advantage in the Premier League is statistically significant. However, further research could investigate factors like crowd size, travel distance, team tactics, and player performance contributing to this advantage, as well as variations across different teams and over time.

The link to the GitHub repository can be found here.