Skip to content

[BUG] PBP scores are incorrect for 17 games #99

Description

@Josh-TX

Is there an existing issue for this?

  • I have searched the existing issues

Have you installed the latest development version of the package(s) in question?

  • I have installed the latest development version of the package.

If this is a data issue, have you tried clearing your nflverse cache?

I have cleared my nflverse cache and the issue persists.

What version of the package do you have?

nflreadr: 1.5.0

Describe the bug

The pbp data's score at the end of the game doesn't match the schedule data's score for 17 games. The schedule data matches other NFL score websites, so the issue is with the pbp data

16 of these are Jacksonville home games in the 2001 and 2002 seasons. Whenever a touchdown is scored by JAX, the 6 points are awarded to the other team. The same issue happens when JAX scores a safety. Scoring is correct for all non-JAX scores, and for Jax extra points, 2-pt-conv, and field goals. I noticed these games due to anomalies in the home_wp, but I think fixing the scoring would fix the home_wp

The 17th game is 2011_13_DET_NO, and is very different from the other 16 games. Here there appears to be 2 duplicate plays: play_ids 1129 and 1152 are the same (14 yd TD run by Ingram at 11:50 in the 2nd), and play_ids 1655 and 1946 are the same (2 yd TD run by Smith at 2:32 in the 2nd).

Reprex

library(nflreadr)

pbp <- load_pbp(TRUE)
schedules <- load_schedules(TRUE)

end_game <- pbp[pbp$desc == "END GAME", c("game_id", "total_home_score", "total_away_score")]

merged <- merge(
  schedules[, c("game_id", "season", "home_team", "away_team", "home_score", "away_score")],
  end_game,
  by = "game_id"
)

discrepancies <- merged[
  merged$home_score != merged$total_home_score |
  merged$away_score != merged$total_away_score,
]

discrepancies$sum_score_diff <- (
  discrepancies$total_home_score + discrepancies$total_away_score -
  discrepancies$home_score - discrepancies$away_score
)
discrepancies$home_score_diff <- discrepancies$total_home_score - discrepancies$home_score

print(discrepancies)

Expected Behavior

The pbp data and schedule data should have the same final score

nflverse_sitrep

NA

Screenshots

Additional context

I'm using python's nflreadpy. I used AI to get the R equivalent code to confirm the problem exists there too.

This is my python code:

import nflreadpy as nfl
import polars as pl
pbp = nfl.load_pbp(True)
schedules = nfl.load_schedules(True)
pbp.filter(pl.col("desc") == "END GAME")
discrepancies = (
    schedules
    .select(["game_id", "season", "home_team", "away_team", "home_score", "away_score"])
    .join(
        pbp.filter(pl.col("desc") == "END GAME").select(["game_id", "total_home_score", "total_away_score"]),
        on="game_id",
        how="inner"
    )
    .filter((pl.col("home_score") != pl.col("total_home_score")) | (pl.col("away_score") != pl.col("total_away_score")))
    .with_columns(
        (pl.col("total_home_score") + pl.col("total_away_score") - pl.col("home_score") - pl.col("away_score")).alias("sum_score_diff"),
        (pl.col("total_home_score") - pl.col("home_score")).alias("home_score_diff"),
    )
)
print(discrepancies)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions