Is there an existing issue for this?
Have you installed the latest development version of the package(s) in question?
If this is a data issue, have you tried clearing your nflverse cache?
I have cleared my nflverse cache and the issue persists.
What version of the package do you have?
nflreadr: 1.5.0
Describe the bug
The pbp data's score at the end of the game doesn't match the schedule data's score for 17 games. The schedule data matches other NFL score websites, so the issue is with the pbp data
16 of these are Jacksonville home games in the 2001 and 2002 seasons. Whenever a touchdown is scored by JAX, the 6 points are awarded to the other team. The same issue happens when JAX scores a safety. Scoring is correct for all non-JAX scores, and for Jax extra points, 2-pt-conv, and field goals. I noticed these games due to anomalies in the home_wp, but I think fixing the scoring would fix the home_wp
The 17th game is 2011_13_DET_NO, and is very different from the other 16 games. Here there appears to be 2 duplicate plays: play_ids 1129 and 1152 are the same (14 yd TD run by Ingram at 11:50 in the 2nd), and play_ids 1655 and 1946 are the same (2 yd TD run by Smith at 2:32 in the 2nd).
Reprex
library(nflreadr)
pbp <- load_pbp(TRUE)
schedules <- load_schedules(TRUE)
end_game <- pbp[pbp$desc == "END GAME", c("game_id", "total_home_score", "total_away_score")]
merged <- merge(
schedules[, c("game_id", "season", "home_team", "away_team", "home_score", "away_score")],
end_game,
by = "game_id"
)
discrepancies <- merged[
merged$home_score != merged$total_home_score |
merged$away_score != merged$total_away_score,
]
discrepancies$sum_score_diff <- (
discrepancies$total_home_score + discrepancies$total_away_score -
discrepancies$home_score - discrepancies$away_score
)
discrepancies$home_score_diff <- discrepancies$total_home_score - discrepancies$home_score
print(discrepancies)
Expected Behavior
The pbp data and schedule data should have the same final score
nflverse_sitrep
Screenshots
Additional context
I'm using python's nflreadpy. I used AI to get the R equivalent code to confirm the problem exists there too.
This is my python code:
import nflreadpy as nfl
import polars as pl
pbp = nfl.load_pbp(True)
schedules = nfl.load_schedules(True)
pbp.filter(pl.col("desc") == "END GAME")
discrepancies = (
schedules
.select(["game_id", "season", "home_team", "away_team", "home_score", "away_score"])
.join(
pbp.filter(pl.col("desc") == "END GAME").select(["game_id", "total_home_score", "total_away_score"]),
on="game_id",
how="inner"
)
.filter((pl.col("home_score") != pl.col("total_home_score")) | (pl.col("away_score") != pl.col("total_away_score")))
.with_columns(
(pl.col("total_home_score") + pl.col("total_away_score") - pl.col("home_score") - pl.col("away_score")).alias("sum_score_diff"),
(pl.col("total_home_score") - pl.col("home_score")).alias("home_score_diff"),
)
)
print(discrepancies)
Is there an existing issue for this?
Have you installed the latest development version of the package(s) in question?
If this is a data issue, have you tried clearing your nflverse cache?
I have cleared my nflverse cache and the issue persists.
What version of the package do you have?
nflreadr: 1.5.0
Describe the bug
The pbp data's score at the end of the game doesn't match the schedule data's score for 17 games. The schedule data matches other NFL score websites, so the issue is with the pbp data
16 of these are Jacksonville home games in the 2001 and 2002 seasons. Whenever a touchdown is scored by JAX, the 6 points are awarded to the other team. The same issue happens when JAX scores a safety. Scoring is correct for all non-JAX scores, and for Jax extra points, 2-pt-conv, and field goals. I noticed these games due to anomalies in the home_wp, but I think fixing the scoring would fix the home_wp
The 17th game is 2011_13_DET_NO, and is very different from the other 16 games. Here there appears to be 2 duplicate plays: play_ids 1129 and 1152 are the same (14 yd TD run by Ingram at 11:50 in the 2nd), and play_ids 1655 and 1946 are the same (2 yd TD run by Smith at 2:32 in the 2nd).
Reprex
Expected Behavior
The pbp data and schedule data should have the same final score
nflverse_sitrep
NAScreenshots
Additional context
I'm using python's nflreadpy. I used AI to get the R equivalent code to confirm the problem exists there too.
This is my python code: