Skip to content

/api/v2/importErrors inflates total_entries when one import-error file maps to multiple DAGs #67525

@hkc-8010

Description

@hkc-8010

Under which category would you file this issue?

Airflow Core

Apache Airflow version

3.2.1

What happened and how to reproduce it?

We found a count mismatch in the Airflow 3 UI where the Dashboard "Dag Import Errors" badge can show a higher number than the import-errors modal and CLI.

In the affected deployment, the home-page badge showed 4 import errors, but:

  • the import-errors modal showed only 2 files
  • airflow dags list-import-errors showed only 2 files
  • the metadata DB contained only 2 import_error rows

This appears to happen when one ParseImportError file is associated with multiple DAGs in dag, causing the /api/v2/importErrors query to expand one import error into multiple joined rows. The endpoint later groups those rows back into one returned import-error object per file, but total_entries appears to be counted before that grouping step.

The original customer report included a UI screenshot showing the count mismatch. The screenshot is private support data, so I am not pasting the private attachment URL here, but it can be manually attached when filing if needed.

Evidence gathered during verification:

  1. Internal verification on 2026-05-13T16:43:30Z confirmed there were only 2 rows in import_error.
  2. Live verification on 2026-05-26 again showed only 2 import-error rows:
(596, 2026-05-26 04:42:26.495552+00:00, 'dags/test_smtp_local.py', 'main')
(697, 2026-05-26 04:40:27.897028+00:00, 'dags/dwh_garantias_extraction.py', 'main')
  1. Live airflow dags list-import-errors output on 2026-05-26 returned only these 2 files:
main | dags/dwh_garantias_extraction.py | TypeError: partial() got an unexpected keyword argument 'file_format'
main | dags/test_smtp_local.py          | airflow.sdk.exceptions.AirflowRuntimeError: VARIABLE_NOT_FOUND: {'message': 'Variable AIRFLOW_CONN_SMTP_CONN not found'}
  1. Live metadata query on 2026-05-26 showed that one import-error file maps to multiple DAGs:
## dag_counts
('dags/dwh_garantias_extraction.py', 'main', 1, 'dwh_garantias_extraction')
('dags/test_smtp_local.py', 'main', 3, 'smtp_check_emailoperator, smtp_send_smtplib, test_smtp_local')

## import_error_rows
(596, 'dags/test_smtp_local.py', 'main', 2026-05-26 04:42:26.495552+00:00)
(697, 'dags/dwh_garantias_extraction.py', 'main', 2026-05-26 04:40:27.897028+00:00)

## joined_rows
(596, 'dags/test_smtp_local.py', 'main', 'smtp_check_emailoperator')
(596, 'dags/test_smtp_local.py', 'main', 'smtp_send_smtplib')
(596, 'dags/test_smtp_local.py', 'main', 'test_smtp_local')
(697, 'dags/dwh_garantias_extraction.py', 'main', 'dwh_garantias_extraction')
  1. A direct aggregate over that join produced:
(4, 2)

Where:

  • 4 = raw joined row count
  • 2 = distinct import_error.id count

That matches the user-visible mismatch exactly.

Relevant code paths:

  • airflow/api_fastapi/core_api/routes/public/import_error.py
    • builds the joined query around select(ParseImportError, file_dags_cte.c.dag_id)
    • groups the result later with groupby(...)
  • airflow/api_fastapi/common/db/common.py
    • paginated_select() computes total_entries = get_query_count(statement, session=session) before any route-local grouping
  • airflow/ui/src/pages/Dashboard/Stats/DAGImportErrors.tsx
    • renders the Dashboard badge from data?.total_entries

Likely reproduction shape:

  1. Create or retain a file that appears once in import_error.
  2. Ensure that same file path is associated with multiple DAG IDs in dag.
  3. Call /api/v2/importErrors and observe that total_entries reflects raw joined rows rather than distinct import-error objects.
  4. Observe that the UI badge uses total_entries, while the modal list groups back down to fewer entries.

What you think should happen instead?

The Dashboard badge, the modal, the CLI, and the DB-backed count should all agree on the number of import-error files.

In this case they should all show 2.

I suspect one of these fixes would resolve it:

  1. Make /api/v2/importErrors count distinct ParseImportError.id values after authorization logic instead of counting raw joined rows.
  2. Restructure the route so pagination and counting happen on a deduplicated import-error subquery rather than on the raw join.
  3. Add a regression test where one ParseImportError file maps to multiple DAGs, but total_entries still matches the number of distinct import-error objects returned.

Operating System

Not Applicable - managed Astronomer deployment

Deployment

Astronomer

Apache Airflow Provider(s)

Not Applicable

Versions of Apache Airflow Providers

Not Applicable

Official Helm Chart version

Not Applicable

Kubernetes Version

Not Applicable

Helm Chart configuration

Not Applicable

Docker Image customizations

Unknown / not relevant for the API counting bug

Anything else?

I did not find an obvious existing Airflow issue or PR for this exact count-inflation behavior when searching for:

  • import errors count modal home page
  • import error relative_fileloc bundle_name
  • importErrors total_entries DagModel relative_fileloc bundle_name

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:APIAirflow's REST/HTTP APIkind:bugThis is a clearly a bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions