Report hasn't been filed before.
What version of drizzle-orm are you using?
0.45.1
What version of drizzle-kit are you using?
0.31.8 (the buggy query is still present on main, see below)
Describe the Bug
What is the undesired behavior?
drizzle-kit pull (PostgreSQL) assigns the wrong operator class to index columns when introspecting a multi-column index whose columns have different operator classes (e.g. (bigint, timestamptz)). The opclasses get shuffled between the columns of the index, and the result is arbitrary (it depends on physical row order in the catalog join), so it can also differ between databases that have identical DDL.
Example of generated output (column types vs opclasses are mismatched):
index("example_status_created_idx").using(
"btree",
table.status.asc().nullsLast().op("timestamptz_ops"), // status is text!
table.createdAt.asc().nullsLast().op("timestamptz_ops"),
),
Consequences:
- The pulled schema is not a faithful snapshot of the database.
drizzle-kit push / generate from the pulled schema produces DDL that PostgreSQL rejects (ERROR: operator class "timestamptz_ops" does not accept data type bigint).
- Repeated
pull runs keep re-introducing the wrong opclasses even after manually fixing the schema file.
Root cause
The index introspection query in fromDatabase (drizzle-kit src/serializer/pgSerializer.ts, currently line ~1568 on main) joins pg_opclass like this:
JOIN pg_opclass opc ON opc.oid = ANY(i.indclass)
pg_index.indclass is an oidvector holding one opclass per index column, positionally aligned with indkey. Joining with = ANY(...) matches every opclass present anywhere in the index against every column row produced by the unnest(i.indkey) lateral. The surrounding SELECT DISTINCT ON (t.relname, ic.relname, k.i) has no disambiguating ORDER BY, so PostgreSQL keeps an arbitrary row per (index, column position) — i.e. an arbitrary opclass per column.
Indexes where all columns share one opclass are unaffected (there is only one value to match), which is why this only shows up on multi-column indexes with mixed column types.
What are the steps to reproduce it?
CREATE TABLE opclass_repro (
id bigint PRIMARY KEY,
created_at timestamptz NOT NULL,
status text NOT NULL
);
CREATE INDEX opclass_repro_id_created_idx ON opclass_repro (id, created_at);
CREATE INDEX opclass_repro_status_created_idx ON opclass_repro (status, created_at);
Run drizzle-kit pull against this database, or run the introspection query from pgSerializer.ts directly. Actual output on PostgreSQL 15 (note created_at getting text_ops in the second index, while the first index happens to come out right — that's the DISTINCT ON arbitrariness):
table_name | indexname | index_order | column_name | opcname
---------------+----------------------------------+-------------+-------------+-----------------
opclass_repro | opclass_repro_id_created_idx | 1 | id | int8_ops
opclass_repro | opclass_repro_id_created_idx | 2 | created_at | timestamptz_ops
opclass_repro | opclass_repro_pkey | 1 | id | int8_ops
opclass_repro | opclass_repro_status_created_idx | 1 | status | text_ops
opclass_repro | opclass_repro_status_created_idx | 2 | created_at | text_ops <-- wrong
(5 rows)
On another database (same DDL, different catalog row order) we got the opclasses of (chamado_id bigint, created_at timestamptz) fully swapped — timestamptz_ops on the bigint column and int8_ops on the timestamptz column.
What is the desired result?
Each index column should get the opclass stored at its own position in indclass. Replacing the join with a positional subscript fixes it (verified against the repro above — all opclasses come back correct):
LEFT JOIN pg_opclass opc ON opc.oid = i.indclass[k.i-1]
(oidvector subscripts are 0-based, k.i is 1-based in the query, hence k.i-1. With the positional join being unique per row, the workaround-ish DISTINCT ON also stops papering over the row multiplication.)
Environment
Report hasn't been filed before.
What version of
drizzle-ormare you using?0.45.1
What version of
drizzle-kitare you using?0.31.8 (the buggy query is still present on
main, see below)Describe the Bug
What is the undesired behavior?
drizzle-kit pull(PostgreSQL) assigns the wrong operator class to index columns when introspecting a multi-column index whose columns have different operator classes (e.g.(bigint, timestamptz)). The opclasses get shuffled between the columns of the index, and the result is arbitrary (it depends on physical row order in the catalog join), so it can also differ between databases that have identical DDL.Example of generated output (column types vs opclasses are mismatched):
Consequences:
drizzle-kit push/generatefrom the pulled schema produces DDL that PostgreSQL rejects (ERROR: operator class "timestamptz_ops" does not accept data type bigint).pullruns keep re-introducing the wrong opclasses even after manually fixing the schema file.Root cause
The index introspection query in
fromDatabase(drizzle-kitsrc/serializer/pgSerializer.ts, currently line ~1568 onmain) joinspg_opclasslike this:pg_index.indclassis anoidvectorholding one opclass per index column, positionally aligned withindkey. Joining with= ANY(...)matches every opclass present anywhere in the index against every column row produced by theunnest(i.indkey)lateral. The surroundingSELECT DISTINCT ON (t.relname, ic.relname, k.i)has no disambiguatingORDER BY, so PostgreSQL keeps an arbitrary row per (index, column position) — i.e. an arbitrary opclass per column.Indexes where all columns share one opclass are unaffected (there is only one value to match), which is why this only shows up on multi-column indexes with mixed column types.
What are the steps to reproduce it?
Run
drizzle-kit pullagainst this database, or run the introspection query frompgSerializer.tsdirectly. Actual output on PostgreSQL 15 (notecreated_atgettingtext_opsin the second index, while the first index happens to come out right — that's theDISTINCT ONarbitrariness):On another database (same DDL, different catalog row order) we got the opclasses of
(chamado_id bigint, created_at timestamptz)fully swapped —timestamptz_opson the bigint column andint8_opson the timestamptz column.What is the desired result?
Each index column should get the opclass stored at its own position in
indclass. Replacing the join with a positional subscript fixes it (verified against the repro above — all opclasses come back correct):(
oidvectorsubscripts are 0-based,k.iis 1-based in the query, hencek.i-1. With the positional join being unique per row, the workaround-ishDISTINCT ONalso stops papering over the row multiplication.)Environment
node-postgresdriver.