Skip to content

Empty string ('') as Enum8 member is not loaded into ClickHouse DB #484

Description

@sillynuy

Describe the bug
I have a source table with a column of type Enum8, where one of the allowed enum values is an empty string (''). When I try to load this table into the destination using DLT, my pipeline fails with the following error:
ValueError: Invalid enum member name
To Reproduce

  1. Create a table with the following schema:
CREATE TABLE test_table (
    x Enum8('foo' = 0, 'bar' = 1, '' = 2)
) ENGINE = MergeTree
ORDER BY tuple();
  1. Insert some data into this table including the empty string enum value.
  2. Use the following DLT pipeline code to load data:
def cl_shop_client_products():
    source = sql_database(
        backend="pyarrow",
        table_names=["test_table"]
    )
    
    pipeline = dlt.pipeline(
        pipeline_name="pl_cl_test_table",
        destination='clickhouse',
        dataset_name="bronze"
    )

    info = pipeline.run(
        source,
        write_disposition="append"
    )
    print(info)
    return info
  1. Run the pipeline and observe the exception:
ValueError: Invalid enum member name: 
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/utils.py", line 56, in op_execution_error_boundary
    yield
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_utils/__init__.py", line 391, in iterate_with_context
    next_output = next(iterator)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py", line 129, in _coerce_op_compute_fn_to_iterator
    result = invoke_compute_fn(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dagster/_core/execution/plan/compute_generator.py", line 117, in invoke_compute_fn
    return fn(context, **args_to_pass) if context_arg_provided else fn(**args_to_pass)
  File "/home/my_username/repo/my_project/dlt-dagster-project/dlt_dagster_project/assets.py", line 131, in test_table_data
    cl_test_table()
  File "/home/my_username/repo/my_project/dlt-dagster-project/dlt_dagster_project/dlt_utils/pipelines/pl_cl__test_table.py", line 38, in cl_test_table
    source = sql_database(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 207, in call
    source = self._deco_f(*args, **kwargs)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 293, in _wrap
    return _eval_rv(rv, schema_copy)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/extract/decorators.py", line 253, in _eval_rv
    _rv = list(_rv)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/dlt/sources/sql_database/__init__.py", line 107, in sql_database
    metadata.reflect(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/sql/schema.py", line 5885, in reflect
    _reflect_info = insp._get_reflection_info(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 2016, in _get_reflection_info
    columns=run(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 2002, in run
    res = meth(filter_names=_fn, **kw)
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 931, in get_multi_columns
    table_col_defs = dict(
  File "/home/my_username/repo/my_project/dlt-dagster-project/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 1129, in _default_multi_reflect
    single_tbl_method(
  File "<string>", line 2, in get_columns

Expected behavior
The pipeline should correctly handle Enum8 columns with empty string members and successfully load the data into ClickHouse without raising ValueError.

Versions

Python 3.10.17
Clickhouse 25.1.1.4165
clickhouse-driver 0.2.9
dlt 1.10.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions