Skip to content

MySQL/MariaDB: INCREMENTAL_BY_UNIQUE_KEY DELETE uses CONCAT_WS which prevents index usage → full table scan / timeout #5711

@goerkasch

Description

@goerkasch

Summary

When using INCREMENTAL_BY_UNIQUE_KEY with MySQL/MariaDB, SQLMesh generates a DELETE statement using CONCAT_WS() to match rows between the target and temp table. MariaDB cannot use indexes on CONCAT_WS expressions, causing a full table scan on every incremental run which leads to multi-hour query times and connection timeouts on larger tables.

Environment

  • SQLMesh version: 0.228.0
  • Engine: MariaDB
  • Model kind: INCREMENTAL_BY_UNIQUE_KEY
  • Table size: ~730k rows

Generated SQL (current behavior)

DELETE FROM `dbm_stg`.`target_table`
WHERE CONCAT_WS('__SQLMESH_DELIM__', `property_id`, `publication_id`, `article_id`)
IN (
    SELECT CONCAT_WS('__SQLMESH_DELIM__', `property_id`, `publication_id`, `article_id`)
    FROM `dbm_stg`.`__temp_table`
)

EXPLAIN output (current)

id select_type table type key rows
1 PRIMARY target_table ALL NULL 697295
2 DEPENDENT SUBQUERY temp_table ALL NULL 200

Index exists on (property_id, publication_id, article_id) but is not used because MariaDB cannot apply indexes to CONCAT_WS() expressions.

Result: The DELETE ran for 2 hours before timing out with:
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')

Expected SQL (proposed fix)

A JOIN-based DELETE allows MariaDB to use the index:

DELETE `_target`
FROM `dbm_stg`.`target_table` AS `_target`
INNER JOIN `dbm_stg`.`__temp_table` AS `_temp`
ON `_target`.`property_id` = `_temp`.`property_id`
AND `_target`.`publication_id` = `_temp`.`publication_id`
AND `_target`.`article_id` = `_temp`.`article_id`

EXPLAIN output (proposed)

id select_type table type key rows
1 SIMPLE temp_table ALL NULL 200
1 SIMPLE target_table ref idx_prop_pubid_artid 1

Index is used, query completes in seconds.

Workaround

Monkey-patching MySQLEngineAdapter._replace_by_key in config.py with a JOIN-based DELETE + deduplicated INSERT via ROW_NUMBER().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions