-
Notifications
You must be signed in to change notification settings - Fork 356
Description
Summary
When using INCREMENTAL_BY_UNIQUE_KEY with MySQL/MariaDB, SQLMesh generates a DELETE statement using CONCAT_WS() to match rows between the target and temp table. MariaDB cannot use indexes on CONCAT_WS expressions, causing a full table scan on every incremental run which leads to multi-hour query times and connection timeouts on larger tables.
Environment
- SQLMesh version: 0.228.0
- Engine: MariaDB
- Model kind:
INCREMENTAL_BY_UNIQUE_KEY - Table size: ~730k rows
Generated SQL (current behavior)
DELETE FROM `dbm_stg`.`target_table`
WHERE CONCAT_WS('__SQLMESH_DELIM__', `property_id`, `publication_id`, `article_id`)
IN (
SELECT CONCAT_WS('__SQLMESH_DELIM__', `property_id`, `publication_id`, `article_id`)
FROM `dbm_stg`.`__temp_table`
)EXPLAIN output (current)
| id | select_type | table | type | key | rows |
|---|---|---|---|---|---|
| 1 | PRIMARY | target_table | ALL | NULL | 697295 |
| 2 | DEPENDENT SUBQUERY | temp_table | ALL | NULL | 200 |
Index exists on (property_id, publication_id, article_id) but is not used because MariaDB cannot apply indexes to CONCAT_WS() expressions.
Result: The DELETE ran for 2 hours before timing out with:
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
Expected SQL (proposed fix)
A JOIN-based DELETE allows MariaDB to use the index:
DELETE `_target`
FROM `dbm_stg`.`target_table` AS `_target`
INNER JOIN `dbm_stg`.`__temp_table` AS `_temp`
ON `_target`.`property_id` = `_temp`.`property_id`
AND `_target`.`publication_id` = `_temp`.`publication_id`
AND `_target`.`article_id` = `_temp`.`article_id`EXPLAIN output (proposed)
| id | select_type | table | type | key | rows |
|---|---|---|---|---|---|
| 1 | SIMPLE | temp_table | ALL | NULL | 200 |
| 1 | SIMPLE | target_table | ref | idx_prop_pubid_artid | 1 |
Index is used, query completes in seconds.
Workaround
Monkey-patching MySQLEngineAdapter._replace_by_key in config.py with a JOIN-based DELETE + deduplicated INSERT via ROW_NUMBER().