¿Cuál es la diferencia de rendimiento en las implementaciones de división relacional de MySQL (IN AND en lugar de IN OR)?

Hice algunas mejoras en JOIN versión; ver más abajo.

Voto por el enfoque JOIN por velocidad. Así es como lo determiné:

TENIENDO, versión 1

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_external_lock      | 2     |
| Handler_read_first         | 1     |
| Handler_read_key           | 2     |
| Handler_read_last          | 1     |
| Handler_read_next          | 4175  | -- full index scan

(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                            |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | city_state | 769     | NULL | 4176 | Using where; Using index for group-by (scanning) |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+--------------------------------------------------+

El 'Extra' señala que decidió abordar el GROUP BY y usa INDEX(city, state) aunque INDEX(state, city) podría tener sentido.

HAVING, versión 2

Haciéndolo cambiar a INDEX(state, city) rendimientos:

mysql> FLUSH STATUS;
mysql> SELECT city
    ->     FROM us_vch200  IGNORE INDEX(city_state)
    ->     WHERE state IN ('IL', 'MO', 'PA')
    ->     GROUP BY city
    ->     HAVING count(DISTINCT state) >= 3;
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 2     |
| Handler_read_key           | 401   |
| Handler_read_next          | 398   |
| Handler_read_rnd           | 398   |
(etc)

+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
| id | select_type | table     | type  | possible_keys         | key        | key_len | ref  | rows | Extra                                    |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+
|  1 | SIMPLE      | us_vch200 | range | state_city,city_state | state_city | 2       | NULL |  397 | Using where; Using index; Using filesort |
+----+-------------+-----------+-------+-----------------------+------------+---------+------+------+------------------------------------------+

ÚNETE

mysql> SELECT x.city
    -> FROM us_vch200 x
    -> JOIN us_vch200 y ON y.city= x.city AND y.state = 'MO'
    -> JOIN us_vch200 z ON z.city= x.city AND z.state = 'PA'
    -> WHERE                                  x.state = 'IL';
+-------------+
| city        |
+-------------+
| Springfield |
| Washington  |
+-------------+
2 rows in set (0.00 sec)

mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_external_lock      | 6     |
| Handler_read_key           | 86    |
| Handler_read_next          | 87    |
(etc)    
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
| id | select_type | table | type | possible_keys         | key        | key_len | ref                | rows | Extra                    |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+
|  1 | SIMPLE      | y     | ref  | state_city,city_state | state_city | 2       | const              |   81 | Using where; Using index |
|  1 | SIMPLE      | z     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
|  1 | SIMPLE      | x     | ref  | state_city,city_state | state_city | 769     | const,world.y.city |    1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+--------------------+------+--------------------------+

Solo INDEX(state, city) se necesita Los números de Handler son los más pequeños para esta formulación, por lo que deduzco que es el más rápido.

Observe cómo el optimizador decidió por sí mismo con qué tabla comenzar, probablemente debido a

+-------+----------+
| state | COUNT(*) |
+-------+----------+
| IL    |      221 |
| MO    |       81 |  -- smallest
| PA    |       96 |
+-------+----------+

Conclusiones

JOIN (sin la innecesaria t table) es probablemente el más rápido. Además, se necesita este índice compuesto:INDEX(state, city) .

Para volver a traducir a su caso de uso:

city --> documentid
state --> termid

Advertencia:YMMV porque la distribución de valores para documentid y termid podría ser bastante diferente al caso de prueba que utilicé.