1

我有两张桌子:

  • table_1 有大约 100 万行,列 id_t1:整数,c1_t1:varchar 等。
  • table_2 有大约 5000 万行,列 id_t2:整数,ref_id_t1:整数,c1_t2:varchar 等。

ref_id_t1 填充了 id_t1 值,但是它们没有通过外键链接,因为 table_2 不知道 table_1。

我需要在两个表上做一个请求,如下所示:

SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
(SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%');

在没有任何更改或基本索引的情况下,请求大约需要一分钟才能完成,因为在 table_2 上执行顺序扫描。为了防止这种情况,我使用 gin_trgm_ops 选项创建了一个 GIN 索引:

CREATE EXTENSION pg_trgm;
CREATE INDEX c1_t2_gin_index ON table_2 USING gin (c1_t2, gin_trgm_ops);

然而这并不能解决问题,因为内部请求仍然需要很长时间。

EXPLAIN ANALYSE SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%'

给出以下

Bitmap Heap Scan on table_2 t2 (cost=664.20..189671.00 rows=65058 width=4) (actual time=5101.286..22854.838 rows=69631 loops=1)
  Recheck Cond: ((c1_t2 )::text ~~ '%1.1%'::text)
  Rows Removed by Index Recheck: 49069703
  Heap Blocks: exact=611548
  ->  Bitmap Index Scan on gin_trg  (cost=0.00..647.94 rows=65058 width=0) (actual time=4911.125..4911.125 rows=49139334 loops=1)
        Index Cond: ((c1_t2)::text ~~ '%1.1%'::text)
Planning time: 0.529 ms
Execution time: 22863.017 ms

位图索引扫描很快,但我们需要 t2.ref_id_t1 PostgreSQL 需要执行位图堆扫描,这对 65000 行数据并不快。

避免位图堆扫描的解决方案是执行仅索引扫描。这可以使用带有 btree 索引的多列,请参阅https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html

如果我将请求更改为搜索 c1_t2 的开头,即使内部请求返回 90000 行,并且如果我在 c1_t2 和 ref_id_t1 上创建 btree 索引,则请求只需一秒钟多一点。

CREATE INDEX c1_t2_ref_id_t1_index
    ON table_2  USING btree
    (c1_t2 varchar_pattern_ops ASC NULLS LAST, ref_id_t1 ASC NULLS LAST)


EXPLAIN ANALYSE SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
    (SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE 'aaa%');

Hash Join  (cost=56561.99..105233.96 rows=1 width=2522) (actual time=953.647..1068.488 rows=36 loops=1)
  Hash Cond: (t1.id_t1 = t2.ref_id_t1)
  ->  Seq Scan on table_1 t1  (cost=0.00..48669.65 rows=615 width=2522) (actual time=0.088..667.576 rows=790 loops=1)
        Filter: (c1_t1 = 'A')
        Rows Removed by Filter: 1083798
  ->  Hash  (cost=56553.74..56553.74 rows=660 width=4) (actual time=400.657..400.657 rows=69632 loops=1)
        Buckets: 131072 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 3472kB
        ->  HashAggregate  (cost=56547.14..56553.74 rows=660 width=4) (actual time=380.280..391.871 rows=69632 loops=1)
              Group Key: t2.ref_id_t1
              ->  Index Only Scan using c1_t2_ref_id_t1_index on table_2 t2   (cost=0.56..53907.28 rows=1055943 width=4) (actual time=0.014..202.034 rows=974737 loops=1)
                    Index Cond: ((c1_t2  ~>=~ 'aaa'::text) AND (c1_t2  ~<~ 'chb'::text))
                    Filter: ((c1_t2 )::text ~~ 'aaa%'::text)
                    Heap Fetches: 0
Planning time: 1.512 ms
Execution time: 1069.712 ms

然而,这对于 gin 索引是不可能的,因为这些索引不会将所有数据存储在键中。

有没有办法使用带有 btree 索引的 pg_trmg 扩展,这样我们就可以使用LIKE '%abc%'请求进行仅索引扫描?

4

0 回答 0