postgresql - PostgreSQL 索引不用于 IP 范围的查询

Question

我正在使用 PostgreSQL 9.2 并且有一个 IP 范围表。这是SQL：

CREATE TABLE ips (
  id serial NOT NULL,
  begin_ip_num bigint,
  end_ip_num bigint,
  country_name character varying(255),
  CONSTRAINT ips_pkey PRIMARY KEY (id )
)

我在和上添加了普通的 B-treebegin_ip_num索引end_ip_num：

CREATE INDEX index_ips_on_begin_ip_num ON ips (begin_ip_num);
CREATE INDEX index_ips_on_end_ip_num ON ips (end_ip_num );

正在使用的查询是：

SELECT ips.* FROM ips
WHERE 3065106743 BETWEEN begin_ip_num AND end_ip_num;

问题是我的BETWEEN查询仅使用begin_ip_num. 使用索引后，它使用end_ip_num. 结果如下EXPLAIN ANALYZE：

Index Scan using index_ips_on_begin_ip_num on ips  (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1)
Index Cond: (3065106743::bigint >= begin_ip_num)
Filter: (3065106743::bigint <= end_ip_num)
Rows Removed by Filter: 47596
Total runtime: 16.425 ms

我已经尝试过各种索引组合，包括在begin_ip_num和上添加复合索引end_ip_num。

score 28 · Accepted Answer

尝试多列索引，但第二列的顺序相反：

CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

排序对于单列索引几乎无关紧要，因为它几乎可以以同样快的速度向后扫描。但这对于多列索引很重要。

使用我建议的索引，Postgres 可以扫描第一列并找到地址，索引的其余部分满足第一个条件。然后，对于第一列的每个值，它可以返回满足第二个条件的所有行，直到第一个条件失败。然后跳转到第一列的下一个值，等等。
这仍然不是很有效，Postgres 可能会更快，只需扫描第一个索引列并过滤第二个。很大程度上取决于您的数据分布。

无论哪种方式，CLUSTER使用上面的多列索引都可以提高性能：

CLUSTER ips USING index_ips_begin_end_ip_num

这样，满足您的第一个条件的候选人将被打包到相同或相邻的数据页上。如果第一列的每个值有很多行，则可以大大提高性能。否则几乎没有效果。
（也有用于此目的的非阻塞外部工具：pg_repack或pg_squeeze。）

此外，autovacuum是否运行和配置正确，或者您是否ANALYZE在桌面上运行？您需要 Postgres 的当前统计信息来选择适当的查询计划。

真正有用的是一个列的 GiST 索引int8range，从 PostgreSQL 9.2 开始可用。

进一步阅读：

优化对一系列时间戳（两列）的查询

如果您的 IP 范围可以覆盖其中一种内置网络类型inet或cidr，请考虑替换您的两bigint列。或者，更好的是，看看 Andrew Gierth 的附加模块ip4r（不在标准发行版中。索引策略会相应更改。

除此之外，您可以使用带有部分索引的复杂机制在 dba.SE 上查看此相关答案。先进的东西，但它提供了出色的性能：

空间索引能否帮助“范围-按-限制”查询

score 6 · Accepted Answer

我在 maxmind.com 的免费 geiop 表中几乎相同的数据集上遇到了完全相同的问题。我使用 Erwin 关于范围类型和 GiST 索引的提示解决了这个问题。GiST 索引是关键。没有它，我最多每秒查询 3 行。有了它，我在 10 秒内查询了近 500000 行！由于 Erwin 没有发布有关如何执行此操作的详细说明，我想我会在此处添加它们...

首先，您必须添加一个具有范围类型的新列，注意 int8range 对于 bigint 类型是必需的。接下来适当地设置它的值，注意'[]'参数表示使范围包含在下限和上限（rtfm）。最后添加索引，注意 GiST 索引是所有性能优势的来源。

alter table ips add column iprange int8range;
update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]');
create index index_ips_on_iprange on ips using gist (iprange);

打好基础后，您现在可以使用 '<@' 包含的运算符来针对表搜索特定地址。见http://www.postgresql.org/docs/9.2/static/functions-range.html

SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange);

score 4 · Accepted Answer

我参加这个聚会有点晚了，但这对我来说真的很有效。

考虑安装ip4r 扩展。它基本上允许您定义一个可以保存 IP 范围的列。扩展名意味着它仅适用于 IPv4，但目前它也支持 IPv6。

在使用该列中的范围填充表后，您需要创建 GIST 索引：

CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);

我的数据库中有近 1000 万个范围，但查询只需要几分之一毫秒：

region=> select count(*) from ip_zip ;

  count  
---------
 9566133

region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1)
   Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1)
         Index Cond: ('8.8.8.8'::ip4r <<= ip4_range)
 Planning time: 0.106 ms
 Execution time: 0.118 ms
(7 rows)

region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ip_zip  (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1)
   Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on ip_zip_ip4_range  (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1)
         Index Cond: ('254.50.22.54'::ip4r <<= ip4_range)
 Planning time: 0.102 ms
 Execution time: 0.145 ms
(7 rows)

score 0 · Accepted Answer

我相信您的查询看起来像WHERE [constant] BETWEEN begin_ip_num AND end_ipnum或

据我所知，Postgres 没有“AND-EQUAL”访问计划，因此您需要按照Erwin Brandstetter的建议在 2 列上添加复合索引。

postgresql - PostgreSQL 索引不用于 IP 范围的查询

4 回答 4

Related

Reference