postgresql - PostgreSQL 9.6 selects a wrong plan during aggregation against timestamp columns

Question

I have a simple but pretty big table “log” that has three columns: user_id, day, hours.

user_id character varying(36) COLLATE pg_catalog."default" NOT NULL,
day timestamp without time zone,
hours double precision

All columns have indexes.

The issue is that aggregation against the 'day' field works extremely slow. For example the simple query takes an eternity to complete.

select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'

Analysis shows that Postgres makes a full scan filtering the entries that doesn’t relate to user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' what is absolutely counter-intuitive

[
  {
    "Execution Time": 146502.05,
    "Planning Time": 0.893,
    "Plan": {
      "Startup Cost": 789.02,
      "Actual Rows": 1,
      "Plans": [
        {
          "Startup Cost": 0.44,
          "Actual Rows": 1,
          "Plans": [
            {
              "Index Cond": "(log.day IS NOT NULL)",
              "Startup Cost": 0.44,
              "Scan Direction": "Forward",
              "Plan Width": 8,
              "Rows Removed by Index Recheck": 0,
              "Actual Rows": 1,
              "Node Type": "Index Scan",
              "Total Cost": 1395792.54,
              "Plan Rows": 1770,
              "Relation Name": "log",
              "Alias": "log",
              "Parallel Aware": false,
              "Actual Total Time": 146502.015,
              "Output": [
                "log.day"
              ],
              "Parent Relationship": "Outer",
              "Actual Startup Time": 146502.015,
              "Schema": "public",
              "Filter": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
              "Actual Loops": 1,
              "Rows Removed by Filter": 12665610,
              "Index Name": "index_log_day"
            }
          ],
          "Node Type": "Limit",
          "Plan Rows": 1,
          "Parallel Aware": false,
          "Actual Total Time": 146502.016,
          "Output": [
            "log.day"
          ],
          "Parent Relationship": "InitPlan",
          "Actual Startup Time": 146502.016,
          "Plan Width": 8,
          "Subplan Name": "InitPlan 1 (returns $0)",
          "Actual Loops": 1,
          "Total Cost": 789.02
        }
      ],
      "Node Type": "Result",
      "Plan Rows": 1,
      "Parallel Aware": false,
      "Actual Total Time": 146502.019,
      "Output": [
        "$0"
      ],
      "Actual Startup Time": 146502.019,
      "Plan Width": 8,
      "Actual Loops": 1,
      "Total Cost": 789.03
    },
    "Triggers": []
  }
]

And what is more strange the almost similar query works perfectly.

select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'

Postgres selects entries for user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' first and then aggregates among them what is obviously correct.

[
  {
    "Execution Time": 5.989,
    "Planning Time": 1.186,
    "Plan": {
      "Partial Mode": "Simple",
      "Startup Cost": 6842.66,
      "Actual Rows": 1,
      "Plans": [
        {
          "Startup Cost": 66.28,
          "Plan Width": 8,
          "Rows Removed by Index Recheck": 0,
          "Actual Rows": 745,
          "Plans": [
            {
              "Startup Cost": 0,
              "Plan Width": 0,
              "Actual Rows": 745,
              "Node Type": "Bitmap Index Scan",
              "Index Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
              "Plan Rows": 1770,
              "Parallel Aware": false,
              "Actual Total Time": 0.25,
              "Parent Relationship": "Outer",
              "Actual Startup Time": 0.25,
              "Total Cost": 65.84,
              "Actual Loops": 1,
              "Index Name": "index_log_user_id"
            }
          ],
          "Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
          "Exact Heap Blocks": 742,
          "Node Type": "Bitmap Heap Scan",
          "Plan Rows": 1770,
          "Relation Name": "log",
          "Alias": "log",
          "Parallel Aware": false,
          "Actual Total Time": 5.793,
          "Output": [
            "day",
            "hours",
            "user_id"
          ],
          "Lossy Heap Blocks": 0,
          "Parent Relationship": "Outer",
          "Actual Startup Time": 0.357,
          "Total Cost": 6838.23,
          "Actual Loops": 1,
          "Schema": "public"
        }
      ],
      "Node Type": "Aggregate",
      "Strategy": "Plain",
      "Plan Rows": 1,
      "Parallel Aware": false,
      "Actual Total Time": 5.946,
      "Output": [
        "min(hours)"
      ],
      "Actual Startup Time": 5.946,
      "Plan Width": 8,
      "Actual Loops": 1,
      "Total Cost": 6842.67
    },
    "Triggers": []
  }
]

There are two possible work arounds:

1) rewrite query into:

select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id

2) Introduce pair indexes like it was suggested in finding MAX(db_timestamp) query

They could look good but I consider both ways exactly as work arounds (first one is even a hack). Logically if Postgres can select a proper plan for ‘hours’ it have to do it for ‘day’ but it doesn’t. So it looks like a Postgres bug that is occurred during aggregate of timestamp fields, however I admit that I can miss something. Could somebody please advise if something can be done here without using WAs or it’s really a Postgres bug and I have to report it?

UPD: I've reported this as a bug to the PostgreSQL bugs mailing list. I'll let everybody know if it's accepted.

score 1 · Accepted Answer

Min 是聚合函数，而不是运算符。必须对所有匹配的记录执行函数。选择部分中的字段不影响计划。From ... join ... where ... group by ... order by - 所有这些都在计划中考虑。尝试：

select day from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
order by user_id, day
limit 1

score 1 · Accepted Answer

我收到了 PostgreSQL 的回复。他们不认为这是一个错误。在这种情况下可能存在 WA，其中许多在原始帖子和稍后的评论中都有提及。我个人的选择是最初提到的第一个选项，因为它不需要索引操作（这并不总是可能的）。所以解决方案是将查询重写为：

select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id

score 0 · Accepted Answer

看这篇文章有一些关于索引顺序的游戏 - PostgreSQL index not used for query on range

https://dba.stackexchange.com/questions/39589/optimizing-queries-on-a-range-of-timestamps-two-columns

还有一个想法是

select min(day) from (
   select day from log 
      where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f'
) q

ps 另外你能确认它autovacuum (verbose, analyze)是为表执行的吗？

postgresql - PostgreSQL 9.6 selects a wrong plan during aggregation against timestamp columns

3 回答 3

Related

Reference