May 19, 2022
Run ANALYZE. Run ANALYZE. Run ANALYZE.
We’ve said it before, we’ll say it again: one of the first things you should do, if not the first, when trying to solve a slow-running PostgreSQL query problem is run ANALYZE. A recent interaction with an OtterTune user is a good case in point.
About PostgreSQL ANALYZE
Getting statistics about how your queries and jobs are operating is essential. The PostgreSQL ANALYZE command collects statistics about specific table columns, entire tables, or entire databases, and stores the results in the
pg_statistic system catalog. The PostgreSQL query planner then uses these statistics to help determine the most efficient execution plans for queries.
By default, ANALYZE examines every table in the current database. If you supply a table name as a parameter, ANALYZE examines only that table. You can also give ANALYZE a list of column names, in which case it collects only the statistics for those columns.
OtterTune user with slow-running load-transform jobs
Recently, a PostgreSQL user contacted us via the OtterTune Community Slack channel. This OtterTune user had an ETL workload running scripts that bulk-loaded data, and then ran queries to pre-aggregate values and store them. They had been experiencing slow performance problems for months.
They were running Amazon RDS PostgreSQL. The Load & Transform jobs ran daily, and they processed roughly 4 million rows of data. The jobs took about one hour to run, on average. OtterTune tuning improved the data normalization time by 30%, but that was only part of the overall job.
In response to the user’s help request, I dug into the issue. The first thing I did was ask the user to send us EXPLAIN ANALYZE output on his queries.
We saw that his database was not using an index on the table and doing a sequential scan of ~3.5 million rows. Hence, queries were running very slowly. We investigated further and noticed that query plans were different from those the customer ran manually — very much so.
When run manually, the analysis used:
Our guess was that the user did not run ANALYZE after the bulk loading, so the stats were not updated. This resulted in the optimizer choosing not to use an index. We asked the user to run ANALYZE after loading the data in the
dbt script and uncovered an important issue in this line:
We immediately recognized the problem: the row was 1 in the estimation, but the actual number of rows was 3474601. That’s why the optimizer was choosing
seq scan in the query plan, because it thought there was only one row.
From 52 minutes to 34 seconds
The customer quickly learned the importance of running ANALYZE and is now creating a script so ANALYZE runs at regular intervals, automatically. We know from experience that running ANALYZE quickly pinpoints and provides a path to solving most issues. Said this customer: “Proof is in the pudding. Calling ANALYZE after building the models dropped the query times dramatically.”
Dramatically, indeed. The job run time went from 52 minutes to 34 seconds.
This made OtterTune smarter
In this case, we solved the problem manually, and in the process learned to enhance OtterTune to recognize this pattern going forward and alert users that running ANALYZE is a good idea.
In the meantime, always run ANALYZE Also, reach out to us via our Slack channel. We’re here to help.