Sampling remote data is subject to the network latency, and this is why an ANALYZE
on foreign tables can take a considerable amount of time. In order to not compromise the local statistics, the autovacuum routine, by definition, doesn't trigger an ANALYZE
on foreign tables. However, over time, the foreign table statistics become outdated on the local database, which may cause the query planner to not choose the best plan for queries involving foreign tables.
If many of your queries involve foreign tables, then you might benefit from scheduling a cronjob to run ANALYZE
on these tables, for example, considering that my_foreign_table
is a problematic foreign table:
ANALYZE my_foreign_table;
This command doesn't block anything, shouldn't take too long, and you could schedule it to run multiple times a day, for example. In the same scheduled job you can analyze multiple foreign tables.
For complex local queries that are not executed very often, or for foreign tables that are heavily written on the remote database, you might benefit instead from the use_remote_estimate
option from postgres_fdw
. Depending on the query complexity, Postgres might issue several EXPLAIN
commands to the remote database, in order to build the best plan locally. The cost is that the local plan will take much longer to be built, and will add some additional network round trips between the local database and the remote database for each query, but the query execution will likely be much faster. Of course this depends on autovacuum being healthy on the remote database.
So our recommendation is to use use_remote_estimate
on really problematic foreign tables, and on the tables referenced by complex, long-running queries that are not executed very often. For all other foreign tables, schedule them to be ANALYZE
d in a cron job on the local Postgres database.