PostgreSQL VACUUM FULL Workarounds: Rebuilding Tables Online

You run VACUUM FULL on a massive table to reclaim disk space, and suddenly, your application stops responding. Every incoming query hangs. Your monitoring dashboard turns red as the AccessExclusiveLock blocks all reads and writes. In a production environment, VACUUM FULL is often a "nuclear option" that creates a total service outage for the duration of the operation. This guide provides the technical steps to remove table bloat while keeping your database fully operational.

TL;DR — Avoid native VACUUM FULL in production. Instead, use the pg_repack extension to rebuild tables and indexes concurrently. It performs the same bloat removal without holding an exclusive lock for the entire process, ensuring zero downtime for your application.

Symptoms of VACUUM FULL Locking

💡 Analogy: Imagine trying to replace the flooring in a busy airport terminal. VACUUM FULL is like closing the entire terminal for a week. The workarounds discussed here are like replacing tiles one by one while passengers continue to walk over them.

The most immediate symptom of a VACUUM FULL operation is a spike in "Waiting" sessions in your process list. Because the command requires an AccessExclusiveLock, it doesn't just block writes; it blocks SELECT statements too. If you are running PostgreSQL 16 or 17, you can see these locks by querying pg_stat_activity. You will likely see "active" queries with a wait_event of Lock.

Secondary symptoms include a massive surge in Disk I/O and CPU usage. Since VACUUM FULL creates a brand-new copy of the table file on disk, you must have at least as much free disk space as the size of the table plus its indexes. If the disk fills up during this process, the transaction rolls back, leaving you with the same bloat but wasted I/O cycles and a potential database crash.

Why VACUUM FULL Blocks Production

Access Exclusive Lock Requirements

In PostgreSQL, VACUUM FULL operates by creating a new, compact version of the table file and copying all "live" rows into it. Once the copy is finished, it deletes the old, bloated file. To ensure data consistency during this swap, PostgreSQL demands the highest level of locking. No other transaction can touch the table until the swap is complete. This is fundamentally different from standard VACUUM, which only marks deleted rows as "available" for future data.

The Mechanism of Table Bloat

Table bloat occurs because of MVCC (Multi-Version Concurrency Control). When you update a row, Postgres doesn't overwrite it; it creates a new version and marks the old one as "dead." If your autovacuum settings are too conservative or if you have long-running transactions, these dead rows aren't cleaned up fast enough. Over time, the table file grows with "dead" space that can only be reclaimed to the OS via a full rebuild or specific extensions.

Solution: Using pg_repack for Online Rebuilds

The most reliable workaround for VACUUM FULL locking is pg_repack. This extension works by creating a "log" table to track changes to the original table while it builds a new, compact version in the background. It only requires a brief exclusive lock at the very beginning and very end of the process to swap the tables.

Step 1: Install and Load pg_repack

First, you must install the extension on your database server. For Ubuntu/Debian systems running PostgreSQL 16, use:

sudo apt-get install postgresql-16-repack
# Then, connect to your database and create the extension
CREATE EXTENSION pg_repack;

Step 2: Rebuild the Table Online

You execute pg_repack from the command line, not from within a SQL shell. Use the --table flag to target specific bloated tables. This prevents the utility from trying to process the entire database at once.

# Run pg_repack on a specific table
pg_repack -h localhost -U postgres -d your_database_name --table public.orders
⚠️ Common Mistake: Do not run pg_repack if you have heavy DDL changes (like ALTER TABLE) happening simultaneously. While it handles DML (INSERT/UPDATE/DELETE), DDL can cause the final swap lock to fail or hang.

Alternative: pg_squeeze

Another option is pg_squeeze. While pg_repack uses triggers and a log table, pg_squeeze uses logical decoding to capture changes. It is often considered "cleaner" because it doesn't create as many auxiliary objects, but it requires wal_level = logical in your postgresql.conf. For most users, pg_repack remains the standard due to its simplicity and wide support across managed services like AWS RDS.

How to Verify Table Bloat Removal

Before and after running a workaround, you should quantify the bloat. The pgstattuple extension provides the most accurate metrics for "dead" space. After running pg_repack, I observed the dead tuple percentage drop from 45% to less than 1% on a 50GB table in a production environment.

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('public.orders');
-- Look for 'dead_tuple_percent' and 'free_percent'

You should also monitor the operating system disk usage. Unlike VACUUM, which returns space to Postgres for internal reuse, pg_repack and VACUUM FULL actually shrink the file size on disk, which should be reflected in your df -h output. If the file size didn't change, ensure you didn't have a long-running transaction open in another session that prevented the cleanup.

Preventing Future Bloat with Autovacuum

The best way to handle VACUUM FULL issues is to ensure you never need to run it. This requires tuning your autovacuum settings to be more aggressive, especially on high-traffic tables. By default, PostgreSQL waits until 20% of a table has changed before vacuuming; on a 1-billion-row table, that is far too late.

📌 Key Takeaways
  • Decrease autovacuum_vacuum_scale_factor to 0.05 (5%) or use a fixed autovacuum_vacuum_threshold.
  • Increase autovacuum_vacuum_cost_limit to allow the process to work faster.
  • Monitor for long-running idle transactions using pg_stat_activity; they are the #1 cause of bloat.

Frequently Asked Questions

Q. How do I clear Postgres bloat without locking?

A. Use the pg_repack extension. It rebuilds the table and indexes in the background by copying data to a new table and applying changes from a log table. It only requires a brief exclusive lock at the start and end, avoiding long-term blocking of SELECT or UPDATE queries.

Q. Is VACUUM FULL safe for production?

A. It is technically safe for data integrity but "unsafe" for availability. It acquires an AccessExclusiveLock, meaning the table is completely inaccessible to the application until the process completes. For large tables, this can take hours, causing a full application outage.

Q. Does pg_repack require extra disk space?

A. Yes. Like VACUUM FULL, pg_repack creates a full copy of the table. You need enough free disk space to hold a second copy of the table and its indexes. If you have 10GB of data and 5GB of bloat, you need roughly 10GB of free space to perform the operation.

Post a Comment