When I first started managing enterprise data warehouses, I treated storage as a ‘solved problem.’ After all, S3 is cheap, right? But as my datasets grew into the petabyte range, I noticed a creeping trend in our monthly bill. If you’re wondering how to reduce Snowflake storage costs, you’ve likely realized that while Snowflake separates compute and storage, the way you manage your data lifecycle directly impacts your bottom line.
Storage costs in Snowflake aren’t just about the raw bytes you load; they are heavily influenced by metadata, Time Travel, and Fail-safe. In my experience, most companies are paying for data they don’t actually need. To build a modern data stack that is sustainable, you have to be aggressive about storage hygiene.
1. Optimize Your Time Travel Retention Periods
Time Travel is a lifesaver when you accidentally run a DROP TABLE without a WHERE clause, but it’s a silent budget killer. By default, Standard Edition has a 1-day retention, but Enterprise can go up to 90 days. If you have a 10TB table and update 1TB of data daily, you are paying for that churn for the duration of your retention period.
I recommend setting retention to 0 or 1 for transient tables and strictly limiting permanent tables to 7-14 days unless regulatory requirements dictate otherwise.
-- Change retention period to 1 day for a specific table
ALTER TABLE raw_events SET DATA_RETENTION_TIME_IN_DAYS = 1;
2. Leverage Transient Tables for ETL/ELT
This is the single fastest way to cut costs. Permanent tables have a Fail-safe period (7 days) that you cannot disable. Transient tables have 0 or 1 day of Time Travel and zero Fail-safe. Since ETL staging tables are easily reproducible from source, there is no reason to pay for Fail-safe storage on them.
3. Implement a Rigid Data Retention Policy
I’ve seen ‘archive’ tables that haven’t been queried in three years still sitting in Snowflake. Storage is cheap, but not free. I suggest creating a script to identify tables that haven’t been accessed in 90 days using the ACCOUNT_USAGE.TABLE_STORAGE_METRICS view.
If you need the data for long-term compliance but not for BI, move it to an external stage (S3/Azure Blob) and access it via external tables. For those looking for even lighter footprints, exploring DuckDB for business intelligence on local files can offload some of this burden.
4. Audit and Clean Up Unused Clones
Zero-copy cloning is a superpower, but it can lead to ‘clone sprawl.’ While a clone initially costs nothing, as the data in the clone diverges from the original, Snowflake starts charging for the unique blocks. I often find developers creating ‘test_table_v1’, ‘test_table_v2’ and forgetting them. Run a monthly cleanup of any table with ‘test’ or ‘tmp’ in the name.
5. Be Mindful of Table Clustering
Automatic clustering is great for performance, but it’s a storage trap. When Snowflake re-clusters a table, it creates new versions of the micro-partitions. This increases the amount of data stored in Time Travel. If you’re clustering a massive table frequently, your storage costs will spike. Only cluster tables where the query performance gain outweighs the storage overhead.
6. Avoid Over-Indexing (The ‘Wide Table’ Trap)
While Snowflake handles columnar storage efficiently, creating massive ‘one-big-table’ (OBT) views with redundant columns increases the storage footprint. I’ve found that normalizing certain dimensions and using joins in a view is often more storage-efficient than duplicating strings across a billion rows.
7. Monitor Storage with Account Usage Views
You can’t fix what you can’t measure. I use a custom dashboard to monitor the TABLE_STORAGE_METRICS view. This tells me exactly how much storage is being used by the active table versus Time Travel and Fail-safe.
As shown in the image below, focusing on the ‘Fail-safe’ and ‘Time Travel’ columns allows you to identify which tables are the primary cost drivers.
8. Use Compression-Friendly Data Types
Snowflake compresses data automatically, but your choice of data type still matters. Avoid using VARIANT for data that is strictly structured. While VARIANT is flexible, casting it to a structured column often results in better compression and faster scans.
9. Prune Your Stage Files
Internal stages can accumulate a massive amount of unused .csv or .json files from previous PUT commands. Use REMOVE @stage_name/path to clear out files once they have been successfully copied into a table.
10. Implement Automated Data Lifecycle Management
Don’t rely on manual deletes. I recommend building a ‘Janitor’ process—a Snowflake Task that runs weekly to drop tables in your DEV and QA environments that are older than 30 days.
Common Mistakes When Reducing Costs
- Over-aggressive dropping: Dropping production tables without a verified backup in an external stage.
- Ignoring Fail-safe: Thinking that
DROP TABLEimmediately deletes the data. Remember, Fail-safe keeps data for 7 days regardless. - Neglecting the ‘Small’ Tables: Focusing only on the TB-scale tables while ignoring thousands of small, redundant metadata tables.
Measuring Success
To know if your efforts are working, track your Average Storage Cost per GB over a 3-month rolling window. You should see a downward trend in the ‘Time Travel’ portion of your storage bill after implementing the retention changes mentioned above.
Ready to optimize your entire pipeline? Check out my guide on how to build a modern data stack to ensure your architecture is cost-effective from day one.