SNOWFLAKE ARCHITECTURE 101

Everyone's talking about Snowflake. But what actually happens when you run a query? Let's open the bonnet and look at the engine - no marketing slides, just the real picture.

SNOWFLAKE CLOUD DWH SQL BEGINNER

First - Why Did Snowflake Even Exist?

Old-school data warehouses like Teradata and Netezza were powerful - but they were physical boxes. You had to buy hardware upfront, tune it yourself, and pray your query didn't kill the server during peak hours. Scaling meant buying more machines.

Then cloud came. Amazon launched Redshift. Google launched BigQuery. But Snowflake came in 2012 with a different idea: what if storage and compute were completely separate? That one decision changed everything.

The Big Idea

In traditional databases, your data and your query engine are glued together. Snowflake unstuck them. Your data lives in S3. Your query engine (virtual warehouse) is rented on demand. Scale one without touching the other.

The Three-Layer Architecture

Snowflake has exactly three layers. Every query you run passes through all three.

// SNOWFLAKE'S 3-LAYER ARCHITECTURE

graph TB A["☁️ Cloud Services Layer\n(The Brain)\nAuthentication · Metadata · Optimizer"] --> B["⚙️ Query Processing Layer\n(The Muscle)\nVirtual Warehouses - XS to 6XL"] B --> C["💾 Centralized Storage Layer\n(The Memory)\nAWS S3 / Azure Blob / GCS"] style A fill:#ffd700,stroke:#ffd700,color:#000 style B fill:#00d9ff,stroke:#00d9ff,color:#000 style C fill:#333,stroke:#00d9ff,color:#fff

Layer 1 - Storage: The Cold Hard Truth

Your data doesn't live inside Snowflake's servers. It lives in cloud object storage - S3 if you're on AWS, Blob if you're on Azure, GCS on Google Cloud. Snowflake manages this storage on your behalf.

Here's what makes this storage special:

Columnar format - Unlike a spreadsheet that reads rows left to right, Snowflake stores each column separately. If you query only ORDER_AMOUNT, it reads only that column's file. Not the whole table.
Compressed - Each column file is compressed automatically. Your 100GB CSV might become 10GB in storage. Less storage cost, faster reads.
Micro-partitioned - Data is split into small chunks (~50–500MB each), and each chunk has metadata (min/max values, counts). This lets Snowflake skip entire chunks when they don't match your WHERE clause.

-- When you run this query:
SELECT order_amount
FROM orders
WHERE order_date = '2026-01-15';

-- Snowflake doesn't scan the whole table.
-- It checks the metadata of each micro-partition:
-- "Does this chunk have any rows from 2026-01-15?"
-- If NO → skip it entirely.
-- This is called "Partition Pruning" and it's the reason
-- Snowflake feels fast even on billion-row tables.

Layer 2 - Virtual Warehouses: The Query Engine

A Virtual Warehouse (VW) is a cluster of compute resources that actually runs your SQL. Think of it as a temporarily rented team of CPUs and RAM that you switch on, use, and switch off.

They come in sizes from XS to 6XL. Each size jump roughly doubles the compute power - and the cost per hour:

X-Small - 1 server. Great for development, light queries.
Small - 2 servers. Small team analytics.
Medium - 4 servers. Most production workloads.
Large → 6X-Large - For big transformations, heavy data loads.

Common Beginner Mistake

Bigger warehouse ≠ always faster. If your query only reads 1GB of data, an XL warehouse won't be faster than an S - it just costs more. Bigger warehouses help when your query is CPU-heavy (complex joins, massive aggregations). Right-size your warehouse.

The killer feature: multiple warehouses can query the same data simultaneously, without blocking each other. Your BI team's dashboard queries don't slow down your data engineering load jobs. They use separate virtual warehouses, both reading from the same S3 storage.

// MULTI-CLUSTER WORKLOAD ISOLATION

graph TD S["☁️ Shared Storage\n(S3 - your data)"] S --> A["VW: Engineering\nRuns dbt transforms"] S --> B["VW: Analytics\nPowers Tableau"] S --> C["VW: Data Science\nJupyter notebooks"] style S fill:#ffd700,stroke:#ffd700,color:#000 style A fill:#00d9ff,stroke:#00d9ff,color:#000 style B fill:#00d9ff,stroke:#00d9ff,color:#000 style C fill:#00d9ff,stroke:#00d9ff,color:#000

Layer 3 - Cloud Services: The Brain

This is the layer nobody talks about - but it does all the invisible heavy lifting:

Authentication - verifying who you are before letting you run queries
Query Optimizer - reads your SQL and figures out the most efficient way to run it (which partitions to scan, what join order, etc.)
Metadata Manager - stores information about every table, column, partition, and query ever run
Transaction Manager - ensures ACID guarantees (no partial writes)

This layer is fully managed by Snowflake. You never configure it, patch it, or worry about it. It just works. And it's what makes Snowflake feel effortless compared to managing your own Hadoop cluster.

The Result Cache - Free Speed

Here's a feature that blew my mind when I first encountered it.

Snowflake caches the results of every query for 24 hours. If you - or anyone else in your account - runs the exact same query again, Snowflake skips the virtual warehouse entirely and returns the cached result instantly. Zero compute cost.

-- First run: takes 8 seconds, costs compute credits
SELECT region, SUM(revenue) FROM sales GROUP BY region;

-- Second run (same query, same data, within 24h): takes < 1 second
-- Warehouse doesn't even wake up.
-- Snowflake checks: "Is the underlying data changed? No? → serve cache."
SELECT region, SUM(revenue) FROM sales GROUP BY region;

This is why Snowflake dashboards can feel snappy even for complex queries - if your BI tool hits the same query multiple times a day, only the first run costs you.

Time Travel - Yes, Really

Snowflake gives you the ability to query data as it existed in the past. This is called Time Travel.

-- What did this table look like 2 hours ago?
SELECT * FROM orders AT (OFFSET => -60*120);

-- What did it look like before someone accidentally deleted rows?
SELECT * FROM orders BEFORE (STATEMENT => '01a2b3c4-xxxx');

-- Accidentally dropped a table? Restore it.
UNDROP TABLE orders;

The default is 1 day of time travel (up to 90 days on Enterprise tier). This has saved more than a few careers when someone ran DELETE FROM orders without a WHERE clause.

The Architecture Summary

Storage is cheap and shared → Compute is isolated and elastic → Cloud services is the brain that glues it all together. This separation is why Snowflake can scale to petabytes without you managing a single server.

One Last Thing - How Does Snowflake Make Money?

You pay for two things: storage (per TB/month) and compute (credits per hour, per warehouse size).

This means Snowflake is incentivised to make your queries run faster - the quicker your warehouse finishes, the sooner it shuts down, but the more likely you are to run more queries. Aligned incentives make for a well-engineered product.

What's Next?

You now know where data is stored and queried. The next question is: what happens when your data is too big for even Snowflake to transform on its own? That's where Apache Spark comes in. Read Apache Spark Architecture 101 →

NEXT BRIEFING →

APACHE SPARK ARCHITECTURE 101

← PREVIOUS BRIEFING

DATA ENGINEERING LIFE CYCLE