Lesson 20
Small Files in Delta Lake: Every Lever You Can Use to Fix Read Performance
In Databricks Delta Lake, small files hurt read performance by increasing metadata overhead, file-open costs, and task scheduling inefficiency. To fix this, engineers use techniques like OPTIMIZE compaction, auto-compaction, optimized writes, proper partitioning, Liquid Clustering, batching streaming writes, tuning file sizes, and reducing excessive parallelism to create larger, more efficient Parquet files for faster query execution.
Get the full lesson
Sign in to unlock everything beyond the preview — it's free.
- Take timestamped notes as you watch
- Read the full transcript and download resources
- Join the discussion and track your progress