Row Storage (CSV) vs Column Storage (Parquet) CSV: Row by Row name age city Alice 25 NYC Bob 30 LA Carol 28 CHI ← Must read ALL rows + columns SELECT AVG(age) FROM data Reads 9 cells to get 3 values vs Parquet: Column by Column name Alice Bob Carol age 25 30 28 city NYC LA CHI ← Only reads the age column SELECT AVG(age) FROM data Reads 3 cells to get 3 values With 50 columns and 1M rows, Parquet reads 1 column. CSV reads all 50. That is a 50x difference in I/O.