Throwing 107 GB and 5 billion fake rows of order data at DuckDB and Athena

DuckDB is an in-process database management system for OLAP.
Athena is Amazon’s serverless analytics service.
Cloudflare R2 is an alternative to AWS S3 where egress is free of charge.

If you don’t care about DuckDB already, go ahead and read my other post. This post details some highlights of my experiments on a technical level. If you’re more into what it means for your hosting bill, I wrote about that too.

Test data

I used a dataset with one table containing 5 billion fake order lines from 1 billion orders over 10 years. The data is in snappy-compressed Parquet files, partitioned by year and month. The total file size is 107 GB (about 330 GB uncompressed). Here is a sample of records:

┌──────────────────────┬───────┬────────────┬──────────┬────────────────────┬────────────────────┬─────────────┬─────────────────────┬───────────┬─────────┬─────────┬──────────┬─────────┐
│       order_id       │ line  │ product_id │ quantity │       price        │     line_price     │ customer_id │       created       │ num_lines │   day   │  month  │ store_id │  year   │
│        int64         │ int64 │   int64    │  int64   │       double       │       double       │    int64    │      timestamp      │   int64   │ varchar │ varchar │ varchar  │ varchar │
├──────────────────────┼───────┼────────────┼──────────┼────────────────────┼────────────────────┼─────────────┼─────────────────────┼───────────┼─────────┼─────────┼──────────┼─────────┤
│  1135077679086384973 │     1 │        854 │        5 │ 16.083097153631755 │  80.41548576815877 │        1535 │ 2016-09-06 01:03:04 │         4 │ 6       │ 9       │ 8        │ 2016    │
│  1135077679086384973 │     2 │        561 │        3 │ 43.970102508551975 │ 131.91030752565592 │        1535 │ 2016-09-06 01:03:04 │         4 │ 6       │ 9       │ 8        │ 2016    │
│  1135077679086384973 │     3 │         99 │        3 │ 54.659783092799465 │  163.9793492783984 │        1535 │ 2016-09-06 01:03:04 │         4 │ 6       │ 9       │ 8        │ 2016    │
│  1135077679086384973 │     4 │        261 │        5 │ 33.972849604393296 │  169.8642480219665 │        1535 │ 2016-09-06 01:03:04 │         4 │ 6       │ 9       │ 8        │ 2016    │
│  4791786128601458882 │     3 │        854 │        6 │ 16.083097153631755 │  96.49858292179053 │        9267 │ 2016-09-06 04:15:31 │         5 │ 6       │ 9       │ 8        │ 2016    │
│  4791786128601458882 │     2 │         83 │        2 │  88.68105154796304 │ 177.36210309592607 │        9267 │ 2016-09-06 04:15:31 │         5 │ 6       │ 9       │ 8        │ 2016    │
│  4791786128601458882 │     1 │        368 │       10 │  33.87760446706489 │  338.7760446706489 │        9267 │ 2016-09-06 04:15:31 │         5 │ 6       │ 9       │ 8        │ 2016    │
│  4791786128601458882 │     4 │        781 │        2 │  97.59143722363255 │  195.1828744472651 │        9267 │ 2016-09-06 04:15:31 │         5 │ 6       │ 9       │ 8        │ 2016    │
│  4791786128601458882 │     5 │        396 │        2 │ 3.9038578062505547 │  7.807715612501109 │        9267 │ 2016-09-06 04:15:31 │         5 │ 6       │ 9       │ 8        │ 2016    │
│ -1029262612490749341 │     8 │        854 │       10 │ 16.083097153631755 │ 160.83097153631755 │        5408 │ 2016-09-06 05:27:59 │         9 │ 6       │ 9       │ 8        │ 2016    │
├──────────────────────┴───────┴────────────┴──────────┴────────────────────┴────────────────────┴─────────────┴─────────────────────┴───────────┴─────────┴─────────┴──────────┴─────────┤
│ 10 rows                                                                                                                                                                      13 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Each Parquet file has a size of approximately 50 MB. A sample of file paths:

orders/year=2023/month=3/data_0.parquet
orders/year=2023/month=3/data_10.parquet
orders/year=2023/month=3/data_11.parquet
orders/year=2023/month=3/data_12.parquet
orders/year=2023/month=3/data_13.parquet
orders/year=2023/month=3/data_14.parquet
orders/year=2023/month=3/data_15.parquet
orders/year=2023/month=3/data_1.pa rquet
orders/year=2023/month=3/data_2.parquet
orders/year=2023/month=3/data_3.parquet
orders/year=2023/month=3/data_4.parquet

Point DuckDB at local Parquet files

create or replace view orders as 
select * from parquet_scan('orders/*/*/*.parquet', hive_partitioning=1);

Get data to and from R2

I used rclone to upload data to R2 and got a throughput of ~470 Mb/s from a box in Helsinki. I downloaded the dataset to a box in Falkenstein at ~920 Mb/s.

Your rclone.conf needs a block like this:

[r2]
type = s3
provider = Cloudflare
env_auth = false
access_key_id = $access
secret_access_key = $secret
endpoint = https://$account.r2.cloudflarestorage.com
acl = private

Point DuckDB at R2

install httpfs;
load httpfs;
set s3_endpoint='$account.r2.cloudflarestorage.com';
set s3_region='auto';
set s3_access_key_id='$access';
set s3_secret_access_key='$secret';

create or replace view orders as 
select * from parquet_scan('s3://duckdata/year-month/orders/*/*/*.parquet', hive_partitioning=1);

Notably, my first attempt to query R2 failed while later attempts succeeded:

explain analyze select count(1) from orders;
 14% ▕████████▍                                                   ▏ Error: IO Error: Connection error for HTTP HEAD to 'https://duckdata.$account.r2.cloudflarestorage.com/year-month/orders/year%3D2019/month%3D9/data_4.parquet'

Upload data to S3

Again I used rclone to upload the dataset to AWS S3 and achieved a throughput of 650 Mb/s.

[s3]
type = s3
provider = AWS
env_auth = false
region = $region
access_key_id = $access
secret_access_key = $secret
acl = private

Create a table in Athena

CREATE EXTERNAL TABLE orders (
    order_id bigint,
    store_id string,
    customer_id string,
    product_id bigint,
    created timestamp,
    num_lines bigint,
    line bigint,
    quantity bigint,
    price double,
    line_price double,
    day varchar(2)
)
PARTITIONED BY (year varchar(4), month varchar(2))
STORED AS PARQUET
LOCATION 's3://$bucket/orders/'
tblproperties ("parquet.compression"="SNAPPY");

MSCK REPAIR TABLE orders;

I had to change some column types to string that were recognized as int64 by DuckDB. I didn’t investigate but please reach out if you know what might be the reason. I got errors like these:

HIVE_BAD_DATA: Field store_id's type BINARY in parquet file s3://$bucket/orders/year=2023/month=4/data_11.parquet is incompatible with type bigint defined in table schema
HIVE_BAD_DATA: Field order_id's type INT64 in parquet file s3://$bucket/orders/year=2023/month=4/data_8.parquet is incompatible with type varchar defined in table schema
HIVE_CURSOR_ERROR: Failed to read Parquet file: s3://$bucket/orders/year=2022/month=11/data_15.parquet

Query

This is not a benchmark. I wanted to get a sense of what one might expect.

Simple count

select count(1) from orders;

Athena	2.547s
DuckDB (Xeon 2017)	1.81s
DuckDB (Xeon 2017 + R2)	90s
DuckDB (Ampere)	2.50s
DuckDB (i9 2022)	0.797s

I decided not to run additional queries against the R2 storage due to how long this one took.

Count

select year, store_id, count(1) order_line_count
from orders
group by 1, 2
order by 3 desc, 1, 2
limit 10;

Athena	9.095s
DuckDB (Xeon 2017)	20.20s
DuckDB (Ampere)	10.39s
DuckDB (i9 2022)	6.76s

Count distinct

select year, store_id, count(distinct order_id) order_count
from orders
where year = '2022'
group by 1, 2
order by 3 desc, 1, 2
limit 10;

Athena	4.823s
DuckDB (Xeon 2017)	10.52s
DuckDB (Ampere)	6.88s
DuckDB (i9 2022)	4.57s

I had to add a filter by year to prevent out-of-memory errors on DuckDB. This is because DuckDB cannot use partitioning information for hash aggregates.

I attempted to rewrite the query so that it used a bunch of union all to query each partition individually, as DuckDB supports filter pushdown into its Parquet reader, but still got OOM errors, presumably because it attempts to run all queries in parallel.

Join

with orders_sub as (
  select 
    order_id, 
    product_id
  from 
    orders 
  where 
    year = '2022'
), 
order_pairs as (
  select 
    o1.product_id as product_id1, 
    o2.product_id as product_id2, 
    o1.order_id as order_id 
  from 
    orders_sub o1 
    join orders_sub o2 on o1.order_id = o2.order_id 
  where 
    o1.product_id < o2.product_id
) 
select 
  product_id1, 
  product_id2, 
  count(order_id) as num_orders 
from 
  order_pairs 
group by 
  product_id1, 
  product_id2 
order by 
  num_orders desc 
limit 
  10;

Athena	14.668s
DuckDB (Xeon 2017)	56.59s
DuckDB (Ampere)	39.78s
DuckDB (i9 2022)	39.05s

Hardware

Benchmarks by yabs.sh .

Xeon 2017

A KVM guest on Hetzner dedicated server with Xeon W-2145. Spinning disks, zfs on host.

Basic System Information:
---------------------------------
Uptime     : 13 days, 11 hours, 39 minutes
Processor  : Common KVM processor
CPU cores  : 16 @ 3695.998 MHz
AES-NI     : ❌ Disabled
VM-x/AMD-V : ❌ Disabled
RAM        : 62.8 GiB
Swap       : 975.0 MiB
Disk       : 9.8 TiB
Distro     : Debian GNU/Linux 11 (bullseye)
Kernel     : 5.10.0-21-amd64
VM Type    : KVM
Net Online : IPv4

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 280.18 MB/s  (70.0k) | 528.06 MB/s   (8.2k)
Write      | 280.92 MB/s  (70.2k) | 530.83 MB/s   (8.2k)
Total      | 561.11 MB/s (140.2k) | 1.05 GB/s    (16.5k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.97 GB/s     (3.8k) | 207.88 MB/s    (203)
Write      | 2.08 GB/s     (4.0k) | 221.72 MB/s    (216)
Total      | 4.06 GB/s     (7.9k) | 429.61 MB/s    (419)

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 894
Multi Core      | 5202

Ampere

Hetzner CAX41, Ampere Altra. NVMe SSD, ext4.

Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 32 minutes
Processor  : Neoverse-N1
CPU cores  : 16 @ ??? MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled
RAM        : 30.5 GiB
Swap       : 0.0 KiB
Disk       : 496.2 GiB
Distro     : Ubuntu 22.04.2 LTS
Kernel     : 5.15.0-69-generic
VM Type    : NONE
Net Online : IPv4 & IPv6

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 155.07 MB/s  (38.7k) | 1.19 GB/s    (18.5k)
Write      | 154.97 MB/s  (38.7k) | 1.22 GB/s    (19.1k)
Total      | 310.04 MB/s  (77.5k) | 2.41 GB/s    (37.7k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.56 GB/s     (3.0k) | 1.52 GB/s     (1.4k)
Write      | 1.70 GB/s     (3.3k) | 1.70 GB/s     (1.6k)
Total      | 3.26 GB/s     (6.3k) | 3.23 GB/s     (3.1k)

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 1088
Multi Core      | 8700

i9 2022

A WSL2 guest on i9-13900KF. Guest has its own NVMe SSD, xfs.

Basic System Information:
---------------------------------
Uptime     : 0 days, 5 hours, 18 minutes
Processor  : 13th Gen Intel(R) Core(TM) i9-13900KF
CPU cores  : 32 @ 2995.198 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ✔ Enabled
RAM        : 31.2 GiB
Swap       : 8.0 GiB
Disk       : 1.1 TiB
Distro     : Debian GNU/Linux 11 (bullseye)
Kernel     : 5.15.90.1-microsoft-standard-WSL2
VM Type    : WSL
Net Online : IPv4

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.41 GB/s   (353.7k) | 2.08 GB/s    (32.6k)
Write      | 1.41 GB/s   (354.7k) | 2.10 GB/s    (32.8k)
Total      | 2.83 GB/s   (708.4k) | 4.19 GB/s    (65.4k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 2.74 GB/s     (5.3k) | 3.01 GB/s     (2.9k)
Write      | 2.89 GB/s     (5.6k) | 3.21 GB/s     (3.1k)
Total      | 5.64 GB/s    (11.0k) | 6.22 GB/s     (6.0k)

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 2441
Multi Core      | 17531

Dig deeper

Video lecture by Mark Raasveldt: DuckDB Internals (CMU Advanced Databases / Spring 2023)
- Slides
groupby, join benchmarks published by DuckDB: Database-like ops benchmark