Are you folks looking to support iceberg? I know there's the duckdb iceberg extension, but its in a very unfinished state (doesn't support catalogs, predicate pushdown, writes ...).
We have a customer-facing web-app that mostly deals with pre-aggregated data that we keep in postgres, but some views need to drill down to very small slices of the raw data. The raw data lives in a partitioned iceberg table on S3 (with glue catalog).
I can query it with Athena of course, and I indeed tried doing that, but the latency was all over the place.
Rather than running trino ourselves, I ended up writing a tiny java api using the iceberg java libraries and the duckdb jdbc connector. Basically for my queries, I use the iceberg library to figure out which are the relevant parquet files to scan, and then query those with a
read_parquet([ <the list of files> ])
, and present the results to the client.
If duckdb/motherduck supported iceberg more robustly, we'd totally just throw away the little java service and use md!
We currently use the iceberg FindFiles helpers
and using the withRecordsMatching() method and applying the relevant filter expressions then build the
read_parquet()
and swap it into our query.
Ideally i'd be able to just write a more boring
SELECT blah FROM my_iceberg_table WHERE a = 1 AND customer_id = 2 AND timestamp > :timestamp
or
FROM iceberg_scan('my_table', catalog='glue', region='us-east-1')
.