Full support for querying Iceberg tables | Voters

Full support for querying Iceberg tables
Chris Atkins
Are you folks looking to support iceberg? I know there's the duckdb iceberg extension, but its in a very unfinished state (doesn't support catalogs, predicate pushdown, writes ...).
We have a customer-facing web-app that mostly deals with pre-aggregated data that we keep in postgres, but some views need to drill down to very small slices of the raw data. The raw data lives in a partitioned iceberg table on S3 (with glue catalog).
I can query it with Athena of course, and I indeed tried doing that, but the latency was all over the place.
Rather than running trino ourselves, I ended up writing a tiny java api using the iceberg java libraries and the duckdb jdbc connector. Basically for my queries, I use the iceberg library to figure out which are the relevant parquet files to scan, and then query those with a read_parquet([ <the list of files> ])
, and present the results to the client.
If duckdb/motherduck supported iceberg more robustly, we'd totally just throw away the little java service and use md!
We currently use the iceberg FindFiles helpers
https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.html
specifically builder there: https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.Builder.html
and using the withRecordsMatching() method and applying the relevant filter expressions then build the read_parquet()
 and swap it into our query.
Ideally i'd be able to just write a more boring SELECT blah FROM my_iceberg_table WHERE a = 1 AND customer_id = 2 AND timestamp > :timestamp
 or FROM iceberg_scan('my_table', catalog='glue', region='us-east-1')
.
Background: https://motherduckcommunity.slack.com/archives/C058S3CUEAG/p1719443889927559?thread_ts=1719414165.050349&cid=C058S3CUEAG
June 27, 2024