Feature Requests

Full support for querying Iceberg tables
Are you folks looking to support iceberg? I know there's the duckdb iceberg extension, but its in a very unfinished state (doesn't support catalogs, predicate pushdown, writes ...). We have a customer-facing web-app that mostly deals with pre-aggregated data that we keep in postgres, but some views need to drill down to very small slices of the raw data. The raw data lives in a partitioned iceberg table on S3 (with glue catalog). I can query it with Athena of course, and I indeed tried doing that, but the latency was all over the place. Rather than running trino ourselves, I ended up writing a tiny java api using the iceberg java libraries and the duckdb jdbc connector. Basically for my queries, I use the iceberg library to figure out which are the relevant parquet files to scan, and then query those with a read_parquet([ <the list of files> ]) , and present the results to the client. If duckdb/motherduck supported iceberg more robustly, we'd totally just throw away the little java service and use md! We currently use the iceberg FindFiles helpers https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.html specifically builder there: https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.Builder.html and using the withRecordsMatching() method and applying the relevant filter expressions then build the read_parquet() and swap it into our query. Ideally i'd be able to just write a more boring SELECT blah FROM my_iceberg_table WHERE a = 1 AND customer_id = 2 AND timestamp > :timestamp or FROM iceberg_scan('my_table', catalog='glue', region='us-east-1') . Background: https://motherduckcommunity.slack.com/archives/C058S3CUEAG/p1719443889927559?thread_ts=1719414165.050349&cid=C058S3CUEAG
0
Load More