Feature Requests | MotherDuck

Boards

Feature Requests

All our data lives in GCP, and egress cost to AWS is a non-starter. We specifically are interested in us-central1 region support.

Better support for showing view definitions

Currently it is possible to get a view's underlying SQL, but only by copying it to your clipboard. It would be more convenient if you could just see it inline (formatted of course, since view definitions don't seem to retain the original whitespace).

Europe (EU) Region Hosting

in progress

Full support for querying Iceberg tables

Are you folks looking to support iceberg? I know there's the duckdb iceberg extension, but its in a very unfinished state (doesn't support catalogs, predicate pushdown, writes ...). We have a customer-facing web-app that mostly deals with pre-aggregated data that we keep in postgres, but some views need to drill down to very small slices of the raw data. The raw data lives in a partitioned iceberg table on S3 (with glue catalog). I can query it with Athena of course, and I indeed tried doing that, but the latency was all over the place. Rather than running trino ourselves, I ended up writing a tiny java api using the iceberg java libraries and the duckdb jdbc connector. Basically for my queries, I use the iceberg library to figure out which are the relevant parquet files to scan, and then query those with a read_parquet([ <the list of files> ]) , and present the results to the client. If duckdb/motherduck supported iceberg more robustly, we'd totally just throw away the little java service and use md! We currently use the iceberg FindFiles helpers https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.html specifically builder there: https://iceberg.apache.org/javadoc/1.5.2/org/apache/iceberg/FindFiles.Builder.html and using the withRecordsMatching() method and applying the relevant filter expressions then build the read_parquet() and swap it into our query. Ideally i'd be able to just write a more boring SELECT blah FROM my_iceberg_table WHERE a = 1 AND customer_id = 2 AND timestamp > :timestamp or FROM iceberg_scan('my_table', catalog='glue', region='us-east-1') . Background: https://motherduckcommunity.slack.com/archives/C058S3CUEAG/p1719443889927559?thread_ts=1719414165.050349&cid=C058S3CUEAG

Auto-format SQL in workbook cells

https://github.com/quarylabs/sqruff has a DuckDB dialect that works well enough, though I wish it defaulted to lowercase. It's in Rust so you should be able to compile to wasm easily enough.

Scheduled queries / notebooks

Received some user feedback that there should be a way to schedule queries to run automatically on a cadence.

Customer Hosted Query Compute

(aka, hybrid deployment for MotherDuck) Financial services institutions and other customers with sensitive data strongly prefer to keep their data on infrastructure that they control. While many have moved to Snowflake or Databricks for scalability, etc. those customers also are moving to "lakehouse"-style architectures with open table formats (OTF like Iceberg, Delta) hosted on cloud object storage (S3, Azure Blob, etc.). They then use the cloud data platforms for compute, with access to their OTF data as external tables that remain on their object storage. This approach gives them more control over the data, and avoids lock-in with data platforms vendors. However, while the data is stored on customer-controlled cloud object storage, it still must be processed on infrastructure owned by the data platform vendors. To avoid this need to rely on vendor infrastructure for computing on sensitive data, companies like Fivetran offer a "hybrid deployment model", in which data is stored AND processed on customer-owned infrastructure. This architecture grants their customers complete control over their data's flow, allowing them to meet specific business needs concerning data security. I'd love to see a "hybrid deployment model" for MotherDuck (MD), too! It seems that MD's architecture would make this approach relatively straightforward, whereas other data platforms might struggle to provide the same. DuckDB and hence MD already can read (and soon write) OTF hosted on cloud object storage (the "storage layer") with near-native performance. MD's architecture also seems to have separated the "control plane" (MD's Service Layer, Catalog) from the "compute layer" (Ducklings). MD already has "dual execution" for running hybrid queries across local and remote compute. While running DuckDB on the customers infrastructure is possible, it wouldn't be the same, since you would lose the concurrency, user management, etc that MD provides. So, I'd love to have MD ducklings that could run remotely in VMs or containers (or even MD Wasm -download and run the duckling in the browser! 😁) on customer-controlled infrastructure, but still managed via the MD front-end (service layer).

Mandate SSO for Organization Members

Be able to mandate that users who sign up with email addresses matching our organization's domain use SSO so that we may prevent email/password access after a user has been removed from our Google Workspace, for example.

Support Postgres Protocol for broad BI Compatibility

May be a bit of a stretch ask, but I'd love to see a MotherDuck-native feature akin to https://github.com/jwills/buenavista that allows BI tools to query against MotherDuck as if it were a Postgres server. This would be a pretty cool way to instantly support the majority of BI tools, imo!

→