Customer Hosted Query Compute
K
Kyle Lundstedt
(aka, hybrid deployment for MotherDuck)
Financial services institutions and other customers with sensitive data strongly prefer to keep their data on infrastructure that they control. While many have moved to Snowflake or Databricks for scalability, etc. those customers also are moving to "lakehouse"-style architectures with open table formats (OTF like Iceberg, Delta) hosted on cloud object storage (S3, Azure Blob, etc.).
They then use the cloud data platforms for compute, with access to their OTF data as external tables that remain on their object storage. This approach gives them more control over the data, and avoids lock-in with data platforms vendors. However, while the data is stored on customer-controlled cloud object storage, it still must be processed on infrastructure owned by the data platform vendors.
To avoid this need to rely on vendor infrastructure for computing on sensitive data, companies like Fivetran offer a "hybrid deployment model", in which data is stored AND processed on customer-owned infrastructure. This architecture grants their customers complete control over their data's flow, allowing them to meet specific business needs concerning data security.
I'd love to see a "hybrid deployment model" for MotherDuck (MD), too! It seems that MD's architecture would make this approach relatively straightforward, whereas other data platforms might struggle to provide the same.
- DuckDB and hence MD already can read (and soon write) OTF hosted on cloud object storage (the "storage layer") with near-native performance.
- MD's architecture also seems to have separated the "control plane" (MD's Service Layer, Catalog) from the "compute layer" (Ducklings).
- MD already has "dual execution" for running hybrid queries across local and remote compute.
While running DuckDB on the customers infrastructure is possible, it wouldn't be the same, since you would lose the concurrency, user management, etc that MD provides.
So, I'd love to have MD ducklings that could run remotely in VMs or containers (or even MD Wasm -download and run the duckling in the browser! 😁) on customer-controlled infrastructure, but still managed via the MD front-end (service layer).
K
Kyle Lundstedt
For that matter, running MD ducklings on the edge (e.g., Cloudflare Workers) would be super cool, too!