Open sourcing Dicer: Databricks's auto-sharder

(databricks.com)

59 points | by vivek-jain 4 hours ago

5 comments

  • charleshn 22 minutes ago
    > Application pods learn the current assignment through a library called the Slicelet (S for server side). The Slicelet maintains a local cache of the latest assignment by fetching it from the Dicer service and watching for updates. When it receives an updated assignment, the Slicelet notifies the application via a listener API.

    For a critical control plane component like this, I tend to prefer a constant work pattern [0], to avoid metastable failures [1], e.g. periodically pull the data instead of relying on notifications.

    [0] https://aws.amazon.com/builders-library/reliability-and-cons...

    [1] https://brooker.co.za/blog/2021/05/24/metastable.html

  • khaki54 2 hours ago
    Seems weird to call it sharding since it's not sharding indexed datasets or anything like that. Is this just a tool to mitigate Databricks’ internal service-scaling challenges?
    • atuladya 54 minutes ago
      Right - this is not about sharding data/datasets. This is for sharding in-memory state that a service might have. The problem of building services at low cost, high scale, low latency and high throughput is common in many environments including our services at Databricks, and Dicer helps with that.
  • ayf 4 hours ago
    Does anyone else have something similar?

    What are some use cases that you found are useful?

    • louis-paul 1 hour ago
      • atuladya 56 minutes ago
        It is similar to Slicer in terms of the abstraction (I built Slicer at Google) but the architecture, implementation and algorithms have a lot of differences
        • bigwheels 42 minutes ago
          Did you also work on this databricks dicery?
    • WookieRushing 1 hour ago
      These show up once you have a certain scale where it is either cost inefficient or the hot spots are very dynamic. They also try to avoid latency by being eventually consistent sidecars instead of proxies.

      I’ve seen them used for traffic routing, storage system metadata systems, distributed cache etc

    • vivek-jain 4 hours ago
      Sharded in-memory caching turns out to be rather useful at scale :)

      Some of the key examples highlighted on our blog are Unity Catalog, which is essentially the metadata layer for Databricks, our Query Orchestration Engine, and our distributed remote cache. See the blog post for more!

  • vivek-jain 4 hours ago
    [dead]