Get started
Back to blog

Why Flagify Uses Deterministic Hashing for Rollouts

When you set a flag to roll out to 50% of users, Flagify evaluates it like this:

hash := murmur3.Sum32([]byte(flagKey + ":" + userID))
return int(hash%100) < percentage

That is the entire rollout decision. No database lookup, no stored state, no coordination between servers. This is intentional, and this post explains why.

How it works

For each flag evaluation, Flagify computes a 32-bit hash from the combination of the flag key and the user ID. The hash is mapped to a number between 0 and 99 using modulo. If that number falls below the rollout percentage, the user receives the flag’s value override. Otherwise, they fall through to the default.

Because the hash function is deterministic, the same user always maps to the same number for the same flag. A user who receives true for new-checkout-flow today will receive true tomorrow, next week, and after your servers restart — without any persistent record of that assignment.

Why stateless evaluation matters at scale

Flagify is designed to evaluate flags in the hot path of your application. That means millions of evaluations per day, across multiple server instances, with latency expectations in the single-digit milliseconds.

A stateless evaluation model delivers this by design:

  • No per-user database queries. The evaluation engine loads flag configuration and targeting rules — which are cached per environment — and then runs pure computation. Adding a user state lookup on every evaluation would introduce a database round-trip that compounds at scale.
  • No cross-instance coordination. Any server can evaluate any flag for any user independently and arrive at the same result. There is no shared mutable state to synchronize.
  • Horizontal scalability without friction. Spinning up additional instances does not require cache warming, data replication, or handoff protocols for user bucket state.

The hash computation itself takes nanoseconds. Combined with environment-level caching of flag configuration, Flagify can evaluate all flags for a user in a single request without meaningful latency overhead.

Determinism is the core contract

The most important property of this system is not performance — it is consistency.

When a user is included in a rollout, they stay included. Their experience does not change between sessions, between API calls, or between server restarts. This is the contract that makes gradual rollouts safe to use in production.

If you change the rollout percentage from 30% to 50%, users already in the 0–29 range remain unaffected. The additional 20% is drawn from users who previously fell in the 30–49 range. Most users see no change. The rollout expands at the edges, not randomly across the whole population.

This stability does not require any stored state. It is a mathematical property of the hash function.

Why exact distribution is not the goal

A common question: if you set 50%, will exactly half your users receive the flag?

At scale, yes — statistically. With 100,000 users, you will see a distribution very close to 50/50. The deviation shrinks as N grows.

With four users, you might get 3/1 or even 4/0. This is not a bug. Percentage rollouts are a tool for gradual, probabilistic exposure at scale. They are not a partitioning mechanism for small user sets.

The alternative — storing explicit bucket assignments so that every new user is placed in the least-populated group — appears to solve this. In practice, it introduces a set of problems that outweigh the benefit:

Race conditions. Under concurrent load, multiple server instances may read the same bucket count before any write is committed. Without strict locking or atomic counters, users pile into the same bucket. Fixing this requires database-level serialization that creates contention proportional to your traffic.

Increased latency. Every new user evaluation requires a read (does this user have an assignment?) and potentially a write (assign them to a bucket). These are sequential operations on the critical path, adding 5–15ms or more depending on database proximity.

Database growth. Bucket assignments accumulate at a rate of flags × users × environments. At any meaningful scale, this table grows into tens or hundreds of millions of rows with no natural expiration.

Temporal bias. Users who encounter the flag early are systematically different from users who encounter it later. Filling buckets in arrival order conflates user behavior with evaluation order.

Percentage changes. If you change a rollout from 30% to 50%, stored assignments do not automatically rebalance. Users near the boundary have no clear resolution without manual intervention or a migration that touches millions of rows.

The deterministic hash approach sidesteps all of these problems. Changing the percentage is instantaneous, consistent across all instances, and requires no data changes.

When to use segments instead

If you need exact, explicit control over which users receive a flag — for internal testing, beta groups, or organization-level rollouts — use segments.

A segment defines a set of rules evaluated against user attributes:

plan == "enterprise"
role == "admin"
userId in ["user-alice", "user-bob", "user-carol"]

Segments give you deterministic, auditable targeting with no probabilistic behavior. If you are working with a small user population and exact membership matters, segments are the right tool. Percentage rollouts are designed for large populations where statistical approximation is sufficient and explicit enumeration is impractical.

The broader pattern

This design mirrors how mature feature flag systems handle rollouts at scale. Probabilistic, stateless bucketing works because the goal of a gradual rollout is not to achieve an exact count — it is to gradually increase exposure to a change while keeping individual user experience stable. Deterministic hashing achieves both properties without any operational overhead.

Where systems diverge is in experimentation. A/B testing platforms that need rigorous statistical validity — controlled experiments with defined sample sizes and power calculations — often do store explicit assignments. That is a different problem with different requirements. For deployment safety and progressive delivery, stateless hashing is the correct model.