Skip to content
Feature flags in production: risk management without slowing development

Feature flags in production: risk management without slowing development

A
abemon
| | 11 min read
Share

The problem feature flags solve

There is a fundamental tension in software development: engineering wants to deploy code frequently (ideally several times a day), while the business wants to control when and to whom a new feature is exposed. Without feature flags, these needs are incompatible. Either you deploy and everyone sees the new functionality, or you maintain a long-lived branch that diverges from main and generates merge conflicts that nobody wants to resolve on a Friday afternoon.

Feature flags decouple code deployment from feature activation. It is a fundamental practice of modern custom development. You can merge code to main, deploy it to production, and keep it invisible until you decide to turn it on. Conceptually straightforward. In practice, most feature flag implementations become a source of complexity, technical debt, and subtle bugs.

We have been using feature flags across all our production projects for three years. Here is what we have learned.

A taxonomy of flags

Not all feature flags serve the same purpose, and treating them homogeneously is the first mistake. We distinguish four types:

Release flags. Control visibility of a new feature. They have a short lifespan: created when the feature starts deploying and removed when it is available to all users. Typical life: 2-6 weeks.

Experiment flags. Used for A/B testing. They split traffic between variants and collect data to inform a decision. Typical life: 2-8 weeks. Require integration with the analytics system.

Ops flags. Operational switches that allow disabling features or degrading services during incidents. The classic example: a kill switch for an integration with an external service that goes down. Typical life: indefinite, but should be reviewed periodically.

Permission flags. Control feature access by user segment (pricing plan, role, geography). They most closely resemble business configuration and have the longest lifespan.

Each type has different requirements for evaluation, storage, and lifecycle. Treating an experiment flag with the same infrastructure as an ops flag is like using the same tool for screwing and for hammering.

Implementation patterns

The naive if/else

The most basic implementation is a conditional:

if feature_flags.is_enabled("new_checkout"):
    return render_new_checkout(request)
else:
    return render_old_checkout(request)

Works for one flag. For fifty, the code becomes unreadable, untestable, and fragile. Each flag adds a branch that multiplies execution paths. Ten binary flags create 1,024 possible combinations. You are not going to test all 1,024.

Strategy pattern

A cleaner implementation uses the strategy pattern:

class CheckoutStrategy:
    def render(self, request): ...

class NewCheckout(CheckoutStrategy): ...
class OldCheckout(CheckoutStrategy): ...

strategy = flag_router.get_strategy("checkout", request.user)
return strategy.render(request)

The flag determines which strategy gets injected, and the business code does not know a flag exists. This makes features testable independently and makes flag removal trivial: delete the old strategy and make the new one the default.

Trunk-based development with flags

The workflow that works best with feature flags is trunk-based development: everyone works on main (or very short-lived branches that merge to main daily), and incomplete features hide behind flags.

Practical rules:

  1. Never deploy a flag enabled by default. If the feature is not ready, the flag is off.
  2. Create the flag before writing the first line of code. Do not retrofit flags into existing code.
  3. Every flag has an owner and an expiration date. Without an owner, nobody will remove it.
  4. Test the flag in both positions. Your CI should run tests with the flag on and with the flag off.

Progressive delivery

Feature flags enable progressive delivery strategies that drastically reduce risk:

Canary release. Activate the feature for 1% of traffic. Monitor error rate, latency, business metrics. If everything looks good, scale to 5%, then 25%, then 100%. If something fails at 1%, reverting means turning off the flag. Revert time: seconds, not the 15-30 minutes of a deployment rollback.

Targeted rollout. Activate the feature for a specific segment: internal users, beta testers, customers on a specific plan, users in a geography. This allows validation in production with real users before general launch.

Scheduled activation. Marketing wants to launch the feature on Tuesday at 10:00 AM. With a scheduled flag, the engineering team deploys the code on Friday (when they have time to verify), and the flag activates automatically on Tuesday. Zero-stress launch.

We used canary releases on a shipment processing system handling 3,000 daily operations. The new route calculation logic was deployed by activating it first only for domestic shipments (60% of volume but simpler). Two weeks later, we activated for international. The three bugs we caught in the first phase never reached international traffic.

The debt of stale flags

Here is the problem nobody wants to face. Every feature flag that is not removed after fulfilling its purpose is technical debt. Not theoretical debt. Real debt that causes bugs.

A case we lived through in 2024: a release flag that had been active for 100% of traffic for 9 months. Nobody had removed it. A new developer changed the module’s logic without realizing there was an alternative execution path (the flag-off branch) still in the code. Six months later, someone accidentally disabled the flag during a configuration change. The system reverted to logic from 15 months ago. It took us 3 hours to diagnose.

Lifecycle policies

The only defense against flag debt is a strict lifecycle policy:

Mandatory TTL. Every flag has an expiration date. For release flags, 30 days after 100% activation. For experiment flags, 14 days after the decision. The flag system should alert (or outright fail) when a flag exceeds its TTL.

Count and limit. We monitor the total number of active flags. Our internal limit is 25 concurrent flags. When we approach it, cleanup happens before new flags are created. It sounds arbitrary, but without an explicit limit, the count grows indefinitely.

Flag removal as an engineering task. Not tech debt backlog. It is part of the definition of done for the feature: the flag is removed in the sprint following complete rollout. If the flag is not removed, the feature is not finished.

Dead code detection. Tools like Piranha (from Uber, open source) analyze code and detect flags that always evaluate to the same position. If a flag has been evaluating to true for 60 days, the conditional can be removed and the code simplified automatically.

Tools: LaunchDarkly and alternatives

LaunchDarkly is the market leader for good reason: edge evaluation with microsecond latency, complex targeting, integration with everything, complete audit trail. For large teams with hundreds of flags and compliance requirements, it is the reference.

The problem is pricing. LaunchDarkly starts at USD 10 per seat per month on the Pro plan, but the real cost for a 20-person team with intensive usage can exceed USD 1,000 per month. For startups and SMEs, this does not always fit.

Alternatives we have evaluated:

Unleash. Open source, self-hosted. Server-side evaluation with SDKs for major languages. Functional UI for flag management. The pragmatic option if you already have your own infrastructure and do not need edge evaluation. Cost: hosting one container.

Flipt. Open source, written in Go. Extremely lightweight. Server-side evaluation. No external dependencies. Ideal for teams that want simple flags without complex operations. Lacks advanced targeting.

GrowthBook. Open source with a hosted plan. Oriented toward A/B testing with feature flags. Good integration with data warehouses for experiment analysis. The best option if your primary use case is experimentation.

OpenFeature. Not a tool but a standard (CNCF). It defines a common API for feature flags that allows switching providers without modifying application code. Think OpenTelemetry but for flags. If you want to avoid vendor lock-in, coding against the OpenFeature API and connecting your chosen provider is the most future-proof strategy.

For our projects, we use Unleash in production with the OpenFeature API in application code. This gives us the flexibility to switch to any other provider (including LaunchDarkly) if needs grow, without touching application logic.

Anti-patterns

Three anti-patterns we have seen repeatedly:

Flags as permanent business configuration. A flag created to “enable the premium module for client X” that stays forever because it is convenient. This is not a feature flag; it is application configuration. It should live in a configuration system, not in the flag system.

Nested flags. A flag that only takes effect if another flag is active. Dependencies between flags create combinatorial complexity that is impossible to test and difficult to reason about. Rule: if a flag depends on another, one of them is redundant.

Flag evaluation in hot paths. Evaluating a flag on every iteration of a loop processing 100,000 records. Flag evaluation should happen once per request or per operation, not once per record. Cache the decision at the start of the process.

Feature flags as engineering culture

Feature flags are not a tool. They are an engineering practice that reflects a culture of safe deployment and controlled experimentation. Using them well requires discipline: create with intention, review frequently, remove with urgency.

The temptation to leave flags “just in case” is real. Resist it. Every active flag is a conditional in your code that someone will need to understand, test, and maintain. The number of flags should not grow over time; it should remain stable, with new flags replacing retired ones in a continuous flow.

A well-managed flag system is invisible: nobody thinks about it because it simply works. A poorly managed flag system is a minefield of unpredictable bugs, untested combinations, and logic that nobody understands. The difference is lifecycle management, not the tool. For more on deployment and cloud/DevOps strategy, see our approach to infrastructure operations.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.