Flagship Skill · Feature flagging

The feature flagging skill.

Flags as infrastructure, not as accumulating debt.

A senior engineer's playbook for using feature flags well, not just frequently. Codifies the operational discipline for flags as production infrastructure: the five flag types and the discipline of not mixing them, naming conventions, lifecycle from birth through death, targeting rules, rollout strategies, stale flag cleanup, governance, and the technical debt patterns that bite teams who weren't deliberate.

Audience: product managers and the engineers running production code behind their flags.

What this skill is for

The discipline that prevents the technical-debt outcome.

Feature flags are infrastructure. Treated as such, they enable kill switches, gradual rollouts, A/B experiments, permission gates, and operational toggles without redeploys. Treated casually, they become the largest accumulating technical debt in your codebase: thousands of dead flags, conflicting evaluation logic, brittle targeting, and a permission surface no one fully understands.

This skill is the discipline that prevents the second outcome. It assumes you have a feature flag platform; it does not advocate for one. It assumes your engineering team can implement targeting rules and SDK integration. The hard part is the operational discipline, and that is what is here.

The output is not platform configuration. The output is a flag inventory that stays healthy quarter over quarter: every flag has a clear type, owner, lifecycle, and rollout story. Stale flags get removed. Permissions tier correctly. Rollouts actually use the available abort criteria.

What is in the skill

Fourteen considerations covered in the body.

The SKILL.md spans the full flag lifecycle plus the cross-cutting operational concerns (governance, performance, observability) that make a flag inventory healthy at quarter nine.

  1. 01

    What this skill is for

    The skill spans the operational lifecycle of a flag from creation through retirement. It does not cover experiment design, statistical analysis, or platform-specific tooling.

  2. 02

    The five flag types

    Release, experiment, operational, permission, configuration. Each has different lifetime and removal expectations. Mixing them in one flag is the root cause of most flag mess.

  3. 03

    Flag naming conventions

    Typed prefix, owner prefix, semantic name, version or date. Vague names die a slow death; well-named flags survive code review and the cleanup playbook.

  4. 04

    The flag lifecycle

    Birth, adolescence, launch, maturity, death. Birth is fast (one PR); death requires intentional cleanup. Most flag mess is unfinished death.

  5. 05

    Targeting rules and segmentation

    Four target dimensions: user, account, request, time-based. Compose with AND, OR, NOT. Avoid volatile attributes; if your rule needs three nested clauses, your taxonomy is wrong.

  6. 06

    Rollout strategies

    Percentage, cohort, geo-staged, time-based, and combination strategies. The ramp-and-watch rule: at each step, monitor for at least one peak hour before advancing.

  7. 07

    Stale flag management

    Quarterly cadence. 30 days for release and experiment flags, 90 days for operational. One PR per removal. Make removal part of the launch checklist, not a separate effort.

  8. 08

    Governance and permissions

    Permission tiers (viewer, editor, approver, admin). Environment promotion (dev to staging to production with review gates). Audit trail. Emergency override drilled in incidents.

  9. 09

    Flag dependencies and conflicts

    Dependency: flag B requires flag A on. Conflict: two flags target the same surface. Detect via shared-key audit; coordinate cross-team via the experiment registry.

  10. 10

    Performance considerations

    Cache aggressively, use bulk evaluation, prefer server-side SDKs for sensitive logic. 5 ms total budget for fifty flag checks per request.

  11. 11

    Testing flag-gated code

    Both branches covered. Document transition behavior. Use test-only flag overrides for integration tests. Catch the staging-versus-production rule drift via staged rollout.

  12. 12

    Rollback discipline

    Flags enable instant rollback of the gated change. They are not a substitute for code rollback. Practice via incident drills.

  13. 13

    Observability on flags

    Log flag value as a contextual field on every request. Alert on evaluation rate changes. Build dashboards for production rollouts. Connect to error tracking during incidents.

  14. 14

    Common failures

    Rapid-fire pattern catalog: rollout rule wrong, cached value, dead flag pile, two-flag conflict, evaluation latency, permission tiering missing.

Reference files

Seven references that go alongside the SKILL.md.

Each reference is a self-contained doc the team can lift into a project: checklists, pattern catalogs, and the worked-out playbook for stale flag cleanup.

  • references/flag-naming-conventions.md

    Typed prefixes (release_, exp_, ops_, perm_, cfg_), owner prefixes, semantic naming patterns, and the migration plan for existing badly-named flags.

  • references/flag-lifecycle-checklist.md

    Phase-by-phase checklist (birth, adolescence, launch, maturity, death, audit) with explicit entry and exit criteria per phase.

  • references/flag-types-reference.md

    The five flag types in detail with worked examples, common pitfalls, and the anti-pattern of type drift.

  • references/stale-flag-cleanup-playbook.md

    Quarterly cleanup process: report, owner triage, triage meeting, removal PRs (one per flag), platform deletion, verification. Plus the orphan-ownership pattern.

  • references/targeting-rule-patterns.md

    Common patterns (percentage, internal-only, opt-in beta, cohort, geo, time-based, composition) and anti-patterns (volatile attributes, deeply-nested rules, drift between staging and production).

  • references/flag-rollout-strategies.md

    Five rollout strategies in detail with hold times and abort criteria. Worked example for a high-risk checkout redesign launch with the full 80-day timeline.

  • references/governance-and-permissions.md

    Permission tiers, environment-based scope, approval workflow, audit trail, emergency override, service accounts, the production-console-freeze anti-pattern.

Browse all reference files on GitHub

Where to use it

The full flag lifecycle.

At creation. Pick the type. Apply the naming convention. Set the target removal date in metadata. Document the rollout plan and abort criteria. The pre-experiment-readiness checklist for experiments applies to flag-gated rollouts too.

During rollout. The ramp-and-watch discipline. One peak hour at each percentage step. Abort criteria pre-committed. Production rules promoted from staging, not authored directly in production.

Post-launch. The 30-day review. Confirm production behavior matches the rollout expectation. Schedule the removal PR for release and experiment flags.

Quarterly cadence. The stale flag cleanup playbook. Generate the report, triage by owner, open one removal PR per flag, delete from the platform after the PR ships. Without this cadence, dead flags compound.

Where this skill goes next

Skill 2 of 3 in the PM-experimentation suite.

Feature-flagging is the operational layer below experiment-design. Most experiments are delivered via feature flags, so the two skills compose. Experiment-design covers the discipline above (hypothesis writing, sample size, decision-making); this skill covers the operational layer below (flag types, lifecycle, rollout, governance).

experimentation-analytics is the third skill in the suite, covering the analytical layer: variance reduction techniques (CUPED, stratified sampling, control variates), Bayesian alternatives, sequential testing math, and deeper interpretation of marginal results. Skill landing page lands when the skill ships.

An optional fourth skill, experimentation-platform-orchestrator, may follow after the three foundational skills land. That skill schedules; this skill operates; experiment-design designs.

Open source under MIT

Read the SKILL.md on GitHub.

The skill source lives in the rampstackco/claude-skills repository alongside dozens of other skills covering the full lifecycle of brand and product work. MIT licensed.

Frequently asked questions.

Why does the skill insist on five flag types?
Because mixing flag types is the root cause of most flag mess. A flag that started as a release ramp, got repurposed as an experiment, then ended up gating a permission, has no clear lifecycle and no clear owner. The five types (release, experiment, operational, permission, configuration) have different lifetimes and removal expectations. Picking one at creation and refusing to let it drift is the discipline.
Does it depend on a specific platform?
No. The principles work on LaunchDarkly, Flagsmith, Split.io, VWO FME, GrowthBook, Statsig, PostHog, and Optimizely equally. For platform-specific MCP commands and example prompts, pair this skill with the matching /integrations/{platform} microsite. The skill produces the operational shape; the microsite shows how the platform implements it.
How does this differ from experiment-design?
Experiment-design covers the discipline above: hypothesis writing, sample size, decision-making. Feature-flagging covers the operational layer below: flag types, naming, lifecycle, rollout, stale flag cleanup, governance. Most experiments are delivered via feature flags, so the two skills compose. Use experiment-design when designing the test; use feature-flagging when implementing and operating the flag that gates the test.
What about flag-as-config patterns and dynamic configuration?
Configuration flags are one of the five types. They live alongside the others, with their own lifecycle: governed by sales and product agreements, long-lived, evolved as contracts change. The skill covers them explicitly so teams treat configuration flags as first-class infrastructure rather than a special case.
Why does the skill spend so much time on stale flag cleanup?
Because cleanup is the discipline that separates teams with healthy flag inventories from teams with hundreds of dead flags. The cost of leaving flags in is real (two code paths to maintain, evaluation overhead, mental load) and compounds across hundreds of flags. The fix is a quarterly cadence plus making removal part of the launch checklist, not a separate effort.