Flagship Skill · Feature flagging
The feature flagging skill.
Flags as infrastructure, not as accumulating debt.
A senior engineer's playbook for using feature flags well, not just frequently. Codifies the operational discipline for flags as production infrastructure: the five flag types and the discipline of not mixing them, naming conventions, lifecycle from birth through death, targeting rules, rollout strategies, stale flag cleanup, governance, and the technical debt patterns that bite teams who weren't deliberate.
Audience: product managers and the engineers running production code behind their flags.
What this skill is for
The discipline that prevents the technical-debt outcome.
Feature flags are infrastructure. Treated as such, they enable kill switches, gradual rollouts, A/B experiments, permission gates, and operational toggles without redeploys. Treated casually, they become the largest accumulating technical debt in your codebase: thousands of dead flags, conflicting evaluation logic, brittle targeting, and a permission surface no one fully understands.
This skill is the discipline that prevents the second outcome. It assumes you have a feature flag platform; it does not advocate for one. It assumes your engineering team can implement targeting rules and SDK integration. The hard part is the operational discipline, and that is what is here.
The output is not platform configuration. The output is a flag inventory that stays healthy quarter over quarter: every flag has a clear type, owner, lifecycle, and rollout story. Stale flags get removed. Permissions tier correctly. Rollouts actually use the available abort criteria.
What is in the skill
Fourteen considerations covered in the body.
The SKILL.md spans the full flag lifecycle plus the cross-cutting operational concerns (governance, performance, observability) that make a flag inventory healthy at quarter nine.
01
What this skill is for
The skill spans the operational lifecycle of a flag from creation through retirement. It does not cover experiment design, statistical analysis, or platform-specific tooling.
02
The five flag types
Release, experiment, operational, permission, configuration. Each has different lifetime and removal expectations. Mixing them in one flag is the root cause of most flag mess.
03
Flag naming conventions
Typed prefix, owner prefix, semantic name, version or date. Vague names die a slow death; well-named flags survive code review and the cleanup playbook.
04
The flag lifecycle
Birth, adolescence, launch, maturity, death. Birth is fast (one PR); death requires intentional cleanup. Most flag mess is unfinished death.
05
Targeting rules and segmentation
Four target dimensions: user, account, request, time-based. Compose with AND, OR, NOT. Avoid volatile attributes; if your rule needs three nested clauses, your taxonomy is wrong.
06
Rollout strategies
Percentage, cohort, geo-staged, time-based, and combination strategies. The ramp-and-watch rule: at each step, monitor for at least one peak hour before advancing.
07
Stale flag management
Quarterly cadence. 30 days for release and experiment flags, 90 days for operational. One PR per removal. Make removal part of the launch checklist, not a separate effort.
08
Governance and permissions
Permission tiers (viewer, editor, approver, admin). Environment promotion (dev to staging to production with review gates). Audit trail. Emergency override drilled in incidents.
09
Flag dependencies and conflicts
Dependency: flag B requires flag A on. Conflict: two flags target the same surface. Detect via shared-key audit; coordinate cross-team via the experiment registry.
10
Performance considerations
Cache aggressively, use bulk evaluation, prefer server-side SDKs for sensitive logic. 5 ms total budget for fifty flag checks per request.
11
Testing flag-gated code
Both branches covered. Document transition behavior. Use test-only flag overrides for integration tests. Catch the staging-versus-production rule drift via staged rollout.
12
Rollback discipline
Flags enable instant rollback of the gated change. They are not a substitute for code rollback. Practice via incident drills.
13
Observability on flags
Log flag value as a contextual field on every request. Alert on evaluation rate changes. Build dashboards for production rollouts. Connect to error tracking during incidents.
14
Common failures
Rapid-fire pattern catalog: rollout rule wrong, cached value, dead flag pile, two-flag conflict, evaluation latency, permission tiering missing.
Reference files
Seven references that go alongside the SKILL.md.
Each reference is a self-contained doc the team can lift into a project: checklists, pattern catalogs, and the worked-out playbook for stale flag cleanup.
references/flag-naming-conventions.md
Typed prefixes (release_, exp_, ops_, perm_, cfg_), owner prefixes, semantic naming patterns, and the migration plan for existing badly-named flags.
references/flag-lifecycle-checklist.md
Phase-by-phase checklist (birth, adolescence, launch, maturity, death, audit) with explicit entry and exit criteria per phase.
references/flag-types-reference.md
The five flag types in detail with worked examples, common pitfalls, and the anti-pattern of type drift.
references/stale-flag-cleanup-playbook.md
Quarterly cleanup process: report, owner triage, triage meeting, removal PRs (one per flag), platform deletion, verification. Plus the orphan-ownership pattern.
references/targeting-rule-patterns.md
Common patterns (percentage, internal-only, opt-in beta, cohort, geo, time-based, composition) and anti-patterns (volatile attributes, deeply-nested rules, drift between staging and production).
references/flag-rollout-strategies.md
Five rollout strategies in detail with hold times and abort criteria. Worked example for a high-risk checkout redesign launch with the full 80-day timeline.
references/governance-and-permissions.md
Permission tiers, environment-based scope, approval workflow, audit trail, emergency override, service accounts, the production-console-freeze anti-pattern.
Where to use it
The full flag lifecycle.
At creation. Pick the type. Apply the naming convention. Set the target removal date in metadata. Document the rollout plan and abort criteria. The pre-experiment-readiness checklist for experiments applies to flag-gated rollouts too.
During rollout. The ramp-and-watch discipline. One peak hour at each percentage step. Abort criteria pre-committed. Production rules promoted from staging, not authored directly in production.
Post-launch. The 30-day review. Confirm production behavior matches the rollout expectation. Schedule the removal PR for release and experiment flags.
Quarterly cadence. The stale flag cleanup playbook. Generate the report, triage by owner, open one removal PR per flag, delete from the platform after the PR ships. Without this cadence, dead flags compound.
Pairs with these platforms
Eight feature-flag platforms in /integrations.
The skill is platform-agnostic for the operational shape. For platform-specific MCP commands, auth setup, and example prompts, pair this skill with the matching integration microsite.
Enterprise feature flag and rollout teams
LaunchDarkly
Three MCP servers across feature mgmt, AI Configs, observability
Open the pageOpen-source feature flag teams
Flagsmith
Open-source feature flags with change request workflows
Open the pageTeams using Harness FME or Split.io
Split.io (Harness FME)
Feature flags via the Harness platform MCP
Open the pageTeams using VWO Feature Experimentation
VWO FME
Feature flag CRUD from your IDE
Open the pageData-native experimentation teams
GrowthBook
Open-source warehouse-native experimentation
Open the pageModern product teams running experiments at speed
Statsig
Experiments and gates as MCP tools
Open the pageProduct-led growth teams
PostHog
Open-source product analytics with experiments
Open the pageEnterprise personalization and experimentation
Optimizely
Web Experimentation and Feature Experimentation in one MCP
Open the page
Where this skill goes next
Skill 2 of 3 in the PM-experimentation suite.
Feature-flagging is the operational layer below experiment-design. Most experiments are delivered via feature flags, so the two skills compose. Experiment-design covers the discipline above (hypothesis writing, sample size, decision-making); this skill covers the operational layer below (flag types, lifecycle, rollout, governance).
experimentation-analytics is the third skill in the suite, covering the analytical layer: variance reduction techniques (CUPED, stratified sampling, control variates), Bayesian alternatives, sequential testing math, and deeper interpretation of marginal results. Skill landing page lands when the skill ships.
An optional fourth skill, experimentation-platform-orchestrator, may follow after the three foundational skills land. That skill schedules; this skill operates; experiment-design designs.
Open source under MIT
Read the SKILL.md on GitHub.
The skill source lives in the rampstackco/claude-skills repository alongside dozens of other skills covering the full lifecycle of brand and product work. MIT licensed.
Frequently asked questions.
- Why does the skill insist on five flag types?
- Because mixing flag types is the root cause of most flag mess. A flag that started as a release ramp, got repurposed as an experiment, then ended up gating a permission, has no clear lifecycle and no clear owner. The five types (release, experiment, operational, permission, configuration) have different lifetimes and removal expectations. Picking one at creation and refusing to let it drift is the discipline.
- Does it depend on a specific platform?
- No. The principles work on LaunchDarkly, Flagsmith, Split.io, VWO FME, GrowthBook, Statsig, PostHog, and Optimizely equally. For platform-specific MCP commands and example prompts, pair this skill with the matching /integrations/{platform} microsite. The skill produces the operational shape; the microsite shows how the platform implements it.
- How does this differ from experiment-design?
- Experiment-design covers the discipline above: hypothesis writing, sample size, decision-making. Feature-flagging covers the operational layer below: flag types, naming, lifecycle, rollout, stale flag cleanup, governance. Most experiments are delivered via feature flags, so the two skills compose. Use experiment-design when designing the test; use feature-flagging when implementing and operating the flag that gates the test.
- What about flag-as-config patterns and dynamic configuration?
- Configuration flags are one of the five types. They live alongside the others, with their own lifecycle: governed by sales and product agreements, long-lived, evolved as contracts change. The skill covers them explicitly so teams treat configuration flags as first-class infrastructure rather than a special case.
- Why does the skill spend so much time on stale flag cleanup?
- Because cleanup is the discipline that separates teams with healthy flag inventories from teams with hundreds of dead flags. The cost of leaving flags in is real (two code paths to maintain, evaluation overhead, mental load) and compounds across hundreds of flags. The fix is a quarterly cadence plus making removal part of the launch checklist, not a separate effort.