Gating GitHub Actions deploys with feature flags
Production broke. You need to stop the deploys that are about to ship more of the same thing, and you need to do it in the next sixty seconds.
The options are all bad. Revert the PR and wait for CI. Disable the workflow from the GitHub UI and remember to re-enable it later. Add if: false on the deploy job and force-push. Slack the team to not merge anything. Every one of these is a scramble.
A feature flag is a switch. You flip it from the dashboard, and the next deploy notices. No code change, no force-push, no coordinating with whoever happens to be online.
The five-line deploy gate
The Flagify GitHub Action evaluates a flag during your workflow and exposes the result to later steps. Here is a deploy workflow with a kill-switch:
- name: Require production deploys enabled
uses: flagifyhq/flagify-action@v1
with:
api-key: ${{ secrets.FLAGIFY_API_KEY }}
flag: production-deploys-enabled
on-disabled: fail
- name: Deploy
run: ./deploy.sh
Five lines on the action step, one on the deploy. When the flag is off the step fails, dependent jobs skip automatically, and the workflow shows up red instead of quietly doing nothing.
Setting it up
Create the flag. Boolean, defaults to true:
flagify flags create production-deploys-enabled --type boolean
Store your public key as a repo secret:
gh secret set FLAGIFY_API_KEY
Add the two steps above to your deploy workflow. Done.
How the modes differ
The action has three modes for on-disabled, and they exist because “flag is off” means different things in different workflows.
continue(default). Emits outputs and continues. You gate later steps withif: steps.flag.outputs.enabled == 'true'. Use this when the flag is conditional logic.fail. Marks the step as failed so any job usingneeds:skips. Use this when the flag is a gate and you want a red workflow if the gate is closed.skip. Logs a notice and continues. Downstream still gates onoutputs.enabled, but the workflow stays green. Use this when “off” is the expected state sometimes and you do not want red runs cluttering the history.
If the flag is a gate, fail. If the flag is logic, continue. If “off” is routine, skip.
Patterns we actually use
Canary test suites
A risky dependency bump goes out. You want the extended test suite running on every PR for a week, then gone. A flag called run-canary-tests beats adding and removing the step by hand.
- uses: flagifyhq/flagify-action@v1
id: canary
with:
api-key: ${{ secrets.FLAGIFY_API_KEY }}
flag: run-canary-tests
- name: Extended suite
if: steps.canary.outputs.enabled == 'true'
run: npm run test:canary
Per-actor gating
The action accepts user-id and user-attributes, so targeting rules can match on the GitHub user running the workflow. Useful when only certain maintainers should be able to trigger the expensive production job.
- uses: flagifyhq/flagify-action@v1
with:
api-key: ${{ secrets.FLAGIFY_API_KEY }}
flag: run-costly-job
user-id: ${{ github.actor }}
user-attributes: '{"role":"maintainer"}'
on-disabled: skip
Staged rollout of workflow changes
Changing CI is scary because the new workflow runs on the PR that changes it. Guard the new path with a flag, flip it to 100% once you trust it, and roll back with one click if something goes wrong. We use this every time we touch our release workflow.
Limitations
Evaluation happens when the workflow runs. If you flip the flag while a run is already queued, that run keeps the old value. Three ways around it today:
- Re-run the workflow from the UI or with
gh run rerun. - Push any commit to re-trigger.
- Schedule a cron. A check every ten minutes catches most kill-switch cases.
The cleaner answer is webhooks: Flagify fires a flag.toggled event, your repo receives it as repository_dispatch, and the workflow re-runs on its own. That is on our roadmap. Until it ships, the manual re-run is fine for most teams.
Why this is worth the five lines
A flag gate means someone who does not know the deploy pipeline can still stop the deploy pipeline. A support engineer sees a spike in checkout errors at 2am, flips the flag, and the 2:15am cron deploy skips itself. No on-call page to a deploy engineer, no scramble to find the right repo.
It also gives you an audit trail by accident. Every toggle is logged with who and when. The next time the postmortem asks “why did we ship this at 3am”, the answer is on the flag, not in a Slack thread nobody can find.
The action is open source: flagifyhq/flagify-action. More patterns on the integration page.