What this template is for
Disaster recovery plans rot. People change, topology changes, the runbook in the wiki refers to a hostname that no longer exists. The only way to know your DR plan works is to test it, and the only way to test it consistently is to make the test a tracked piece of work with measurable outcomes.
This template turns a DR drill into a normal Jira workflow: scoped, planned, executed, measured, reviewed. RTO and RPO are required fields because every drill needs a pass/fail criterion.
When to use it
Use this template:
- For any planned DR drill (tabletop, game day, or live failover)
- Whenever you change a piece of recovery topology - new replica, new region, new orchestration
- When a real incident exposes a DR gap and you want to verify the fix
How to set it up in Jira
Create a Disaster Recovery Test issue type with the fields above on the create screen. Use STM Issue Templates to lay down the pre-drill / execute / measure / fail-back / post-drill sub-tasks on every new ticket. Link the linked-runbook field to the relevant runbook ticket so the drill’s gap analysis can directly update the runbook that will be followed when the real event happens.
Fields to add to your Jira create screen
These are the fields a project admin should make sure exist on the Create Issue screen for this issue type (Project settings → Screens). Without these on the screen, reporters can't provide the information triage needs - and STM can't reference them either.
| Field | Example value | Required |
|---|---|---|
Summary | DR drill Q3 2026 - payments-db cross-region failover | Yes |
Scope (custom) | payments-db primary loss; failover to us-west-2 replica | Yes |
Drill Type (custom) | Tabletop / Game day / Live failover | Yes |
RTO Target (custom) | 15 minutes | Yes |
RPO Target (custom) | 60 seconds of write loss | Yes |
Test Plan (custom) | Step-by-step failover procedure with timing | Yes |
Success Criteria (custom) | Failover within RTO; data loss within RPO; no manual cleanup | Yes |
Rollback Plan (custom) | Fail back to original primary once region is healthy | Yes |
Participants | @sre-team, @payments-tl, @oncall-platform | No |
Drill Date | 2026-08-12 14:00 UTC | No |
Linked Runbook | RB-204 - cross-region failover | No |
Linked Incident (real) | Optional - if this drill simulates a past incident | No |
Note on custom fields. STM currently supports up to 5 custom fields per template. You can add as many custom fields as you like to your Jira Create Issue screen - the 5-field limit only applies if you want STM to set or update those custom fields itself.
Sub-tasks STM creates automatically
Build an STM sub-task template containing the items below, then wire it to an On Create Issue Executor scoped to this issue type. Whenever a new issue of this type is created in the project, STM creates the full sub-task set in one step - with assignee, due date, and components inherited from the parent unless you override them.
- Define scope and confirm the drill does not affect production users (or that you have a window)
- Pre-drill: confirm replica health, monitoring, and rollback path
- Pre-drill: notify stakeholders and customer support of the window
- Execute the failover procedure with a scribe capturing exact timing
- Measure actuals against RTO/RPO targets
- Fail back to the original primary and confirm data integrity
- Post-drill review: gap analysis, runbook updates, action items
Common questions
What is a disaster recovery test in Jira used for?
It is the ticket that scopes a DR drill, defines the RTO/RPO targets being tested, captures the test plan, and houses the post-drill review. A DR test ticket is the artefact that turns 'we have a DR plan' into 'we have tested DR procedures with measured outcomes' - which is the only version that matters when you actually need to fail over.
How often should you run DR drills?
At least annually for full live failovers; quarterly for tabletop exercises; whenever the topology changes for a focused drill on the changed component. The most common failure mode is to do a drill once at launch and then never again - the runbooks drift, the people change, and the next drill discovers everything the last one verified is now wrong.
Should DR tests run against production?
Live failovers should run against production in a planned window with customer support notified; tabletop drills do not need to. The point of a live drill is to find out which assumptions in your runbook are actually wrong, and you only discover that against real systems. If you cannot fail over production safely, the drill has already found the most important gap.
How do you keep DR test cadence on track?
Schedule the next drill ticket at the post-drill review, not in a calendar. [STM Issue Templates](/stm/) can auto-create the quarterly or annual drill ticket with the standard sub-task checklist already laid down, so the next drill never falls off the radar.
Automate the sub-tasks with STM
STM Issue Templates saves the sub-task list above as a reusable template and creates them on every new issue of this type - via an Executor on issue creation, on status transition, or triggered manually from the issue's "Create bulk sub-tasks" menu. STM does not change the parent issue's create screen (that's a Jira project-settings job) but it removes the manual work of creating the sub-tasks every time.
Try STM on the Atlassian Marketplace ↗ See how STM templates are built →