Skip to main content
    DevOps
    Way of Working
    1. Home
    2. Capabilities
    3. Operate Backup

    Backup and Recovery

    Foundation
    Phase: operate
    DF
    MTTR

    Quick Reference

    Phase
    operate
    Epic
    Infrastructure & Operations Baseline
    Milestone
    Foundation
    Target
    >= 90% stateful services have backups
    Implementation Time
    Part of Infrastructure & Operations Baseline epic: 4 weeks (32 hours per capability avg)

    What & Why

    Definition

    >= 90% of stateful services (databases, volumes) have automated backups with tested recovery procedures.

    Business Value

    Prevents infrastructure drift entirely and reduces environment provisioning time from 2 weeks to 2 hours through versioned infrastructure code Achieving >= 90% stateful services have backups is a key milestone toward this goal.

    Context

    This capability is part of the Foundation milestone's focus on establish baseline practices (testable, releasable, monitorable). Essential for teams targeting DF, MTTR improvements.

    Success Criteria

    Target

    >= 90% stateful services have backups

    Measurement

    Backup job success rate + recovery drill pass rate

    Evidence

    • Backup config
    • Backup monitoring dashboard
    • Recovery drill reports

    In Practice

    Real-World Implementation

    Teams configure automated backups (RDS snapshots, volume snapshots), retain 30 days, test recovery quarterly.

    Concrete Example

    PostgreSQL RDS: automated daily snapshots at 3am UTC, 30-day retention. Recovery drill Q1: restored 5-day-old snapshot in 12 minutes.

    Implementation Guide

    Implementation Steps

    Follow the measurement approach: Backup job success rate + recovery drill pass rate

    For detailed step-by-step guidance, refer to the Infrastructure & Operations Baseline Implementation Kit.

    Resources

    Implementation Kit

    Infrastructure & Operations Baseline Kit

    Templates

    Browse all templates

    Related Resources

    View learning paths

    Related Capabilities

    Enables

    What this unlocks

    Automated Disaster Recovery

    Complementary

    Often adopted together, from the Infrastructure & Operations Baseline epic

    Infrastructure as Code
    Operational Runbooks
    On-Call Rotation
    Autoscaling Configuration

    Troubleshooting & FAQs

    Common Issues

    Issue: Target metric not improving

    Solution: Verify measurement is accurate, check if prerequisites are fully implemented, review evidence artifacts for completeness

    Issue: Team resistance to adoption

    Solution: Start with pilot team, demonstrate value with metrics, provide training and support during transition

    Issue: Inconsistent implementation across teams

    Solution: Create shared templates and guidelines, establish regular sync meetings, use automation to enforce standards

    Frequently Asked Questions

    Can we implement this before completing prerequisites?

    While possible, it's not recommended. Prerequisites ensure foundational practices are in place, making this capability more effective and easier to adopt.

    How long does implementation typically take?

    Most capabilities can be implemented within 90 days when tackled as part of the Foundation milestone. Individual timelines vary based on team size and existing practices.

    DevOps
    Way of Working

    DevOps practices for the entire delivery lifecycle

    © 2019-2026 devopswow.com. Created by Burhan Öcüt

    PartnersAboutPrivacyTermsCookies