Skip to main content
    DevOps
    Way of Working
    1. Home
    2. Capabilities
    3. Monitor Alerts

    Alerting Rules

    Foundation
    Phase: monitor
    MTTR
    CFR

    Quick Reference

    Phase
    monitor
    Epic
    Observability & Monitoring Foundations
    Milestone
    Foundation
    Target
    >= 80% services have critical alerts
    Implementation Time
    Part of Observability & Monitoring Foundations epic: 3.5 weeks (23 hours per capability avg)

    What & Why

    Definition

    >= 80% of services have alerting for high error rate (>= 5% 5xx), high latency (p95 >= 1s), and down status.

    Business Value

    Reduces mean time to detection (MTTD) from 4 hours to 15 minutes and enables proactive issue resolution for 60% of incidents before user impact Achieving >= 80% services have critical alerts is a key milestone toward this goal.

    Context

    This capability is part of the Foundation milestone's focus on establish baseline practices (testable, releasable, monitorable). Essential for teams targeting MTTR, CFR improvements.

    Success Criteria

    Target

    >= 80% services have critical alerts

    Measurement

    Alerting rule coverage analysis

    Evidence

    • AlertManager/PagerDuty rules
    • Alert firing history
    • Alert response metrics

    In Practice

    Real-World Implementation

    Teams define Prometheus alerting rules, send to PagerDuty, set severity (critical, warning), include runbook links.

    Concrete Example

    Alert: HighErrorRate fires when rate(http_requests_total{status=~"5.."}[5m]) > 0.05. Severity: critical. Runbook: wiki.company.com/runbooks/api-errors

    Implementation Guide

    Prerequisites

    Application Metrics
    >= 80% services expose RED metrics

    Implementation Steps

    Follow the measurement approach: Alerting rule coverage analysis

    For detailed step-by-step guidance, refer to the Observability & Monitoring Foundations Implementation Kit.

    Resources

    Implementation Kit

    Observability & Monitoring Foundations Kit

    Templates

    Browse all templates

    Related Resources

    View learning paths

    Related Capabilities

    Prerequisites

    Implement these first

    Application Metrics

    Enables

    What this unlocks

    AI Alert Prioritization

    Complementary

    Often adopted together, from the Observability & Monitoring Foundations epic

    Centralized Logging
    Health Check Endpoints
    Service Level Objectives
    Observability Dashboards

    Troubleshooting & FAQs

    Common Issues

    Issue: Target metric not improving

    Solution: Verify measurement is accurate, check if prerequisites are fully implemented, review evidence artifacts for completeness

    Issue: Team resistance to adoption

    Solution: Start with pilot team, demonstrate value with metrics, provide training and support during transition

    Issue: Inconsistent implementation across teams

    Solution: Create shared templates and guidelines, establish regular sync meetings, use automation to enforce standards

    Frequently Asked Questions

    Can we implement this before completing prerequisites?

    While possible, it's not recommended. Prerequisites ensure foundational practices are in place, making this capability more effective and easier to adopt.

    How long does implementation typically take?

    Most capabilities can be implemented within 90 days when tackled as part of the Foundation milestone. Individual timelines vary based on team size and existing practices.

    DevOps
    Way of Working

    DevOps practices for the entire delivery lifecycle

    © 2019-2026 devopswow.com. Created by Burhan Öcüt

    PartnersAboutPrivacyTermsCookies