Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps
- Defining AIOps and its significance.
- Comparing traditional monitoring with AIOps-driven observability.
- Exploring AIOps architecture and key components.
Collecting and Normalizing Operational Data
- Understanding types of observability data: metrics, logs, and traces.
- Ingesting data from diverse sources (servers, containers, cloud).
- Utilizing agents and exporters (Prometheus, Beats, Fluentd).
Data Correlation and Anomaly Detection
- Applying time series correlation and statistical methods.
- Employing ML models for anomaly detection.
- Identifying incidents across distributed systems.
Alerting and Noise Reduction
- Designing intelligent alert rules and thresholds.
- Implementing suppression, deduplication, and alert grouping.
- Integrating with Alertmanager, Slack, PagerDuty, or Opsgenie.
Root Cause Analysis and Visualization
- Using dashboards to visualize metrics and detect trends.
- Exploring events and timelines for RCA.
- Tracing issues across layers using distributed tracing tools.
Automation and Remediation
- Triggering automated scripts or workflows from incidents.
- Integrating with ITSM systems (ServiceNow, Jira).
- Exploring use cases: self-healing, scaling, traffic rerouting.
Open Source and Commercial AIOps Platforms
- Overview of tools: Prometheus, Grafana, ELK, Moogsoft, Dynatrace.
- Understanding evaluation criteria for selecting an AIOps platform.
- Conducting a demo and hands-on session with a selected stack.
Summary and Next Steps
Requirements
- A foundational understanding of IT operations and system monitoring concepts.
- Practical experience with monitoring tools or dashboards.
- Familiarity with basic log and metric formats.
Audience
- Operations teams responsible for infrastructure and applications.
- Site Reliability Engineers (SREs).
- IT monitoring and observability teams.
14 Hours