Sr. Site Reliability Engineer Job at ASK IT Solutions, Phoenix, AZ

WlFBNU9nbEVPdTVGTEJMTnhWUStnU01BNkE9PQ==
  • ASK IT Solutions
  • Phoenix, AZ

Job Description

Site Reliability Engineer

Location: Phoenix, AZ

(SRE) to join Cloud Operations and Observability team. You'll be instrumental in driving resiliency, performance, automation, and AI-driven observability across hybrid cloud environments (Azure and GCP). You will design, implement, and manage infrastructure with a strong focus on Kubernetes, and integrating AI/LLM solutions into observability and operational workflows.

Key Responsibilities:

  • Build and operate scalable, secure, and highly available infrastructure in Azure and GCP.
  • Design and maintain observability platforms leveraging Splunk, OpenTelemetry, and cloud-native monitoring tools.
  • Develop and support AI/LLM-driven automation solutions to improve incident triage, alert correlation, and root cause analysis.
  • Partner with application and data teams to define SLOs, SLIs, and error budgets.
  • Drive operational excellence through automation, chaos testing, and proactive reliability improvements.
  • Optimize Kubernetes environments (GKE/AKS) for performance, security, and cost-efficiency.
  • Integrate observability data pipelines with LLMs for anomaly detection, summarization, and proactive remediation.
  • Participate in on-call rotations, incident response, and postmortem reviews.
  • Implement runbooks, auto-remediation scripts, and AI copilots for operations.

Required Qualifications:

  • 8+ years of experience as an SRE.
  • Strong expertise in Azure and GCP cloud platforms (certifications a plus).
  • Proficient in Splunk (Enterprise + Observability) for monitoring, alerting, and log analytics.
  • In-depth knowledge of Kubernetes (AKS, GKE), Helm, and container lifecycle.
  • Familiarity with AI/ML and LLM-based tools (e.g., OpenAI, Hugging Face, Azure OpenAI) for observability or automation use cases.
  • Experience with CI/CD pipelines, GitOps, and secure deployment practices.
  • Programming/scripting skills in Python, Go, or Bash.
  • Strong understanding of SRE principles: SLAs, SLIs, SLOs, error budgets, and incident management.

Preferred Qualifications:

  • Experience building AI-enabled runbooks or copilots.
  • Exposure to FinOps or cost-optimization strategies in cloud environments.
  • Knowledge of distributed tracing and event correlation using OpenTelemetry.
  • Familiarity with Kafka, Pub/Sub, or other messaging systems for observability data.

Job Tags

Similar Jobs

MaineHealth Maine Medical Center

Acute Care Nurse Practitioner or Physician Associate/Assistant Specialty Care Hepatology Part-Time Job at MaineHealth Maine Medical Center

 ...The MaineHealth Maine Medical Center Adult Specialty Care Clinic (Hepatology) is seeking a part-time (24 hours per week) Nurse Practitioner or Physician Associate/Physician Assistant to join its outstanding care team in Portland, Maine. The Adult Specialty Care Clinic... 

ADEX Healthcare Staffing LLC

Travel Nurse RN - PICU - Pediatric Intensive Care - $1,865 per week Job at ADEX Healthcare Staffing LLC

 ...ADEX Healthcare Staffing LLC is seeking a travel nurse RN PICU - Pediatric Intensive Care for a travel nursing job in Marietta, Georgia...  ...sedated lumbar punctures, arterial line placement, CVL/PICC placement, intubation, and chest tube placement Support and/... 

The Trustees of Reservations

Seasonal Camp Behavior Specialist Job at The Trustees of Reservations

 ...this summers exciting weekly themes and projects, please visit: deCordovaCamps The Role : As thedeCordovaCamp Behavior Specialist(CBS), you have a deep understanding of behavioral modificationandchild development and will value collaborative,... 

Beacon National Agency

Online Account Sales Representative Job at Beacon National Agency

 ...for your career from the comfort of your home. Beacon National Agency is searching...  ...while enjoying the ultimate freedom of working from home. If youre a self-starter who...  ...service. Important: This is an independent contractor (1099) position. All compensation is commission... 

Edward M. Kennedy Community Health Center, Inc.

Urgent Care Physician Job at Edward M. Kennedy Community Health Center, Inc.

 ...supportive, team-based environment. Summary The Urgent Care Physician is a member of the clinical practice team and works...  ...school (M.D., or D.O.). Board Certification in Family Medicine, Emergency Medicine or Med/Peds within 1 year of Residency completion....