What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...-Assistant Professor of Instruction (Full-Time), Assistant Professor of Practice (Full-Time), or Lecturer (Part-Time) -College of Sciences, Earth and Planetary Sciences Location: San Antonio, TX Regular/Temporary: Regular Job ID: 14550 Full/Part Time: Full...
...applying your skillsets toward a new career with high growth and earning potential? Arrow Lift specializes in installing/servicing accessibility and lift systems (elevators, stairlifts, etc.). We have an exciting and challenging opportunity for an entry-level Elevator & Lift...
...We have a simple strategy: motivate our people, deliver great work, and delight our clients... ..., either. But successful members of the Analytics team at Zenith generally have a passion... ...story. Role Objectives Senior Analysts form the foundation of the Analytics team...
...us!Shift: Tues-Fri 10AM-7PM & Saturday 8AM-5PMAir Export AOG Agent will serve in an operations capacity, responsible for processing... ...Accurate and timely data entry into our operating system.+ Dispatch freight.+ Communicate effectively with customers, vendors, other...
Company Overview:Allied Universal, North America's leading security and facility services company, offers rewarding careers that provide you a sense of purpose. While working in a dynamic, welcoming, and collaborative workplace, you will be part of a team that contributes...