What Jobs are available for Reliability Engineer in Bahrain?
Showing 663 Reliability Engineer jobs in Bahrain
Site Reliability Engineer
Posted today
Job Viewed
Job Description
Job Description:
An SRE is responsible for keeping all user-facing and internally used services running smoothly. SREs are a blend of a software engineer and system administrative operator that applies infrastructure knowledge towards the betterment of the team, and the quality of the product.
A person in this position will know and specialize in the systems that keep the company afloat, making sure that their availability, reliability and scalability are in peak condition.
Job Expectations
- Triage and Handle Node Health issues in-hours
- Participate in Firefighting along with development engineers
- Own the Design, execution and support the deployment topology of the product through infrastructure as code
- Own and maintain the distribution, scaling, metrics collection, and monitoring of multiple clusters
- Support the engineers in their needs to define resourcing for services that they are building as a stakeholder
- Own the running of our CI/CD systems and work with the Testing Engineers to create a well tested product
- Improve and own operational processes
- Have knowledge and focus in the security of the topologies that we have running in production
- Plan the growth of the infrastructure based on business needs and inputs
Required Skills
- Kubernetes, Docker, and Helm
- Very comfortable operating in Linux, including a knowledge of BASH
- Cloud hosting platform (Ideally GCP, but AWS or Azure are ok)
- Able to write code in Python
- Experience deploying and maintaining modern CI/CD systems (Zuul, CircleCI, Concourse, etc.)
- A knowledge and passion for infrastructure as code
Job Type: Full-time
Is this job a match or a miss?
Senior Reliability Engineer
Posted 15 days ago
Job Viewed
Job Description
Responsibilities include maintaining reliability databases, developing and updating technical documentation, and conducting training for maintenance personnel. You will also be involved in the evaluation of new technologies and equipment to ensure their reliability and maintainability. A strong understanding of mechanical and electrical systems, coupled with expertise in reliability engineering principles and tools, is essential. Proficiency in data analysis software and experience with CMMS (Computerized Maintenance Management Systems) are highly desirable. The ideal candidate will possess a Bachelor's degree in Mechanical Engineering, Electrical Engineering, or a related discipline, with at least 5 years of experience in reliability engineering within an industrial setting. Excellent problem-solving skills, strong analytical abilities, and effective communication skills are crucial for this role. This is a challenging and rewarding opportunity to significantly contribute to operational excellence and asset management.
Is this job a match or a miss?
Senior Site Reliability Engineer
Posted today
Job Viewed
Job Description
Responsibilities:
- Design, build, and maintain infrastructure and systems that are highly available, scalable, and fault-tolerant.
- Develop and implement automation tools and processes to streamline operations and reduce manual toil.
- Monitor system performance and identify potential issues, proactively addressing them before they impact users.
- Implement robust alerting and incident response mechanisms, leading post-mortems to drive continuous improvement.
- Collaborate with development teams to ensure that new features and services are designed for reliability and operability.
- Perform capacity planning and resource optimization to ensure cost-effectiveness and performance.
- Manage cloud infrastructure (e.g., AWS, Azure, GCP) and associated services.
- Implement and maintain CI/CD pipelines and infrastructure-as-code solutions.
- Contribute to the development of SRE best practices and methodologies within the organization.
- On-call rotation to respond to production incidents.
- Mentor junior SRE team members.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years of experience in SRE, DevOps, or a similar role focused on reliability and operations.
- Strong proficiency in at least one scripting or programming language (e.g., Python, Go, Bash).
- Extensive experience with cloud platforms (AWS, Azure, or GCP) and containerization technologies (Docker, Kubernetes).
- Solid understanding of networking concepts, operating systems (Linux), and distributed systems.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Proficiency in infrastructure-as-code tools (e.g., Terraform, Ansible).
- Excellent troubleshooting and problem-solving skills.
- Strong communication and collaboration abilities.
- Experience with database administration and performance tuning is a plus.
Is this job a match or a miss?
Site Reliability Engineer (SRE)
Posted 2 days ago
Job Viewed
Job Description
Responsibilities:
- Design, build, and maintain highly reliable and scalable production systems.
- Implement and manage monitoring, alerting, and logging solutions.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Automate operational tasks through scripting and development.
- Lead incident response and post-mortem analysis to prevent recurrence.
- Conduct capacity planning and performance tuning of systems.
- Collaborate with development teams to ensure the operability of new features.
- Implement and maintain CI/CD pipelines for reliable deployments.
- Develop and execute disaster recovery plans.
- Contribute to infrastructure security and compliance efforts.
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 5+ years of experience in SRE, Systems Engineering, or Software Engineering with a focus on reliability.
- Proficiency in at least one programming language (e.g., Python, Go, Java).
- Experience with cloud platforms (AWS, Azure, GCP).
- Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
- Strong understanding of Linux/Unix systems.
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog).
- Knowledge of networking, databases, and distributed systems.
- Excellent problem-solving and debugging skills.
- Ability to work effectively in a team environment.
Is this job a match or a miss?
Senior Site Reliability Engineer
Posted 9 days ago
Job Viewed
Job Description
The ideal candidate will possess a deep understanding of distributed systems, cloud computing platforms (e.g., AWS, Azure, GCP), and containerization technologies (e.g., Docker, Kubernetes). You should have a strong background in scripting and automation (e.g., Python, Go, Bash) and a proven ability to troubleshoot complex production issues. Experience with CI/CD pipelines, infrastructure as code (e.g., Terraform, Ansible), and performance tuning is highly valued. You will work closely with development and operations teams to embed reliability best practices into the software development lifecycle. Excellent communication and problem-solving skills are essential, as is the ability to work effectively in both remote and on-site settings. Your contributions will be vital in maintaining high standards of service uptime and performance for our client's users.
Key Responsibilities:
- Design, implement, and maintain scalable and reliable cloud infrastructure.
- Develop automation tools and scripts to streamline operations and deployments.
- Build and manage robust monitoring, alerting, and logging systems.
- Lead incident response efforts, conduct post-mortems, and implement preventative measures.
- Collaborate with development teams to improve system design and performance.
- Manage and optimize container orchestration platforms like Kubernetes.
- Implement and maintain Infrastructure as Code (IaC) solutions.
- Perform performance tuning and capacity planning.
- Ensure security best practices are integrated into all aspects of infrastructure management.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
- Strong proficiency in at least one scripting language (Python, Go, Bash).
- Extensive experience with cloud platforms (AWS, Azure, GCP).
- Deep understanding of containerization technologies (Docker, Kubernetes).
- Experience with CI/CD tools and practices.
- Familiarity with Infrastructure as Code tools (Terraform, Ansible).
- Excellent troubleshooting and problem-solving skills.
- Strong communication and collaboration abilities, suitable for a hybrid work environment.
Is this job a match or a miss?
Senior Site Reliability Engineer
Posted 15 days ago
Job Viewed
Job Description
The ideal candidate will have extensive experience with cloud platforms (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and infrastructure-as-code tools (Terraform, Ansible). You should be proficient in scripting languages such as Python, Go, or Bash, and have a strong background in system administration and networking. Responsibilities include designing and implementing robust monitoring and alerting systems, developing automation tools to reduce manual operational effort, participating in on-call rotations, and leading incident post-mortems to identify root causes and implement preventative measures. Collaboration with development teams to ensure production readiness of new features and services is a key aspect of this role. A Bachelor's degree in Computer Science, Engineering, or a related field is required, along with a minimum of 5 years of experience in SRE, DevOps, or a similar role. Strong problem-solving skills, excellent communication abilities, and a proactive approach to system resilience are essential. Experience with CI/CD pipelines and application performance monitoring (APM) tools is highly desirable.
Key Responsibilities:
- Design, build, and maintain scalable and reliable production systems.
- Implement and manage monitoring, alerting, and logging solutions.
- Automate infrastructure provisioning and configuration management using IaC tools.
- Develop and maintain CI/CD pipelines for efficient software deployment.
- Respond to and resolve production incidents, leading post-mortems.
- Collaborate with development teams to ensure system reliability and performance.
- Perform capacity planning and performance tuning of distributed systems.
- Manage and scale container orchestration platforms (e.g., Kubernetes).
- Develop and maintain system documentation and runbooks.
- Participate in an on-call rotation schedule.
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in Site Reliability Engineering, DevOps, or System Administration.
- Proficiency with cloud platforms (AWS, Azure, GCP).
- Strong experience with containerization (Docker) and orchestration (Kubernetes).
- Expertise in infrastructure-as-code tools (Terraform, Ansible).
- Proficient scripting skills (Python, Go, Bash).
- Solid understanding of networking concepts and protocols.
- Experience with monitoring tools (Prometheus, Grafana, Datadog).
- Excellent troubleshooting and problem-solving abilities.
- Strong communication and collaboration skills.
Is this job a match or a miss?
Remote Site Reliability Engineer
Posted 16 days ago
Job Viewed
Job Description
Key Responsibilities:
- Design, build, and maintain scalable and reliable infrastructure on cloud platforms.
- Develop and implement automation for deployment, scaling, and operational tasks.
- Monitor system performance, availability, and capacity, and respond to incidents.
- Diagnose and resolve complex production issues across distributed systems.
- Implement and manage CI/CD pipelines and infrastructure-as-code solutions.
- Conduct root cause analysis for incidents and implement preventative measures.
- Contribute to disaster recovery planning and testing.
- Collaborate with development teams to ensure the reliability and operability of new features.
- Document system architecture, operational procedures, and best practices.
- Proven experience in Site Reliability Engineering or a similar role.
- Strong proficiency with cloud platforms (AWS, Azure, or GCP).
- Expertise in containerization (Docker, Kubernetes).
- Proficiency in infrastructure-as-code tools (Terraform, Ansible).
- Strong scripting skills (Python, Bash, Go).
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Deep understanding of networking protocols and distributed systems.
- Excellent troubleshooting and problem-solving abilities.
- Ability to work effectively in a remote, collaborative environment.
Is this job a match or a miss?
Be The First To Know
About the latest Reliability engineer Jobs in Bahrain !
Senior Site Reliability Engineer
Posted 17 days ago
Job Viewed
Job Description
Responsibilities:
- Design, build, and maintain highly available and scalable systems using infrastructure as code principles.
- Develop and implement automation for deployment, monitoring, and operational tasks.
- Proactively identify and resolve performance bottlenecks and system issues.
- Lead incident response efforts, conduct root cause analyses, and implement preventative measures.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Collaborate with software engineering teams to improve system design for reliability and operability.
- Contribute to the development and maintenance of CI/CD pipelines.
- Mentor junior engineers and share expertise on SRE best practices.
- Participate in on-call rotation for production incident management.
- Stay current with emerging technologies and industry trends in site reliability and cloud infrastructure.
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
- Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
- Strong proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
- Extensive experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Deep understanding of distributed systems, microservices architectures, and network protocols.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Proven experience with infrastructure as code tools (e.g., Terraform, Ansible).
- Excellent problem-solving and debugging skills.
- Strong communication and collaboration skills, essential for remote team dynamics.
- Experience working in a remote-first or distributed team environment is highly preferred.
Is this job a match or a miss?
Senior Site Reliability Engineer
Posted 24 days ago
Job Viewed
Job Description
Key Responsibilities:
- Design, build, and maintain the infrastructure supporting our cloud-based construction management systems, focusing on high availability and fault tolerance.
- Develop and implement CI/CD pipelines to automate application deployments, testing, and configuration management across various environments.
- Implement comprehensive monitoring, logging, and alerting systems to proactively identify and resolve performance bottlenecks and system failures.
- Create and maintain detailed documentation for infrastructure, operational procedures, and incident response plans.
- Collaborate with software development teams to ensure new features and services are designed for reliability, scalability, and maintainability.
- Participate in on-call rotation to provide timely resolution of critical production issues.
- Conduct performance tuning and capacity planning to ensure optimal resource utilization and cost-effectiveness.
- Drive the adoption of best practices in site reliability engineering, including chaos engineering and security hardening.
- Mentor junior engineers and contribute to a culture of continuous improvement and knowledge sharing.
- Stay abreast of the latest trends and technologies in cloud infrastructure, DevOps, and SRE.
Qualifications:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years of experience in Site Reliability Engineering, DevOps, or a similar infrastructure-focused role.
- Proven expertise in managing and scaling cloud platforms such as AWS, Azure, or GCP.
- Strong proficiency in scripting languages (e.g., Python, Bash) and infrastructure-as-code tools (e.g., Terraform, Ansible).
- Experience with containerization technologies (Docker, Kubernetes).
- Solid understanding of networking principles, operating systems (Linux), and database technologies.
- Excellent problem-solving and analytical skills, with a strong commitment to system stability.
- Exceptional communication and collaboration skills, particularly in a remote work environment.
- Experience with monitoring tools like Prometheus, Grafana, ELK Stack.
- Knowledge of CI/CD tools (e.g., Jenkins, GitLab CI).
This is a fully remote position, offering the flexibility to work from anywhere. The ideal candidate will be a proactive self-starter with a passion for building resilient systems in the fast-paced world of construction technology. Join us in shaping the future of construction.
Is this job a match or a miss?
Senior Site Reliability Engineer
Posted 26 days ago
Job Viewed
Job Description
Responsibilities:
- Design, build, and maintain infrastructure that is scalable, reliable, and secure.
- Develop and implement automation tools and scripts for infrastructure provisioning, deployment, and management.
- Monitor system performance, availability, and latency, proactively identifying and resolving issues.
- Participate in on-call rotations to respond to production incidents and emergencies.
- Conduct root cause analysis for production incidents and implement preventative measures.
- Collaborate with software development teams to improve the reliability and operability of applications.
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Implement and manage CI/CD pipelines to streamline software delivery.
- Develop and maintain comprehensive documentation for systems, processes, and incident response.
- Contribute to capacity planning and performance tuning of production systems.
- Evaluate and recommend new technologies and tools to enhance site reliability.
- Mentor junior engineers and foster a culture of reliability and continuous improvement.
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or systems engineering.
- Strong proficiency in at least one scripting language (e.g., Python, Bash, Go).
- Extensive experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Deep understanding of operating systems (Linux/Unix), networking protocols, and distributed systems.
- Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
- Proven ability to troubleshoot complex production issues in a high-pressure environment.
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Excellent problem-solving, analytical, and communication skills.
- Familiarity with CI/CD principles and tools.
- A strong understanding of software development practices.
Is this job a match or a miss?