Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

What Jobs are available for Reliability Engineer in Bahrain?

Showing 663 Reliability Engineer jobs in Bahrain

Site Reliability Engineer

BHD90000 - BHD120000 Y Penny Software

Posted today

Tap Again To Close

Job Description

Job Description:

An SRE is responsible for keeping all user-facing and internally used services running smoothly. SREs are a blend of a software engineer and system administrative operator that applies infrastructure knowledge towards the betterment of the team, and the quality of the product.

A person in this position will know and specialize in the systems that keep the company afloat, making sure that their availability, reliability and scalability are in peak condition.

Job Expectations

Triage and Handle Node Health issues in-hours
Participate in Firefighting along with development engineers
Own the Design, execution and support the deployment topology of the product through infrastructure as code
Own and maintain the distribution, scaling, metrics collection, and monitoring of multiple clusters
Support the engineers in their needs to define resourcing for services that they are building as a stakeholder
Own the running of our CI/CD systems and work with the Testing Engineers to create a well tested product
Improve and own operational processes
Have knowledge and focus in the security of the topologies that we have running in production
Plan the growth of the infrastructure based on business needs and inputs

Required Skills

Kubernetes, Docker, and Helm
Very comfortable operating in Linux, including a knowledge of BASH
Cloud hosting platform (Ideally GCP, but AWS or Azure are ok)
Able to write code in Python
Experience deploying and maintaining modern CI/CD systems (Zuul, CircleCI, Concourse, etc.)
A knowledge and passion for infrastructure as code

Job Type: Full-time

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Reliability Engineer

56789 Manama, Capital BHD80000 Annually WhatJobs

Posted 15 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a highly skilled Senior Reliability Engineer to join their dynamic team supporting operations in the Manama, Capital, BH region. This role is instrumental in ensuring the optimal performance, availability, and longevity of industrial equipment and systems. You will be responsible for developing and implementing comprehensive reliability programs, including Failure Mode and Effects Analysis (FMEA), predictive maintenance strategies, and root cause analysis of equipment failures. The successful candidate will analyze operational data to identify trends, predict potential failures, and recommend preventative actions to minimize downtime and optimize maintenance costs. Collaboration with maintenance, operations, and engineering departments is key to success.

Responsibilities include maintaining reliability databases, developing and updating technical documentation, and conducting training for maintenance personnel. You will also be involved in the evaluation of new technologies and equipment to ensure their reliability and maintainability. A strong understanding of mechanical and electrical systems, coupled with expertise in reliability engineering principles and tools, is essential. Proficiency in data analysis software and experience with CMMS (Computerized Maintenance Management Systems) are highly desirable. The ideal candidate will possess a Bachelor's degree in Mechanical Engineering, Electrical Engineering, or a related discipline, with at least 5 years of experience in reliability engineering within an industrial setting. Excellent problem-solving skills, strong analytical abilities, and effective communication skills are crucial for this role. This is a challenging and rewarding opportunity to significantly contribute to operational excellence and asset management.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

10805 Tyre BHD110000 Annually WhatJobs Direct

Posted today

Tap Again To Close

Job Description

full-time

Our client is seeking a seasoned Senior Site Reliability Engineer (SRE) to join their operations team in **Sanad, Capital, BH**. This critical role focuses on ensuring the reliability, scalability, and performance of our client's complex and high-traffic applications and infrastructure. The ideal candidate will possess a strong background in software engineering, systems administration, and a deep understanding of distributed systems and cloud technologies.

Responsibilities:

Design, build, and maintain infrastructure and systems that are highly available, scalable, and fault-tolerant.
Develop and implement automation tools and processes to streamline operations and reduce manual toil.
Monitor system performance and identify potential issues, proactively addressing them before they impact users.
Implement robust alerting and incident response mechanisms, leading post-mortems to drive continuous improvement.
Collaborate with development teams to ensure that new features and services are designed for reliability and operability.
Perform capacity planning and resource optimization to ensure cost-effectiveness and performance.
Manage cloud infrastructure (e.g., AWS, Azure, GCP) and associated services.
Implement and maintain CI/CD pipelines and infrastructure-as-code solutions.
Contribute to the development of SRE best practices and methodologies within the organization.
On-call rotation to respond to production incidents.
Mentor junior SRE team members.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or a similar role focused on reliability and operations.
Strong proficiency in at least one scripting or programming language (e.g., Python, Go, Bash).
Extensive experience with cloud platforms (AWS, Azure, or GCP) and containerization technologies (Docker, Kubernetes).
Solid understanding of networking concepts, operating systems (Linux), and distributed systems.
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Proficiency in infrastructure-as-code tools (e.g., Terraform, Ansible).
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration abilities.
Experience with database administration and performance tuning is a plus.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (SRE)

600 Southern, Southern BHD100000 Annually WhatJobs

Posted 2 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a dedicated and experienced Site Reliability Engineer (SRE) to join their team in Nuwaidrat, Southern, BH . In this critical role, you will be responsible for ensuring the reliability, scalability, and performance of our production systems and services. You will work closely with development and operations teams to implement and automate infrastructure, define SLOs (Service Level Objectives), and manage incident response. The ideal candidate possesses a strong background in system administration, software development, and a deep understanding of distributed systems and cloud-native architectures. You will focus on proactively identifying and mitigating potential issues, automating operational tasks, and driving improvements in system resilience. This position involves hands-on work with monitoring tools, alerting systems, and deployment pipelines. You will be a key player in capacity planning, performance tuning, and disaster recovery strategies. A strong command of scripting and programming languages is essential for developing automation tools and solutions. We are looking for an individual who is passionate about system stability, possesses excellent troubleshooting skills, and thrives in a collaborative environment. You will contribute to defining best practices for reliability engineering and champion a culture of operational excellence. Experience with cloud platforms such as AWS, Azure, or GCP is highly preferred. You will be involved in designing and implementing resilient infrastructure that can withstand failures and scale seamlessly with demand. Understanding of network protocols, security best practices, and database management within a production environment is also crucial.
Responsibilities:

Design, build, and maintain highly reliable and scalable production systems.
Implement and manage monitoring, alerting, and logging solutions.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Automate operational tasks through scripting and development.
Lead incident response and post-mortem analysis to prevent recurrence.
Conduct capacity planning and performance tuning of systems.
Collaborate with development teams to ensure the operability of new features.
Implement and maintain CI/CD pipelines for reliable deployments.
Develop and execute disaster recovery plans.
Contribute to infrastructure security and compliance efforts.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
5+ years of experience in SRE, Systems Engineering, or Software Engineering with a focus on reliability.
Proficiency in at least one programming language (e.g., Python, Go, Java).
Experience with cloud platforms (AWS, Azure, GCP).
Familiarity with containerization and orchestration technologies (Docker, Kubernetes).
Strong understanding of Linux/Unix systems.
Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog).
Knowledge of networking, databases, and distributed systems.
Excellent problem-solving and debugging skills.
Ability to work effectively in a team environment.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

4040 Seef, Capital BHD95000 Annually WhatJobs

Posted 9 days ago

Tap Again To Close

Job Description

full-time

Our client is looking for a seasoned Senior Site Reliability Engineer (SRE) to enhance the availability, performance, and scalability of their critical infrastructure. This role offers a hybrid work arrangement, combining the benefits of remote flexibility with essential in-office collaboration. You will be responsible for designing, building, and automating robust systems that ensure the reliability of our client's services. This includes developing and maintaining monitoring systems, implementing effective incident response protocols, and proactively identifying and mitigating potential risks to system stability.

The ideal candidate will possess a deep understanding of distributed systems, cloud computing platforms (e.g., AWS, Azure, GCP), and containerization technologies (e.g., Docker, Kubernetes). You should have a strong background in scripting and automation (e.g., Python, Go, Bash) and a proven ability to troubleshoot complex production issues. Experience with CI/CD pipelines, infrastructure as code (e.g., Terraform, Ansible), and performance tuning is highly valued. You will work closely with development and operations teams to embed reliability best practices into the software development lifecycle. Excellent communication and problem-solving skills are essential, as is the ability to work effectively in both remote and on-site settings. Your contributions will be vital in maintaining high standards of service uptime and performance for our client's users.

Key Responsibilities:

Design, implement, and maintain scalable and reliable cloud infrastructure.
Develop automation tools and scripts to streamline operations and deployments.
Build and manage robust monitoring, alerting, and logging systems.
Lead incident response efforts, conduct post-mortems, and implement preventative measures.
Collaborate with development teams to improve system design and performance.
Manage and optimize container orchestration platforms like Kubernetes.
Implement and maintain Infrastructure as Code (IaC) solutions.
Perform performance tuning and capacity planning.
Ensure security best practices are integrated into all aspects of infrastructure management.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
Strong proficiency in at least one scripting language (Python, Go, Bash).
Extensive experience with cloud platforms (AWS, Azure, GCP).
Deep understanding of containerization technologies (Docker, Kubernetes).
Experience with CI/CD tools and practices.
Familiarity with Infrastructure as Code tools (Terraform, Ansible).
Excellent troubleshooting and problem-solving skills.
Strong communication and collaboration abilities, suitable for a hybrid work environment.

This role is based in Seef, Capital, BH , with a hybrid work model.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

BH26003 Amwaj Islands BHD140000 Annually WhatJobs

Posted 15 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join their robust infrastructure team. This hybrid role offers a blend of remote flexibility and essential in-office collaboration at our **Shakhura, Northern, BH** location. The Senior SRE will be responsible for ensuring the availability, performance, scalability, and reliability of our production systems and services. You will play a critical role in automating and streamlining our operations, from deployment to monitoring and incident response. This position requires a deep understanding of distributed systems, cloud infrastructure, and modern software development practices.

The ideal candidate will have extensive experience with cloud platforms (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and infrastructure-as-code tools (Terraform, Ansible). You should be proficient in scripting languages such as Python, Go, or Bash, and have a strong background in system administration and networking. Responsibilities include designing and implementing robust monitoring and alerting systems, developing automation tools to reduce manual operational effort, participating in on-call rotations, and leading incident post-mortems to identify root causes and implement preventative measures. Collaboration with development teams to ensure production readiness of new features and services is a key aspect of this role. A Bachelor's degree in Computer Science, Engineering, or a related field is required, along with a minimum of 5 years of experience in SRE, DevOps, or a similar role. Strong problem-solving skills, excellent communication abilities, and a proactive approach to system resilience are essential. Experience with CI/CD pipelines and application performance monitoring (APM) tools is highly desirable.

Key Responsibilities:

Design, build, and maintain scalable and reliable production systems.
Implement and manage monitoring, alerting, and logging solutions.
Automate infrastructure provisioning and configuration management using IaC tools.
Develop and maintain CI/CD pipelines for efficient software deployment.
Respond to and resolve production incidents, leading post-mortems.
Collaborate with development teams to ensure system reliability and performance.
Perform capacity planning and performance tuning of distributed systems.
Manage and scale container orchestration platforms (e.g., Kubernetes).
Develop and maintain system documentation and runbooks.
Participate in an on-call rotation schedule.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field.
5+ years of experience in Site Reliability Engineering, DevOps, or System Administration.
Proficiency with cloud platforms (AWS, Azure, GCP).
Strong experience with containerization (Docker) and orchestration (Kubernetes).
Expertise in infrastructure-as-code tools (Terraform, Ansible).
Proficient scripting skills (Python, Go, Bash).
Solid understanding of networking concepts and protocols.
Experience with monitoring tools (Prometheus, Grafana, Datadog).
Excellent troubleshooting and problem-solving abilities.
Strong communication and collaboration skills.

This hybrid role offers a competitive salary, comprehensive benefits, and the opportunity to work on challenging problems in a collaborative environment.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Remote Site Reliability Engineer

789 Northern, Northern BHD90000 Annually WhatJobs

Posted 16 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a highly skilled and motivated Remote Site Reliability Engineer to ensure the stability, performance, and scalability of our cutting-edge digital platforms. This role is critical in maintaining our robust infrastructure and delivering seamless user experiences. As a remote-first position, you will collaborate with engineering and operations teams globally, contributing to the design, implementation, and automation of our systems. Your core responsibilities will include monitoring system health, diagnosing and resolving production issues, implementing robust automation solutions for deployment and infrastructure management, and proactively identifying potential bottlenecks or failure points. You will be instrumental in capacity planning, performance tuning, and disaster recovery strategy development and execution. The ideal candidate possesses deep expertise in cloud computing environments (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and infrastructure-as-code tools (Terraform, Ansible). A strong background in scripting languages (Python, Bash) and familiarity with monitoring and alerting tools (Prometheus, Grafana, ELK stack) are essential. You will thrive in an environment that values collaboration, innovation, and a commitment to operational excellence. This role requires exceptional problem-solving skills, a proactive approach to identifying and mitigating risks, and the ability to work independently with minimal supervision. You will contribute to post-mortems, document best practices, and evangelize SRE principles across the organization. We are looking for a candidate passionate about building and maintaining highly available and fault-tolerant systems. This is a unique opportunity to shape the future of our infrastructure and impact millions of users worldwide, all from the comfort of your chosen remote location. If you are dedicated to reliability, automation, and continuous improvement in complex distributed systems, we want to hear from you.

Key Responsibilities:

Design, build, and maintain scalable and reliable infrastructure on cloud platforms.
Develop and implement automation for deployment, scaling, and operational tasks.
Monitor system performance, availability, and capacity, and respond to incidents.
Diagnose and resolve complex production issues across distributed systems.
Implement and manage CI/CD pipelines and infrastructure-as-code solutions.
Conduct root cause analysis for incidents and implement preventative measures.
Contribute to disaster recovery planning and testing.
Collaborate with development teams to ensure the reliability and operability of new features.
Document system architecture, operational procedures, and best practices.

Qualifications:

Proven experience in Site Reliability Engineering or a similar role.
Strong proficiency with cloud platforms (AWS, Azure, or GCP).
Expertise in containerization (Docker, Kubernetes).
Proficiency in infrastructure-as-code tools (Terraform, Ansible).
Strong scripting skills (Python, Bash, Go).
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
Deep understanding of networking protocols and distributed systems.
Excellent troubleshooting and problem-solving abilities.
Ability to work effectively in a remote, collaborative environment.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Reliability engineer Jobs in Bahrain !

Set Email Alert:

Enter your email

Job title

Location

Senior Site Reliability Engineer

55555 Durrat Al Bahrain BHD110000 Annually WhatJobs

Posted 17 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join their innovative engineering team. This role is critical in ensuring the availability, performance, scalability, and efficiency of our client's critical systems and infrastructure. As a remote-first position, you will collaborate with globally distributed engineering teams, championing best practices in reliability, automation, and incident management. You will play a key role in designing, building, and maintaining robust and scalable systems that underpin our client's digital services.

Responsibilities:

Design, build, and maintain highly available and scalable systems using infrastructure as code principles.
Develop and implement automation for deployment, monitoring, and operational tasks.
Proactively identify and resolve performance bottlenecks and system issues.
Lead incident response efforts, conduct root cause analyses, and implement preventative measures.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Collaborate with software engineering teams to improve system design for reliability and operability.
Contribute to the development and maintenance of CI/CD pipelines.
Mentor junior engineers and share expertise on SRE best practices.
Participate in on-call rotation for production incident management.
Stay current with emerging technologies and industry trends in site reliability and cloud infrastructure.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Engineering.
Strong proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
Extensive experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
Deep understanding of distributed systems, microservices architectures, and network protocols.
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Proven experience with infrastructure as code tools (e.g., Terraform, Ansible).
Excellent problem-solving and debugging skills.
Strong communication and collaboration skills, essential for remote team dynamics.
Experience working in a remote-first or distributed team environment is highly preferred.

This is a pivotal role for an experienced SRE looking to make a significant impact on a global scale. If you are passionate about building resilient systems and thrive in a collaborative remote setting, apply now to join our client's cutting-edge team. Work remotely from anywhere, supporting critical operations for **Isa Town, Southern, BH**.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

600 Northern, Northern BHD100000 Annually WhatJobs

Posted 24 days ago

Tap Again To Close

Job Description

full-time

Our client, a leader in innovative construction technology, is seeking a highly motivated and experienced Senior Site Reliability Engineer to join their dynamic, fully remote team. This pivotal role will be instrumental in ensuring the scalability, reliability, and performance of our cutting-edge digital construction platforms. You will be responsible for designing, implementing, and maintaining the infrastructure that supports our complex project management and simulation software. This includes developing automated deployment pipelines, proactive monitoring solutions, and robust disaster recovery strategies.

Key Responsibilities:

Design, build, and maintain the infrastructure supporting our cloud-based construction management systems, focusing on high availability and fault tolerance.
Develop and implement CI/CD pipelines to automate application deployments, testing, and configuration management across various environments.
Implement comprehensive monitoring, logging, and alerting systems to proactively identify and resolve performance bottlenecks and system failures.
Create and maintain detailed documentation for infrastructure, operational procedures, and incident response plans.
Collaborate with software development teams to ensure new features and services are designed for reliability, scalability, and maintainability.
Participate in on-call rotation to provide timely resolution of critical production issues.
Conduct performance tuning and capacity planning to ensure optimal resource utilization and cost-effectiveness.
Drive the adoption of best practices in site reliability engineering, including chaos engineering and security hardening.
Mentor junior engineers and contribute to a culture of continuous improvement and knowledge sharing.
Stay abreast of the latest trends and technologies in cloud infrastructure, DevOps, and SRE.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of experience in Site Reliability Engineering, DevOps, or a similar infrastructure-focused role.
Proven expertise in managing and scaling cloud platforms such as AWS, Azure, or GCP.
Strong proficiency in scripting languages (e.g., Python, Bash) and infrastructure-as-code tools (e.g., Terraform, Ansible).
Experience with containerization technologies (Docker, Kubernetes).
Solid understanding of networking principles, operating systems (Linux), and database technologies.
Excellent problem-solving and analytical skills, with a strong commitment to system stability.
Exceptional communication and collaboration skills, particularly in a remote work environment.
Experience with monitoring tools like Prometheus, Grafana, ELK Stack.
Knowledge of CI/CD tools (e.g., Jenkins, GitLab CI).

This is a fully remote position, offering the flexibility to work from anywhere. The ideal candidate will be a proactive self-starter with a passion for building resilient systems in the fast-paced world of construction technology. Join us in shaping the future of construction.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

70205 Isa Town, Northern BHD4000 Monthly WhatJobs

Posted 26 days ago

Tap Again To Close

Job Description

full-time

Our client is seeking a highly experienced and proactive Senior Site Reliability Engineer (SRE) to join their dynamic technology team in **Isa Town, Southern, BH**. This role is pivotal in ensuring the scalability, reliability, and performance of our critical production systems and services. The ideal candidate possesses a deep understanding of distributed systems, infrastructure automation, and best practices in site reliability engineering. You will be instrumental in building and maintaining robust, high-availability systems, proactively identifying and mitigating potential issues, and driving operational excellence.

Responsibilities:

Design, build, and maintain infrastructure that is scalable, reliable, and secure.
Develop and implement automation tools and scripts for infrastructure provisioning, deployment, and management.
Monitor system performance, availability, and latency, proactively identifying and resolving issues.
Participate in on-call rotations to respond to production incidents and emergencies.
Conduct root cause analysis for production incidents and implement preventative measures.
Collaborate with software development teams to improve the reliability and operability of applications.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Implement and manage CI/CD pipelines to streamline software delivery.
Develop and maintain comprehensive documentation for systems, processes, and incident response.
Contribute to capacity planning and performance tuning of production systems.
Evaluate and recommend new technologies and tools to enhance site reliability.
Mentor junior engineers and foster a culture of reliability and continuous improvement.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related field.
Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or systems engineering.
Strong proficiency in at least one scripting language (e.g., Python, Bash, Go).
Extensive experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
Deep understanding of operating systems (Linux/Unix), networking protocols, and distributed systems.
Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
Proven ability to troubleshoot complex production issues in a high-pressure environment.
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
Excellent problem-solving, analytical, and communication skills.
Familiarity with CI/CD principles and tools.
A strong understanding of software development practices.

This is an excellent opportunity for a seasoned SRE to make a significant impact on a leading technology platform.

Is this job a match or a miss?

This advertiser has chosen not to accept applicants from your region.

Industry

View All Reliability Engineer Jobs

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

What Jobs are available for Reliability Engineer in Bahrain?

Site Reliability Engineer

Job Description

Is this job a match or a miss?

Senior Reliability Engineer

Job Description

Is this job a match or a miss?

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Site Reliability Engineer (SRE)

Job Description

Is this job a match or a miss?

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Remote Site Reliability Engineer

Job Description

Is this job a match or a miss?

Be The First To Know

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Senior Site Reliability Engineer

Job Description

Is this job a match or a miss?

Nearby Locations

Other Jobs Near Me

Industry