15 Maintenance Reliability jobs in Bahrain

Industrial Equipment Maintenance Engineer

24112 Ghuraifa, Capital BHD35 Hourly WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

contractor
Our client is seeking an experienced Industrial Equipment Maintenance Engineer for a hybrid role, based out of Jidhafs, Capital, BH . This position involves a mix of on-site inspections and remote diagnostics for a range of industrial machinery and production lines. The successful candidate will be responsible for the upkeep, repair, and troubleshooting of critical equipment to ensure optimal operational efficiency and minimize downtime. You will conduct preventative maintenance, perform detailed inspections, and respond promptly to urgent repair requests. This role requires a strong mechanical and electrical aptitude, with a thorough understanding of industrial control systems, hydraulics, pneumatics, and automation technology. You will analyze equipment performance data, identify potential failure points, and implement solutions to enhance reliability and longevity. A key aspect of this role will be developing and refining maintenance strategies, including the creation of maintenance schedules and the management of spare parts inventory. You will also collaborate with production teams to identify needs for equipment upgrades or modifications and assist in the implementation of new machinery. The ability to read and interpret complex blueprints, schematics, and technical manuals is essential. For the remote component, you will leverage diagnostic software and remote monitoring tools to assess equipment health and provide guidance to on-site technicians. Strong analytical and problem-solving skills are paramount, as is the ability to work effectively both independently and as part of a team. Excellent communication skills are required to liaise with various departments, suppliers, and external service providers. This role offers a unique opportunity to work with cutting-edge industrial technology and contribute to the operational success of our client's facilities.

Responsibilities:
  • Perform routine inspections and preventative maintenance on industrial machinery.
  • Diagnose and repair mechanical and electrical faults in complex equipment.
  • Utilize remote monitoring tools for equipment health assessment.
  • Develop and implement effective maintenance plans and schedules.
  • Manage spare parts inventory and ensure availability.
  • Analyze equipment performance data to identify areas for improvement.
  • Collaborate with production and engineering teams on equipment upgrades and new installations.
  • Read and interpret technical drawings, schematics, and manuals.
  • Troubleshoot and resolve urgent equipment breakdowns.
  • Provide technical guidance and support to junior technicians.
Qualifications:
  • Bachelor's degree in Mechanical Engineering, Electrical Engineering, or a related field.
  • Minimum of 5 years of experience in industrial maintenance and engineering.
  • Proficiency in diagnosing and repairing heavy machinery, automation systems, and PLCs.
  • Experience with hydraulics, pneumatics, and electrical control systems.
  • Skilled in reading and understanding technical documentation.
  • Familiarity with remote diagnostic tools and software.
  • Strong analytical and problem-solving capabilities.
  • Excellent communication and teamwork skills.
  • Ability to manage time effectively in a hybrid work environment.
  • Experience in a manufacturing or production setting is preferred.
This role requires regular on-site presence in Jidhafs, Capital, BH , with flexibility for remote work as needed.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Greenfix Property Care

Posted 4 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Overview

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of globally distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution. The company is founder-led, profitable, and growing. We are hiring a Site Reliability Engineer. Our goal is to perfect enterprise infrastructure DevOps practices, raising the bar on what’s possible with automation by embracing a model-driven approach, whether on-premise or on public clouds. We run hundreds of private cloud, Kubernetes clusters, and applications for customers across both physical and public cloud estates. We identify and address incidents, monitor and observe applications, anticipate potential issues, and enable product refinement to ultimately achieve high-quality standards in our open source portfolio. The role is a globally remote position.

To succeed in this role, you need to have a strong background in Linux, Python, networking, and knowledge of how clouds work. Your work will encompass the entire stack, from bare-metal networking and kernel up to Kubernetes and open source applications. You can expect to be trained in our core technologies like OpenStack, Kubernetes, security standards, open source products like Kubeflow, Kafka, OpenSearch, databases, and many others. Automation for us is a software engineering problem that we approach with a scientific mindset to bring operations at scale, driven by metrics and code.

The role

We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices. To become a member of our team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from bare metal to containers, and you need the ability to work in operations with mission-critical services for global brand-name customers. As a member of the team, you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure.

What We Are Looking For In You
  • Degree in software engineering or computer science
  • Python software development experience
  • Operational experience in Linux environments
  • Experience with Kubernetes deployment or operations
  • Excellent interpersonal skills, curiosity, flexibility, and accountability
  • Ability to travel internationally twice a year, for company events up to two weeks long
Bonus skills
  • Familiarity with OpenStack deployment or operations
  • Familiarity with public cloud deployment or operations
  • Familiarity with private cloud management
What we offer colleagues

We consider geographical location, experience, and performance in shaping compensation worldwide. We adjust compensation every 6 months to ensure we recognize outstanding performance, and in addition to base pay, we offer annual bonuses. We provide all team members with additional benefits, which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.

  • Distributed work environment with twice-yearly team sprints in person
  • Personal learning and development budget of USD 2,000 per year
  • Every 6 months compensation review
  • Recognition rewards
  • Annual holiday leave
  • Maternity and paternity leave
  • Employee Assistance Programs
  • Opportunity to travel to new locations to meet your colleagues
  • Priority Pass and travel upgrades for long-haul company events
About Canonical

Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence - in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004. Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.

Equal opportunity

Canonical is an equal opportunity employer. We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background creates a better work environment and better products. Whatever your identity, we will give your application fair consideration.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Canonical

Posted 6 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Overview

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of globally distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.

The company is founder-led, profitable, and growing.

We are hiring a Site Reliability Engineer . Our goal is to perfect enterprise infrastructure DevOps practices, raising the bar on what's possible with automation by embracing a model-driven approach, whether on-premise or on public clouds.

We run hundreds of private cloud, Kubernetes clusters, and applications for customers across both physical and public cloud estates. We identify and address incidents, monitor and observe applications, anticipate potential issues, and enable product refinement to ultimately achieve high-quality standards in our open source portfolio.

To succeed in this role, you need to have a strong background in Linux, Python, networking, and knowledge of how clouds work. Your work will encompass the entire stack, from bare-metal networking and kernel up to Kubernetes and open source applications. You can expect to be trained in our core technologies like OpenStack, Kubernetes, security standards, open source products like Kubeflow, Kafka, OpenSearch, databases, and many others.

Automation for us is a software engineering problem that we approach with a scientific mindset to bring operations at scale, driven by metrics and code.

Location: Globally remote role

The role

We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying DevOps practices.

To become a member of our team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from bare metal to containers, and you need the ability to work in operations with mission-critical services for global brand-name customers.

As a member of the team, you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure.

What we are looking for in you

  • Degree in software engineering or computer science
  • Python software development experience
  • Operational experience in Linux environments
  • Experience with Kubernetes deployment or operations
  • Excellent interpersonal skills, curiosity, flexibility, and accountability
  • Ability to travel internationally twice a year, for company events up to two weeks long

Bonus skills

  • Familiarity with OpenStack deployment or operations
  • Familiarity with public cloud deployment or operations
  • Familiarity with private cloud management

What we offer colleagues

We consider geographical location, experience, and performance in shaping compensation worldwide. We adjust compensation every 6 months to ensure we recognize outstanding performance, and in addition to base pay, we offer annual bonuses. We provide all team members with additional benefits, which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.

  • Distributed work environment with twice-yearly team sprints in person
  • Personal learning and development budget of USD 2,000 per year
  • Every 6 months compensation review
  • Recognition rewards
  • Annual holiday leave
  • Maternity and paternity leave
  • Employee Assistance Programs
  • Opportunity to travel to new locations to meet your colleagues
  • Priority Pass and travel upgrades for long-haul company events
About Canonical

Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence - in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004. Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.

Canonical is an equal opportunity employer

We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background creates a better work environment and better products. Whatever your identity, we will give your application fair consideration.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

Greenfix Property Care

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Senior Site Reliability / Gitops Engineer

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include leading public cloud and silicon providers, and industry leaders across sectors. The company is founded on global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person in interesting locations around the world to align on strategy and execution. The company is founder led, profitable and growing.

We are hiring a Senior Site Reliability Engineer

Next-gen operations at scale, with pure Python infra-as-code, from bare metal to containers and applications. Our goal is to perfect enterprise infrastructure devops. We run hundreds of private cloud, Kubernetes, and application clusters for customers across physical and public cloud estate, and we are raising the bar on automation by embracing a universal operator pattern and model-driven operations.

To succeed in this role you need to believe in automation as a pure software engineering problem, not a hack-it-till-it-works-for-me problem. You need to be interested in the scientific approach to operations at scale, driven by metrics and code, and you need to be able to learn the entire stack, from bare metal networking and kernel up to serverless and open source applications.

Location: Globally remote role

The role entails

Our cloud operations engineers bring Python software-engineering skills and rigour to the operations domain. We practise devsecops from bare metal to application. We architect and run OpenStack, Kubernetes and software defined storage, and we enable devsecops for applications running on that infrastructure too.

To become a member of this team, you need to be a software engineer fluent in Python, you need a genuine interest in the full open source infrastructure stack from metal to containers, and you need the ability to work in a high pressure operations environment with mission-critical services for global brand name customers.

As a member of the team you will gain experience in a broad range of cloud technologies. We evolve our offerings as the state of the art improves, so you get to stay current with the latest capabilities in open source infrastructure. We drive upgrades to keep our customers on the latest, best solutions.

What We Are Looking For In You

  • Degree in Software Engineering or Computer Science
  • Experience with Linux and familiarity with Linux networking and storage
  • Python software development expertise
  • Operational experience
  • Excellent interpersonal skills, curiosity, flexibility, and accountability
  • Ability to travel internationally twice a year, for company events up to two weeks long

Nice-to-have skills

  • Experience with OpenStack or Kubernetes deployment or operations
  • Familiarity with public or private cloud management

What we offer colleagues

We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more often for graduates and associates) to ensure we recognise outstanding performance. In addition to base pay, we offer a performance-driven annual bonus or commission. We provide all team members with additional benefits, which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.

  • Distributed work environment with twice-yearly team sprints in person
  • Personal learning and development budget of USD 2,000 per year
  • Annual compensation review
  • Recognition rewards
  • Annual holiday leave
  • Maternity and paternity leave
  • Employee Assistance Programme
  • Opportunity to travel to new locations to meet colleagues
  • Priority Pass, and travel upgrades for long haul company events

About Canonical

Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence — in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004. Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.

Canonical is an equal opportunity employer

We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. Whatever your identity, we will give your application fair consideration.

Job Id: hSJJDsHgC2zcU3W/4bXowB8gGMyZ/PX5ipOq1g1ITUR2DPLG+dps5c8f6wK6hKpPCdbtbyUgMQ==

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

Canonical

Posted 3 days ago

Job Viewed

Tap Again To Close

Job Description

workfromhome

Senior Site Reliability Engineer

Canonical is hiring a Senior Site Reliability Engineer. Location: Globally remote role. We run hundreds of private cloud, Kubernetes, and application clusters for customers across physical and public cloud estate, and we are raising the bar on automation by embracing a universal operator pattern and model-driven operations. To succeed in this role you need to believe in automation as a pure software engineering problem, be interested in the scientific approach to operations at scale, driven by metrics and code, and be able to learn the entire stack, from bare metal networking and kernel up to serverless and open source applications.

Responsibilities
  • Architect and run OpenStack, Kubernetes and software defined storage, and enable devsecops for applications running on that infrastructure.
  • Bring Python software-engineering skills and rigour to the operations domain; practice devsecops from bare metal to application.
  • Confidently operate in a high pressure operations environment with mission-critical services for global brand name customers.
  • Evolve offerings with the state of the art in open source infrastructure and stay current with capabilities.
What we are looking for in you
  • Degree in Software Engineering or Computer Science
  • Experience with Linux and familiarity with Linux networking and storage
  • Python software development expertise
  • Operational experience
  • Excellent interpersonal skills, curiosity, flexibility, and accountability
  • Ability to travel internationally twice a year, for company events up to two weeks long
Nice-to-have skills
  • Experience with OpenStack or Kubernetes deployment or operations
  • Familiarity with public or private cloud management
What we offer colleagues
  • Distributed work environment with twice-yearly team sprints in person
  • Personal learning and development budget of USD 2,000 per year
  • Annual compensation review
  • Recognition rewards
  • Annual holiday leave
  • Maternity and paternity leave
  • Employee Assistance Programme
  • Opportunity to travel to new locations to meet colleagues
  • Priority Pass, and travel upgrades for long haul company events
About Canonical

Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open source projects and the platform for AI, IoT and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence - in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since its inception in 2004. Working here is a step into the future, and will challenge you to think differently, work smarter, learn new skills, and raise your game.

Canonical is an equal opportunity employer. We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. Whatever your identity, we will give your application fair consideration.

#J-18808-Ljbffr
This advertiser has chosen not to accept applicants from your region.

Remote Site Reliability Engineer

50000 Zallaq, Southern BHD80000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Zallaq, Southern, BH

Our client, a cutting-edge technology firm, is looking for a highly skilled Remote Site Reliability Engineer (SRE) to join their globally distributed team. This is a fully remote position, offering the opportunity to work from anywhere with a stable internet connection. The SRE will be instrumental in ensuring the availability, performance, scalability, and reliability of our client's mission-critical systems and infrastructure. This role involves a blend of software engineering and systems administration, focusing on automating operations, improving system resilience, and reducing manual toil. Key responsibilities include designing and implementing infrastructure as code, developing monitoring and alerting systems, managing cloud environments (AWS, Azure, or GCP), and participating in on-call rotations to respond to incidents. You will collaborate closely with development teams to ensure that services are designed for reliability and operability from the outset. The ideal candidate possesses deep expertise in cloud technologies, containerization (Docker, Kubernetes), CI/CD pipelines, and scripting/programming languages (e.g., Python, Go). Strong troubleshooting skills, a proactive approach to identifying and mitigating risks, and a passion for building robust, scalable systems are essential. We are seeking individuals who are committed to continuous improvement, possess excellent analytical and problem-solving abilities, and can communicate technical concepts effectively within a remote team environment. This is an exceptional opportunity to work on challenging problems with a talented team, shaping the future of reliable and scalable cloud infrastructure.

Key Responsibilities:
  • Design, build, and maintain scalable and reliable cloud infrastructure.
  • Implement and manage infrastructure as code using tools like Terraform or Ansible.
  • Develop and maintain robust monitoring, alerting, and logging systems.
  • Automate operational tasks and reduce manual toil.
  • Troubleshoot and resolve incidents affecting system availability and performance.
  • Collaborate with development teams to ensure operability and reliability of services.
  • Participate in on-call rotation to provide 24/7 system support.
  • Continuously improve system performance and efficiency.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Proven experience in Site Reliability Engineering, DevOps, or Systems Engineering.
  • Strong experience with cloud platforms (AWS, Azure, GCP).
  • Expertise in containerization technologies (Docker, Kubernetes).
  • Proficiency in scripting and programming languages (e.g., Python, Go, Bash).
  • Experience with CI/CD tools and practices.
  • Solid understanding of networking concepts and protocols.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills for remote work.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

90005 Jbeil BHD140000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled Senior Site Reliability Engineer (SRE) to join their dynamic technology team. This hybrid role requires a proactive and detail-oriented individual who can contribute to the stability, performance, and scalability of their critical production systems. You will be responsible for designing, building, and maintaining robust infrastructure, automating operational tasks, and ensuring the reliability of their services. Key duties include monitoring system health, troubleshooting complex issues, implementing infrastructure as code, and developing strategies to prevent outages. The ideal candidate will have extensive experience with cloud platforms (AWS, Azure, GCP), containerization technologies (Docker, Kubernetes), and scripting languages (Python, Go). You must possess strong analytical and problem-solving skills, with a deep understanding of distributed systems and network protocols. This role involves collaborating closely with development teams to foster a DevOps culture and ensure seamless integration between development and operations. A passion for automation and a commitment to continuous improvement are essential. You will play a crucial role in maintaining high availability and performance for our client's digital offerings.
Responsibilities:
  • Design, implement, and manage highly available and scalable production systems.
  • Develop and maintain infrastructure as code using tools like Terraform or Ansible.
  • Automate operational tasks, deployment pipelines, and incident response.
  • Monitor system performance, identify bottlenecks, and implement solutions.
  • Troubleshoot and resolve complex technical issues in production environments.
  • Collaborate with software development teams to ensure system reliability and performance.
  • Implement and manage CI/CD pipelines.
  • Participate in on-call rotations to provide 24/7 system support.
  • Develop and enforce reliability best practices and standards.
  • Contribute to capacity planning and performance tuning.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Minimum of 5 years of experience in Site Reliability Engineering, DevOps, or Systems Administration.
  • Proficiency in at least one major cloud platform (AWS, Azure, GCP).
  • Strong experience with containerization technologies (Docker, Kubernetes).
  • Expertise in scripting languages such as Python, Go, or Bash.
  • Solid understanding of networking concepts (TCP/IP, DNS, HTTP).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
  • Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI).
  • Excellent problem-solving and troubleshooting skills.
  • Ability to work effectively in a hybrid work environment.
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Maintenance reliability Jobs in Bahrain !

Senior Site Reliability Engineer

2027 Al Daih, Northern BHD120000 Annually WhatJobs

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join their established team in Budaiya, Northern, BH . This role is critical to ensuring the stability, performance, and scalability of our client's production systems and infrastructure. As a Senior SRE, you will be responsible for designing, building, and operating reliable and efficient systems, automating operational tasks, and driving improvements in system availability and performance. You will work closely with development and operations teams to embed reliability best practices throughout the software development lifecycle. The ideal candidate possesses a strong blend of software engineering and systems administration skills, with a deep understanding of distributed systems, cloud technologies, and incident management.

Key Responsibilities:
  • Design, implement, and maintain robust and scalable infrastructure and services.
  • Develop and deploy automation tools and scripts to streamline operational tasks, such as deployment, monitoring, and incident response.
  • Monitor system health and performance, identifying and resolving performance bottlenecks and issues proactively.
  • Lead incident response efforts, including on-call rotations, troubleshooting, and post-mortem analysis to prevent recurrence.
  • Collaborate with software development teams to ensure the reliability and operability of new features and services.
  • Implement and manage CI/CD pipelines for efficient and reliable software delivery.
  • Optimize system performance and resource utilization.
  • Develop and maintain comprehensive documentation for systems, processes, and runbooks.
  • Contribute to capacity planning and disaster recovery strategies.
  • Mentor junior engineers and share knowledge across teams.
  • Evaluate and recommend new technologies and tools to enhance reliability and efficiency.
  • Ensure adherence to security best practices and compliance requirements.
  • Participate in architectural reviews to ensure systems are designed for reliability and scalability from the outset.
Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related technical field; Master's degree is a plus.
  • Minimum of 7 years of experience in Site Reliability Engineering, DevOps, or a related field.
  • Proven experience with cloud platforms such as AWS, Azure, or GCP.
  • Strong proficiency in at least one programming language (e.g., Python, Go, Java, Ruby).
  • Extensive experience with containerization technologies like Docker and Kubernetes.
  • Deep understanding of infrastructure-as-code tools (e.g., Terraform, Ansible, Chef, Puppet).
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack, Datadog).
  • Solid understanding of networking concepts (TCP/IP, DNS, HTTP, load balancing).
  • Experience with relational and NoSQL databases.
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration abilities.
  • Experience in an on-call rotation and handling production incidents effectively.
This is an excellent opportunity to join a forward-thinking company and make a significant impact on system reliability in Budaiya, Northern, BH .
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer (SRE)

20002 Galali BHD115000 Annually WhatJobs

Posted today

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join their dynamic team. This role is fully remote, allowing you to contribute from anywhere. You will be responsible for ensuring the availability, performance, scalability, and reliability of our client's critical systems and infrastructure. This involves a blend of software engineering and systems administration, focusing on automating operations, reducing toil, and building robust, self-healing systems. Your contributions will be vital in maintaining high uptime and exceptional user experience for their global user base. Key responsibilities include:

  • Designing, building, and maintaining scalable and reliable infrastructure using infrastructure-as-code principles (e.g., Terraform, Ansible).
  • Developing automation tools and scripts to streamline deployment, monitoring, and operational tasks.
  • Implementing and managing robust monitoring, alerting, and logging solutions (e.g., Prometheus, Grafana, ELK stack).
  • Proactively identifying and resolving performance bottlenecks and reliability issues across the stack.
  • Participating in on-call rotations to respond to and mitigate production incidents.
  • Conducting post-mortems for incidents, identifying root causes, and implementing preventive measures.
  • Collaborating with development teams to improve system design for reliability and operability.
  • Defining and tracking key Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
  • Implementing chaos engineering practices to test system resilience.
  • Contributing to capacity planning and performance tuning efforts.
  • Ensuring the security and compliance of the infrastructure.
  • Mentoring junior SREs and promoting SRE best practices within the organization.

Qualifications:
  • Bachelor's degree in Computer Science, Engineering, or a related field; equivalent practical experience will be considered.
  • 5+ years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency in at least one scripting/programming language such as Python, Go, or Bash.
  • Hands-on experience with cloud platforms (AWS, Azure, or GCP) and containerization technologies (Docker, Kubernetes).
  • Deep understanding of Linux operating systems and networking fundamentals (TCP/IP, DNS, HTTP).
  • Experience with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI).
  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Splunk).
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet).
  • Strong understanding of distributed systems and microservices architectures.
  • Excellent troubleshooting and problem-solving skills.
  • Effective communication and collaboration abilities, especially in a remote setting.
  • Experience with database administration or management is a plus.
This is an excellent opportunity for an experienced SRE to join a team that values innovation, automation, and a robust, reliable infrastructure, all within a fully remote work structure.
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer (Remote)

55501 Northern, Northern BHD120000 Annually WhatJobs

Posted 1 day ago

Job Viewed

Tap Again To Close

Job Description

full-time
Our client is seeking a highly skilled Senior Site Reliability Engineer (SRE) to join their fully remote, world-class engineering team. This role is paramount in ensuring the scalability, availability, and performance of our client's critical infrastructure and services. You will be responsible for designing, building, and automating robust systems, implementing monitoring solutions, and responding to incidents to maintain system health. The ideal candidate possesses a deep understanding of distributed systems, cloud computing (AWS/Azure/GCP), and infrastructure-as-code. This is a remote-first position, offering the opportunity to work on challenging problems with a talented team without geographical constraints.

Key responsibilities include:
  • Designing, implementing, and managing scalable and highly available production environments.
  • Developing and maintaining automation tools and scripts for deployment, configuration, and monitoring.
  • Implementing and managing monitoring, logging, and alerting systems to proactively identify and resolve issues.
  • Participating in on-call rotations to respond to and resolve production incidents.
  • Conducting root cause analysis (RCA) for system outages and implementing preventative measures.
  • Collaborating with development teams to improve application reliability and performance.
  • Defining and tracking key service level objectives (SLOs) and service level indicators (SLIs).
  • Managing and optimizing cloud infrastructure resources for cost-effectiveness and performance.
  • Developing and advocating for SRE best practices across the engineering organization.
  • Mentoring junior engineers and contributing to a culture of continuous learning and improvement.
  • Automating operational tasks and reducing toil.
  • Participating in capacity planning and performance testing.
The ideal candidate will have a Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience. A minimum of 6 years of experience in systems engineering, DevOps, or SRE roles, with a strong emphasis on reliability and automation, is required. Proven experience with cloud platforms (AWS, Azure, or GCP) and expertise in containerization technologies (Docker, Kubernetes) are essential. Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash) is required. Strong understanding of networking concepts, operating systems (Linux), and database technologies is crucial. Excellent troubleshooting and problem-solving skills, coupled with strong communication and collaboration abilities, are necessary for success in this remote role. You should be comfortable working independently and driving initiatives within a distributed team.
This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Maintenance Reliability Jobs