Cloud Site Reliability Engineer
Hong Kong

Feedzai is the market leader in managing financial risk with AI. We’re coding the future of commerce with today’s most advanced risk management platform powered by big data and machine learning.

Founded and developed by data scientists and aerospace engineers, Feedzai has one mission : to make banking and commerce safe.

The world’s largest banks, processors, and retailers use Feedzai’s fraud prevention and anti-money laundering products to manage risk, while improving customer Experience.

We are looking for a Cloud SRE for our Feedzai Cloud organization in Product Engineering, to support the management of our customers' public cloud environments.

With Cloud at its core, Engineering is responsible for all Feedzai product development. Together with Product Management and Data Science, we are building the next generation of tools to catch fraud in real time with a machine learning first approach.

Formed by engineers and managed by engineers, at Feedzai you will find one of the most talented teams out there from junior to senior engineers that provide a safe, open and collaborative environment leading to a continuous learning of everyone.

While building the best value for our customers, you will be exposed to a wide range of technical challenges such as building distributed systems that need to operate 24 / 7 and ultra-low latencies.

The Cloud SRE will be responsible for delivering and managing overall availability, performance, efficiency, monitoring, emergency response and capacity planning of the cloud services provided to our customers.

As part of this team you'll work with other reliability engineers with a deep understanding of Feedzai products and environments in order to assure and improve our Feedzai Cloud operation, and so contributing directly to the capacity and availability of our service.

Who Are You

  • You are a person that is curious and is customer centric;
  • You enjoy working with members of the team, and other teams, in providing the best operational service possible to your customers
  • When problems occur you feel the urge to identify the root cause of the problems and how you can improve the solution to prevent it in the future;
  • You accept the fact that failures will happen and align with key stakeholders to plan solution architectures to address issues;
  • You have an Operational Excellence mindset (CloudNetSec) and enjoy working in a DevOps environment.
  • Responsibilities

  • Guarantee general availability and resilience of cloud environments;
  • Participate in incident response, root cause investigation and resolution;
  • Contribute to our infrastructure as code (IaC), and support development activities aiming at automating manual activities and reduce operational effort;
  • Troubleshoot and optimize high availability and performance issues;
  • Use the best tools for the job and write our own if that’s the best option.
  • Qualifications

  • 3+ years working in Cloud Operations, and / or Software Development teams is desirable
  • Good programming skills (e.g. Python, Bash); Java experience is a plus;
  • Good knowledge of Linux systems;
  • Real-world experience with cloud services (e.g. AWS, Google Cloud, Azure);
  • Experience in supporting production systems;
  • Understanding of service metrics (e.g. SLA, SLO);
  • Knowledge of service monitoring metrics;
  • Knowledge in CI / CD processes (preferably Cloud Native CI / CD solutions) desirable
  • Certifications in Public Cloud desirable (eg. AWS, Azure, Google Cloud)
  • Continuous service improvement mindset
  • 报告这项工作

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    通過點擊“持續”,我允許neuvoo同意處理我的數據並向我發送電子郵件提醒,詳見neuvoo的 隱私政策 。我可以隨時撤回我的同意或退訂。