Ahrefs is looking for a SRE to help take care of its distributed backend systems powered by 3000 servers and ensure all systems are up and running 24/7. We require deep understanding of operating systems and networks fundamentals, practical knowledge of Linux and a healthy desire to automate everything while being able to quickly resolve urgent issues manually. We strive to keep humans away from doing repetitive job that can be done by computers and focus instead on foreseeing problems and defining programmatic structures to handle them.

Who we are

Ahrefs runs an internet-scale bot that crawls the whole Web, storing huge volumes of information to be indexed and structured in a timely fashion. Backend system is powered by a custom petabyte-scale distributed key-value storage to accommodate all that data coming in at high speed. With that data Ahrefs is building analytics services for end-users and web-scale search platform.

We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, and are constantly seeking to automate repetitive tasks and eliminate boilerplate, guided by DRY and following KISS. If there is any new technology that will make our life easier - no doubt, we'll give it a try. We rely heavily on opensource code (as the only viable way to build maintainable system) and contribute back, see e.g. Occasionally we track down CPU bugs.

Our motto is "first do it, then do it right, then do it better".


  • develop internal automation - monitoring, setup, statistics
  • setup automatic systems to control infrastructure
  • monitor live production systems health
  • first-aid reaction to infrastructure failures
  • deal with hardware problems and interact with datacenter
  • help developers with deployment and integration
  • participate in on-call rotation

You will be dealing on a daily basis with:

  • 20PB storage cluster
  • 3000 linux servers
  • experimental large-scale deployments
  • all kinds of software bugs and hardware deviations

Our system is big part custom OCaml code and also employs the following third-party technologies:

  • LAMP
  • ELK
  • Puppet

The ideal candidate is expected to:

  • Independently deal with and investigate infrastructure issues on live production systems
  • Foresee problems and prevent them from happening
  • Make well-reasoned technical choice and take responsibility for it
  • Understand the whole technology stack at all levels : from network and userspace code to OS internals and hardware
  • Approach problems with practical mindset and suppress perfectionism when time is a priority
  • Automate everything and then some
  • Have healthy detestation for complex shell scripts

We provide:

  • Competitive salary
  • Cutting-edge technologies
  • Informal and thriving atmosphere
  • International team

Apply for this job

To apply for this job drop us a note at

Please include:

  • Salary expectations.
  • Your CV and short description of how we can benefit each other.
  • Date of availability.
Apply now

Have questions?

Chat with usSend us an email to