We are online-led, mobile-led and sports-led: and our technology, unique products and innovative marketing all combine to offer a superb experience to our customers. We’re a dedicated and passionate team who work hard to make things happen.
Are you up for the challenge?
We are looking for an experienced Engineer to join our Site Reliability Engineering team in London. This team is tasked with “relentlessly driving down MTTR” and achieve that by providing:
- Production Support Consultancy (proactive and reactive) to keep the site operational and performant
- Monitoring expert: strategy, implementation and support
- Building and Maintaining our automation and tooling
You will be part of a Global SRE Team working with the talented engineers using agile methodology. SRE are also represented in Dublin and Porto so travelling will be part of your journey. Your tasks will be hugely varied: from writing code to spinning up services, working with our peers to investigate root cause analysis, all while making sure you build things. We work with extremely high performing, high throughput, distributed systems which support millions of customers 24 hours a day so you’ll need to be passionate about performance and scale. Working in this environment also means you’ll need to bring a lot of experience to our team but you will learn a lot more, very quickly.
Don’t worry, you won’t be on your own. We have experts in most of the technologies we’ll be asking you to work with and we’re a friendly bunch so you’ll get help when you need it. We like creative thinkers as well. As a member of the team, we’d love to hear your ideas. Make the old hands scratch their heads and question why they’d not thought of that before.
- Constantly improving our infrastructure, monitoring, standards and tooling as part of SRE led projects.
- Proactively and relentlessly seeking opportunities to prevent down-time and reduce MTTR
- Providing in-house consultancy to teams requiring advice in your fields of expertise
- Automate repetitive tasks and drive self service framework
- Raising the technical profile of SRE within Paddy Power Betfair in the wider technical community
- Participating in and promoting the Paddy Power Betfair Strong DevOps Community
- Mentoring/coaching of other SRE engineers and the broader development community
Essential Skills & Experience
- “Anything that moves, graph it” approach and mad about monitoring; Integration and gluing things together
- Understanding how network and applications work over and under the hood
- Ability to analyse network behaviour, system performance and application issues
- In depth technical knowledge of one or more of Linux, Networking, Storage and Databases
- Passion for open source technologies and culture
- Detailed understanding of software development, core infrastructure and reliability engineering
- Team player, who strives to maximise team and departmental performance
- Excellent communication skills, both written and verbal
- Willingness to occasionally travel (mainly Dublin, Porto and Cluj)
Desirable Skills & Experience
- Experiences with technologies such as Sensu, Nagios, TSDB, Grafana, PagerDuty, AppDynamics, Sumo Logic, Splunk. Concepts of RUM, Event Correlation etc
- Understanding of Infrastructure Automation, more specifically in an OpenStack environment
- Excellent presentation skills for both internal and external conferences
- Understanding of continuous pipeline delivery, Runbook automation, cloud infrastructure, config-as-code – such as Chef, Puppet, Rundeck, Jenkins, Git, Artifactory, Go and Ansible.
- BSc in Computer Science or equivalent demonstrable knowledge
- Ability to participate in on-call duty