At Bloomberg, our software runs the financial markets. To do this, we ingest this real-time market data and news from hundreds of exchanges and thousands of newswires around the world to the tune of 100 billion ticks of data per day.
Our users and downstream applications rely on us to be reliable and always available - this is where our SREs come in. Our SREs ensure our product is fast, reliable, and always available.
Our team :
The Real Time Market Data and News Feeds SRE team sits right at the heart of Bloomberg's global market data operations. Market Data is information - trades, quotes, and other pricing data - gathered from around the world on every possible kind of financial instrument, from stocks and bonds to currencies and commodities.
We are responsible for meeting our clients’ need for speed and stability across the market data pipeline from ingestion of real-time market data from hundreds of financial sources, through to enrichment and delivery of data to the clients.
As a System Reliability Engineer (SRE) at Bloomberg, your mission is to drive the automation of our production operations, everything from reaction to failures, deployment, testing, and quality checks.
You will ensure the optimal availability, latency, scalability, and efficiency of more than ten thousand client-facing applications.
We’ll expect you to own our production environment from the initial design phases to ensuring continuous high availability.
You should be comfortable working alongside other engineers to help fix and debug issues with the production environment.
We’ll trust you to :
Create and maintain monitoring solutions to be used in production monitoring, capacity management, and incident detection and response
Help us establish SLOs and SLIs that we can use to measure our quality as an organization, and contribute to engineering projects aimed at ensuring we meet those standards
Investigate, triage, and troubleshoot production problems as they occur
Develop and maintain tools used in investigating production problems
Build automation for manual processes to increase reliability while at the same time reducing time to market and cost
Help promote and improve development and operational standards within the wider group; you will work with your business partners and software engineers
Ensure that support documentation is produced, maintained and improved
You’ll need to have :
Excellent Python skills
Proficiency in an OO language such as C++ or Java
Familiarity with design and implementation of large scale distributed systems
Familiarity with SRE concepts : production monitoring, capacity management, automated deployment, orchestration, configuration management, etc.
Strong experience with Unix / Linux systems including command line and tools used in troubleshooting
Knowledge and experience of applying agile practices to software development
We’d love to see :
Excellent collaboration and partnering skills : you can effectively listen, communicate, challenge and influence team members, immediate peer group and senior managers
That you are comfortable taking ownership and responsibility of issues
A desire to learn new technologies and apply them where appropriate to improve the quality of our software and processes
Market Data industry exposure or knowledge
Practical knowledge of networking such as TCP / UDP / IP