Technology - High Performance Computing (HPC) Grid Reliability Engineer
Morgan Stanley
Hong Kong, Hong Kong, Hong Kong, Japan Asia


Morgan Stanley is a leading global financial services firm providing a wide range of investment banking, securities, investment management and wealth management services.

The Firm's employees serve clients worldwide including corporations, governments and individuals from more than 747 offices in 42 countries.

In Morgan Stanley, Technology works as a strategic partner with Morgan Stanley business units and the world's leading technology companies to redefine how we do business in ever more global, complex, and dynamic financial markets.

Morgan Stanley's sizeable investment in technology results in quantitative trading systems, cutting-edge modelling and simulation software, comprehensive risk and security systems, and robust client-relationship capabilities, plus the worldwide infrastructure that forms the backbone of these systems and tools.

Our insights, our applications and infrastructure give a competitive edge to clients' businesses and to our own.

Technology & Reliability and Production Engineering

The mission of Technology is to provide a highly reliable and commercial technology platform, which supports the Firm's strategy, delivered by an innovative, world-class team of professionals.

Reliability and Production Engineering (RPE), a super-department within Technology, provides global services for Institutional Securities and Support Services applications.

Consolidated support functions include Plant Management / Engineering, Capacity Management, and Grid Management.

Plant Management

RPE includes a horizontal Plant management (PLM), Tools and Engineering practice area that complements its direct production activities.

This covers operational plant management, grid computing, platform engineering, capacity management and production tooling functions.

PLM is a global practice area operating out of New York, London, Montreal, Toronto, Shanghai, Tokyo and Bengaluru

Key expectations from this role are :

  • Innovative and proactive technology professional who can wear multiple hats
  • Primary environment is Unix server (Linux)
  • Operational support professional as well as assume technical ownership to automate / optimize various Grid engineering functions
  • Demonstrate technical & operational acumen to deal with escalations from respective application Support teams to troubleshoot
  • Resolve incidents as a HPC Grid subject matter expert in a global coverage role
  • Effectively communicate and coordinate with relevant Development / Production Management teams, work stream leads & stakeholders
  • Assume design, development & test accountability of assigned automation projects
  • Be able to drive automated RFB (Ready for Business) checks for various multi-tenant Grids and silo grids in order to timely escalate and respond to potential issues
  • Support relevant work stream Leads within Grid Engineering in identification and resolution of performance and capacity bottlenecks, risk and control gaps and infrastructural upgrades, as and when required
  • Demonstrate good understanding of Production Management methodologies i.e. Incident Management, Problem Management, and Event Management etc.
  • Manage incidents on behalf of the global team
  • Develop strong working relationships with Global Plant Management staff, Application Technology teams & RPE Leads
  • Required Skills

    The successful candidate must demonstrate :

  • Strong technical skills along with strong communication and the ability to organize and manage their body of work.
  • Communicate across all levels of the organization to the appropriate level of detail to convey their message.
  • Experienced in Linux, command line tools, how processes interact with each other, memory, & storage management, and can demonstrate performance and troubleshooting skills.
  • Experience in scripting languages (Unix shell scripting, Perl, Python etc.)
  • Excellent communication, interpersonal, and writing skills along with an organized approach to manage a high volume of work.
  • Highly motivated problem solver that can multi-task, work under time pressures and be self-sufficient were required
  • Takes ownership and holds themselves and others accountable while also receptive to constructive feedback
  • Works well under pressure.
  • Bachelor's degree in Computer Science or Computer Engineering from a 4-year program Experience working in an Equities and / or Fixed Income eTrading environment
  • 4 years professional workplace experience
  • Desired Skills

  • Experience of similar roles, developing or supporting infrastructure systems
  • IBM Spectrum Symphony grid software.
  • Cloud technologies - Amazon Web Services, IBM Cloud, Azure.
  • Ansible Infrastructure Automation and Configuration Management.
  • Source code control with git.
  • Advanced Unix knowledge of Kernels, File system, and memory management.
  • Experience of Agile methodologies using Atlassian Jira.
  • Demonstrate the proactive nature of scaling up / down application infrastructure (i.e., servers, storage) leveraging moving average trends, peaks, and business forecasts to take the necessary actions before an incident occurs where an application has no more room to grow
  • bility to gain consensus in formal settings by preparing agendas, presentations, and meeting minutes.
  • Understanding of the SDLC (Software Development Lifecycle) process with the ability to work closely with development teams to ensure properly designed infrastructure for the application needs, following proper change management process.
  • Understanding of and experience of Operating within an ITIL Framework.
  • 报告这项工作

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    通過點擊“持續”,我允許neuvoo同意處理我的數據並向我發送電子郵件提醒,詳見neuvoo的 隱私政策 。我可以隨時撤回我的同意或退訂。