Infrastructure Stability Architect job opportunity at OKX.



bot
OKX Infrastructure Stability Architect
Experience: 10-years
Pattern: full-time
apply Apply Now
Salary:
Status:

Engineering

Copy Link Report
degreeHigh School (S.S.C.E)
loacation , SAR, Hong Kong
loacation , SAR....Hong Kong

OKX will be prioritising applicants who have a current right to work in Singapore, and do not require OKX's sponsorship of a visa   Who We Are At OKX, we believe that the future will be reshaped by Crypto, ultimately contributing to every individual's freedom. OKX began as a crypto exchange giving millions of people access to crypto trading and over time becoming among the largest platforms in the world. In recent years, we have developed one of the most connected Web3 wallets used by millions to access decentralized crypto applications (dApps). OKX is a trusted brand by hundreds of large institutions seeking access to crypto markets on a reliable platform that seamlessly connects with global banking and payments. In the last year, OKX has expanded into new markets including Australia, Brazil, Netherlands, Singapore and Turkey, with plans to launch in the US, Belgium and the UAE. We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology. This is why we publish proof of reserves monthly, and continue to ship new innovative security features.   About the Opportunity With the vision of ensuring service stability as one of the core competitiveness of the company's products, the service stability engineering team has built end-to-end link-level risk management capabilities to achieve sustainable automatic identification and analysis of potential stability risks, and changed from "passive governance" to "active governance", so as to move more stability matters forward and left, prevent them before they occur, and improve user experience.   What You’ll Be Doing Design and lead the stability architecture for large-scale distributed systems, including big data platforms, data warehouses, and core middleware infrastructure. Develop and optimize comprehensive stability strategies covering capacity planning, performance optimization, fault prevention, and disaster recovery. Spearhead chaos engineering practices, design complex fault injection scenarios to validate system resilience and self-healing capabilities. Build and refine comprehensive monitoring and alerting systems for rapid fault detection, localization and recovery,. Lead root cause analysis for major incidents and formulate long-term improvement plans to continuously enhance system availability and reliability. Drive infrastructure intelligence and automation, designing and implementing AIOps solutions.  Collaborating closely with product, development, and operations teams to integrate stability requirements throughout the product lifecycle. Lead the development of stability-related technical standards and best practices, promoting their adoption across the organization.    What We Look For In You  Bachelor degree or above in Computer Science or related major, with more than 10 years of architecture design experience in large-scale internet or computing platforms. Expert knowledge of distributed system architectures, with deep understanding and rich practical experience in big data, cloud-native, and micro-service technologies. In-depth understanding of various infrastructure components (e.g. Kubernetes, Kafka, Database) and ability to perform advanced tuning. Strong systems thinking capability, able to analyze and solve complex stability issues from a holistic perspective. Extensive experience in handling large-scale system failures, with the ability to quickly locate and resolve challenging problems. Mastery of Linux systems and network technologies, familiarity with mainstream cloud platforms e.g. Alibaba Cloud, AES) architecture and services. Excellent technical leadership skills, able to guide teams and drive cross-department collaboration. Proficiency in speaking, reading and writing in both English and Mandarin to collaborate effectively with global and cross-functional team members. Passion for continuous learning, able to quickly grasp new technologies and apply them in practical work scenarios.   Perks & Benefits Competitive total compensation package L&D programs and Education subsidy for employees' growth and development Various team building programs and company events Wellness and meal allowances Comprehensive healthcare schemes for employees and dependants More that we love to tell you along the process! Notice: All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website. Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.
ai summary

Other Ai Matches

Senior Manager, Risk Operations Strategy Applicants are expected to have a solid experience in handling Product Management related tasks
Principal/Senior Product Manager, Tokenisation Applicants are expected to have a solid experience in handling Product Management related tasks
Design Director, Design Systems & Visual Experience Applicants are expected to have a solid experience in handling Product Design related tasks
Product Design Manager Applicants are expected to have a solid experience in handling Product Design related tasks
Product Manager / Director, User Growth (User Reach) Applicants are expected to have a solid experience in handling Product Management related tasks
Engineering Director, Liquidity Platform, Risk & Analytics Applicants are expected to have a solid experience in handling Engineering related tasks
Operations Manager Applicants are expected to have a solid experience in handling Growth Center related tasks
Director, Data Product Management Applicants are expected to have a solid experience in handling CEO Office related tasks
Senior/Staff Engineer, Customer Genius Applicants are expected to have a solid experience in handling Engineering related tasks
Engineering Director, Trading Service Applicants are expected to have a solid experience in handling Engineering related tasks
Product Manager / Director, User Growth (User Conversion) Applicants are expected to have a solid experience in handling Product Management related tasks
Mobile Network Development Expert Applicants are expected to have a solid experience in handling Engineering related tasks
Principal / Senior Product Manager, Payment Applicants are expected to have a solid experience in handling Product Management related tasks
Senior Data Analyst, Customer Service Operations Applicants are expected to have a solid experience in handling Ops Analytics & Intelligence related tasks
Senior/Staff Engineer - DeFi Wallet Applicants are expected to have a solid experience in handling Engineering related tasks
Senior Manager, Total Rewards Applicants are expected to have a solid experience in handling Central Functions related tasks
Principal/Senior Product Manager, AI and Agent Applicants are expected to have a solid experience in handling Product Management related tasks
Compliance Analyst Applicants are expected to have a solid experience in handling Compliance related tasks
Senior Product Manager, VIP Growth & Operations Applicants are expected to have a solid experience in handling Product Management related tasks
Growth Risk Specialist Applicants are expected to have a solid experience in handling Product Management related tasks
remote-jobserver Remote
Senior Manager, Product Marketing (Europe, EEA) Applicants are expected to have a solid experience in handling Global Marketing and Growth related tasks
remote-jobserver Remote
Growth & Strategy Operations Manager, Northeast Asia Applicants are expected to have a solid experience in handling Growth Center related tasks
Senior Risk Manager, Trading (24/7 team) Applicants are expected to have a solid experience in handling Product Management related tasks