Infrastructure Senior SRE Engineer job opportunity at OKX.



bot
OKX Infrastructure Senior SRE Engineer
Experience: 10-years
Pattern: full-time
apply Apply Now
Salary:
Status:

Engineering

Copy Link Report
degreeBachelor's (B.A.)
loacation , SAR, Hong Kong
loacation , SAR....Hong Kong

Who We Are At OKX, we believe that the future will be reshaped by crypto, and ultimately contribute to every individual's freedom. OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves. Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er. OKX is part of OKG, a group that brings the value of Blockchain to users around the world, through our leading products OKX, OKX Wallet, OKLink and more.   About the Team The Service Stability Engineering Team envisions service stability as one of the core competitive strengths of the company's products. By building end-to-end, link-level risk management capabilities, the team aims to achieve sustainable automatic identification and analysis of stability risks, transforming from "reactive governance" to "proactive governance." This approach shifts more stability-related matters forward and addresses them early, preventing issues before they arise and enhancing user experience.   Job Responsibilities: Design and lead the stability architecture for large-scale distributed systems, including big data platforms, data warehouses, and core middleware infrastructure. Develop and optimize comprehensive stability strategies covering capacity planning, performance optimization, fault prevention, and disaster recovery. Spearhead chaos engineering practices, designing complex fault injection scenarios to validate system resilience and self-healing capabilities. Build and refine comprehensive monitoring and alerting systems for rapid fault detection, localization, and recovery. Lead root cause analysis for major incidents and formulate long-term improvement plans to continuously enhance system availability and reliability. Drive infrastructure intelligence and automation, designing and implementing AIOps solutions. Collaborate closely with product, development, and operations teams to integrate stability requirements throughout the product lifecycle. Lead the development of stability-related technical standards and best practices, promoting their adoption across the organization. Qualifications: Bachelor's degree or above in Computer Science or related field, with 10+ years of architectural design experience in large-scale internet or cloud computing platforms. Expert knowledge of distributed system architectures, with deep understanding and rich practical experience in big data, cloud-native, and microservice technologies. In-depth understanding of various infrastructure components (e.g., Kubernetes, Kafka, Database) and ability to perform advanced tuning. Strong systems thinking capability, able to analyze and solve complex stability issues from a holistic perspective. Extensive experience in handling large-scale system failures, with the ability to quickly locate and resolve challenging problems. Mastery of Linux systems and network technologies, familiarity with mainstream cloud platforms (e.g., Alibaba Cloud, AWS) architecture and services. Excellent technical leadership skills, able to guide teams and drive cross-departmental collaboration. Strong communication and documentation skills, with the ability to engage in technical discussions in both Chinese and English. Passion for continuous learning, able to quickly grasp new technologies and apply them in practical work scenarios. Perks & Benefits  Competitive total compensation Comprehensive insurance coverage for employees and their dependants More that we love to tell you along the process! Disclaimer: Please note that Hong Kong is a group-level service hub, and OKX does not carry on a business of operating a virtual asset trading platform in Hong Kong.  Notice: All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website. Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.
ai summary

Other Ai Matches

Senior VIP Relationship Manager Applicants are expected to have a solid experience in handling Strategic Markets related tasks
Data Security Audit Senior Manager Applicants are expected to have a solid experience in handling Internal Audit related tasks
Senior/Staff Engineer - DeFi Wallet Applicants are expected to have a solid experience in handling Engineering related tasks
Senior/Staff Software Engineer, Compliance (Platform) Applicants are expected to have a solid experience in handling Engineering related tasks
iOS Developer Applicants are expected to have a solid experience in handling Engineering related tasks
Senior Internal Audit Manager Applicants are expected to have a solid experience in handling Audit-Risk-Compliance Center related tasks
Finance Manager Applicants are expected to have a solid experience in handling Finance Department related tasks
Senior Software Engineer, Risk Applicants are expected to have a solid experience in handling Engineering related tasks
Principal Product Manager, Advanced Trading Applicants are expected to have a solid experience in handling Product Management related tasks
Software Engineer (Android) - Mobile Infrastructure (Performance Optimization) Applicants are expected to have a solid experience in handling Engineering related tasks
Senior / Staff Software Engineer, Liquidity Platform, Risk & Analytics Applicants are expected to have a solid experience in handling Engineering related tasks
Senior Agent, Customer Service (Russian Speaker) - Based in Kuala Lumpur, Malaysia Applicants are expected to have a solid experience in handling Customer Service Operations related tasks
Senior/Staff Software Engineer, Compliance (KYC) Applicants are expected to have a solid experience in handling Engineering related tasks
Senior Data Analyst, Customer Service Operations Applicants are expected to have a solid experience in handling Ops Analytics & Intelligence related tasks
Product Development Audit Director Applicants are expected to have a solid experience in handling Internal Audit related tasks
Process and Controls Operations Sr. Analyst Applicants are expected to have a solid experience in handling Growth Center related tasks
Head of Risk, Brazil Applicants are expected to have a solid experience in handling Risk related tasks
Mobile Network Development Expert Applicants are expected to have a solid experience in handling Engineering related tasks
Application Architect Applicants are expected to have a solid experience in handling Engineering related tasks
Senior Java Engineer, Core Compliance, AML Applicants are expected to have a solid experience in handling Engineering related tasks
Product Director, Trust & Experience Applicants are expected to have a solid experience in handling Product Management related tasks
LLM Security Engineer Applicants are expected to have a solid experience in handling Engineering related tasks
remote-jobserver Remote
Affiliate Business Development Lead, NEA Applicants are expected to have a solid experience in handling Strategic Markets related tasks