Site Reliability Engineer

1 week ago


Taipei Taipei Taiwan Kkcompany Full time $90,000 - $120,000 per year
About the job Site Reliability Engineer

Team Segment : Solutions Business

KKCompany Technologies, Asias leading AI multimedia technology group is dedicated to creating values for customers with core businesses of multimedia technologies, digital cloud, and AI applications.

At KKCompany, we believe in Innovation Made Simple, and technology is the answer to the struggles faced by every industry. Since its establishment two decades ago, KKCompany has expanded its portfolio, including KKBOX, BlendVision and Going Cloud. KKBOX is the worlds first platform bringing legal music streaming service to the public. It utilizes state-of-the-art streaming technology to enable excellent user experience. Our flagship brands and a base of international clients enable us to accumulate extensive data and advance analytical capabilities. The strengths along with our abundant experience in brand management help businesses achieve digital transformation successfully. We serve over tens of millions of consumers and enterprise clients in Asia cross a broad spectrum of industries such as telecommunication, multimedia, online education, fitness, smart retail and more.

KKCompany now has nearly 500 employees across offices in Tokyo, Singapore, Taipei, Kaohsiung, and Hong Kong.

*Job Overview:
We are seeking a Site Reliability Engineer (SRE) to join our team supporting services with millions of active users. This role ensures service availability, performance, and scalability through automation, monitoring, incident response, and collaboration with DevOps and application development teams.

As an SRE, you will be embedded in the lifecycle of our systems from architecture design, deployment pipelines, and observability frameworks to incident resolution. This is a highly impactful position that requires both technical depth and operational ownership.

Responsibilities:

  • Monitoring & Incident Management
  1. Participate in on-call rotations to respond to critical incidents and ensure high service availability.
  2. Build and maintain monitoring and alerting tools using AWS CloudWatch or third-party platforms.
  3. Set up effective alerting rules, triage anomalies, and lead service recovery efforts during incidents.
  • Architecture Understanding & Collaboration
  1. Work with Web, Backend and DevOps teams to gain deep understanding of the service architecture.
  2. Support integration of operational and reliability best practices into the software development process.
  • Deployment & Release Validation
  1. Monitor new deployments and evaluate their impact on service SLAs.
  2. Make quick rollback decisions when deployments threaten reliability or availability.
  • Infrastructure & Automation
  1. Automate infrastructure provisioning using Infrastructure-as-Code tools such as Terraform, AWS CDK, or CloudFormation for the core encoding service.
  2. Ensure highly available and scalable system design using AWS and Kubernetes.
  • Toil Reduction & Operational Efficiency
  1. Identify repetitive manual tasks (toil) in operations and incident management.
  2. Design and implement automation or process improvements to reduce manual effort and increase engineering velocity.
  • Documentation
  1. Create and maintain detailed documentation including architecture diagrams, runbooks, and postmortems.

Requirements:

  • Bachelor's degree in Computer Science or a related technical field involving software or systems engineering, or equivalent practical experience.
  • Willingness to take part in on-call rotations and respond quickly to incidents.
  • Strong collaboration and communication skills across cross-functional teams.
  • Ability to write scripting languages such as Python or Shell.
  • Familiarity with AWS high availability architecture and services.
  • Experience with Git and CI/CD pipelines, preferably using GitLab CI/CD.
  • Experience with operating and debugging Kubernetes in production.

Nice to Have:

  • Experience in optimizing service performance and reliability in cloud-native environments.
  • Experience managing observability tools such as CloudWatch
  • Familiarity with managing large-scale systems supporting millions of active users.
  • Knowledge of auditing and compliance processes related to ISO27001.


  • Taipei, Taipei City, Taiwan PalUp Full time $90,000 - $120,000 per year

    The engineering team at PalUp is at the core of our mission, building and maintaining systems that make our large-scale social platform stable, reliable, and efficient. As a Site Reliability Engineer, you will play a vital role in ensuring the seamless operation of our infrastructure and services, supporting millions of global users while collaborating...


  • Taipei, Taipei City, Taiwan Circle Full time $125,000 - $175,000 per year

    Circle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data — globally, nearly instantly and less expensively than legacy settlement systems. This ground-breaking new internet layer opens up previously unimaginable possibilities for payments, commerce and markets that...


  • Taipei–Keelung Metropolitan area, Taiwan TenMax ADTech Lab Co., LTD. Full time $90,000 - $120,000 per year

    We're hiring a Site Reliability Engineer at TenMax Join us at TenMax, where you'll help build and maintain the backbone of our digital advertising platform, supporting our operations across Taiwan, and Southeast Asia. We're looking for someone passionate about infrastructure, eager to tackle challenges in hybrid cloud environments, and driven to optimize...


  • (Taiwan) HsinChu, Taiwan Rivos Full time $90,000 - $120,000 per year

    The primary responsibility of the Quality and Reliability Engineer is to ensure that our products meet the reliability criteria set by ourselves, our vendors and suppliers and our Customers. This means that we test our design and products to achieve high standards of reliability, identify weaknesses in design and manufacturing and support improving design...


  • Taipei, Taiwan hermeneutic Investments Full time $100,000 - $150,000 per year

    About the Role:We're looking for an Senior Site Reliability/DevOps Engineer to join our hedge fund's technology team. You'll be responsible for building and maintaining our cloud infrastructure that powers our trading operations. This role combines expertise in AWS architecture, database administration, and system monitoring to ensure our platform operates...


  • Taipei, Taiwan Apple Full time $104,000 - $130,878 per year

    Apple is a place where extraordinary people gather to do their best work. Just be ready to dream big. The people here at Apple don't just build products — they build the kind of wonder that's revolutionized entire industries. It's the diversity of those people and their ideas that encourages the innovation that runs through everything we do, from amazing...


  • Taipei, Taipei City, Taiwan Netskope Full time $104,000 - $130,878 per year

    About NetskopeToday, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security.Since 2012, we have built the...


  • Taipei, Taiwan Meta Full time $150,000 - $200,000 per year

    Reality Labs is focused on ensuring the highest quality and reliability of our advanced silicon products. We are seeking a Lead Silicon Reliability Engineer to drive reliability strategies and lead efforts to deliver robust silicon solutions.Responsibilities Lead the development, execution, and continuous improvement of silicon reliability qualification...


  • Taipei, Taipei City, Taiwan Meta Full time $104,000 - $130,878 per year

    Reality Labs is focused on ensuring the highest quality and reliability of our advanced silicon products. We are seeking a Lead Silicon Reliability Engineer to drive reliability strategies and lead efforts to deliver robust silicon solutions.Silicon Reliability Engineer Responsibilities:Lead the development, execution, and continuous improvement of silicon...


  • Taipei City, Taiwan Rambus Full time $104,000 - $130,878 per year

    Responsibilities:Own On-going Reliability Monitoring for production, task covers from ORM plan, coordinating test, data analysis, to final ORM report.Drive failure analysis to root causes and implementation of corrective actions by interacting with cross-function teams, design, validation, packaging, PE/TE, etc.Collaborate with foundries and package houses...