
Site Reliability Engineer
1 week ago
Team Segment : Solutions Business
KKCompany Technologies, Asias leading AI multimedia technology group is dedicated to creating values for customers with core businesses of multimedia technologies, digital cloud, and AI applications.
At KKCompany, we believe in Innovation Made Simple, and technology is the answer to the struggles faced by every industry. Since its establishment two decades ago, KKCompany has expanded its portfolio, including KKBOX, BlendVision and Going Cloud. KKBOX is the worlds first platform bringing legal music streaming service to the public. It utilizes state-of-the-art streaming technology to enable excellent user experience. Our flagship brands and a base of international clients enable us to accumulate extensive data and advance analytical capabilities. The strengths along with our abundant experience in brand management help businesses achieve digital transformation successfully. We serve over tens of millions of consumers and enterprise clients in Asia cross a broad spectrum of industries such as telecommunication, multimedia, online education, fitness, smart retail and more.
KKCompany now has nearly 500 employees across offices in Tokyo, Singapore, Taipei, Kaohsiung, and Hong Kong.
*Job Overview:
We are seeking a Site Reliability Engineer (SRE) to join our team supporting services with millions of active users. This role ensures service availability, performance, and scalability through automation, monitoring, incident response, and collaboration with DevOps and application development teams.
As an SRE, you will be embedded in the lifecycle of our systems from architecture design, deployment pipelines, and observability frameworks to incident resolution. This is a highly impactful position that requires both technical depth and operational ownership.
Responsibilities:
- Monitoring & Incident Management
- Participate in on-call rotations to respond to critical incidents and ensure high service availability.
- Build and maintain monitoring and alerting tools using AWS CloudWatch or third-party platforms.
- Set up effective alerting rules, triage anomalies, and lead service recovery efforts during incidents.
- Architecture Understanding & Collaboration
- Work with Web, Backend and DevOps teams to gain deep understanding of the service architecture.
- Support integration of operational and reliability best practices into the software development process.
- Deployment & Release Validation
- Monitor new deployments and evaluate their impact on service SLAs.
- Make quick rollback decisions when deployments threaten reliability or availability.
- Infrastructure & Automation
- Automate infrastructure provisioning using Infrastructure-as-Code tools such as Terraform, AWS CDK, or CloudFormation for the core encoding service.
- Ensure highly available and scalable system design using AWS and Kubernetes.
- Toil Reduction & Operational Efficiency
- Identify repetitive manual tasks (toil) in operations and incident management.
- Design and implement automation or process improvements to reduce manual effort and increase engineering velocity.
- Documentation
- Create and maintain detailed documentation including architecture diagrams, runbooks, and postmortems.
Requirements:
- Bachelor's degree in Computer Science or a related technical field involving software or systems engineering, or equivalent practical experience.
- Willingness to take part in on-call rotations and respond quickly to incidents.
- Strong collaboration and communication skills across cross-functional teams.
- Ability to write scripting languages such as Python or Shell.
- Familiarity with AWS high availability architecture and services.
- Experience with Git and CI/CD pipelines, preferably using GitLab CI/CD.
- Experience with operating and debugging Kubernetes in production.
Nice to Have:
- Experience in optimizing service performance and reliability in cloud-native environments.
- Experience managing observability tools such as CloudWatch
- Familiarity with managing large-scale systems supporting millions of active users.
- Knowledge of auditing and compliance processes related to ISO27001.
-
Site Reliability Engineer
1 week ago
Taipei, Taipei City, Taiwan PalUp Full time $90,000 - $120,000 per yearThe engineering team at PalUp is at the core of our mission, building and maintaining systems that make our large-scale social platform stable, reliable, and efficient. As a Site Reliability Engineer, you will play a vital role in ensuring the seamless operation of our infrastructure and services, supporting millions of global users while collaborating...
-
Senior Site Reliability Engineer
1 week ago
Taipei, Taipei City, Taiwan Circle Full time $125,000 - $175,000 per yearCircle is a financial technology company at the epicenter of the emerging internet of money, where value can finally travel like other digital data — globally, nearly instantly and less expensively than legacy settlement systems. This ground-breaking new internet layer opens up previously unimaginable possibilities for payments, commerce and markets that...
-
Site Reliability Engineer
1 week ago
Taipei–Keelung Metropolitan area, Taiwan TenMax ADTech Lab Co., LTD. Full time $90,000 - $120,000 per yearWe're hiring a Site Reliability Engineer at TenMax Join us at TenMax, where you'll help build and maintain the backbone of our digital advertising platform, supporting our operations across Taiwan, and Southeast Asia. We're looking for someone passionate about infrastructure, eager to tackle challenges in hybrid cloud environments, and driven to optimize...
-
Quality and Reliability Engineer
1 week ago
(Taiwan) HsinChu, Taiwan Rivos Full time $90,000 - $120,000 per yearThe primary responsibility of the Quality and Reliability Engineer is to ensure that our products meet the reliability criteria set by ourselves, our vendors and suppliers and our Customers. This means that we test our design and products to achieve high standards of reliability, identify weaknesses in design and manufacturing and support improving design...
-
Taipei, Taiwan hermeneutic Investments Full time $100,000 - $150,000 per yearAbout the Role:We're looking for an Senior Site Reliability/DevOps Engineer to join our hedge fund's technology team. You'll be responsible for building and maintaining our cloud infrastructure that powers our trading operations. This role combines expertise in AWS architecture, database administration, and system monitoring to ensure our platform operates...
-
Reliability Engineer
2 days ago
Taipei, Taiwan Apple Full time $104,000 - $130,878 per yearApple is a place where extraordinary people gather to do their best work. Just be ready to dream big. The people here at Apple don't just build products — they build the kind of wonder that's revolutionized entire industries. It's the diversity of those people and their ideas that encourages the innovation that runs through everything we do, from amazing...
-
Staff Site Reliability Engineer
1 week ago
Taipei, Taipei City, Taiwan Netskope Full time $104,000 - $130,878 per yearAbout NetskopeToday, there's more data and users outside the enterprise than inside, causing the network perimeter as we know it to dissolve. We realized a new perimeter was needed, one that is built in the cloud and follows and protects data wherever it goes, so we started Netskope to redefine Cloud, Network and Data Security.Since 2012, we have built the...
-
Silicon Reliability Engineer
1 week ago
Taipei, Taiwan Meta Full time $150,000 - $200,000 per yearReality Labs is focused on ensuring the highest quality and reliability of our advanced silicon products. We are seeking a Lead Silicon Reliability Engineer to drive reliability strategies and lead efforts to deliver robust silicon solutions.Responsibilities Lead the development, execution, and continuous improvement of silicon reliability qualification...
-
Silicon Reliability Engineer
1 week ago
Taipei, Taipei City, Taiwan Meta Full time $104,000 - $130,878 per yearReality Labs is focused on ensuring the highest quality and reliability of our advanced silicon products. We are seeking a Lead Silicon Reliability Engineer to drive reliability strategies and lead efforts to deliver robust silicon solutions.Silicon Reliability Engineer Responsibilities:Lead the development, execution, and continuous improvement of silicon...
-
Reliability Engineer contractor
5 days ago
Taipei City, Taiwan Rambus Full time $104,000 - $130,878 per yearResponsibilities:Own On-going Reliability Monitoring for production, task covers from ORM plan, coordinating test, data analysis, to final ORM report.Drive failure analysis to root causes and implementation of corrective actions by interacting with cross-function teams, design, validation, packaging, PE/TE, etc.Collaborate with foundries and package houses...