Supercomputing Test Software Engineer
1 day ago
About Etched
Etched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.
Job Summary
We are seeking highly motivated and detail-oriented Software Engineers to join our Supercomputing Testing team. This team plays a critical role in ensuring the reliability and stability of our highest-performance Inference server hardware and software. As a Software Engineer on this team, you will design, develop, and execute comprehensive supercomputing test suites, analyze test results, and collaborate with hardware and software engineering teams at Etched and our ODM partners to identify and resolve potential issues. You will be at the forefront of ensuring our server products meet the highest quality standards before they reach our customers.
Key Team Responsibilities
- Test Development: Design, develop, and implement automated supercomputing test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance.
- Test Execution: Execute tests on server hardware, monitor system performance and health, and analyze test results.
- Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans.
- Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions.
- Test Infrastructure: Contribute to the development and maintenance of the supercomputing testing infrastructure, including portable test environments and automation tools runnable in any environment.
- Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results.
- Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization.
- Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the testing process.
Representative projects
- Develop automated test suites to stress-testing of Supercomputing AI subsystems under extreme at-scale workloads.
- Design and implement fault injection tests to simulate hardware and software failures.
- Create tools to monitor and analyze system performance metrics, such as CPU utilization, cross-socket memory performance and usage, and network latency.
- Build and maintain a scalable testing environment capable of handling multiple server configurations.
- Collaborate with hardware engineers to develop tests for new server features and components.
- Contribute to the creation of Pod-level dashboards tracking system testing across hundreds of AI accelerators.
You may be a good fit if you have
- Proficiency in at least one scripting language (e.g., Python, Bash, Go).
- Experience with software testing methodologies and tools.
- Strong understanding of operating systems (Linux preferred) and server hardware architectures.
- Ability to analyze complex technical problems and provide effective solutions.
- Excellent communication and collaboration skills.
- Ability to work independently and as part of a team.
- Experience with version control systems (e.g., Git).
- Experience with reading and interpreting hardware logs.
Strong Candidates May Also Have Experience With (Nice-to-have Qualifications)
- Experience with hardware burn-in testing or reliability testing.
- Experience with performance testing and benchmarking tools.
- Familiarity with hardware diagnostic tools and techniques.
- Experience with CI/CD pipelines.
- Knowledge of low level hardware communication protocols (i2c, etc.)
- Experience with data analysis tools and techniques.
Benefits
- Competitive compensation packages including generous equity packages
- Comprehensive insurance coverage and other top-of-market benefits
How We're Different
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in San Jose and Taipei, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
-
Supercomputing Test Software Engineer
3 days ago
Taipei, Taiwan Etched Full timeAbout EtchedEtched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought...
-
Supercomputing Test Software Engineer
1 day ago
Taipei, Taiwan Etched Full timeAbout EtchedEtched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought...
-
Supercomputing Software Engineer
1 day ago
Taipei, Taiwan Etched Full timeAbout EtchedEtched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought...
-
Supercomputing Software Engineer
3 days ago
Taipei, Taiwan Etched Full timeAbout EtchedEtched is building the world's first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought...
-
Field Test Engineer – Automotive Software
2 weeks ago
Taipei, Taipei City, Taiwan FPT Software Career Full time1. Position OverviewThe Field Test Tester is responsible for performing on-vehicle testing by riding along with the driver to validate automotive software features under real driving conditions in Taiwan. This role requires high concentration, precise operation of IVI/HU systems, and the ability to record and report test results clearly and accurately.2. Key...
-
Software Engineer in Test
1 week ago
Taipei, Taipei City, Taiwan Trend Micro Full timeJoin Trend ‧ Join New Generation趨勢科技 - 全球雲端資安領航者 / 全亞洲最大軟體公司 / 企業版圖橫跨五大洲 / 趨勢全球研發基地在台灣===============================================================As a Software Development Engineer in Test, you will be responsible for testing tasks related to Endpoint Security products...
-
Software Development Engineer in Test
2 weeks ago
Taipei, Taipei City, Taiwan Trend Micro Full timeJoin Trend ‧ Join New Generation趨勢科技 - 全球雲端資安領航者 / 全亞洲最大軟體公司 / 企業版圖橫跨五大洲 / 趨勢全球研發基地在台灣===============================================================Project IntroductionThe job offers the opportunity for the candidates to explore, design and test the Deep Discovery Analyzer...
-
Senior Software Quality and Test Engineer
3 days ago
Taipei City, Taipei City, Taiwan WITS (Wistron ITS) Full timePosition:Senior Software Quality and Test Engineer (Automation)Job PurposeAs an Automation Senior Software Quality and Test Engineer, Medical Imaging, you will ensure the quality, compliance, and reliability of software used in regulated life sciences products by leading verification and validation (V&V) activities, implementing robust quality processes, and...
-
Software Development Engineer in Test
2 weeks ago
Taipei, Taipei City, Taiwan Ubiquiti Inc. Full timeAbout UbiquitiAt Ubiquiti Inc., we create technology platforms for Businesses, Smart Homes, and Internet Service Providers, driven by our goal to connect everyone, everywhere. To date, Ubiquiti has shipped over 100 million devices worldwide, from ISP networking products to next generation of IT solutions. Our growth is made possible by the dedicated team of...
-
Junior Software Test Engineer
4 days ago
Taipei, Taipei City, Taiwan Ubiquiti Inc. Full timeAbout UbiquitiAt Ubiquiti Inc., we create technology platforms for Businesses, Smart Homes, and Internet Service Providers, driven by our goal to connect everyone, everywhere. To date, Ubiquiti has shipped over 100 million devices worldwide, from ISP networking products to next generation of IT solutions. Our growth is made possible by the dedicated team of...