2026/5/13

【AIエージェントxRPA - 業界No.1 SaaS vendor from US】Principal Site Reliability Engineer

想定年収1,000 ~ 2,000 万円勤務地東京都職種その他雇用形態正社員募集締切9/8(火) 23:59

社名非公開

Silkwise

堀敬史

どんな仕事か

【クライアントについて】 "Agentic(エージェンティック）"の最先端で一緒に働いてみませんか？当社は、エンドツーエンドの業務自動化を通じて、これまで日本企業の効率化と変革を支えてきました。今、我々が注力しているのは「エージェンティックオートメーション」。AIエージェント、RPAのロボット、人を連携させて、企業全体の業務を安全かつ安定的に自動化することです。日本支社は本社直下のリージョンに昇格し、日本を最重要拠点と位置づける戦略のもと、日本から世界へソリューションを発信することを目指しています。当社は、好奇心旺盛で、自ら進んで動けるフットワークの軽い人材を求めています。ビジネスのスピードや変化を喜びとし、互いを思いやり、ともに成長し続けられる仲間が必要です。是非、一緒にエージェンティックオートメーションを実現し、共に社会を変革しましょう。 Role Overview: This is a high-impact, principal level role designed for an engineer who excels in the "heat of the moment". Operating with a high degree of autonomy, you will take operational leadership to restore the stability of our large-scale distributed services, blending deep technical SRE expertise with the authoritative presence of an Incident Commander. You will partner closely with platform, infrastructure, and application teams globally to improve service availability, reduce operational toil, and ensure our systems scale reliably under real-world load and failure conditions. You will act as the Japan regional owner for SRE standards and maintain a close partnership and functional alignment with our Global SRE organization. You will also own service reliability, observability, automation, and continuous improvement initiatives for the region. You will report primarily to Senior Director of Japan and functionally to Vice President - SRE, based in U.S. You will also act in the managerial capacity with another team member reporting to you. What You’ll Be Working On: 1. Incident Command & Tactical Response • Lead Incident Command: Act as the primary Incident Commander for high-stakes technical events. Establish command and control, orchestrate cross-functional response efforts (Compute, Network, Storage, Database), and maintain a common operating picture for all stakeholders. • Live Site Troubleshooting: Serve as a key escalation point for complex issues. Use your deep understanding of service topology and dependencies to diagnose "grey failure" and resolve disruptions promptly. • Executive Communication: Own the communication life cycle. Deliver real-time, executive-level briefings during active incidents, translating technical jargon into clear business impact and recovery timelines for leadership. 2. Prevention & Reliability Engineering • Post-Incident Evolution: Lead thorough retrospectives and RCAs. Beyond just documenting what happened, you will drive and influence the discovery and implementation of automated self-healing solutions to ensure the same issue never occurs twice. • Observability: Define, track, and improve service health through promoting well-designed SLIs and SLOs. Influence and implement proactive monitoring, dashboards, and early-warning alerts to identify performance bottlenecks before they trigger an incident. • Toil Automation: Design and implement automation to reduce manual intervention during incidents and routine operations. Apply engineering rigor to operational workflows to eliminate repetitive and error-prone tasks. • Service Resilience: Understand the know-how to test service behavior under load, including degradation modes, scaling characteristics, and dependency failures. Ensure backup, restore, and disaster recovery capabilities are implemented, tested, and maintained. 3. Service Design & Cross-functional Leadership • Architectural Partnership: Partner with development teams to champion high availability and readiness of the services and promote best practices on reliability, resilience, and operability. • Team Mentorship: Advocate for SRE best practices. Mentor and support other engineers, helping raise the overall incident response and reliability maturity of the organization.

必要なスキル・経験

What You’ll Bring to the Team: • Experience: 7+ years in SRE , Cloud Operations, or a related technical field, with at least 3 years in a lead responder or command-oriented role. • Command Presence: Demonstrated ability to remain calm, focused, and decisive under extreme pressure. You can lead a room of diverse stakeholders and drive technical conversations to successful outcomes. • Forensics & Investigation: Skills in analyzing system artifacts, network, and performance dashboard data to lead the multi-disciplinary audience to appropriate root cause areas of service failures. • Technical Breadth: Strong proficiency in Python or Go and a holistic understanding of distributed systems, Kubernetes, and cloud infrastructure (Preferably Azure). • Observability Expertise: Deep experience with leveraging Prometheus/Grafana, Open Telemetry or any other equivalent 3rd party Observability stack. • Availability: Willingness to participate in the on-call rotation as an Incident Commander for high-severity issues.

歓迎条件

Nice to have: • Command Frameworks: Familiarity with structured command systems (such as the Incident Command System - ICS) used in crisis management. • LLM Ops: Experience using LLMs or AI-driven detection systems to solve reliability and capacity challenges in GPU-heavy, high-performance computing environments. • AI Tooling: Champion the use of AI tools and LLM-powered agents to improve SRE pillars including, but not limited to, reducing operational toil. • Event-Driven Remediation: Proven history of building "self-healing" infrastructure via Terraform, A zure Service Operator, or any other equivalent solutions.

想定年収	1,000 ~ 2,000 万円
ポジション	Principal Site Reliability Engineer
勤務地	東京都

担当エージェント

Silkwise

堀敬史

Silkwise

堀敬史

【当社は東京を拠点に、外資系IT企業（グローバルIT企業）を専門領域とするExecutive Search Firmです】 ※大変ありがたいことに、現在、外資系IT企業（US, Europaメイン）を主に多くの企業様より採用支援のご相談をいただいており、メッセージの確認・返信に少しお時間をいただく場合がございます。 Silkwiseでは複数名のコンサルタントがチームで個人の皆様のキャリア支援を行っております。私以外のメンバーへもお気軽にご相談ください。また、当社HPからのご連絡も歓迎しております。 ⭐︎当社について（https://silkwise.co/） Silkwiseは、外資系IT企業や急成長スタートアップを中心に、エグゼクティブサーチおよびタレントアドバイザリーを提供するExecutive Search Firmです。特に昨今はAI、クラウド、データ、SaaS、製造業向けSW領域を中心としたサーチに強みを持ち、日本市場における事業拡大フェーズの企業と、次のキャリアを模索されている個人の皆様との橋渡しを行えるよう日々活動しております。「People First」を価値観（理念）の中心に置き、企業・個人の皆様・そして私たち自身も含めた関わる全ての方々がWin-Win-Winの機会を創出することをミッションとしています。また、グローバルIT企業とのネットワークを活かし、海外本社もしくはAPJ直結のポジションや、日本市場立ち上げフェーズの重要ポジションなど、一般には公開されない機会も多く取り扱っています。 ■当社のご紹介案件例：・外資系テクノロジー企業（AI / Cloud / SaaS）のカントリーマネージャーポジション・外資系Big Tech / グローバル企業における日本市場での日本市場立ち上げ、事業責任者ポジション、組織拡大支援（Sales, Pre Sales, Post Sales, Marketing, FinanceからHRまで幅広くカバーしております）・AI / Data / Cloud関連のシニアエンジニア・アーキテクト・プロフェッショナルサービス・スタートアップ〜成長企業におけるCxO / VPクラスの経営人材・コンサルティングファーム / DX / AI関連プロフェッショナルポジション ※上記以外にも、AI、クラウド、SaaS、DX、データ領域を中心に、多数の非公開ポジションをお預かりしております。大小関わらず、外資系IT企業にご興味のある方は、ぜひお気軽にご相談ください。 ■私（堀）について： SilkwiseにてSenior Managerをしております、堀と申します。この仕事を始めて8年目に突入しますが、初期からテクノロジー業界を中心に、外資系IT企業の日本市場における採用支援を行っております。専門領域：AI、クラウド、データ、SaaS (CRMやEPR含む）、製造業向けSW（SCM, PLM, CAD, CAE, MBD/MBSE, AD/ADAS etc) 個人の皆様様のご経験やお考えを踏まえ、単に求人をばら撒く存在ではなく、長期的なキャリア形成を見据えたご提案を行うことを大切にしております。また、日本国内だけでなくグローバル組織との連携が必要なポジションや、英語を活かしたキャリア機会についてもサポートさせていただいております。 ■実績：（年齢の若い順番から記載）・26歳（当時）、有名私立大卒　日系SIerエンジニア700万円　⇒　外資Cloudベンダーコンサルタント1,000万円・32歳（当時）、有名私立大卒　外資系製造業向けSWベンダーセールス2,300万円　⇒　外資系AIスタートアップセールス4,800万円・34歳(当時）有名私立大卒　外資系製造業向けSWベンダーセールス1,260万円　⇒　外資系製造業向けSWベンダーセールス1,560万円・39歳（当時）、有名国立大学　外資系ERPベンダープロジェクトマネージャー1,250万円　⇒　外資Cloudベンダープリセールス1,700万円・45歳（当時）、有名私立大学　日系SIerプロジェクトマネージャー1,000万円　⇒　外資系製造業向けSWベンダープロジェクトマネージャー1,225万円・56歳（当時）、有名国立大学　外資総合ファームダイレクター2,200万円　⇒　外資Cloudベンダーマネジメントポジション2,600万円個人の皆様にとって最適なコミュニケーション方法を大切にしており、LinkedIn、Emailなど柔軟に対応しております。キャリアのご相談だけでも構いませんので、ぜひお気軽にご連絡ください。 Silkwise：https://silkwise.co/

【AIエージェントxRPA - 業界No.1 SaaS vendor from US】Principal Site Reliability Engineer

どんな仕事か

必要なスキル・経験

歓迎条件

担当エージェント

人気求人ランキング