06 Fév

AI DevOps / Infrastructure / Optimization

Geneve

Description

Infomaniak is 30 years of expertise and over 290 passionate individuals, with a common ambition: to create an ethical cloud without compromising on ecology, privacy, and we create centers that are at the forefront of ecological advancements. We develop IaaS, PaaS, and SaaS services that are fully hosted and developed in Switzerland for B2B and B2C. Our solutions include an online suite and cloud hosting, streaming, marketing, and event solutions trusted by millions and public and private entities across Europe — such as RTBF, the United Nations, central banks, over 200 radio and TV stations, and many metropolitan areas and security agencies — Infomaniak is an independent company, committed to technological independence in Europe, the local economy, and a more sustainable digital future for the planet. Are you ready to join a growing company, to give your best, and to grow with us in order to contribute to the development of an ethical alternative to the giants of the Web? Then we look forward to meeting you!We are looking for: AI DevOps / Infrastructure: Infomaniak develops an open-source AI hosted on its own Swiss infrastructure. We deploy large-scale language models and build intelligent agents for our products (kMeet, kDrive). We are looking for an AI Engineer to design, implement, and optimize our AI agents, focusing on quality, reliability, and user experience.Responsibilities:Deployment & Performance: Deploy, maintain, and optimize LLMs while maximizing GPU resource efficiency. Improve and industrialize our GitLab CI pipelines for AI models (build, test, deployment, rollback). Drive deployments via Flux CD (GitOps).Monitoring & Observability: Strengthen our Prometheus / Grafana / Victoria Metrics stack for fine visibility on performance, GPU utilization, availability, and overall health of the services.Resource Management: Work on cost and performance efficiency (autoscaling, scheduling, quota management, etc.).Reliability: Ensure robustness, security, and reproducibility of deployments in a critical environment.The profile that excites us:- Mastery of modern serving frameworks (e.g., vLLM, TGI).- Proficiency in GitLab CI (pipelines, runners, variables, etc.) with Kubernetes.- Proven experience in Kubernetes, Helm, CRDs, networking, and autoscaling.- Experience with Flux CD (GitOps, Helm Releases, Kustomize, deployments).- Experience with Prometheus / Grafana (dashboards, alerting, exporters).- Knowledge of GPU infrastructures (NVIDIA, CUDA, GPU scheduling, monitoring).- A strong inclination for quality, reliability, and performance.- Ability to work in a critical environment (high SLA, high availability).- Good collaboration skills with ML teams.If you have knowledge in:- Technical curiosity, a taste for innovative challenges, and contributions to open source or side projects are appreciated.- You enjoy working in a team and demonstrate a positive attitude.Your humor, flexibility, and team spirit are essential to work in a fun environment.The technical stack we use:- LangChain- Pydantic- aivLLM- FastAPI- GitLab- Sentry- QdrantPosition: PermanentRate: 80 - 100%Location: GenevaAvailability: As soon as possibleRecruitment process stages:- A first technical interview to validate your skills.- A second interview in our offices. jidd356857aen jit0206aen jpiy26aen

Postuler pour ce poste