DevOps / Platform Engineer
📍 Batumi, Georgia · open to remote / relocate
About me:
Solely running 10 Kubernetes clusters in production (~45 nodes, ~10K RPS, 99.85% uptime), bare-metal + IaaS. Designed and shipped an internal LLM system for cluster health analysis on CPU-only hardware - adopted by DevOps and Analytics teams.
What I do:
- Production K8s: bare-metal, k3s, Selectel IaaS - upgrades, incidents, on-call
- GitOps: migrated 20+ microservices to ArgoCD, 0 manual prod deploys
- CI/CD from scratch (GitLab CI + ArgoCD): deploys from weekly -> daily
- Observability: Prometheus, Grafana, Loki, Alertmanager + automated incident alerting
- Migrations: 25 microservices from VMware -> k3s
- LLM pipeline: llama.cpp + Ray, correlating K8s events / metrics / alerts -> root-cause hypotheses
Stack: Python, Bash, C · k8s, k3s, Helm, ArgoCD · AWS, Terraform · Prometheus/Grafana/Loki · Docker, Ansible, VMware · MetalLB, WG · PostgreSQL, ClickHouse
🌐 English C1 · Russian native
📩 Contact: @ptruha ·
LinkedIn