Ai-Infrastructure

Run Ragas evaluations as a production Kubernetes workload: offline eval suites, online LLM-as-judge sampling from …

Build production GraphRAG on Kubernetes: Neo4j cluster with causal clustering, graph construction pipelines, …

Run Feast feature store in production on Kubernetes: online (Redis/DynamoDB) + offline (BigQuery/Snowflake/Postgres) …

Serve classical ML models in production on Kubernetes with BentoML and Yatai: containerized bentos, auto-scaling …

Run pgvector on Kubernetes in production: CloudNativePG cluster setup, HNSW vs IVFFlat indexing, query tuning, …

Run Milvus 2.4+ in production on Kubernetes: distributed architecture with etcd, Pulsar, and MinIO/S3, Milvus Operator …

Honest comparison of vLLM, Hugging Face TGI, and NVIDIA Triton with TensorRT-LLM for self-hosted LLM serving on …

Self-host Dify on Kubernetes in production: API, worker, web, and sandbox components, Postgres and Weaviate …

End-to-end production RAG architecture on Kubernetes: ingestion pipeline, embedding and vector search with Qdrant, LLM …

Run LiteLLM as a production LLM gateway on Kubernetes: virtual keys, per-team budgets, provider fallbacks, Redis …

Run Qdrant vector database in production on Kubernetes: HA cluster topology, sharding and replication, memory sizing for …

Self-host Langfuse v3 on Kubernetes in production: reference architecture, Helm values, Postgres + ClickHouse + Redis HA …