Deploy Feast Feature Store on Kubernetes: Production MLOps Guide (2026)
Run Feast feature store in production on Kubernetes: online (Redis/DynamoDB) + offline (BigQuery/Snowflake/Postgres) stores, materialization jobs, feature-server autoscaling, and integration with BentoML, MLflow, and Ragas evaluation.
Feature stores are the unsexy backbone of serious ML programs. You don’t feel the pain until you have three models and each team computes the same ‘user average order value’ feature three different ways - and then the model explanations stop matching reality. Feast on Kubernetes is the operationally lean way to fix that, and this guide covers the production setup we deploy for clients.
When a feature store is worth it
| Signal | Value of Feast |
|---|---|
| 1 model, 1 team, 1 data source | Low - skip it |
| Multiple models sharing features | High - eliminates duplicate compute + drift |
| Training-serving skew observed in production | Critical - this is Feast’s primary job |
| Real-time features with sub-10ms online latency | High - online store pattern shines |
| Batch-only features, no real-time inference | Moderate - you could do without, but registry still helps |
| LLM RAG wanting user-grounded context | High - underrated use case |
We typically recommend introducing Feast at the 3-model or 2-team threshold. Earlier than that, the operational cost exceeds the value.
Architecture
┌─────────────────────┐
│ Feast Registry │
│ (Postgres) │ Feature definitions, schemas
└──────────┬──────────┘
│
┌───────────────────────┼────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ Training │ │ Materializer │ │ Inference │
│ pipeline │ │ Job │ │ (feat-server)│
│ (SDK) │ │ (CronJob / │ │ Deployment │
│ │ │ Airflow) │ │ │
└────┬─────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ historical read online
│ queries historical reads
▼ │ ▼
┌────────────────────┐ │ ┌──────────────┐
│ Offline store │◄─────┘ │ Online store │
│ (BigQuery / │ │ write │ (Redis / │
│ Snowflake / │◄─────┘ │ DynamoDB) │
│ Postgres) │ │ │
└────────────────────┘ └──────────────┘
- Registry is the schema / feature definition store.
- Offline store holds historical data; source of truth.
- Online store holds the latest values, keyed by entity, for low-latency reads.
- Feature server is a stateless HTTP/gRPC service that reads online features.
- Materialization keeps online in sync with offline on a schedule.
Prerequisites
- Kubernetes 1.28+
- cert-manager + ingress-nginx + prometheus-operator
- Postgres (CNPG works well - see pgvector guide)
- Redis for online store (or DynamoDB / Bigtable)
- BigQuery / Snowflake / Postgres / a data lake for offline store
- S3-compatible bucket for materialization staging (if using BigQuery or Snowflake offline)
Feature repo structure
Feast is a Python-centric tool. Keep the feature repo in its own Git repository. Minimal layout:
feature_repo/
├── feature_store.yaml
├── entities.py
├── data_sources.py
├── features/
│ ├── user.py
│ ├── transaction.py
│ └── session.py
└── tests/
└── test_feature_views.py
The repo config:
# feature_store.yaml
project: prod
provider: local
registry:
registry_type: sql
path: postgresql://feast:${FEAST_PG_PASSWORD}@feast-pg-rw.data.svc.cluster.local/feast
online_store:
type: redis
connection_string: redis://feast-redis-master.data.svc.cluster.local:6379,password=${FEAST_REDIS_PASSWORD}
offline_store:
type: postgres # or bigquery, snowflake
host: analytics-pg-ro.data.svc.cluster.local
port: 5432
database: analytics
user: feast_ro
password: ${FEAST_OFFLINE_PG_PASSWORD}
entity_key_serialization_version: 2
auth:
type: kubernetes
Feature definitions:
# features/user.py
from datetime import timedelta
from feast import Entity, Field, FeatureView
from feast.types import Float64, Int64, String
from data_sources import user_stats_source
user = Entity(name="user_id", join_keys=["user_id"])
user_stats_fv = FeatureView(
name="user_stats",
entities=[user],
ttl=timedelta(days=30),
schema=[
Field(name="lifetime_order_count", dtype=Int64),
Field(name="lifetime_revenue", dtype=Float64),
Field(name="days_since_signup", dtype=Int64),
Field(name="preferred_category", dtype=String),
],
source=user_stats_source,
online=True,
tags={"owner": "growth-team", "criticality": "high"},
)
Apply the definitions:
feast apply
This writes them to the Postgres registry.
Deploy the feature server on Kubernetes
Feast doesn’t ship an official Helm chart, but the pattern is simple:
apiVersion: apps/v1
kind: Deployment
metadata:
name: feast-feature-server
namespace: feast
spec:
replicas: 3
selector:
matchLabels: {app: feast-feature-server}
template:
metadata:
labels: {app: feast-feature-server}
spec:
serviceAccountName: feast
containers:
- name: feature-server
image: feastdev/feature-server:0.42.0
command: ["feast"]
args:
- "-c"
- "/feature_repo"
- "serve"
- "--host"
- "0.0.0.0"
- "--port"
- "6566"
envFrom:
- secretRef:
name: feast-secrets # registry + online creds
ports:
- containerPort: 6566
name: http
resources:
requests: {cpu: "500m", memory: "1Gi"}
limits: {memory: "2Gi"}
readinessProbe:
httpGet: {path: /health, port: 6566}
periodSeconds: 10
volumeMounts:
- name: feature-repo
mountPath: /feature_repo
volumes:
- name: feature-repo
configMap:
name: feature-repo
---
apiVersion: v1
kind: Service
metadata:
name: feast-feature-server
namespace: feast
spec:
selector: {app: feast-feature-server}
ports:
- port: 6566
targetPort: 6566
protocol: TCP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: feast-feature-server
namespace: feast
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: feast-feature-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
The feature-repo ConfigMap is synced from your Git repo via a CI job that runs feast apply && kubectl create configmap feature-repo --from-file=....
Client usage
From a BentoML runner, orchestration service, or training pipeline:
from feast import FeatureStore
store = FeatureStore(repo_path="/feature_repo")
# Online lookup (in serving path)
features = store.get_online_features(
features=[
"user_stats:lifetime_order_count",
"user_stats:lifetime_revenue",
"user_stats:days_since_signup",
],
entity_rows=[{"user_id": "u-123"}, {"user_id": "u-456"}],
).to_dict()
# Historical lookup (in training pipeline)
training_df = store.get_historical_features(
entity_df=entity_df, # has user_id + event_timestamp columns
features=[
"user_stats:lifetime_order_count",
"user_stats:lifetime_revenue",
],
).to_df()
The key property Feast guarantees: the features returned by get_online_features at time T match the features that get_historical_features would return for a timestamp T. This is the anti-skew guarantee.
Materialization job
The online store needs periodic refresh. Run as a CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: feast-materialize
namespace: feast
spec:
schedule: "0 * * * *" # hourly incremental
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 7
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: materialize
image: feastdev/feast:0.42.0
command: ["/bin/bash", "-c"]
args:
- |
set -euo pipefail
cd /feature_repo
# Incremental: Feast tracks last run per feature view
feast materialize-incremental $(date -u +%Y-%m-%dT%H:%M:%S)
envFrom:
- secretRef:
name: feast-secrets
volumeMounts:
- name: feature-repo
mountPath: /feature_repo
resources:
requests: {cpu: "1", memory: "2Gi"}
limits: {memory: "4Gi"}
volumes:
- name: feature-repo
configMap:
name: feature-repo
For offline stores like BigQuery or Snowflake with expensive queries, schedule less frequently and use incremental materialization only. For Postgres-backed offline stores with fast scans, hourly full-freshness is reasonable.
Observability
Feast feature server exposes Prometheus metrics on /metrics:
feast_feature_server_request_duration_seconds_bucket- latency per feature viewfeast_feature_server_request_total- request volume by statusfeast_materialization_last_run_seconds- freshness alert signalfeast_online_read_timing_seconds- per-feature online lookup time
Key alerts:
- Materialization hasn’t run in 2× expected interval (online store getting stale)
- Online read p99 > 20ms (hot path degradation)
- Materialization error rate > 1%
Using Feast for RAG grounding
The underrated pattern: RAG returns generic document chunks; Feast adds user-specific grounding.
# In the orchestration service
user_feats = store.get_online_features(
features=[
"user_stats:plan_tier",
"user_stats:support_tickets_last_30d",
"user_stats:preferred_category",
],
entity_rows=[{"user_id": req.user_id}],
).to_dict()
# Retrieve from vector DB
hits = await qdrant.search(...)
# Compose grounded prompt
system_prompt = f"""You are a helpful assistant.
User context: plan={user_feats['plan_tier'][0]}, recent_tickets={user_feats['support_tickets_last_30d'][0]}, interests={user_feats['preferred_category'][0]}.
Use this to personalize your response without explicitly mentioning it."""
# Generate via LiteLLM (see /deploy-litellm-proxy-on-kubernetes/)
response = await llm.chat.completions.create(...)
This makes RAG answers feel personal without retraining the model or expanding the corpus with user-specific documents.
Sizing tiers
| Tier | Entities | Feature views | Feature server replicas | Online store | Est. monthly cost (AED, EKS me-central-1) |
|---|---|---|---|---|---|
| Small | <1M | 5-10 | 3 × small | Redis single, 4 GB | ~4,000 |
| Medium | 1-10M | 10-50 | 6 × medium | Redis primary+replica, 16 GB | ~12,000 |
| Large | 10-100M | 50+ | 20+ | Redis cluster 64 GB or DynamoDB | ~45,000 |
Offline store costs (BigQuery, Snowflake) are separate and dominate above medium scale.
Common failure modes
- Stale online store across feature versions - new feature columns added but materialization job uses cached schema. Run
feast apply && feast materialize-incrementalin a single CI step. - Feature drift between training and online - usually caused by feature logic implemented in two places. The fix is moving all feature engineering into Feast SQL/Python definitions, not duplicating in training code.
- Materialization OOM on large offline reads - switch to Dask-based offline store or batch materialization in smaller time windows.
- Feature server slow on cold start - Feast SDK reads registry on init. Cache the registry proto in the pod image build, or use
feast serve --registry-ttl-sec 600to refresh in the background. - Orphaned features no one uses - Feast registry grows over time with legacy features. Tag features with
criticalityand run a quarterly cleanup job pruning untagged / low-usage features.
What this connects to
Feast is the feature-serving layer in a production ML stack:
- Training pipelines consume Feast historical features to avoid train-serve skew
- BentoML runners call Feast online features at inference time
- Production RAG Stack uses Feast to ground LLM prompts in user context
- Langfuse can log feature values alongside traces for debugging
Getting help
We deploy Feast for GCC enterprise ML teams looking to escape feature-computation duplication and train-serve skew. AI/ML Infrastructure on Kubernetes is the entry engagement. Typical rollout: 4-6 weeks including feature-repo design, materialization pipeline, and migration of 3-5 baseline models.
Frequently Asked Questions
Do I really need a feature store?
If you have one ML model served from a single notebook with one data source, no. If you have multiple models that share features, different features at training vs inference time (train-serve skew), or a team of ML engineers duplicating feature computations, yes. The break-even is usually 3-5 production models or one model reused across teams. Below that, feature stores add operational surface without returns; above that, they're the only thing keeping feature definitions consistent across training and inference.
How does Feast fit with Kubernetes?
Feast on Kubernetes runs as three components: (1) the Feast SDK that lives in client applications and training pipelines, (2) a feature-server Deployment exposing online features via HTTP/gRPC at inference time, and (3) materialization jobs (usually Kubernetes Jobs or Airflow DAGs) that copy offline features to the online store on a schedule. The registry metadata lives in Postgres; online features in Redis, DynamoDB, or Bigtable; offline features in BigQuery, Snowflake, or a data lake.
Redis or DynamoDB for Feast's online store?
Redis is faster (p99 latency under 2ms, vs DynamoDB's 5-10ms) and cheaper at small scale. DynamoDB wins at extreme scale (100M+ entities) because it's fully managed and scales horizontally without sharding work. For GCC deployments, Redis on Kubernetes or Azure Cache for Redis in UAE North is the common choice; DynamoDB only available in Middle East Bahrain. If you're under 10M entities, use Redis.
How do I materialize features reliably?
Run materialization as Kubernetes Jobs triggered by Airflow, Argo Workflows, or a simple CronJob. Key reliability patterns: (1) idempotent materialization with explicit start/end timestamps, (2) monitor freshness via feast_materialization_last_run_seconds Prometheus metric, (3) use incremental materialization instead of full-table rebuilds, (4) run materialization on a dedicated node pool so it doesn't contend with feature-server pods.
Can Feast serve features for RAG pipelines?
Yes, and it's underrated. Store pre-computed business context (user plan tier, account age, recent purchase categories, support-ticket history) as features keyed by user_id. At RAG query time, join the retrieved document chunks with these features to ground the LLM prompt in user-specific context. This makes answers more personalized without retraining the model or expanding the vector corpus.
Is Feast a good fit for GCC data-sovereign ML?
Yes. Feast is open-source and deployment-flexible: pick in-region stores for every layer. Typical UAE deployment: offline store on Azure Synapse or BigQuery Omni UAE, online store on Azure Cache for Redis UAE North or in-cluster Redis, feature server on the same AKS/EKS cluster, registry Postgres also in-region. No external SaaS needed.
Get Started for Free
We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.
Talk to an Expert