April 23, 2026 · 7 min read

Deploy Feast Feature Store on Kubernetes: Production MLOps Guide (2026)

Run Feast feature store in production on Kubernetes: online (Redis/DynamoDB) + offline (BigQuery/Snowflake/Postgres) stores, materialization jobs, feature-server autoscaling, and integration with BentoML, MLflow, and Ragas evaluation.

Deploy Feast Feature Store on Kubernetes: Production MLOps Guide (2026)

Feature stores are the unsexy backbone of serious ML programs. You don’t feel the pain until you have three models and each team computes the same ‘user average order value’ feature three different ways - and then the model explanations stop matching reality. Feast on Kubernetes is the operationally lean way to fix that, and this guide covers the production setup we deploy for clients.

When a feature store is worth it

SignalValue of Feast
1 model, 1 team, 1 data sourceLow - skip it
Multiple models sharing featuresHigh - eliminates duplicate compute + drift
Training-serving skew observed in productionCritical - this is Feast’s primary job
Real-time features with sub-10ms online latencyHigh - online store pattern shines
Batch-only features, no real-time inferenceModerate - you could do without, but registry still helps
LLM RAG wanting user-grounded contextHigh - underrated use case

We typically recommend introducing Feast at the 3-model or 2-team threshold. Earlier than that, the operational cost exceeds the value.

Architecture

                  ┌─────────────────────┐
                  │  Feast Registry     │
                  │  (Postgres)         │  Feature definitions, schemas
                  └──────────┬──────────┘
                             │
     ┌───────────────────────┼────────────────────────┐
     │                       │                        │
     ▼                       ▼                        ▼
┌──────────┐         ┌──────────────┐         ┌──────────────┐
│ Training │         │ Materializer │         │ Inference    │
│ pipeline │         │  Job         │         │ (feat-server)│
│  (SDK)   │         │ (CronJob /   │         │ Deployment   │
│          │         │  Airflow)    │         │              │
└────┬─────┘         └──────┬───────┘         └──────┬───────┘
     │                      │                         │
     │ historical        read                      online
     │ queries         historical                  reads
     ▼                      │                         ▼
┌────────────────────┐      │                  ┌──────────────┐
│  Offline store     │◄─────┘                  │ Online store │
│  (BigQuery /       │      │ write            │  (Redis /    │
│   Snowflake /      │◄─────┘                  │   DynamoDB)  │
│   Postgres)        │                         │              │
└────────────────────┘                         └──────────────┘
  • Registry is the schema / feature definition store.
  • Offline store holds historical data; source of truth.
  • Online store holds the latest values, keyed by entity, for low-latency reads.
  • Feature server is a stateless HTTP/gRPC service that reads online features.
  • Materialization keeps online in sync with offline on a schedule.

Prerequisites

  • Kubernetes 1.28+
  • cert-manager + ingress-nginx + prometheus-operator
  • Postgres (CNPG works well - see pgvector guide)
  • Redis for online store (or DynamoDB / Bigtable)
  • BigQuery / Snowflake / Postgres / a data lake for offline store
  • S3-compatible bucket for materialization staging (if using BigQuery or Snowflake offline)

Feature repo structure

Feast is a Python-centric tool. Keep the feature repo in its own Git repository. Minimal layout:

feature_repo/
├── feature_store.yaml
├── entities.py
├── data_sources.py
├── features/
│   ├── user.py
│   ├── transaction.py
│   └── session.py
└── tests/
    └── test_feature_views.py

The repo config:

# feature_store.yaml
project: prod
provider: local
registry:
  registry_type: sql
  path: postgresql://feast:${FEAST_PG_PASSWORD}@feast-pg-rw.data.svc.cluster.local/feast
online_store:
  type: redis
  connection_string: redis://feast-redis-master.data.svc.cluster.local:6379,password=${FEAST_REDIS_PASSWORD}
offline_store:
  type: postgres                # or bigquery, snowflake
  host: analytics-pg-ro.data.svc.cluster.local
  port: 5432
  database: analytics
  user: feast_ro
  password: ${FEAST_OFFLINE_PG_PASSWORD}
entity_key_serialization_version: 2
auth:
  type: kubernetes

Feature definitions:

# features/user.py
from datetime import timedelta
from feast import Entity, Field, FeatureView
from feast.types import Float64, Int64, String
from data_sources import user_stats_source

user = Entity(name="user_id", join_keys=["user_id"])

user_stats_fv = FeatureView(
    name="user_stats",
    entities=[user],
    ttl=timedelta(days=30),
    schema=[
        Field(name="lifetime_order_count", dtype=Int64),
        Field(name="lifetime_revenue", dtype=Float64),
        Field(name="days_since_signup", dtype=Int64),
        Field(name="preferred_category", dtype=String),
    ],
    source=user_stats_source,
    online=True,
    tags={"owner": "growth-team", "criticality": "high"},
)

Apply the definitions:

feast apply

This writes them to the Postgres registry.

Deploy the feature server on Kubernetes

Feast doesn’t ship an official Helm chart, but the pattern is simple:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: feast-feature-server
  namespace: feast
spec:
  replicas: 3
  selector:
    matchLabels: {app: feast-feature-server}
  template:
    metadata:
      labels: {app: feast-feature-server}
    spec:
      serviceAccountName: feast
      containers:
        - name: feature-server
          image: feastdev/feature-server:0.42.0
          command: ["feast"]
          args:
            - "-c"
            - "/feature_repo"
            - "serve"
            - "--host"
            - "0.0.0.0"
            - "--port"
            - "6566"
          envFrom:
            - secretRef:
                name: feast-secrets      # registry + online creds
          ports:
            - containerPort: 6566
              name: http
          resources:
            requests: {cpu: "500m", memory: "1Gi"}
            limits: {memory: "2Gi"}
          readinessProbe:
            httpGet: {path: /health, port: 6566}
            periodSeconds: 10
          volumeMounts:
            - name: feature-repo
              mountPath: /feature_repo
      volumes:
        - name: feature-repo
          configMap:
            name: feature-repo
---
apiVersion: v1
kind: Service
metadata:
  name: feast-feature-server
  namespace: feast
spec:
  selector: {app: feast-feature-server}
  ports:
    - port: 6566
      targetPort: 6566
      protocol: TCP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: feast-feature-server
  namespace: feast
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: feast-feature-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

The feature-repo ConfigMap is synced from your Git repo via a CI job that runs feast apply && kubectl create configmap feature-repo --from-file=....

Client usage

From a BentoML runner, orchestration service, or training pipeline:

from feast import FeatureStore

store = FeatureStore(repo_path="/feature_repo")

# Online lookup (in serving path)
features = store.get_online_features(
    features=[
        "user_stats:lifetime_order_count",
        "user_stats:lifetime_revenue",
        "user_stats:days_since_signup",
    ],
    entity_rows=[{"user_id": "u-123"}, {"user_id": "u-456"}],
).to_dict()

# Historical lookup (in training pipeline)
training_df = store.get_historical_features(
    entity_df=entity_df,     # has user_id + event_timestamp columns
    features=[
        "user_stats:lifetime_order_count",
        "user_stats:lifetime_revenue",
    ],
).to_df()

The key property Feast guarantees: the features returned by get_online_features at time T match the features that get_historical_features would return for a timestamp T. This is the anti-skew guarantee.

Materialization job

The online store needs periodic refresh. Run as a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: feast-materialize
  namespace: feast
spec:
  schedule: "0 * * * *"                 # hourly incremental
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 7
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: materialize
              image: feastdev/feast:0.42.0
              command: ["/bin/bash", "-c"]
              args:
                - |
                  set -euo pipefail
                  cd /feature_repo
                  # Incremental: Feast tracks last run per feature view
                  feast materialize-incremental $(date -u +%Y-%m-%dT%H:%M:%S)
              envFrom:
                - secretRef:
                    name: feast-secrets
              volumeMounts:
                - name: feature-repo
                  mountPath: /feature_repo
              resources:
                requests: {cpu: "1", memory: "2Gi"}
                limits: {memory: "4Gi"}
          volumes:
            - name: feature-repo
              configMap:
                name: feature-repo

For offline stores like BigQuery or Snowflake with expensive queries, schedule less frequently and use incremental materialization only. For Postgres-backed offline stores with fast scans, hourly full-freshness is reasonable.

Observability

Feast feature server exposes Prometheus metrics on /metrics:

  • feast_feature_server_request_duration_seconds_bucket - latency per feature view
  • feast_feature_server_request_total - request volume by status
  • feast_materialization_last_run_seconds - freshness alert signal
  • feast_online_read_timing_seconds - per-feature online lookup time

Key alerts:

  • Materialization hasn’t run in 2× expected interval (online store getting stale)
  • Online read p99 > 20ms (hot path degradation)
  • Materialization error rate > 1%

Using Feast for RAG grounding

The underrated pattern: RAG returns generic document chunks; Feast adds user-specific grounding.

# In the orchestration service
user_feats = store.get_online_features(
    features=[
        "user_stats:plan_tier",
        "user_stats:support_tickets_last_30d",
        "user_stats:preferred_category",
    ],
    entity_rows=[{"user_id": req.user_id}],
).to_dict()

# Retrieve from vector DB
hits = await qdrant.search(...)

# Compose grounded prompt
system_prompt = f"""You are a helpful assistant.
User context: plan={user_feats['plan_tier'][0]}, recent_tickets={user_feats['support_tickets_last_30d'][0]}, interests={user_feats['preferred_category'][0]}.
Use this to personalize your response without explicitly mentioning it."""

# Generate via LiteLLM (see /deploy-litellm-proxy-on-kubernetes/)
response = await llm.chat.completions.create(...)

This makes RAG answers feel personal without retraining the model or expanding the corpus with user-specific documents.

Sizing tiers

TierEntitiesFeature viewsFeature server replicasOnline storeEst. monthly cost (AED, EKS me-central-1)
Small<1M5-103 × smallRedis single, 4 GB~4,000
Medium1-10M10-506 × mediumRedis primary+replica, 16 GB~12,000
Large10-100M50+20+Redis cluster 64 GB or DynamoDB~45,000

Offline store costs (BigQuery, Snowflake) are separate and dominate above medium scale.

Common failure modes

  • Stale online store across feature versions - new feature columns added but materialization job uses cached schema. Run feast apply && feast materialize-incremental in a single CI step.
  • Feature drift between training and online - usually caused by feature logic implemented in two places. The fix is moving all feature engineering into Feast SQL/Python definitions, not duplicating in training code.
  • Materialization OOM on large offline reads - switch to Dask-based offline store or batch materialization in smaller time windows.
  • Feature server slow on cold start - Feast SDK reads registry on init. Cache the registry proto in the pod image build, or use feast serve --registry-ttl-sec 600 to refresh in the background.
  • Orphaned features no one uses - Feast registry grows over time with legacy features. Tag features with criticality and run a quarterly cleanup job pruning untagged / low-usage features.

What this connects to

Feast is the feature-serving layer in a production ML stack:

  • Training pipelines consume Feast historical features to avoid train-serve skew
  • BentoML runners call Feast online features at inference time
  • Production RAG Stack uses Feast to ground LLM prompts in user context
  • Langfuse can log feature values alongside traces for debugging

Getting help

We deploy Feast for GCC enterprise ML teams looking to escape feature-computation duplication and train-serve skew. AI/ML Infrastructure on Kubernetes is the entry engagement. Typical rollout: 4-6 weeks including feature-repo design, materialization pipeline, and migration of 3-5 baseline models.

Frequently Asked Questions

Do I really need a feature store?

If you have one ML model served from a single notebook with one data source, no. If you have multiple models that share features, different features at training vs inference time (train-serve skew), or a team of ML engineers duplicating feature computations, yes. The break-even is usually 3-5 production models or one model reused across teams. Below that, feature stores add operational surface without returns; above that, they're the only thing keeping feature definitions consistent across training and inference.

How does Feast fit with Kubernetes?

Feast on Kubernetes runs as three components: (1) the Feast SDK that lives in client applications and training pipelines, (2) a feature-server Deployment exposing online features via HTTP/gRPC at inference time, and (3) materialization jobs (usually Kubernetes Jobs or Airflow DAGs) that copy offline features to the online store on a schedule. The registry metadata lives in Postgres; online features in Redis, DynamoDB, or Bigtable; offline features in BigQuery, Snowflake, or a data lake.

Redis or DynamoDB for Feast's online store?

Redis is faster (p99 latency under 2ms, vs DynamoDB's 5-10ms) and cheaper at small scale. DynamoDB wins at extreme scale (100M+ entities) because it's fully managed and scales horizontally without sharding work. For GCC deployments, Redis on Kubernetes or Azure Cache for Redis in UAE North is the common choice; DynamoDB only available in Middle East Bahrain. If you're under 10M entities, use Redis.

How do I materialize features reliably?

Run materialization as Kubernetes Jobs triggered by Airflow, Argo Workflows, or a simple CronJob. Key reliability patterns: (1) idempotent materialization with explicit start/end timestamps, (2) monitor freshness via feast_materialization_last_run_seconds Prometheus metric, (3) use incremental materialization instead of full-table rebuilds, (4) run materialization on a dedicated node pool so it doesn't contend with feature-server pods.

Can Feast serve features for RAG pipelines?

Yes, and it's underrated. Store pre-computed business context (user plan tier, account age, recent purchase categories, support-ticket history) as features keyed by user_id. At RAG query time, join the retrieved document chunks with these features to ground the LLM prompt in user-specific context. This makes answers more personalized without retraining the model or expanding the vector corpus.

Is Feast a good fit for GCC data-sovereign ML?

Yes. Feast is open-source and deployment-flexible: pick in-region stores for every layer. Typical UAE deployment: offline store on Azure Synapse or BigQuery Omni UAE, online store on Azure Cache for Redis UAE North or in-cluster Redis, feature server on the same AKS/EKS cluster, registry Postgres also in-region. No external SaaS needed.

Get Started for Free

We would be happy to speak with you and arrange a free consultation with our Kubernetes Expert in Dubai, UAE. 30-minute call, actionable results in days.

Talk to an Expert