Architecture

Veneer’s data flow, reconciliation loops, and overlay lifecycle.

Overview

Veneer is a Kubernetes controller that bridges the gap between AWS cost data and Karpenter provisioning decisions. It continuously monitors Savings Plans and Reserved Instance utilization via Lumina metrics in Prometheus, then manages NodeOverlay custom resources to steer Karpenter toward cost-optimal instance types.

Data Flow

flowchart LR
    Lumina["Lumina"]
    Prom["Prometheus"]
    Veneer["Veneer"]
    NO["NodeOverlays"]
    Karpenter["Karpenter"]
    Fleet["AWS CreateFleet"]

    Lumina -->|"expose SP/RI metrics"| Prom
    Prom -->|"query cost data"| Veneer
    Veneer -->|"create/update/delete"| NO
    NO -->|"adjust pricing"| Karpenter
    Karpenter -->|"Priority values"| Fleet

    style Lumina fill:#e3f2fd,stroke:#1565c0,color:#1565c0
    style Prom fill:#fbe9e7,stroke:#bf360c,color:#bf360c
    style Veneer fill:#e0f2f1,stroke:#00695c,color:#00695c
    style NO fill:#f1f8e9,stroke:#33691e,color:#33691e
    style Karpenter fill:#ede7f6,stroke:#4527a0,color:#4527a0
    style Fleet fill:#fff3e0,stroke:#e65100,color:#e65100

Lumina discovers AWS Savings Plans, Reserved Instances, and running EC2 instances. It computes utilization and remaining capacity, then exposes these as Prometheus metrics.
Veneer queries Prometheus on a 5-minute interval (matching Lumina’s refresh cycle). The decision engine analyzes capacity data and determines which NodeOverlays should exist.
Karpenter reads NodeOverlay resources and applies price adjustments to its instance type offerings. Adjusted prices become Priority values in the AWS CreateFleet API call.
AWS selects instances based on the allocation strategy and Priority values. See Instance Selection Deep Dive for details.

Two Reconcilers

Veneer runs two independent reconciliation loops:

Metrics Reconciler

The metrics reconciler runs on a timed interval (every 5 minutes) and is responsible for cost-aware overlay management:

flowchart TD
    Start["Timer fires (every 5 min)"]
    Fresh{"Data fresh?"}
    Query["Query Prometheus for\nSP utilization, SP capacity,\nRI counts"]
    Decide{"For each SP/RI:\nutilization < threshold\nand capacity available?"}
    Create["Create or update\nNodeOverlay"]
    Delete["Delete NodeOverlay\n(if exists)"]
    Skip["Skip reconciliation\n(preserve last state)"]
    Done["Reconciliation complete"]

    Start --> Fresh
    Fresh -->|"Yes"| Query
    Fresh -->|"No — stale data"| Skip
    Query --> Decide
    Decide -->|"Yes"| Create
    Decide -->|"No"| Delete
    Create --> Done
    Delete --> Done
    Skip --> Done

    style Start fill:#e3f2fd,stroke:#1565c0,color:#1565c0
    style Fresh fill:#fff3e0,stroke:#e65100,color:#e65100
    style Query fill:#e0f2f1,stroke:#00695c,color:#00695c
    style Decide fill:#fff3e0,stroke:#e65100,color:#e65100
    style Create fill:#f1f8e9,stroke:#33691e,color:#33691e
    style Delete fill:#fbe9e7,stroke:#bf360c,color:#bf360c
    style Skip fill:#f5f5f5,stroke:#616161,color:#616161
    style Done fill:#ede7f6,stroke:#4527a0,color:#4527a0

Query Prometheus for Lumina metrics:
- Savings Plan utilization percentages
- Savings Plan remaining capacity ($/hour)
- Reserved Instance counts by type and region
Check data freshness – Skip reconciliation if Lumina data is stale
Run the decision engine – For each SP and RI, determine whether a NodeOverlay should exist:
- Create overlay when utilization is below the threshold (default 95%) and remaining capacity exists
- Delete overlay when utilization exceeds the threshold or no capacity remains
Apply changes – Create, update, or delete NodeOverlays in the cluster

NodePool Reconciler

The NodePool reconciler watches for changes to Karpenter NodePool resources and manages preference-based overlays:

flowchart TD
    Watch["Watch NodePool changes"]
    Parse["Parse veneer.io/preference.N\nannotations"]
    Gen["Generate NodeOverlay\nfor each preference"]
    Clean["Clean up overlays for\nremoved preferences"]
    GC["Garbage collect on\nNodePool deletion"]

    Watch --> Parse
    Parse --> Gen
    Parse --> Clean
    Watch -->|"NodePool deleted"| GC

    style Watch fill:#e3f2fd,stroke:#1565c0,color:#1565c0
    style Parse fill:#e0f2f1,stroke:#00695c,color:#00695c
    style Gen fill:#f1f8e9,stroke:#33691e,color:#33691e
    style Clean fill:#fbe9e7,stroke:#bf360c,color:#bf360c
    style GC fill:#f5f5f5,stroke:#616161,color:#616161

Watch NodePools for veneer.io/preference.N annotations
Parse preference annotations into matcher expressions and price adjustments
Generate NodeOverlays for each preference
Clean up overlays when preferences are removed or NodePools are deleted

See Instance Preferences for annotation syntax and examples.

Overlay Lifecycle

Cost-Aware Overlays (from Lumina data)

stateDiagram-v2
    [*] --> Active : SP below threshold\nor RI count > 0
    Active --> Active : Capacity still available\n(no change)
    Active --> Removed : Utilization exceeds threshold\nor capacity exhausted
    Removed --> Active : Capacity becomes available
    Removed --> [*]
    Active --> Preserved : Lumina data stale
    Preserved --> Active : Fresh data arrives
    Preserved --> Removed : Fresh data shows\nno capacity

Cost-aware overlays follow this lifecycle:

Event	Action	Overlay State
SP utilization below threshold, capacity available	Create overlay	Active – influences Karpenter pricing
SP utilization rises above threshold	Delete overlay	Removed – Karpenter uses default pricing
RI count > 0 for instance type in region	Create overlay	Active
RI count drops to 0	Delete overlay	Removed
Lumina data becomes stale	Skip reconciliation	No change – last known state preserved

Preference Overlays (from NodePool annotations)

Preference overlays follow the NodePool lifecycle:

Event	Action
Preference annotation added to NodePool	Create overlay
Preference annotation value changed	Update overlay
Preference annotation removed	Delete overlay
NodePool deleted	Overlay garbage collected via owner reference

Weight Hierarchy

When multiple overlays target the same instance types, the overlay with the highest weight wins:

Overlay Type	Default Weight	Scope
Reserved Instance	30	Instance-type specific (e.g., `m5.xlarge` in `us-west-2`)
EC2 Instance Savings Plan	20	Family-specific (e.g., `m5` family in `us-west-2`)
Compute Savings Plan	10	Global (all families, all regions)
Preference	1-9 (from annotation)	User-defined scope

Keep preference overlay weights below 10 to ensure cost-aware overlays (backed by real AWS capacity data) take precedence.

Disabled Mode

Veneer supports a “disabled” mode (overlays.disabled: true) that creates NodeOverlays with an impossible requirement (veneer.io/disabled: true). This allows testing overlay creation logic without affecting Karpenter’s provisioning decisions. The veneer_config_overlays_disabled metric reports this state.