Skip to content

Karpenter Lens: Because Your Nodes Deserve a Report Card

If you're running Kubernetes on AWS with Karpenter, you already know it's great at provisioning nodes reactively. Pod pending? Karpenter spins up the right instance. But here's the thing — Karpenter never looks back. It doesn't tell you whether those nodes are efficiently packed, or if you're quietly bleeding money on half-empty instances.

That's the gap Karpenter Lens (klens) fills.

The Problem

I was staring at our cluster one day and noticed something odd: we had nodes running at 85% CPU but only 20% memory. Others were the opposite. Some small nodes had DaemonSets eating 20% of their capacity before a single workload pod even landed. The cost dashboards said "everything's fine" because they only look at aggregate numbers.

There was no tool that could answer simple questions like:

  • How well-packed is each node? Not just CPU or memory — both together.
  • Are we using the right instance types for our workload shapes?
  • What would happen if we switched to different instance families?

So I built one.

How klens Scores Nodes

The core insight is that arithmetic mean lies about efficiency. A node at 90% CPU / 10% memory averages to 50% — sounds decent, right? But that node can't accept any memory-heavy pod. It's effectively full, with 10% CPU utilization wasted.

Klens uses the geometric mean instead:

efficiency = sqrt(cpu_efficiency * memory_efficiency)

That same 90/10 node scores 30% — which accurately reflects that it's poorly packed. Nodes only score high when both dimensions are well-utilized.

Each node gets a letter grade:

GradeEfficiency
A>= 85%
B>= 70%
C>= 55%
D>= 40%
F< 40%

Run klens score and you get a color-coded terminal table grouped by NodePool, showing every node's CPU/memory utilization, DaemonSet overhead, waste, and grade.

The DaemonSet Tax

This is one of the most underappreciated cost factors in Kubernetes. DaemonSets (Datadog agents, kube-proxy, aws-node, etc.) run on every node regardless of size. On a c6a.8xlarge, that overhead might be 3% of capacity. On a c6a.medium, it's 18%.

Klens separately tracks DaemonSet overhead and flags nodes where the "tax" exceeds 15%. If you're running lots of small nodes, the DaemonSet tax can silently eat a significant chunk of your cluster budget.

Five-Pass Inefficiency Analysis

klens analyze runs five automated checks:

  1. Overprovisioned nodes — flags nodes with >40% unused capacity
  2. Shape mismatch — CPU-heavy workloads on memory-optimized instances (or vice versa)
  3. DaemonSet tax — small nodes where system overhead is disproportionate
  4. Instance staleness — running m5 when m7i exists with better price-performance
  5. CPU:Memory imbalance — NodePools where CPU and memory efficiency diverge by >25 points

Each finding comes with a severity level, explanation, and actionable suggestion.

The Simulator: What-If Bin-Packing

This is my favorite feature. klens simulate takes all your current pods and repacks them onto a different set of instance types using the First Fit Decreasing (FFD) algorithm. You get a side-by-side comparison:

Current:   9 nodes, $1,834/mo, 54% avg efficiency
Simulated: 7 nodes, $1,198/mo, 78% avg efficiency
Savings:   $636/month (35%)

You can test instance type changes before touching your NodePool config. No more YOLO-ing instance family changes into production.

Smart Recommendations

klens recommend generates suggestions ranked by estimated monthly savings:

  • Right-sizing — match instance family to your actual workload shape
  • Generation upgrades — newer generations almost always have better price-performance
  • Graviton/ARM — ~20% cheaper if your images support ARM64
  • Spot instances — for fault-tolerant workloads

It ships with a built-in catalog of 48 common instance types for offline use, and can optionally pull real-time pricing from the AWS Pricing API.

Optional AI Advisor

Pass the --ai flag to any analysis command and klens will stream insights from Claude, providing natural language explanations of your cluster's efficiency patterns and prioritized recommendations. It's completely optional — all core features work without it.

Getting Started

pip install karpenter-lens
klens score                    # grade your nodes
klens analyze                  # find inefficiencies  
klens recommend                # get optimization suggestions
klens simulate -i c6a.xlarge,c6a.2xlarge  # what-if analysis

All commands support --kubeconfig and --context for multi-cluster setups, and --output json for CI/CD integration.

What's Next

  • Prometheus integration for historical efficiency trends
  • --watch mode for a live terminal dashboard
  • Datadog/StatsD metrics emission
  • Multi-cluster comparison

Check it out on GitHub — contributions and feedback are welcome.