 Command

Pranesh Nikhar's personal site. Vim-style keybinds for navigation; theme + font pickers below.

Theme
 Font Body Code
Reader
Keybinds
Navigation
j / ↓ Next item k / ↑ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block d / u Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zr> Toggle reader view <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar ⇧C / ⇧E Collapse / expand all sections
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
n Next document b Previous document ⌃O History back ⌃I History forward
 Search
about: Pranesh Nikhar about/more: 🪪 More docs/test: Docs Test ideas: 💡 Ideas more: ➕ More now: Now posts: 📬 Posts projects: 📚 Projects webtui: Style posts/agentic-eda: 📊 AgenticEDA — Automated Exploratory Data Analysis with LangGraph posts/cap-theorem-outage-story: 🌐 CAP Theorem with a Real Outage Story posts/codepilot: ✈️ CodePilot — From Requirements to Deployable FastAPI Backend posts/common-auth-mistakes: 🔐 Common Auth Mistakes Developers Make posts/compiled-vs-jit-vs-interpreted: ⚡ Why Is X Language Fast or Slow? — Compiled vs JIT vs Interpreted posts/cs-degree-gaps: 🎓 Things CS Degrees Don't Teach You posts/cve-2025-breach-analysis: 🛡️ CVE-2025 Breach Analysis — Midnight Blizzard and the 16 Billion Credential Leak posts/fixloop: 🔄 FixLoop — AI Agent Loop for Self-Correcting Code posts/functional-vs-oop: ⚡ Functional vs OOP — Same Problem, Both Ways posts/getman: 🦾 Getman — Declarative API Tester for CLI & TUI posts/how-compilers-optimize: ⚙️ How Compilers Actually Optimize Your Code posts/http3-quic: ⚡ HTTP/3 and QUIC — Why They Matter posts/leetcode-vs-engineering: 🧩 LeetCode vs Real Engineering Skills posts/llm-from-scratch: 🧠 LLM from Scratch — GPT-Style Transformer in PyTorch posts/lsm-trees-bloom-filters: 🌳 LSM Trees & Bloom Filters — Production Deep Dive posts/mcp-workflow-builder: 🔧 MCP Workflow Builder — Visual DAG for MCP Tools posts/persistent-memory: 🧠 Persistent Memory — Long-Term Memory for AI Agents via MCP posts/playcli: 🎬 PlayCLI — Terminal Video Player posts/postgres-mvcc: 🗄️ How PostgreSQL MVCC Works — Multi-Version Concurrency Control Deep Dive posts/raft-consensus: ⛵ Raft Consensus Algorithm Explained posts/rust-borrow-checker: 🦀 Rust Borrow Checker — Catches Real Bugs posts/titan: 🤖 Titan — Terminal AI Coding Agent posts/what-happens-url: 🌐 What Happens Between Typing a URL and Seeing the Page posts/what-happens-when-you-run-a-program: ⚙️ What Actually Happens When You Run a Program posts/zero-knowledge-proofs: 🔐 Zero-Knowledge Proofs Explained Simply webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/input: Input webtui/components/popover: Popover webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### ‹style› tag webtui/installation/astro: ### Full Library Import webtui/installation/nextjs: Next.js webtui/installation/vite: Vite webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells
 Theme Current: Light j/k or ↑/↓ + Enter

⚡ Why Is X Language Fast or Slow? — Compiled vs JIT vs Interpreted

The spectrum from AOT (C/Rust/Go) → JIT (Java V8/LuaJIT/PyPy) → interpreted (CPython/bash), what each tier costs, V8\u2019s 4-tier pipeline, PyPy\u2019s tracing JIT, monomorphization, and why "compiled vs interpreted" is the wrong question.

🧩 The Wrong Question

The most common question from new developers: “Is Python compiled or interpreted?” The answer used to be “interpreted” — but that stopped being useful around 2010.

The real question is:

“What execution tiers does this language’s runtime have?”

Every language runtime sits on a spectrum from pure ahead-of-time (AOT) compilation to pure AST walking. The performance you see depends on where your runtime is on this spectrum, and — crucially — how long it’s been running.


📊 The Execution Spectrum

Ahead-of-Time                    Just-in-Time                     Interpretation
(C, Rust, Go)                    (Java V8, LuaJIT, PyPy)          (CPython, Bash)

Slow compile ────────────────────────────────────────────────── Fast startup
  Fast exec                                                    Slow exec
        │                          │                                  │
        ▼                          ▼                                  ▼
┌─────────────────┐   ┌──────────────────────────┐   ┌──────────────────────┐
│ .c → compile →  │   │ .java → bytecode → JIT → │   │ .py → AST → eval     │
│ native binary   │   │ native                    │   │ loop (bytecode)      │
└─────────────────┘   └──────────────────────────┘   └──────────────────────┘
TierExamplesStartupSteady-StateDev Experience
AOTC, Rust, Go, ZigInstantFastestSlow compile, fast run
Baseline JITLuaJIT, V8 (Liftoff)FastFastInstant startup, warms up
Optimizing JITJava C2/Graal, V8 TurboFanSlowNearly AOT speedWarmup required
Tracing JITPyPy, LuaJIT traceMediumFast for loopsJit-unfriendly code hurts
Bytecode VMCPython, Ruby MRI, PHPFastSlowBest iteration speed
Tree-walkBash, early JS enginesFastestSlowestNo compile step

🏛️ AOT: C, Rust, Go

AOT compilers translate your source code directly to machine code before execution. The result is a static binary with zero runtime overhead.

C: The Baseline

int sum(int n) {
    int total = 0;
    for (int i = 0; i < n; i++) total += i;
    return total;
}

Compiled with gcc -O2, this becomes:

sum:
        test    edi, edi
        jle     .L1
        lea     eax, [rdi-1]
        lea     ecx, [rdi-2]
        imul    ecx, eax
        shl     eax, 31
        shr     ecx, 1
        add     eax, ecx
        add     eax, edi
        ret
.L1:
        xor     eax, eax
        ret

GCC recognized the loop as sum(1..n) = n(n-1)/2 + n and replaced the entire loop with arithmetic. Zero loop overhead. Zero runtime checks.

Rust: Same AOT, More Safety

Rust compiles through LLVM, same as Clang. The generated code is equivalent in performance. Where Rust differs from C is at compile time: the borrow checker runs static analysis to guarantee memory safety, but this produces zero runtime cost:

fn sum(n: i32) -> i32 {
    (0..n).fold(0, |acc, i| acc + i)
}

With --release, LLVM optimizes this to the same constant-time formula. Rust’s zero-cost abstractions mean high-level constructs like iterators compile down to the same assembly as hand-written loops.

Go: AOT + GC

Go is AOT-compiled but includes a runtime (garbage collector, goroutine scheduler, memory allocator). This is the key difference: Go compiles to native code, but that native code calls into the runtime for memory management.

func sum(n int) int {
    total := 0
    for i := 0; i < n; i++ {
        total += i
    }
    return total
}

Go’s compiler doesn’t recognize the closed form (it doesn’t do the induction-variable optimization C compilers do). The loop runs as written. Additionally, total and i may be heap-allocated if the escape analysis determines they outlive the function — adding GC pressure.

Monomorphization: Rust and C++ generate specialized code for each generic instantiation. Vec<i32> and Vec<String> produce completely different machine code — optimal for each type. Go interfaces use runtime dispatch (similar to virtual methods), which defeats inlining and adds indirect call overhead.


🔥 JIT: From Java 1.0 to V8’s 4 Tiers

JIT compilers start execution quickly and gradually replace hot code paths with increasingly optimized native code.

V8’s 4-Tier Pipeline (JavaScript)

The absolute state of the art in JIT compilation. V8 doesn’t have one compiler — it has four:

Source Code


┌──────────┐
│ Ignition  │  ← Baseline bytecode interpreter (fastest startup)
│           │     Generates bytecode from AST, starts executing immediately
└─────┬────┘
      │ hot function detected

┌──────────┐
│ Sparkplug │  ← "Baseline" compiler (~10× faster than Ignition)
│           │     Generates minimal native code, no optimizations
│           │     Sacrifices code quality for compilation speed
└─────┬────┘
      │ hotter (+ optimizing compiler available)

┌──────────┐
│ Maglev    │  ← Mid-tier optimizing compiler (casual game ready)
│           │     Simple optimizations: inlining, constant folding
│           │     Fast compilation, good speed-up (Chromium late 2024)
└─────┬────┘
      │ hottest function (executed 1000+ times)

┌──────────┐
│ TurboFan  │  ← Full optimizing compiler ("the big gun")
│           │     Sophisticated: type feedback, escape analysis,
│           │     loop invariant code motion, allocation sinking
│           │     Slow compilation, highest quality code
└──────────┘

Why four tiers? The JIT has to solve a fundamental trade-off: compile time vs execution time. If a function runs once, spending 10ms to compile it is a net loss. But if it runs 10 million times, spending 100ms to optimize it pays back instantly.

TierCompile CostSpeed vs InterpreterWhen Triggered
Ignition~0ms (no compile)1× (baseline)Always
Sparkplug~0.1ms10×After ~1 call
Maglev~0.5ms50×After ~50 calls
TurboFan~5-20ms100-200×After ~1000 calls

Type Feedback: The JIT’s Superpower

V8 collects type feedback as it runs:

function add(x, y) { return x + y; }

// First call: x = int, y = int
//   → V8 records: "add was called with (int, int)"
add(3, 4);

// TurboFan generates: mov rax, rdi; add rax, rsi; ret

// Later: x = string, y = string  
//   → V8 records: "also called with (string, string)"
add("hello", " world");

// This triggers deoptimization: TurboFan's optimized code is discarded.
// The function falls back to Ignition, which handles the type polymorphism.

The cost of type polymorphism is dramatic:

// Monomorphic: always same types → V8 can optimize heavily
function mono(shape) { return shape.x + shape.y; }
mono({x: 1, y: 2});     // V8 creates hidden class C0
mono({x: 3, y: 4});     // Same hidden class → fast path
// → ~50M ops/sec

// Polymorphic: different shapes → V8 gives up
function poly(shape) { return shape.x + shape.y; }
poly({x: 1, y: 2});     // Hidden class C0
poly({x: "a", y: "b"}); // Hidden class C1 (different)
poly({a: 1, b: 2});     // Hidden class C2 (different)
// → ~5M ops/sec (10× slower)

This is why writing “hot” JavaScript that doesn’t change its object shapes is critical for V8 performance — and why TypeScript (which doesn’t emit type-guided code) doesn’t help V8 optimize better.

PyPy’s Tracing JIT

PyPy takes a different approach from V8. Instead of compiling whole functions, it records traces — linear paths through loops:

# This loop runs → PyPy traces it
total = 0
for i in range(1_000_000):
    total += i  # ← trace starts here when loop is detected as hot

The trace records every operation as a sequence of S-expressions:

loop_begin:
    i3 = get(guard_value(p47))     # i = loop variable
    i4 = int_add(i3, 1)            # i + 1
    i5 = int_add(p46, i3)          # total += i
    set(p46, i5)                   # store total
    guard_value(i4 != 1000000)     # loop condition
    jump(loop_begin)

The trace is then optimized (loop invariant code motion, constant folding) and compiled to native code. If the program follows the trace, it runs at near-native speed. If a guard fails (e.g., a variable changes type), execution “exits” the trace and falls back to the interpreter.

PyPy’s tracing JIT is excellent for:

  • Tight numeric loops (NumPy-level speed for pure Python loops)
  • Simple data structures traversed linearly

PyPy is poor for:

  • Highly polymorphic code (many different types entering the same trace)
  • Short-running scripts (the JIT never warms up)
  • C extension modules (PyPy’s C API emulation is slow)

Python 3.15’s JIT (2025)

As of Python 3.13/3.14, CPython added a copy-and-patch JIT (called “JIT by default” in 3.14+). It’s not a full optimizing JIT like V8 — it’s closer to Sparkplug-level: it generates specialized machine code by copying pre-compiled templates and patching in the operands for each bytecode instruction.

CPython 3.12: pure bytecode interpreter
  → ~80 Python bytecodes per C function call

CPython 3.13: tier 1 interpreter + small JIT (experimental)
  → Some instructions become native code, no complex optimizations

CPython 3.14+: JIT enabled by default
  → ~2x speedup for CPU-bound Python, ~1.2x for typical web apps

It doesn’t make Python “fast” — but it closes the gap from “30× slower than C” to “15× slower than C.”


🐌 Interpreted: CPython, Bash, Ruby

Pure interpreters read source code (or bytecode) and execute it instruction by instruction. No native code is generated.

CPython’s execution model:

Python source


parse → AST → compile → bytecode (.pyc)


                    bytecode interpreter loop
                    (ceval.c: _PyEval_EvalFrameDefault)

                    ┌───────┴────────┐
                    │   opcode loop  │
                    │                │
                    │ for each op:   │
                    │   switch(op) { │
                    │     case BINARY_OP: ...     │
                    │     case LOAD_FAST: ...     │
                    │     case CALL_FUNCTION: ... │
                    │   }            │
                    └────────────────┘

Each bytecode instruction requires:

  1. Fetch opcode and arguments from the bytecode array
  2. Dispatch via a computed goto or switch statement
  3. Perform the operation (which may involve dynamic type checks)
  4. Store results back in the stack or locals array
  5. Jump to next instruction

This loop overhead is ~10-20 CPU cycles per bytecode instruction. A single Python a + b might be 3-4 bytecodes, each taking ~50 cycles. In C, that’s one add instruction (~1 cycle). This is the fundamental reason interpreters are slow: each line of source code has a per-operation tax that doesn’t exist in compiled code.


📊 Benchmarks

From the Computer Language Benchmarks Game (all measurements normalized to C):

BenchmarkC (gcc -O2)RustGoJava (GraalVM)Node.js (V8)CPythonPyPyLuaJIT
n-body1.00×1.01×3.5×1.2×1.8×90×35×1.3×
fannkuch-redux1.00×1.02×2.8×1.5×2.0×120×45×
binary-trees1.00×1.01×4.0×2.0×3.0×80×30×2.0×
regex-redux1.00×0.95×1.5×0.8×0.7×25×12×1.5×
pidigits1.00×1.00×1.2×1.0×1.1×2.5×1.0×
Geometric Mean1.00×1.00×2.5×1.3×1.6×45×18×2.0×

Key observations:

  • Rust = C: Zero-cost abstractions are real. Rust matches C performance within 1-2%.
  • Go: ~2.5× slower than C, almost entirely due to GC overhead and lack of certain LLVM optimizations.
  • Java (GraalVM): Near C speed for compute-heavy workloads. GC adds variance but average throughput is excellent.
  • Node.js (V8): Impressive for a dynamic language. V8’s 4-tier JIT is the best optimizing JIT ever built.
  • CPython: 45× slower than C. This is the “Python tax” — the cost of dynamic dispatch at every operation.
  • PyPy: ~2.5× faster than CPython for these numeric benchmarks. Still 18× slower than C.
  • LuaJIT: Nearly as fast as Java. LuaJIT’s trace-compiler is a masterpiece of JIT engineering (one of Mike Pall’s greatest contributions).

🔬 Warmup Effects

A critical and often overlooked dimension: how long does it take to reach peak performance?

Performance over time:
                     ┌──────────────────────┐
AOT (C/Rust/Go)      │██████████████████████│  peak from instruction 1
                     └──────────────────────┘
                     ┌──────────────┐
JIT (Java/V8/LuaJIT) │░░░░░░│████████████████│  ramp-up as JIT compiles
                     └──────────────┘
                     ┌──────────────────────┐
                     │░░░░  slow at startup, │
Interpreted (Python) │██████████████████████│  peak = steady state
                     └──────────────────────┘  (same speed always)
  • Serverless (AWS Lambda, CloudFlare Workers): JIT-heavy languages suffer because functions are cold-started. Java on Lambda can take 1-2 seconds to warm up. Python starts instantly but runs slow. This is why serverless pushes toward Node.js or Rust-compiled-to-Wasm.

  • Long-running servers (database, API, stream processing): The JIT’s warmup cost is negligible over hours of uptime. Java is extremely competitive here.

  • CLI tools: AOT wins. A 5ms compile + instant execution beats 0ms compile + 50ms execution.

ScenarioBest RuntimeWhy
ls, grepC/Rust/GoStartup dominates
Web API endpointJava/.NET/NodeLong-lived process, JIT pays off
Data pipeline scriptPython/PyPyFast to write, PyPy if CPU-bound
Game engineC++/RustPredictable latency, no GC pauses
Shell scriptBashEverything else is negligible

🎯 The Real Answer

The question “Is X compiled or interpreted?” is not useful. Instead, ask:

  1. What execution tiers does the runtime have? (interpreter → baseline JIT → optimizing JIT)
  2. How long does the program run? (warmup cost amortization)
  3. What’s the tolerance for latency variance? (GC pauses, deoptimization)
  4. How much control do I have over memory layout? (monomorphization, cache locality)

Every language runtime is converging to the same architecture: start with a fast interpreter, profile hot code, compile incrementally. CPython is getting a JIT. Java uses an interpreter + C1 + C2. V8 uses 4 tiers. Even Ruby’s YJIT (introduced in 3.1) is a baseline JIT.

The “compiled vs interpreted” war is over. The answer is: both, in tiers, depending on how hot the code is.


📖 Series Navigation

 praneshnikhar.site / posts / compiled-vs-jit-vs-interpreted · Top 1:1