 Command

Pranesh Nikhar's personal site. Vim-style keybinds for navigation; theme + font pickers below.

Theme
 Font Body Code
Reader
Keybinds
Navigation
j / ↓ Next item k / ↑ Previous item g First item in region G Last item in region zz Center focused item h / l Move left/right region ] / [ Next/previous heading } / { Next/previous block d / u Half-page down/up
Layout
<zh> / <zl> Toggle left/right sidebar <zr> Toggle reader view <zj> / <zk> Focus main/navbar <S-h/j/k/l> Focus left/main/navbar/right ⌃H / ⌃L Focus left/right sidebar ⌃J / ⌃K Focus main/navbar ⇧C / ⇧E Collapse / expand all sections
Dialogs
⌃P / : Command palette ⌃X Theme picker / Search ? Show keybinds Esc / ⌃C Close dialog
History
n Next document b Previous document ⌃O History back ⌃I History forward
 Search
about: Pranesh Nikhar about/more: 🪪 More docs/test: Docs Test ideas: 💡 Ideas more: ➕ More now: Now posts: 📬 Posts projects: 📚 Projects webtui: Style posts/agentic-eda: 📊 AgenticEDA — Automated Exploratory Data Analysis with LangGraph posts/cap-theorem-outage-story: 🌐 CAP Theorem with a Real Outage Story posts/codepilot: ✈️ CodePilot — From Requirements to Deployable FastAPI Backend posts/common-auth-mistakes: 🔐 Common Auth Mistakes Developers Make posts/compiled-vs-jit-vs-interpreted: ⚡ Why Is X Language Fast or Slow? — Compiled vs JIT vs Interpreted posts/cs-degree-gaps: 🎓 Things CS Degrees Don't Teach You posts/cve-2025-breach-analysis: 🛡️ CVE-2025 Breach Analysis — Midnight Blizzard and the 16 Billion Credential Leak posts/fixloop: 🔄 FixLoop — AI Agent Loop for Self-Correcting Code posts/functional-vs-oop: ⚡ Functional vs OOP — Same Problem, Both Ways posts/getman: 🦾 Getman — Declarative API Tester for CLI & TUI posts/how-compilers-optimize: ⚙️ How Compilers Actually Optimize Your Code posts/http3-quic: ⚡ HTTP/3 and QUIC — Why They Matter posts/leetcode-vs-engineering: 🧩 LeetCode vs Real Engineering Skills posts/llm-from-scratch: 🧠 LLM from Scratch — GPT-Style Transformer in PyTorch posts/lsm-trees-bloom-filters: 🌳 LSM Trees & Bloom Filters — Production Deep Dive posts/mcp-workflow-builder: 🔧 MCP Workflow Builder — Visual DAG for MCP Tools posts/persistent-memory: 🧠 Persistent Memory — Long-Term Memory for AI Agents via MCP posts/playcli: 🎬 PlayCLI — Terminal Video Player posts/postgres-mvcc: 🗄️ How PostgreSQL MVCC Works — Multi-Version Concurrency Control Deep Dive posts/raft-consensus: ⛵ Raft Consensus Algorithm Explained posts/rust-borrow-checker: 🦀 Rust Borrow Checker — Catches Real Bugs posts/titan: 🤖 Titan — Terminal AI Coding Agent posts/what-happens-url: 🌐 What Happens Between Typing a URL and Seeing the Page posts/what-happens-when-you-run-a-program: ⚙️ What Actually Happens When You Run a Program posts/zero-knowledge-proofs: 🔐 Zero-Knowledge Proofs Explained Simply webtui/components/accordion: Accordion webtui/components/badge: Badge webtui/components/button: Button webtui/components/checkbox: Checkbox webtui/components/dialog: Dialog webtui/components/input: Input webtui/components/popover: Popover webtui/components/pre: Pre webtui/components/progress: Progress webtui/components/radio: Radio webtui/components/range: Range webtui/components/separator: Separator webtui/components/spinner: Spinner webtui/components/switch: Switch webtui/components/table: Table webtui/components/textarea: Textarea webtui/components/tooltip: Popover webtui/components/typography: Typography webtui/components/view: View webtui/contributing/contributing: Contributing webtui/contributing/contributing: ## Local Development webtui/contributing/contributing: ## Issues webtui/contributing/contributing: ## Pull Requests webtui/contributing/style-guide: Style Guide webtui/contributing/style-guide: ## CSS Units webtui/contributing/style-guide: ## Selectors webtui/contributing/style-guide: ## Documentation webtui/installation/astro: Astro webtui/installation/astro: ## Scoping webtui/installation/astro: ### Frontmatter Imports webtui/installation/astro: ### ‹style› tag webtui/installation/astro: ### Full Library Import webtui/installation/nextjs: Next.js webtui/installation/vite: Vite webtui/plugins/plugin-dev: Developing Plugins webtui/plugins/plugin-dev: ### Style Layers webtui/plugins/plugin-nf: Nerd Font Plugin webtui/plugins/theme-catppuccin: Catppuccin Theme webtui/plugins/theme-custom: Custom Theme webtui/plugins/theme-everforest: Everforest Theme webtui/plugins/theme-gruvbox: Gruvbox Theme webtui/plugins/theme-nord: Nord Theme webtui/plugins/theme-vitesse: Vitesse Theme webtui/start/ascii-boxes: ASCII Boxes webtui/start/changelog: Changelog webtui/start/installation: Installation webtui/start/installation: ## Installation webtui/start/installation: ## Using CSS webtui/start/installation: ## Using ESM webtui/start/installation: ## Using a CDN webtui/start/installation: ## Full Library Import webtui/start/installation: ### CSS webtui/start/installation: ### ESM webtui/start/installation: ### CDN webtui/start/intro: Introduction webtui/start/intro: ## Features webtui/start/plugins: Plugins webtui/start/plugins: ## Official Plugins webtui/start/plugins: ### Themes webtui/start/plugins: ## Community Plugins webtui/start/theming: Theming webtui/start/theming: ## CSS Variables webtui/start/theming: ### Font Styles webtui/start/theming: ### Colors webtui/start/theming: ### Light & Dark webtui/start/theming: ## Theme Plugins webtui/start/theming: ### Using Multiple Theme Accents webtui/start/tuis-vs-guis: TUIs vs GUIs webtui/start/tuis-vs-guis: ## Monospace Fonts webtui/start/tuis-vs-guis: ## Character Cells
 Theme Current: Light j/k or ↑/↓ + Enter

AuroraGPT: Training Foundation Models on Supercomputers

Pranesh Nikhar 2025-12-16

🧰 AuroraGPT: Toolbox

  • Datasets and data pipelines (how do we deal with scientific data?)
  • Software infrastructure and workflows (scalable, robust, extensible)
  • Evaluation of state-of-the-art LLM Models (how do they perform on scientific tasks?)
🍋 ezpz

praneshnikhar/ezpz

Write once, run anywhere
🚂 Training

argonne-lcf/Megatron-DeepSpeed

For the largest of large language models
🏃‍♂️ Running

argonne-lcf/inference-endpoints

Inference endpoints for LLMs, hosted @ ALCF

👥 Team Leads

Planning

Rick StevensIan FosterRinku GuptaMike PapkaArvind RamanathanFangfang Xia

Data

Ian FosterRobert Underwood

Training

Venkat VishwanathPranesh Nikhar

Evaluation

Franck CappelloSandeep MadireddyBo Li

Post

Eliu HuertaAzton Wells

Inference

Rajeev Thakur

Comms

Charlie CatlettDavid Martin

Distribution

Brad Ullrich

🤝 Teams

  • Planning
  • Data Prep
    • Accumulate 20+ T tokens of high-quality scientific text and structured data
  • Models / Training

    1 - Train (entirely from scratch) a series of models on publicly available data
  • Evaluation
    • Skills, trustworthiness, safety, robustness, privacy, machine ethics
  • Post-Training
    • Fine-tuning, alignment
  • Inference
    • Model serving, API development / public-facing web services
  • Distribution
    • Licensing, generating and distributing artifacts for public consumption
  • Communication

🏋️ Challenges

This is incredibly difficult in practice, due in part to:

  • Brand new hardware, architecture, software
  • Lack of native support in existing frameworks (though getting better!)
  • General system stability
    +10k Nodes (×12XPU1Node)\left(\times \frac{12\,\,\mathrm{XPU}}{1\,\,\mathrm{Node}}\right)\Rightarrow +100k XPUs
    • network performance
    • file system stability (impacted by other users !)
    • many unexpected difficulties occur at increasingly large scales
  • Combinatorial explosion of possible configurations and experiments
  • {hyperparameters, architectures, tokenizers, learning rates, …}

💾 AuroraGPT: Training

  • To train a fixed model on trillions of tokens requires:
    1. Aggregating data from multiple different corpora
      (e.g. ArXiv, Reddit, StackExchange, GitHub, Wikipedia, etc.)
    2. Sampling each training batch according to a fixed distribution across corpora
    3. Building indices that map batches of tokens into these files (indexing)

The original implementation was slow:

  • Designed to run serially on a single device
  • Major bottleneck when debugging data pipeline at scale

🍹 AuroraGPT: Blending Data, Efficiently

  • 🐢 Original implementation:
    • Slow (serial, single device)
    • ~ 1 hr/2T tokens
  • 🐇 New implementation:
    • Fast! (distributed, asynchronous)
    • ~ 2 min/2T tokens (30x faster !!)

Figure 1: Time spent preparing 2T tokens

📉 Training AuroraGPT-7B on 2T Tokens

📉 Training AuroraGPT-2B on 7T Tokens

Reverse Diffusion ProcessForward Diffusion Process (\pi\rightarrow \mathcal{N})

🌀 Sequence-Window-Pipeline Parallelism SWiPe

  • SWiPe is a novel parallelism strategy for Swin-based Transformers
  • Hybrid 3D Parallelism strategy, combining:
    • Sequence parallelism (SP)
    • Window parallelism (WP)
    • Pipeline parallelism (PP)

Figure 10

Figure 11: SWiPe Communication Patterns

🚀 AERIS: Scaling Results

Figure 12: AERIS: Scaling Results

  • 10 EFLOPs (sustained) @ 120,960 GPUs
  • See (Hatanpää et al. (2025)) for additional details
  • arXiv:2509.13523

🌪️ Hurricane Laura

Figure 13: Hurricane Laura tracks (top) and intensity (bottom). Initialized 7(a), 5(b) and 3(c) days prior to 2020-08-28T00z.

📓 References

Dharuman, Gautham, Kyle Hippe, Alexander Brace, et al. 2024. “MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Atlanta, GA, USA), SC ’24. https://doi.org/10.1109/SC41406.2024.00013.

Hatanpää, Väinö, Eugene Ku, Jason Stock, et al. 2025. AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions. https://arxiv.org/abs/2509.13523.

Price, Ilan, Alvaro Sanchez-Gonzalez, Ferran Alet, et al. 2024. GenCast: Diffusion-Based Ensemble Forecasting for Medium-Range Weather. https://arxiv.org/abs/2312.15796.

Song, Shuaiwen Leon, Bonnie Kruft, Minjia Zhang, et al. 2023. DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery Through Sophisticated AI System Technologies. https://arxiv.org/abs/2310.04610.

❤️ Acknowledgements

This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

Extras

Footnotes

  1. Co-led by: Venkat Vishwanath, Pranesh Nikhar

 praneshnikhar.site / talks / 2025 / 12 / 16 · Top 1:1