DFlash Scaling Sweep Source of Truth

Paused at: 2026-05-09T16:01:46-04:00

This file is the current source of truth for the interrupted DGX Spark DFlash sweep. Values are parsed from result artifacts under ~/dflash_trie_experiment/results on f7 and cfd0, then synced locally to /tmp/dflash_pause_results_f7 and /tmp/dflash_pause_results_cfd0.

Runtime State

  • f7: all benchmark controllers and vLLM containers stopped; GPU idle after pause.
  • cfd0: baseline benchmark controller stopped; incomplete qwen35_9b_baseline_math500_c4_scale.log preserved as a .paused_* file; the only remaining container is dflash-serve-qwen35_35b_a3b-dflash-32768.
  • cfd0 serving endpoint: http://127.0.0.1:30000/v1, served model qwen35_35b_a3b_dflash_32768; /health passed at pause verification.
  • Runner updated on both hosts with explicit --resume; completed rows are detected by required metric fields, and partial outputs are renamed before rerun.

Resume Commands

Run these after the high-priority job is done. They skip completed rows and rerun only missing or paused rows.

ssh spark "cd ~/dflash_trie_experiment && nohup bash -lc 'set -euo pipefail; export PYTHONUNBUFFERED=1; ./dflash_scale_sweep.py --model qwen35_35b_a3b --mode dflash --phase all --resume' >> sweep_logs/controller_f7_dflash_scale_resume.log 2>&1 &"
ssh -J spark jarrodbarnes@192.168.100.11 "cd ~/dflash_trie_experiment && nohup bash -lc 'set -euo pipefail; export PYTHONUNBUFFERED=1; ./dflash_scale_sweep.py --model qwen35_9b --mode baseline --phase all --resume; ./dflash_scale_sweep.py --model qwen35_35b_a3b --mode baseline --phase all --resume' >> sweep_logs/controller_cfd0_baseline_scale_resume.log 2>&1 &"

Calibration Results

Model Dataset C Baseline TPS DFlash TPS TPS speedup Baseline latency DFlash latency Status
4B gsm8k 1 20.40 91.03 4.46x 2367.6s 529.2s paired
4B gsm8k 4 96.54 288.62 2.99x 500.4s 166.9s paired
4B gsm8k 16 301.09 558.00 1.85x 156.0s 84.3s paired
4B math500 1 20.42 95.39 4.67x 3008.1s 648.2s paired
4B math500 4 95.37 304.04 3.19x 645.6s 202.9s paired
4B math500 16 313.57 584.98 1.87x 197.5s 105.3s paired
4B mbpp 1 20.81 73.36 3.53x 2039.9s 574.0s paired
4B mbpp 4 96.52 233.46 2.42x 424.1s 177.7s paired
4B mbpp 16 297.06 462.95 1.56x 139.6s 90.2s paired
4B nemotron_pt_mix 1 20.77 57.25 2.76x 2918.7s 1061.6s paired
4B nemotron_pt_mix 4 95.62 179.46 1.88x 636.6s 338.1s paired
4B nemotron_pt_mix 16 306.83 354.26 1.15x 197.7s 171.4s paired
9B gsm8k 1 12.48 56.90 4.56x 3865.0s 843.1s paired
9B gsm8k 4 52.47 182.92 3.49x 908.4s 262.3s paired
9B gsm8k 16 179.39 414.10 2.31x 261.5s 113.9s paired
9B math500 1 12.49 59.26 4.74x 4897.7s 1037.6s paired
9B math500 4 - 192.23 -x -s 319.0s DFlash only
9B math500 16 - 439.96 -x -s 139.8s DFlash only
9B mbpp 1 - 42.90 -x -s 998.7s DFlash only
9B mbpp 4 - 141.91 -x -s 293.9s DFlash only
9B mbpp 16 - 300.70 -x -s 138.9s DFlash only
9B nemotron_pt_mix 1 - 33.00 -x -s 1887.5s DFlash only
9B nemotron_pt_mix 4 - 107.45 -x -s 581.6s DFlash only
9B nemotron_pt_mix 16 - 241.17 -x -s 260.3s DFlash only
35B-A3B gsm8k 1 - 51.11 -x -s 929.6s DFlash only
35B-A3B gsm8k 4 - 105.55 -x -s 453.3s DFlash only
35B-A3B gsm8k 16 - 229.82 -x -s 204.4s DFlash only

TRIE Agentic Replay Results

Model Workload C/context Baseline latency DFlash latency Latency speedup Baseline steady completion TPS DFlash steady completion TPS Baseline decode TPS DFlash decode TPS Status
4B agentic_coding_8k.jsonl c16/32k 182.0s 300.2s 0.61x 141.02 46.91 12.95 5.97 paired
9B agentic_coding_8k.jsonl c16/32k 234.4s 325.2s 0.72x 100.05 39.80 7.80 4.09 paired

Paused / Incomplete Rows Preserved

  • f7: qwen35_35b_a3b_dflash_math500_c1_scale.log.paused_20260509T155903
  • cfd0: qwen35_9b_baseline_math500_c4_scale.log.paused_20260509T155908

Remaining Work

  • 9B baseline: resume from MATH-500 c4, then MATH-500 c16, MBPP c1/c4/c16, Nemotron c1/c4/c16, and TRIE if not already accepted as complete.
  • 35B-A3B DFlash: resume after GSM8K c1/c4/c16 completed; continue MATH-500, MBPP, Nemotron, and TRIE.
  • 35B-A3B baseline: not started at pause time.