DFlash Scaling Sweep Source of Truth
DFlash Scaling Sweep Source of Truth
Paused at: 2026-05-09T16:01:46-04:00
This file is the current source of truth for the interrupted DGX Spark DFlash sweep. Values are parsed from result artifacts under ~/dflash_trie_experiment/results on f7 and cfd0, then synced locally to /tmp/dflash_pause_results_f7 and /tmp/dflash_pause_results_cfd0.
Runtime State
- f7: all benchmark controllers and vLLM containers stopped; GPU idle after pause.
- cfd0: baseline benchmark controller stopped; incomplete
qwen35_9b_baseline_math500_c4_scale.logpreserved as a.paused_*file; the only remaining container isdflash-serve-qwen35_35b_a3b-dflash-32768. - cfd0 serving endpoint:
http://127.0.0.1:30000/v1, served modelqwen35_35b_a3b_dflash_32768;/healthpassed at pause verification. - Runner updated on both hosts with explicit
--resume; completed rows are detected by required metric fields, and partial outputs are renamed before rerun.
Resume Commands
Run these after the high-priority job is done. They skip completed rows and rerun only missing or paused rows.
ssh spark "cd ~/dflash_trie_experiment && nohup bash -lc 'set -euo pipefail; export PYTHONUNBUFFERED=1; ./dflash_scale_sweep.py --model qwen35_35b_a3b --mode dflash --phase all --resume' >> sweep_logs/controller_f7_dflash_scale_resume.log 2>&1 &"
ssh -J spark jarrodbarnes@192.168.100.11 "cd ~/dflash_trie_experiment && nohup bash -lc 'set -euo pipefail; export PYTHONUNBUFFERED=1; ./dflash_scale_sweep.py --model qwen35_9b --mode baseline --phase all --resume; ./dflash_scale_sweep.py --model qwen35_35b_a3b --mode baseline --phase all --resume' >> sweep_logs/controller_cfd0_baseline_scale_resume.log 2>&1 &"
Calibration Results
| Model | Dataset | C | Baseline TPS | DFlash TPS | TPS speedup | Baseline latency | DFlash latency | Status |
|---|---|---|---|---|---|---|---|---|
| 4B | gsm8k | 1 | 20.40 | 91.03 | 4.46x | 2367.6s | 529.2s | paired |
| 4B | gsm8k | 4 | 96.54 | 288.62 | 2.99x | 500.4s | 166.9s | paired |
| 4B | gsm8k | 16 | 301.09 | 558.00 | 1.85x | 156.0s | 84.3s | paired |
| 4B | math500 | 1 | 20.42 | 95.39 | 4.67x | 3008.1s | 648.2s | paired |
| 4B | math500 | 4 | 95.37 | 304.04 | 3.19x | 645.6s | 202.9s | paired |
| 4B | math500 | 16 | 313.57 | 584.98 | 1.87x | 197.5s | 105.3s | paired |
| 4B | mbpp | 1 | 20.81 | 73.36 | 3.53x | 2039.9s | 574.0s | paired |
| 4B | mbpp | 4 | 96.52 | 233.46 | 2.42x | 424.1s | 177.7s | paired |
| 4B | mbpp | 16 | 297.06 | 462.95 | 1.56x | 139.6s | 90.2s | paired |
| 4B | nemotron_pt_mix | 1 | 20.77 | 57.25 | 2.76x | 2918.7s | 1061.6s | paired |
| 4B | nemotron_pt_mix | 4 | 95.62 | 179.46 | 1.88x | 636.6s | 338.1s | paired |
| 4B | nemotron_pt_mix | 16 | 306.83 | 354.26 | 1.15x | 197.7s | 171.4s | paired |
| 9B | gsm8k | 1 | 12.48 | 56.90 | 4.56x | 3865.0s | 843.1s | paired |
| 9B | gsm8k | 4 | 52.47 | 182.92 | 3.49x | 908.4s | 262.3s | paired |
| 9B | gsm8k | 16 | 179.39 | 414.10 | 2.31x | 261.5s | 113.9s | paired |
| 9B | math500 | 1 | 12.49 | 59.26 | 4.74x | 4897.7s | 1037.6s | paired |
| 9B | math500 | 4 | - | 192.23 | -x | -s | 319.0s | DFlash only |
| 9B | math500 | 16 | - | 439.96 | -x | -s | 139.8s | DFlash only |
| 9B | mbpp | 1 | - | 42.90 | -x | -s | 998.7s | DFlash only |
| 9B | mbpp | 4 | - | 141.91 | -x | -s | 293.9s | DFlash only |
| 9B | mbpp | 16 | - | 300.70 | -x | -s | 138.9s | DFlash only |
| 9B | nemotron_pt_mix | 1 | - | 33.00 | -x | -s | 1887.5s | DFlash only |
| 9B | nemotron_pt_mix | 4 | - | 107.45 | -x | -s | 581.6s | DFlash only |
| 9B | nemotron_pt_mix | 16 | - | 241.17 | -x | -s | 260.3s | DFlash only |
| 35B-A3B | gsm8k | 1 | - | 51.11 | -x | -s | 929.6s | DFlash only |
| 35B-A3B | gsm8k | 4 | - | 105.55 | -x | -s | 453.3s | DFlash only |
| 35B-A3B | gsm8k | 16 | - | 229.82 | -x | -s | 204.4s | DFlash only |
TRIE Agentic Replay Results
| Model | Workload | C/context | Baseline latency | DFlash latency | Latency speedup | Baseline steady completion TPS | DFlash steady completion TPS | Baseline decode TPS | DFlash decode TPS | Status |
|---|---|---|---|---|---|---|---|---|---|---|
| 4B | agentic_coding_8k.jsonl |
c16/32k | 182.0s | 300.2s | 0.61x | 141.02 | 46.91 | 12.95 | 5.97 | paired |
| 9B | agentic_coding_8k.jsonl |
c16/32k | 234.4s | 325.2s | 0.72x | 100.05 | 39.80 | 7.80 | 4.09 | paired |
Paused / Incomplete Rows Preserved
- f7:
qwen35_35b_a3b_dflash_math500_c1_scale.log.paused_20260509T155903 - cfd0:
qwen35_9b_baseline_math500_c4_scale.log.paused_20260509T155908
Remaining Work
- 9B baseline: resume from MATH-500 c4, then MATH-500 c16, MBPP c1/c4/c16, Nemotron c1/c4/c16, and TRIE if not already accepted as complete.
- 35B-A3B DFlash: resume after GSM8K c1/c4/c16 completed; continue MATH-500, MBPP, Nemotron, and TRIE.
- 35B-A3B baseline: not started at pause time.