BitDecay — Research Landscape

Task family × reasoning condition matrix. Each cell represents a systematic sweep across model architectures and scales. The core formula η = k · exp(−akn) is tested under each combination.

Done
Partial
Next
N/A
Empty
Task
Baseline
CoT
Guided
Chunked
XOR Chain
11 models
η formula core
xor.base
17 models
CoT ↓ error rate
xor.cot
xor.chnk
FSM Traversal
7 models
direct answer
fsm.base
1 model
LLaMA 70B
fsm.cot
13 models
a↓ retrieval-ized
fsm.guid
Mod Arithmetic
2 models
cross-task
ma.base
13 models
R² < 0 !
ma.cot
13 models
R² > 0.6 ✓
ma.guid
MultiHop QA
mh.base
4 models
cross-task v2
mh.cot
mh.chnk
Stack Ops
so.base
2 models
pilot
so.cot
so.guid
so.chnk

Expansion Waves

1
Core exponential decay: XOR + FSM + ModArith full sweeps
Done
2
Cross-task validation + retrieval spectrum
Partial
3
Chunked reasoning + remaining gaps
Next
8
Done
5
Partial
4
Next
56 / 60
Local sweep
15 / 20
Cloud sweep

Key References

paper/main.tex — Main paper: "One Bit at a Time: Exponential Error Accumulation in LLM Compositional Reasoning"
a_guided < a_cot — Guided (retrieval-ized) prompts reduce per-bit error rate. FSM: 12/12 models. ModArith: 10/11 models.
R² contrast — ModArith CoT: R² < 0 (computation). ModArith Guided: R² > 0.6 (retrieval). The parameter a measures retrieval error.
cross-task — Cross-task validation: train on (k,n) subset → predict held-out. MAE 4% for retrieval tasks.
experiments/results/ — Raw JSON results. 64+ experiment files across tasks, models, and conditions.