Geometric Manifold Walking: Stable High-Accuracy Multi-Encoder Fusion Without Backbone Training
Abstract
We present an early look into Walker Fusion, a novel approach to combining representations from multiple pretrained encoders through learned interpolation along representation manifolds. Built on the GeoFractal Router framework and 18 months of geometric deep learning research, Walker Fusion extends pentachora-based attention mechanisms to multi-encoder fusion. The system provides 11 blend modes (including slerp, shiva, gilgamesh), 7 schedule types, and 19 aggregation strategies, enabling systematic exploration of the interpolation geometry between encoder outputs. Across vision (CIFAR-100) and text (AG News) benchmarks, Walker Fusion achieves 88.6% and 94.4% accuracy respectively using only frozen encoder features and a lightweight (~2M parameter) fusion module. Critically, we demonstrate that auxiliary-informed Walker Fusion achieves near-perfect cross-seed consistency (0.999) while outperforming ResNet-18 trained from scratch by +15.6% absolute accuracy. Our results suggest that the geometry of the path between representations matters more than the representations themselves.
1. Introduction
1.1 Research Context
https://github.com/AbstractEyes/geofractal
This work emerges from 18 months of geometric deep learning research exploring whether pentachoron (5-simplex) mathematics can replace traditional neural network components. Prior work includes:
- David: A multi-scale crystal classifier achieving 73-85% ImageNet accuracy with only 120k-3MB parameters using Cayley-Menger determinant-based attention
- Cantor Routing: Fractal coordinate systems for sparse attention patterns
- Liminal Staircase: Hierarchical alpha/β/γ controllers for representation binding
- Beatrix: Flow-matching diffusion models with geometric oscillators
Walker Fusion represents the fusion-specific branch of this research: applying geometric interpolation principles to multi-encoder combination.
1.2 The Core Insight
Modern deep learning increasingly relies on combining multiple pretrained encoders to leverage complementary learned representations. Standard fusion approaches—concatenation, weighted sum, cross-attention—treat encoder outputs as static vectors to be combined. We propose an alternative perspective: the path between representations contains learnable structure.
Consider two encoders observing the same input. Their output representations lie on different manifolds shaped by their training objectives. Rather than asking "how should we weight these outputs?", we ask "how should we walk between them?"
1.3 Key Contributions
FieldWalkerFusion architecture: Vectorized interpolation with 11 blend modes, 7 schedules, 19 aggregations, and 10 presets
Auxiliary-informed fusion (Combo Walker): Geometric features that modulate walking behavior, achieving near-perfect training stability
GeoFractal Router integration: Production-ready components for the geometric deep learning framework
Comprehensive ablation: 100+ configurations across vision and text modalities with dual-run consistency validation
State-of-the-art efficiency: 88.6% CIFAR-100 accuracy with frozen encoders, exceeding 73% ResNet-18 trained from scratch
2. GeoFractal Router Framework
Walker Fusion is implemented within the GeoFractal Router framework, which provides:
2.1 Core Architecture
geofractal/router/
├── base_component.py # ABC - pure Python, no torch
├── base_router.py # ABC - nn.Module, components/objects/_cache
├── base_tower.py # BaseRouter + stages (nn.ModuleList)
├── wide_router.py # BaseRouter + wide execution + torch.compile
├── components/
│ ├── torch_component.py # BaseComponent + nn.Module
│ ├── fusion_component.py # 12+ fusion strategies
│ ├── aggregation_component.py # FieldWalkerFusion system
│ └── ...
└── prefab/
├── geometric_tower_builder.py
└── agatha/beatrix*.py # Diffusion models
2.2 Geometric Foundations
From the David model, we inherit:
Cayley-Menger Determinants: For N points in D dimensions, the Cayley-Menger determinant computes the squared volume of the simplex they form:
def compute_cayley_menger_volume(self, X: torch.Tensor) -> torch.Tensor:
# X: [B, N, D] - N points in D dimensions
# Returns: [B] - squared volumes
Rose Loss: Pentachora-based regularization that encourages representations to form well-conditioned geometric structures.
Cantor Routing: Fractal coordinate assignment for sparse attention patterns:
def _cantor_coordinate(self, position: int, max_len: int, depth: int) -> float:
x = position / max(1, max_len - 1)
cantor_val = 0.0
for _ in range(depth):
x *= 3.0
digit = int(x)
x -= digit
if digit == 2:
cantor_val += factor
factor *= 0.5
return cantor_val
3. FieldWalkerFusion System
3.1 Overview
FieldWalkerFusion provides vectorized interpolation between two representations across T steps:
FieldWalkerFusion(
name="walker",
in_features=768,
num_steps=8,
blend_mode='shiva', # 11 options
schedule='learnable', # 7 options
aggregation='similarity_tree', # 19 options
)
3.2 Blend Modes (11)
| Mode | Formula | Origin |
|---|---|---|
| lerp | (1-α)·a + α·b |
Linear baseline |
| slerp | Spherical linear interpolation | Preserves norms |
| slip | Signed linear interpolation | Alucard experiments |
| zeus | a + α·(b - a)·sigmoid(scale) |
Controlled momentum |
| helios | Cosine-weighted blend | Smooth transitions |
| surge | Exponential ramp | Fast transitions |
| ripple | Sinusoidal oscillation | Periodic sampling |
| gilgamesh | a·cos²(πα/2) + b·sin²(πα/2) |
Energy-preserving |
| shiva | exp(-λα)·a + (1-exp(-λα))·b |
Exponential decay |
| ifrit | Temperature-scaled blend | Sharpness control |
| min_p | Nucleus-style thresholding | Probability filtering |
3.3 Schedules (7)
| Schedule | Pattern | Use Case |
|---|---|---|
| linear | [0, 0.14, 0.29, ..., 1] |
Uniform sampling |
| cosine | (1 - cos(πt))/2 |
Slow-fast-slow |
| sigmoid | 1/(1 + exp(-k(t-0.5))) |
S-curve |
| tau | Golden ratio spacing | Fibonacci-like |
| wave | Sinusoidal modulation | Oscillatory |
| learnable | softmax(params) |
Data-driven |
| adaptive | Input-dependent | Per-sample |
3.4 Aggregations (19)
| Category | Methods |
|---|---|
| Statistical | mean, sum, max, min, weighted |
| Selection | top_k, bottom_k, first, last |
| Probabilistic | softmax, softmin, min_p, gumbel |
| Geometric | triangular, slerp |
| Similarity | similarity, cross_similarity, similarity_tree |
| Learned | attention, learnable |
3.5 Walker Presets (10)
WALKER_PRESETS = {
'alucard': {'blend': 'lerp', 'schedule': 'tau', 'aggregation': 'mean'},
'slerp': {'blend': 'slerp', 'schedule': 'linear', 'aggregation': 'weighted'},
'slip': {'blend': 'slip', 'schedule': 'cosine', 'aggregation': 'similarity'},
'zeus': {'blend': 'zeus', 'schedule': 'sigmoid', 'aggregation': 'last'},
'gilgamesh': {'blend': 'gilgamesh', 'schedule': 'linear', 'aggregation': 'triangular'},
'shiva': {'blend': 'shiva', 'schedule': 'cosine', 'aggregation': 'similarity_tree'},
'ifrit': {'blend': 'ifrit', 'schedule': 'wave', 'aggregation': 'softmax'},
'learnable': {'blend': 'lerp', 'schedule': 'learnable', 'aggregation': 'learnable'},
'fingerprint': {'blend': 'lerp', 'schedule': 'cosine', 'aggregation': 'similarity'},
'min_p': {'blend': 'min_p', 'schedule': 'linear', 'aggregation': 'min_p'},
}
4. Combo Walker: Auxiliary-Informed Fusion
4.1 Motivation
Pure Walker Fusion shows seed-dependent variance (±0.59%). We introduce auxiliary features that inform the walking process without being fused into the output:
ComboWalkerFusion(
aux_type='cosine', # Geometric features
base_blend='shiva', # Walk mode
schedule_mode='aux_modulated', # Per-sample schedule
num_steps=8,
aux_dim=64,
)
4.2 Auxiliary Feature Types
| Type | Computes | Stability |
|---|---|---|
| cosine | Pairwise cosine similarities | 0.999 |
| learned | Fixed per-encoder embeddings | 0.999 |
| input_dependent | Attention over embeddings | 0.992 |
| geometric | Cayley-Menger distances | 0.993 |
| walker_path | Similarities along interpolation | 1.000 |
4.3 Schedule Modulation
Auxiliary features modulate the base schedule per-sample:
modulation = schedule_modulator(aux_feats) # [B, num_steps]
schedule = base_schedule + scale * modulation
schedule = schedule.clamp(0, 1)
This allows the walker to adapt its stepping based on the geometric relationship between encoders for each input.
5. Experiments
5.1 Experimental Setup
Vision (CIFAR-100):
- Encoders: ConvNeXt-S (DINOv3), ViT-B (DINOv3), ViT-B (CLIP)
- All encoders frozen, features cached
- 50K train / 10K test
Text (AG News):
- Encoders: CLIP ViT-B (text), T5-base, BERT-large
- Mean pooling over sequence
- 120K train / 7.6K test
Consistency Protocol (per OverMeta suggestion):
- Each configuration run 2× with seeds {42, 1042}
- Consistency ratio = min/max (>0.95 = reliable)
5.2 Vision Results (CIFAR-100)
Triple Encoder Walker Ablation (47 configurations)
| Rank | Configuration | Accuracy |
|---|---|---|
| 1 | hier_learnable_full | 89.19% |
| 2 | hier_steps_8 | 89.16% |
| 3 | hier_blend_slerp | 89.13% |
| 4 | hier_blend_shiva | 89.13% |
| 5 | chain_default | 89.10% |
Strategy Comparison:
| Strategy | Best | Description |
|---|---|---|
| Hierarchical | 89.19% | ((A,B),C) nesting |
| Chain | 89.10% | A→B→C sequential |
| Sum | 89.06% | Simple baseline |
| Concat | 88.20% | Standard approach |
Combo Walker Stability (20 configurations, dual-run)
| Configuration | Mean | Std | Consistency |
|---|---|---|---|
| baseline_walker | 88.07% | ±0.59% | 0.987 |
| combo_shiva_cosine | 88.62% | ±0.05% | 0.999 |
| combo_shiva_learned | 88.64% | ±0.05% | 0.999 |
| combo_shiva_walker_path | 88.57% | ±0.01% | 1.000 |
Key Finding: Auxiliary features reduce variance by 12× while maintaining accuracy.
5.3 Text Results (AG News)
| Configuration | Accuracy |
|---|---|
| hier_learnable_full | 94.38% |
| hier_blend_gilgamesh | 94.22% |
| baseline_concat | 94.21% |
Cross-Modal Pattern: Learnable hierarchical walking wins in both vision AND text.
5.4 Comparison to Classic Baselines
| Model | Params | Accuracy | Std | Consistency |
|---|---|---|---|---|
| ResNet-18 (scratch) | 11M | 72.96% | ±0.09% | 0.998 |
| ResNet-34 (scratch) | 21M | 73.51% | ±0.13% | 0.996 |
| Combo Walker | ~2M | 88.62% | ±0.05% | 0.999 |
Walker Fusion achieves +15.6% over ResNet-18 with 5× fewer trainable parameters.
5.5 InceptiveFusion Comparison
We tested auxiliary features WITHOUT walking (InceptiveFusion from CantorMultiheadFusion's "consciousness" mode):
| Approach | Accuracy |
|---|---|
| Walker (hierarchical) | 89.19% |
| InceptiveFusion (aux_learned) | 88.05% |
Conclusion: Walking (+2.6%) beats static auxiliary weighting (+1.5%).
6. Analysis
6.1 Why Does Walking Work?
Traditional fusion treats encoder outputs as independent vectors. Walking reveals:
- Manifold structure: Intermediate points z(t) contain valid representations
- Non-uniform schedules: Learned schedules concentrate steps at specific t values
- Blend mode matters: Slerp/Shiva outperform lerp (preserving geometric properties)
6.2 Why Do Auxiliary Features Stabilize Training?
Without auxiliary features, the walker must discover manifold structure from gradients alone. Auxiliary features provide:
- Cosine similarities: Which encoders agree/disagree
- Geometric features: Cayley-Menger distances between representations
- Result: Consistent convergence across seeds
6.3 Connection to Pentachora Research
Walker Fusion extends the pentachora hypothesis: geometric structure in representation space is more informative than raw magnitudes.
- David used Cayley-Menger volumes for attention
- Walker uses geometric interpolation for fusion
- Both discover that the shape of the representation manifold matters
7. Architectural Implications
7.1 Walker Stacks
Our results suggest a new architectural primitive:
| Era | Primitive | Operation |
|---|---|---|
| 2015 | ResNet | y = F(x) + x |
| 2017 | Transformer | y = Attention(Q,K,V) |
| 2025? | Walker | y = Walk(x₁, x₂) |
Concept:
x₁, x₂ = parallel_paths(input)
w₁ = walker(x₁, x₂) # First interpolation
w₂ = walker(w₁, F(w₁)) # Residual walker
w₃ = walker(w₂, G(w₂)) # Stack deep
7.2 Implications for Model Efficiency
Current paradigm: Train massive models, then distill Walker paradigm: Combine existing small models via learned geometry
8. Conclusion
Walker Fusion demonstrates that how we traverse between representations matters more than how we weight them. Built on 18 months of geometric deep learning research, the system achieves:
- 88.6% CIFAR-100 with frozen encoders (vs 73% ResNet-18 from scratch)
- 94.4% AG News matching fine-tuned BERT
- 0.999 consistency across random seeds
- ~2M trainable parameters regardless of encoder size
The path between opinions contains more information than the opinions themselves.
Appendix A: Full System Inventory
A.1 Blend Modes (11)
lerp, slerp, slip, zeus, helios, surge, ripple, gilgamesh, shiva, ifrit, min_p
A.2 Schedules (7)
linear, cosine, sigmoid, tau, wave, learnable, adaptive
A.3 Aggregations (19)
mean, sum, max, min, top_k, bottom_k, softmax, softmin, min_p, weighted, last, first, triangular, similarity, cross_similarity, similarity_tree, slerp, attention, learnable
A.4 Walker Presets (10)
alucard, slerp, slip, zeus, gilgamesh, shiva, ifrit, learnable, fingerprint, min_p
Appendix B: Related GeoFractal Components
| Component | Purpose |
|---|---|
| CantorScaleFusion | Fractal routing for sparse attention |
| GeometricAttentionGate | Cayley-Menger volume attention |
| AdaptiveBindingFusion | Lyra-style α/β/γ controllers |
| HierarchicalTreeGating | Tree-structured fusion |
| InceptiveFusion | Consciousness-aware auxiliary injection |
This is an preliminary AI generated documentation based on my overall research efforts into walked fusion.