Performance Tuning
Packing Size
The packing parameter significantly affects proof size:
- Smaller values: Faster proving, larger proofs
- Larger values: Slower proving, smaller proofs
Recommendation: Experiment with different packing sizes (4096, 8192, 16384) to find the optimal trade-off for your application.
Example
# Test with different packing sizes
./webgpu_prover '{"program": "app.wasm", "packing": 4096, "args": []}'
./webgpu_prover '{"program": "app.wasm", "packing": 8192, "args": []}'
./webgpu_prover '{"program": "app.wasm", "packing": 16384, "args": []}'
Compare proof generation time and proof size for each configuration.
GPU Threads
The gpu-threads parameter controls parallelism:
- Can exceed physical GPU core count
- Higher values may improve performance on high-end GPUs
- Start with the default (= packing size) and adjust
Example
# Custom GPU thread count
./webgpu_prover '{
"program": "app.wasm",
"packing": 8192,
"gpu-threads": 16384,
"args": []
}'
Constraint Optimization
Use Checked Functions Sparingly
Checked functions (e.g., bn254fr_addmod_checked) automatically add constraints. Use them only when necessary:
// Good: Manual constraint when needed
bn254fr_addmod(sum, a, b);
// ... more operations ...
bn254fr_assert_add(sum, a, b); // Add constraint at the end
// Less optimal: Immediate constraint for every operation
bn254fr_addmod_checked(sum, a, b); // Adds constraint immediately
Batch Operations
Process multiple values together when possible:
// Less optimal: Process one at a time
for (int i = 0; i < 1000; i++) {
bn254fr_mulmod(results[i], inputs[i], constant);
}
// Better: Use vectorized operations
vbn254fr_t vec_inputs, vec_results;
vbn254fr_alloc(vec_inputs, 1000);
vbn254fr_alloc(vec_results, 1000);
// ... populate vec_inputs ...
vbn254fr_mulmod_constant(vec_results, vec_inputs, &constant);
Minimize Constraint Depth
Keep your constraint graph shallow:
// Deep: Many sequential dependencies
bn254fr_mulmod(t1, a, b);
bn254fr_mulmod(t2, t1, c);
bn254fr_mulmod(t3, t2, d);
bn254fr_mulmod(result, t3, e);
// Shallow: Parallel operations
bn254fr_mulmod(t1, a, b);
bn254fr_mulmod(t2, c, d);
bn254fr_mulmod(result, t1, t2);
bn254fr_mulmod_checked(result, result, e);
Memory Management
Free field elements as soon as possible:
// Good: Free when done
{
bn254fr_t temp;
bn254fr_alloc(temp);
// ... use temp ...
bn254fr_free(temp);
} // temp is freed
// Less optimal: Keep allocated longer than needed
bn254fr_t temp;
bn254fr_alloc(temp);
// ... use temp ...
// ... lots of other code ...
bn254fr_free(temp); // Freed much later
Profiling
Monitor proof generation time and identify bottlenecks:
time ./webgpu_prover '{
"program": "app.wasm",
"args": []
}'
The prover will output performance metrics including:
- Constraint generation time
- FFT computation time
- Total proof generation time