WebLLM Bench

Test, chat, and compare local browser LLMs on your hardware.

WebGPU Local Inference v2.0
Start Here: Device + Model Guidance
🔧 Add Custom Model
Drag/drop local model files or a folder (`mlc-chat-config.json`, `tokenizer.json`, `tensor-cache.json`, `params_shard_*.bin`).
No local model files selected.
Local files are session-only (browser security). Re-add after reload.
Drag and drop a local .wasm file here (no link needed for model lib).
No local .wasm selected.
Benchmark Configuration
IndexedDB cache
Force full max tokens
Keep downloads across refresh
Sets benchmark/chat defaults for reliable runs.
Initializing…
Model Source & Cache
(not loaded)
(on run)
(on run)
(run to capture)
Chat with any model
Load Model & Start Chatting
💬
Select a model and start chatting.
See how it handles your real prompts.
Side-by-Side Comparison
What are you building?
💬
Chat Assistant
Fast first response, short answers
✍️
Content Generation
Long-form text, articles, stories
💻
Code Completion
Low latency, quick suggestions
📄
Document Processing
Summarize, extract, analyze
Community Baselines
0 baselines loaded
Log
WebLLM Bench v2.0 ready.\n