WebLLM Bench
Test, chat, and compare local browser LLMs on your hardware.
GitHub
·
Device/Model Guide
WebGPU
Local Inference
v2.0
…
Users (24h): —
Runs (24h): —
All Families
Start Here: Device + Model Guidance
The hosted
Qwen2.5-1.5B ctx8192
preset is included directly in the model dropdowns as a built-in option.
Use
Chrome/Edge
for benchmark comparability; Safari is supported but slower in our published runs.
Default stable choice:
Qwen2.5-1.5B-Instruct (q4f16_1)
for balanced quality/speed on laptops.
Need faster decode and can spend more memory: test
Qwen3-1.7B
.
For prompts beyond 4k context, use the custom
ctx8192
build and follow the 8k validation protocol.
Mobile is experimental: start with prompt ≤512 and max output ≤128.
🔧 Add Custom Model
Model Weights URL (optional if local files selected)
Local Model Files (optional)
Drag/drop local model files or a folder (`mlc-chat-config.json`, `tokenizer.json`, `tensor-cache.json`, `params_shard_*.bin`).
Choose Files
Choose Folder
Clear
No local model files selected.
Local files are session-only (browser security). Re-add after reload.
Model Lib URL (optional if local wasm selected)
Drag and drop a local
.wasm
file here (no link needed for model lib).
Choose .wasm
Clear
No local .wasm selected.
Model ID (Unique Name)
Context Window
VRAM Required (MB)
➕ Add to Registry
🗑 Clear Custom Models
⚡ Bench
💬 Chat
📊 Compare
🏆 Best Model
🌍 Community
Benchmark Configuration
Model
Prompt Tokens
512
1024
2048
4096
Max Output Tokens
32
64
128
256
512
1024
2048
Iterations
1 (quick)
3
5
10 (accurate)
Compare With
— none —
IndexedDB cache
Force full max tokens
Keep downloads across refresh
🧹 Clear Site Caches
Preset
Desktop Benchmark (Recommended)
Desktop Real-World Chat
Desktop Long-Output Stress
Safari Stable
Mobile Experimental
Apply Preset
Sets benchmark/chat defaults for reliable runs.
▶ Run Benchmark
⬇ Export JSON
Initializing…
Results
Model Source & Cache
Runtime URL
(not loaded)
Cache
—
Model Root
(on run)
Model Lib
(on run)
Fetch URLs
(run to capture)
Chat with any model
Model
System Prompt
You are a helpful assistant.
Temp
Max Tok
64
128
256
512
1024
2048
4096
Top P
History
Window
Auto
Sliding
Sliding Tok
Reuse chat history (KV-friendly)
Ground model/date facts
🗑 Clear
Load Model & Start Chatting
💬
Select a model and start chatting.
See how it handles your real prompts.
Send
Side-by-Side Comparison
Model A
Model B
Shared Prompt
Explain the concept of recursion in programming. Give one simple example.
Max Tok
128
256
512
1024
2048
4096
Temp
Sliding Tok
📊 Run Both Models
Model A
(waiting…)
Model B
(waiting…)
What are you building?
💬
Chat Assistant
Fast first response, short answers
✍️
Content Generation
Long-form text, articles, stories
💻
Code Completion
Low latency, quick suggestions
📄
Document Processing
Summarize, extract, analyze
Model Size Filter
Small (≤ 3B, fast sweep)
All Models (slow, downloads ~10GB)
Low Resource Only
🏆 Find Best Model for My Hardware
🏆 Recommended for your hardware
Full Ranking
Community Baselines
0 baselines loaded
📋 Copy Results
📥 Import
⬇ Export Sweep
Paste Baseline JSON
Import
Copied to clipboard
Leaderboard
Log
WebLLM Bench v2.0 ready.\n