WebLLM Bench — Test, Chat & Compare Browser LLMs

Start Here: Device + Model Guidance

The hosted Qwen2.5-1.5B ctx8192 preset is included directly in the model dropdowns as a built-in option.
Use Chrome/Edge for benchmark comparability; Safari is supported but slower in our published runs.
Default stable choice: Qwen2.5-1.5B-Instruct (q4f16_1) for balanced quality/speed on laptops.
Need faster decode and can spend more memory: test Qwen3-1.7B.
For prompts beyond 4k context, use the custom ctx8192 build and follow the 8k validation protocol.
Mobile is experimental: start with prompt ≤512 and max output ≤128.

🔧 Add Custom Model

Model Weights URL (optional if local files selected)

Local Model Files (optional)

Drag/drop local model files or a folder (`mlc-chat-config.json`, `tokenizer.json`, `tensor-cache.json`, `params_shard_*.bin`).

No local model files selected.

Local files are session-only (browser security). Re-add after reload.

Model Lib URL (optional if local wasm selected)

Drag and drop a local .wasm file here (no link needed for model lib).

No local .wasm selected.

Model ID (Unique Name)

Context Window

VRAM Required (MB)

Benchmark Configuration

Model

Prompt Tokens

Max Output Tokens

Iterations

Compare With

IndexedDB cache

Force full max tokens

Keep downloads across refresh

Preset Sets benchmark/chat defaults for reliable runs.

Initializing…

Model Source & Cache

Runtime URL

(not loaded)

Cache

—

Model Root

(on run)

Model Lib

(on run)

Fetch URLs

(run to capture)

Chat with any model

Model

System Prompt

Temp Max Tok Top P History Window Sliding Tok Reuse chat history (KV-friendly) Ground model/date facts

Load Model & Start Chatting

💬

Select a model and start chatting.
See how it handles your real prompts.

Side-by-Side Comparison

Model A

Model B

Shared Prompt

Max Tok Temp Sliding Tok

What are you building?

💬

Chat Assistant

Fast first response, short answers

✍️

Content Generation

Long-form text, articles, stories

💻

Code Completion

Low latency, quick suggestions

📄

Document Processing

Summarize, extract, analyze

Model Size Filter

Community Baselines

0 baselines loaded

Log

WebLLM Bench v2.0 ready.\n