LLM in the Browser

Open-source LLMs running in your browser. Nothing leaves your device. Small models are rough and nothing like ChatGPT; larger ones are more capable but slow to download.

Third-party weights. Unwrite does not host, train, or moderate them. Some are unfiltered; check each model card and licence before use.

100% PrivateBrowser-BasedTruly Free

Browser Server

On-device AI

Pick a model to run in your browser

Downloads once, then caches in your browser. Nothing leaves this device.

Detecting device capabilities...

Voice Out

Kokoro 82M

Hexgrad202588 MBTTS

English text-to-speech with 28 voices (American + British accents)

Unfiltered

SmolLM2 135M

Hugging Face2024182 MBDecoder2K ctx

Tuned with DPO for on-device chat, text rewriting, and function calls

Chat

LaMini Flan-T5 248M

MBZUAI2023260 MBSeq2Seq1K ctx

Fine-tuned on 2.58M LaMini instructions for general instruction following

Chat

LFM2.5 350M

Liquid AI2026280 MBHybrid33K ctx

Designed for tool calling and structured extraction; covers 9 languages

Vision

Florence 2 Base

230M

Microsoft2024320 MBVision

Single model for captioning, object detection, OCR, and phrase grounding

Image input

Unfiltered

SmolLM2 360M

Hugging Face2024388 MBDecoder2K ctx

Stronger at reasoning and instructions while staying fully on-device

Summary

BART Large CNN

406M

Meta2022410 MBSeq2Seq1K ctx

Dedicated news summariser fine-tuned on CNN/DailyMail text-summary pairs

Chat

Qwen 3.5 0.8B

Alibaba2025571 MBDecoder33K ctx

Toggles thinking mode on or off; covers 100+ languages and agent tasks

Chat

Gemma 3 1B

Google2025763 MBDecoder33K ctx

Lightweight multilingual chat supporting 140+ languages

Chat

Llama 3.2 1B

1.24B

Meta20241.1 GBDecoder4K ctx

Distilled from larger models for fast on-device chat and tool use

Code

Qwen 2.5 Coder 1.5B

Alibaba20241.3 GBDecoder33K ctx

Handles code generation, fixing, and reasoning across 40+ programming languages

Voice In

Granite 4.0 1B Speech

IBM20261.8 GBASR

Speech recognition and translation across 6 languages, ranked #1 on OpenASR

Audio input

Chat

Qwen 2.5 1.5B

Alibaba20241.8 GBDecoder33K ctx

Tuned for coding, math, and reliable structured output like JSON

Chat

Phi 3.5 Mini

3.8B

Microsoft20242.3 GBDecoder131K ctx

Trained on textbook-quality data for strong reasoning and coding

Reasoning

SmolLM3 3B

Hugging Face20252.7 GBDecoder128K ctx

Supports togglable thinking mode and tool calling; fully open weights

Image Gen

Janus Pro 1B

DeepSeek20252.9 GBMultimodal

Generates images from text and answers questions about image content

Image input

Multimodal

Gemma 4 E2B

2.3B

Google20263.4 GBMultimodal128K ctx

Accepts text, images, and audio; reasons and generates text in 140+ languages

Image input

Model types

Decoder: The most common architecture for chat. Generates text left-to-right, one token at a time. Used by GPT, Llama, Qwen, Gemma, Phi, and SmolLM. Good for conversation, creative writing, and general instruction-following.
Seq2Seq: Encoder-decoder models that read the full input before generating output. Better for structured tasks like translation, summarisation, and Q&A. FLAN-T5 uses this architecture.
Hybrid: Combines convolution and attention layers for efficient on-device inference. LFM2.5 from Liquid AI uses this novel architecture, achieving strong performance at very small sizes.
Vision: Processes images as input and produces text descriptions. Florence 2 can caption images, read text via OCR, and detect objects.
Multimodal: Handles multiple input or output types. Janus Pro generates and understands images. Gemma 4 E2B accepts text, images, audio, and video.
TTS (Text-to-Speech): Converts written text into natural-sounding audio. Kokoro produces speech across 54 voices and 8 languages.
ASR (Speech Recognition): Converts spoken audio into text. Granite 4.0 is the top-ranked open model on the OpenASR leaderboard.

How does in-browser AI work?

Everything on this page runs locally in your browser. These panels explain the technology behind it.

Thank you to the open-source ecosystem powering this page.

We gratefully acknowledge Hugging Face Hub, Transformers.js, ONNX Runtime Web, the ONNX Community, and model authors/publishers including Hexgrad, Hugging Face, MBZUAI, Liquid AI, Microsoft, Meta, Alibaba, Google. Please review each model card and licence before use.

Hugging Face | Transformers.js docs | ONNX Runtime Web