Connect To Your Own LLM Server

Connect your own OpenAI-compatible inference server (Ollama, LM Studio, llama.cpp, Jan, vLLM) and talk to it through this interface. Traffic stays between your browser and your server.

You are responsible for the server you connect to and the tools you wire up. Unwrite provides the client interface only. Review each model card and licence before use.

Browser-BasedTruly Free

Browser Server

Browser-direct

Connect to your LLM server

Pick the server you already run. We send requests from your browser straight to it. Nothing is proxied, inspected, or stored by Unwrite.

Mixed content note: if Unwrite is loaded over HTTPS and your server is plain HTTP on a LAN address, most browsers will block the request. Either run the server over HTTPS (tools like Tailscale or mkcert make this painless) or load Unwrite in a compatible mode.

Model types

Decoder: The most common architecture for chat. Generates text left-to-right, one token at a time. Used by GPT, Llama, Qwen, Gemma, Phi, and SmolLM. Good for conversation, creative writing, and general instruction-following.
Seq2Seq: Encoder-decoder models that read the full input before generating output. Better for structured tasks like translation, summarisation, and Q&A. FLAN-T5 uses this architecture.
Hybrid: Combines convolution and attention layers for efficient on-device inference. LFM2.5 from Liquid AI uses this novel architecture, achieving strong performance at very small sizes.
Vision: Processes images as input and produces text descriptions. Florence 2 can caption images, read text via OCR, and detect objects.
Multimodal: Handles multiple input or output types. Janus Pro generates and understands images. Gemma 4 E2B accepts text, images, audio, and video.
TTS (Text-to-Speech): Converts written text into natural-sounding audio. Kokoro produces speech across 54 voices and 8 languages.
ASR (Speech Recognition): Converts spoken audio into text. Granite 4.0 is the top-ranked open model on the OpenASR leaderboard.

How does in-browser AI work?

Everything on this page runs locally in your browser. These panels explain the technology behind it.

Thank you to the open-source ecosystem powering this page.

We gratefully acknowledge Hugging Face Hub, Transformers.js, ONNX Runtime Web, the ONNX Community, and model authors/publishers including Hexgrad, Hugging Face, MBZUAI, Liquid AI, Microsoft, Meta, Alibaba, Google. Please review each model card and licence before use.

Hugging Face | Transformers.js docs | ONNX Runtime Web

Connect To Your Own LLM Server

Connect to your LLM server

Ollama

LM Studio

llama.cpp

Jan

vLLM

LocalAI

TGI

SGLang

GPT4All

Custom OpenAI-compatible

Model types

How does in-browser AI work?