Connect To Your Own LLM Server
Connect directly from your browser to Ollama, LM Studio, llama.cpp, Jan, vLLM, TGI, and other OpenAI-compatible model servers on localhost, your LAN, or private networks such as Tailscale.
Remote mode is browser-direct - if your setup is blocked by mixed content, CORS, or local-network exposure, this page will surface the browser/server issue instead of proxying around it.
Connect directly to local, LAN, or Tailscale-accessible model servers from your browser.
This mode is browser-direct, so mixed-content, CORS, and server auth still apply.
Saved Connections
Activate an existing profile or build a new one.
No saved profiles yet.
Create or select a profile to connect.
Model types
- Decoder
- The most common architecture for chat. Generates text left-to-right, one token at a time. Used by GPT, Llama, Qwen, Gemma, Phi, and SmolLM. Good for conversation, creative writing, and general instruction-following.
- Seq2Seq
- Encoder-decoder models that read the full input before generating output. Better for structured tasks like translation, summarisation, and Q&A. FLAN-T5 uses this architecture.
- Hybrid
- Combines convolution and attention layers for efficient on-device inference. LFM2.5 from Liquid AI uses this novel architecture, achieving strong performance at very small sizes.
- Vision
- Processes images as input and produces text descriptions. Florence 2 can caption images, read text via OCR, and detect objects.
- Multimodal
- Handles multiple input or output types. Janus Pro generates and understands images. Gemma 4 E2B accepts text, images, audio, and video.
- TTS (Text-to-Speech)
- Converts written text into natural-sounding audio. Kokoro produces speech across 54 voices and 8 languages.
- ASR (Speech Recognition)
- Converts spoken audio into text. Granite 4.0 is the top-ranked open model on the OpenASR leaderboard.
How does in-browser AI work?
Everything on this page runs locally in your browser. These panels explain the technology behind it.