| .claude-code-history | ||
| .forgejo/workflows | ||
| .github/workflows | ||
| build_config | ||
| c | ||
| cmake | ||
| cxxbridge_cmd | ||
| docs | ||
| kotlin | ||
| prebuilt | ||
| python | ||
| runtime | ||
| rust | ||
| schema | ||
| scripts | ||
| src | ||
| tools/test | ||
| .bazeliskrc | ||
| .bazelrc | ||
| .bazelversion | ||
| .gitattributes | ||
| .gitignore | ||
| __init__.py | ||
| android_ndk_env.bzl | ||
| BUILD | ||
| BUILD.antlr4 | ||
| BUILD.llguidance | ||
| BUILD.miniaudio | ||
| BUILD.minizip | ||
| BUILD.minja | ||
| BUILD.nanobind_json | ||
| BUILD.sentencepiece | ||
| BUILD.stb | ||
| BUILD.tokenizers_cpp | ||
| BUILD_SYSTEM.pdf | ||
| cargo-bazel-lock.json | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CMakeLists.txt | ||
| CMakePresets.json | ||
| CONTRIBUTING.md | ||
| LICENSE | ||
| PATCH.llguidance | ||
| PATCH.llguidance_grammar | ||
| PATCH.llguidance_numeric | ||
| PATCH.llguidance_parser | ||
| PATCH.llguidance_perf | ||
| PATCH.llguidance_regexvec | ||
| PATCH.minja | ||
| PATCH.nanobind_json | ||
| PATCH.rules_rust | ||
| PATCH.sentencepiece | ||
| PATCH.tensorflow | ||
| PATCH.toktrie | ||
| README.md | ||
| requirements.txt | ||
| rust_cxx_bridge.bzl | ||
| version.bzl | ||
| WORKSPACE | ||
LiteRT-LM
LiteRT-LM is Google's production-ready, high-performance, open-source inference framework for deploying Large Language Models on edge devices.
🔥 What's New: Gemma 4 support with LiteRT-LM
Deploy Gemma 4 across a broad range of hardware with stellar performance (blog).
👉 Try on Linux, macOS, Windows (WSL) or Raspberry Pi with the LiteRT-LM CLI:
litert-lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--prompt="What is the capital of France?"
🌟 Key Features
- 📱 Cross-Platform Support: Android, iOS, Web, Desktop, and IoT (e.g. Raspberry Pi).
- 🚀 Hardware Acceleration: Peak performance via GPU and NPU accelerators.
- 👁️ Multi-Modality: Support for vision and audio inputs.
- 🔧 Tool Use: Function calling support for agentic workflows.
- 📚 Broad Model Support: Gemma, Llama, Phi-4, Qwen, and more.
🚀 Production-Ready for Google's Products
LiteRT-LM powers on-device GenAI experiences in Chrome, Chromebook Plus, Pixel Watch, and more.
You can also try the Google AI Edge Gallery app to run models immediately on your device.
| Install the app today from Google Play | Install the app today from App Store |
|---|---|
![]() |
|
📰 Blogs & Announcements
| Link | Description |
|---|---|
| Bring state-of-the-art agentic skills to the edge with Gemma 4 | Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and broad reach using LiteRT-LM. |
| On-device GenAI in Chrome, Chromebook Plus and Pixel Watch | Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale. |
| On-device Function Calling in Google AI Edge Gallery | Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs. |
| Google AI Edge small language models, multimodality, and function calling | Latest insights on RAG, multimodality, and function calling for edge language models. |
🏃 Quick Start
🔗 Key Links
- 👉 Technical Overview including performance benchmarks, model support, and more.
- 👉 LiteRT-LM CLI Guide including installation, getting started, and advanced usage.
⚡ Quick Try (No Code)
Try LiteRT-LM immediately from your terminal without writing a single line of code using uv:
uv tool install litert-lm
litert-lm run \
--from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
gemma-3n-E2B-it-int4 \
--prompt="What is the capital of France?"
📚 Supported Language APIs
Ready to get started? Explore our language-specific guides and setup instructions.
| Language | Status | Best For... | Documentation |
|---|---|---|---|
| Kotlin | ✅ Stable | Android apps & JVM | Android (Kotlin) Guide |
| Python | ✅ Stable | Prototyping & Scripting | Python Guide |
| C++ | ✅ Stable | High-performance native | C++ Guide |
| Swift | 🚀 In Dev | Native iOS & macOS | (Coming Soon) |
🏗️ Build From Source
This guide shows how you can
compile LiteRT-LM from source. If you want to build the program from source,
you should checkout the stable tag.
📦 Releases
- v0.10.1: Deploy Gemma 4 with stellar performance (blog) and introduce LiteRT-LM CLI.
- v0.9.0: Improvements to function calling capabilities, better app performance stability.
- v0.8.0: Desktop GPU support and Multi-Modality.
- v0.7.0: NPU acceleration for Gemma models.
For a full list of releases, see GitHub Releases.

