Google has unveiled a pint-sized new addition to its “open” large language model lineup: Gemma 3 270M.
Weighing in at 270 million parameters and requiring around 550MB of memory, it’s designed to make waves in on-device deployment and rapid model iteration — despite the usual caveats around hallucinations, shaky output, and probable copyright entanglements baked into its training data.
Google launched the original Gemma family in February 2024, and at the time offered two flavours: a two-billion-parameter version designed for on-CPU execution and a more capable seven-billion-parameter version targeting systems with GPU- or TPU-based accelerators.
While positioned as “open” models, in contrast to the company’s proprietary Gemini family, they, like most competing “open” models, did not include the source nor training data – only pre-trained models and weights, something which remains true for the latest entry in the family (or, as Google would have it, the “Gemmaverse.”
The new, smaller model — optimized for on-device use and capable of running in as little as 550MB of RAM – is ideal for “high-volume, well-defined” tasks, says Google, or when “you need to make every millisecond and micro-cent count.”
It is pitched as being ideal for rapid development thanks to the speed with which it can be fine-tuned, and in turn says that can lead to the easy creation of “a fleet of specialized task models.”
Based on unverified internal benchmarking, Google claims that Gemma 3 270M outperforms similarly-sized models including SmollLM2-360M-Instruct and Qwen 2.5 0.5B Instruct at the IFEval instruction-following benchmark, though naturally delivers much poorer performance than the four-times-the-size Gemma 3 1B at a score of 51.2 to 80.2.
The model isn’t, Google is keen to point out, designed for raw performance. Instead, the company is making much of its energy efficiency: when quantized down to INT4 precision – with quantization-aware trained (QAT) checkpoints already provided, and the promise of a minimum of performance impact over INT8 precision – Google’s again-unverified internal testing showed a battery drain of 0.75 percentage points for 25 conversations of unspecified length when running on a Pixel 9 Pro smartphone.
While the model itself is smaller than its siblings, its training dataset is not. It includes a similar spread of material as its larger siblings – including web documents, source code, mathematical text, and images – yet the 270M-parameter model was trained with a claimed six trillion tokens, three times as many as the 1B-parameter version and half again as many as the 4B-parameter model.
Only the biggest 12- and 27-billion parameter models beat it, at 12 trillion tokens and 14 trillion tokens respectively. Like all the other Gemma 3 models, the dataset has a “knowledge cut-off date” of August 2024, meaning anything newer than that will have to be fed to the model during fine-tuning or as part of a prompt.
As with the earlier, larger Gemma models, the new compact model is made available for free – but with a set of usage restrictions, the breach of which gives Google “the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonable believes are in violation.”
These restrictions are outlined in the prohibited use policy, and include a ban on generating content “that infringes, misappropriates, or otherwise violates any individual’s or entity’s rights,” the performance of “dangerous, illegal, or malicious activities,” unlicensed practise of medicine and accounting, the generation or distribution of spam, and, more controversially, “attempts to override or circumvent safety filters” and the generation of “sexually explicit content,” with a carve-out in the latter clause for “content created for scientific, educational, documentary, or artistic purposes.”
Those interested in getting hands-on with the latest model in the “Gemmaverse” can find it on Hugging Face, Ollama, [Kaggle](https://www.kaggle.com/models/google/gemma-3), LM Studio, and Docker.
Google has also released a guide to fine-tuning the model.