Independently FunctionGemma Fine-Tune

This repository provides a fine-tuned version of google/functiongemma-270m-it tailored for the Independently desktop platform. The goal of this fine-tune is high-precision, high-reliability tool calling for local virtual assistant workflows (chores, expenses, recipes, alerts, and grocery lists), ensuring it consistently selects the right tool and produces correctly structured arguments even under varied phrasing and bilingual (English/Italian) requests.

Purpose

The core objective of this project was to fine-tune a very small model that can be embedded directly inside the Independently desktop app, including on low-power devices, without sacrificing tool-calling accuracy or requiring external services.

What This Model Optimizes For

Accurate tool selection across chores, expenses, recipes, alerts, and grocery-list flows.
Correct parameterization: filters, tags, date ranges, recurrence rules, limits, and update/delete semantics.
Robustness to phrasing variation: short commands, conversational instructions, ambiguous user wording, and follow-ups.
Multilingual support (EN/IT): consistent behavior across English and Italian prompts for the same intent.

Dataset

The dataset is 100% synthetic, generated to match the FunctionGemma tool-calling format and the Independently tool schema.

Train: 10,000 examples
Eval/Test: 2,000 examples
Coverage: All primary Independently tools and common usage patterns, including edge cases (e.g., missing optional fields, default behaviors, fuzzy time expressions, multi-constraint filters, weird queries or bad worded ones, etc.)

Tool Coverage

The dataset includes examples for the following tools:

Categories & Tags

data.listCategories
data.upsertCategory
data.listTags
data.upsertTag

Chores

data.listChores
data.listArchivedChores
data.createChore
data.completeChoresByFilter
data.archiveChoresByFilter
data.deleteChoresByFilter
data.updateChoresByFilter
data.listRecurringChores
data.createRecurringChore

Expenses & Limits

data.listExpenses
data.createExpense
data.deleteExpensesByFilter
data.updateExpensesByFilter
data.listRecurringExpenses
data.createRecurringExpense
data.listExpenseLimits
data.upsertExpenseLimit

Recipes & Alerts

data.listRecipes
data.suggestRecipes
data.importRecipeDatasetFromUrl
data.createRecipe
data.updateRecipesByFilter
data.scheduleRecipeAlertByName
data.listAlerts
data.dismissAlertsByFilter

Integrations

discord.sendRecipeGroceryListByName

Training Configuration

Item	Value
Base model	`google/functiongemma-270m-it`
Epochs	4
Hardware	NVIDIA RTX 4090
Runtime	~14 minutes
Train size	10,000
Eval size	2,000

Results

On the evaluation set (tool selection + argument correctness):

Base model: ~0% accuracy
Fine-tuned model: 98.2% accuracy

This is a substantial improvement in both:

selecting the intended tool for a given user request, and
producing valid, faithful arguments aligned to Independently’s runtime schema.

Evaluation Methodology

Evaluation is performed on a held-out 2,000-example eval/test set containing natural-language user requests paired with a single expected tool call (tool name + JSON arguments). A prediction is counted as correct only when it produces the right tool and the right parameters for the given query.

What Counts as a Successful Prediction

A model output is considered a successful call when all of the following are true:

Correct tool selection
The predicted tool name exactly matches the expected tool for the request (e.g., data.createExpense vs. data.updateExpensesByFilter).
Correct argument structure
The output is a valid FunctionGemma-style tool call with well-formed JSON, using the expected argument schema for that tool.
Correct parameterization (semantic match)
The arguments faithfully represent the user’s intent, including:
- Filters (tags, categories, names, status, archived/active)
- Date and time constraints (explicit dates, ranges, relative expressions)
- Recurrence rules (frequency, interval, days, next run)
- Limits (amount thresholds, periods)
- Update/delete/complete semantics and scopes (single item vs. matching set)

Matching Rules

Tool name match is strict: the tool must be exactly the expected one.
Required fields must match: missing or incorrect required parameters are failures.
Optional fields are allowed when consistent: extra optional parameters are permitted only if they do not change the meaning of the request or narrow/expand the scope incorrectly.
Equivalent representations are accepted when they resolve to the same meaning (e.g., alternative but schema-valid ways of expressing a date range), as long as they conform to the runtime tool schema.

Accuracy Metric

Reported accuracy is end-to-end exactness on the eval set:

Correct = right tool + correct arguments
Incorrect = wrong tool OR wrong/missing/misleading arguments

This metric is intentionally strict, reflecting Independently’s production requirement that tool calls be executable and faithful to the user’s request without manual repair.

Note: The reported score is measured on the same eval set definition used before and after fine-tuning, focusing on end-to-end correctness (tool + parameters).

Notes & Assumptions

The dataset follows the FunctionGemma tool-calling JSON format used in Google’s official tooling guide.
The model is optimized for local-first execution: tool decisions do not require external APIs.
The tool schema is consistent between training and runtime to minimize schema drift and maximize reliability.

Intended Use

This model is intended to be embedded within the Independently desktop assistant as the tool-calling policy model—the component responsible for producing structured tool invocations from natural language.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for independently-platform/functiongemma-independently

Base model

google/functiongemma-270m-it

Finetuned

(103)

this model

independently-platform
/

functiongemma-independently

Independently FunctionGemma Fine-Tune

Purpose

What This Model Optimizes For

Dataset

Tool Coverage

Categories & Tags

Chores

Expenses & Limits

Recipes & Alerts

Integrations

Training Configuration

Results

Evaluation Methodology

What Counts as a Successful Prediction

Matching Rules

Accuracy Metric

Notes & Assumptions

Intended Use

Model tree for independently-platform/functiongemma-independently

Dataset used to train independently-platform/functiongemma-independently

Independently FunctionGemma Fine-Tune

Purpose

What This Model Optimizes For

Dataset

Tool Coverage

Categories & Tags

Chores

Expenses & Limits

Recipes & Alerts

Integrations

Training Configuration

Results

Evaluation Methodology

What Counts as a Successful Prediction

Matching Rules

Accuracy Metric

Notes & Assumptions

Intended Use

Model tree for independently-platform/functiongemma-independently

Dataset used to train independently-platform/functiongemma-independently

🎉 Free Image Generator Now Available!