Tool calling is a request shape in which the model receives a catalogue of callable functions, each described by a name and a JSON Schema parameter spec, and may respond with structured invocations instead of free text. The application executes the call and feeds the result back, extending a stateless completion endpoint into a grounded loop.

Tool calls (function calling)

Tool calling is a request shape in which the model is given a catalogue of callable functions, each with a name and a JSON Schema parameter spec, and is allowed to respond with one or more structured invocations instead of free text.

The model produces the invocation; the application layer executes it and feeds the result back, which is how a stateless completion endpoint extends into a grounded multi-turn loop without hosting arbitrary application code on the server.

For the longer treatment, see API: tool calls.

Cost and reliability implications

Tool calls put a parser in the serving path: each model family emits its own invocation syntax, and a misconfigured parser turns valid model output into failed requests at runtime. Worse, an unregistered parser name fails only at engine startup, after minutes of billed model loading, so the parser flag deserves the same boundary validation as any schema.

Part of OpenAI-compatible integration on the learn hub.

See also

structured-output

References

OpenAI, Function Calling guide. Reference for the canonical tools / tool_calls / role: tool message shape that the open-source serving ecosystem now mirrors.
vLLM, Tool Calling documentation. Catalogues the per-family tool-call parsers (hermes, mistral, llama3_json, qwen3_coder, qwen3_xml, and others) and the --tool-call-parser launch flag.

The techniques in these pages run in production behind spotinference's OpenAI-compatible endpoint. Get a key and try it: swap the base URL and the key in an existing SDK, and the first request streams back tokens.