Host Functions -- Advanced
This guide covers the SandboxExtension for spawning child Python
interpreters, depth and concurrency limits, resource limits per child,
handle lifecycle management, and production patterns.
Prerequisites: Read the Intermediate guide first.
SandboxExtension
SandboxExtension is a MontyExtension that lets parent Python code spawn
child Python scripts in separate Monty interpreter instances. Each child
gets its own MontyPlatform and DefaultMontyBridge -- fully isolated
interpreter state.
Why Isolated Children
Use cases for child interpreters:
- Parallel computation: Fan out work across multiple interpreters and collect results.
- Untrusted code execution: Run user-supplied code in a constrained child with strict resource limits.
- Recursive scripting: A script spawns sub-scripts that may themselves spawn further children (bounded by depth limits).
- Fault isolation: A failing child does not crash the parent.
Creating an SandboxExtension
import 'package:dart_monty/dart_monty.dart';
import 'package:dart_monty_bridge/dart_monty_bridge.dart';
final extension = SandboxExtension(
platformFactory: () async => createPlatformMonty(),
maxChildren: 16, // Max concurrent children (default: 16)
maxDepth: 3, // Max recursion depth (default: 3)
currentDepth: 0, // This extension's depth level (default: 0)
childLimits: MontyLimits(
timeoutMs: 5000,
memoryBytes: 10 * 1024 * 1024,
),
);
| Parameter | Type | Default | Description |
|---|---|---|---|
platformFactory |
Future<MontyPlatform> Function() |
required | Creates a fresh platform for each child |
childExtensionCoordinatorFactory |
Future<ExtensionCoordinator?> Function(ChildSpawnContext)? |
null |
Optional: provides extensions to children |
parentExtensions |
List<MontyExtension> |
const [] |
Parent extensions for automatic child inheritance |
maxChildren |
int |
16 |
Maximum concurrent living children |
maxDepth |
int |
3 |
Maximum recursion depth for nested SandboxExtensions |
currentDepth |
int |
0 |
This extension's current depth in the recursion tree |
childLimits |
MontyLimits? |
null |
Default resource limits for all children |
sandboxBaseDir |
String? |
null |
Base directory for per-child working directories |
systemPromptBuilder |
ChildSystemPromptBuilder? |
null |
Builds static system prompt from child context |
Host Functions Provided
The extension registers these functions under the sandbox namespace:
| Function | Description | Parameters |
|---|---|---|
sandbox_spawn(code, timeout_ms?, memory_bytes?, system_prompt?) |
Spawn a child. Returns an integer handle. | code: string (required), timeout_ms: integer, memory_bytes: integer, system_prompt: string |
sandbox_await(handle) |
Wait for a child to complete. Returns its result. | handle: integer |
sandbox_await_all(handles) |
Wait for multiple children. Returns list of results. | handles: list |
sandbox_is_alive(handle) |
Check if a child is still running. Returns boolean. | handle: integer |
sandbox_free(handle) |
Release a completed child's handle. | handle: integer |
sandbox_get_output(handle) |
Get a completed child's print output. | handle: integer |
sandbox_gather(handles) |
Wait for multiple children. Returns list of dicts with handle, value, and output. | handles: list |
Basic Usage from Python
# Spawn two children
h1 = sandbox_spawn("2 ** 10")
h2 = sandbox_spawn("3 ** 7")
# Wait for both
results = sandbox_await_all([h1, h2])
# results == [1024, 2187]
# Clean up
sandbox_free(h1)
sandbox_free(h2)
Handle Lifecycle
Each sandbox_spawn() returns an integer handle. Handles follow a
strict lifecycle:
spawn -> alive -> completed -> freed
Rules:
sandbox_await(handle)blocks until the child completes or fails. If the child failed, it raises an error with the child's error message.sandbox_free(handle)releases the handle's resources. It throwsStateErrorif the child is still alive -- you must await first.sandbox_get_output(handle)returns the child's capturedprint()output as a string (ornullif no output). ThrowsStateErrorif the child is still running.sandbox_is_alive(handle)returnstrueif the child is still executing.- Unknown handles throw
ArgumentError.
Warning: You must call sandbox_free() on every completed
handle. Handles are never garbage collected automatically.
Failing to free handles causes a silent memory leak (the _ChildHandle
and its captured output remain in memory) and will eventually exhaust
maxChildren, preventing new children from being spawned.
Per-Child Resource Limits
Children can override the default childLimits at spawn time:
# Use default limits from childLimits
h1 = sandbox_spawn("expensive_computation()")
# Override timeout for this specific child
h2 = sandbox_spawn("slow_task()", timeout_ms=30000)
# Override memory for this specific child
h3 = sandbox_spawn("memory_heavy()", memory_bytes=50000000)
# Override both
h4 = sandbox_spawn("big_slow()", timeout_ms=60000, memory_bytes=100000000)
When timeout_ms or memory_bytes are specified at spawn time, they
override the corresponding fields in childLimits. Unspecified fields
fall back to the childLimits defaults. If childLimits is null
and no per-child overrides are given, the child runs without resource
constraints.
The MontyLimits fields that can be constrained:
| Field | Description |
|---|---|
timeoutMs |
Maximum execution time in milliseconds |
memoryBytes |
Maximum memory usage in bytes |
stackDepth |
Maximum Python call stack depth |
Note that stackDepth is not overridable per-child from Python -- it
uses the value from childLimits.
Depth and Concurrency Limits
Depth Limits
If children also have SandboxExtension registered (via
childExtensionCoordinatorFactory), they can spawn their own children. The
maxDepth and currentDepth parameters control how deep this
recursion can go:
SandboxExtension(
platformFactory: () async => createPlatformMonty(),
maxDepth: 3,
currentDepth: 0,
childExtensionCoordinatorFactory: (context) async {
final registry = ExtensionCoordinator();
registry.register(SandboxExtension(
platformFactory: () async => createPlatformMonty(),
maxDepth: 3,
currentDepth: 1, // One level deeper
));
return registry;
},
)
When currentDepth >= maxDepth, sandbox_spawn() throws StateError
with the message "Maximum sandbox recursion depth (N) exceeded.".
Concurrency Limits
maxChildren limits the number of alive children at any time.
When the limit is reached, sandbox_spawn() throws StateError with
the message "Maximum concurrent children (N) reached.".
Freed children do not count against the limit. After
sandbox_free(handle), the slot is available for new children.
Providing Extensions to Children
By default, children only get the introspection builtins (if a
ExtensionCoordinator is attached). Use childExtensionCoordinatorFactory to
give children access to host functions:
SandboxExtension(
platformFactory: () async => createPlatformMonty(),
childExtensionCoordinatorFactory: (context) async {
final registry = ExtensionCoordinator();
registry.register(MathExtension());
registry.register(StorageExtension());
// Note: do NOT register SandboxExtension here unless you want
// recursive spawning (and remember to increment currentDepth)
return registry;
},
)
The factory receives a ChildSpawnContext with the child's childId
and optional workingDirectory -- use these for per-child resource
configuration.
Return null from the factory to give children only introspection
builtins (no extensions). If the factory itself is null, children get
no extensions at all and no introspection.
Automatic Extension Inheritance
When childExtensionCoordinatorFactory is null, children automatically
inherit extensions from parentExtensions that opt in via
createChildInstance():
SandboxExtension(
platformFactory: () async => createPlatformMonty(),
parentExtensions: registry.extensions, // Pass parent's extension list
)
Each parent extension's createChildInstance(context:) is called with a
ChildSpawnContext. Extensions that return a new instance are registered
on the child's bridge. Extensions that return null are excluded.
Per-Child Filesystem Isolation
The sandboxBaseDir parameter enables per-child working directories:
SandboxExtension(
platformFactory: () async => createPlatformMonty(),
sandboxBaseDir: '/data',
parentExtensions: registry.extensions,
)
When set, each child's ChildSpawnContext.workingDirectory is computed
as $sandboxBaseDir/.sandboxes/child_$id (e.g.,
/data/.sandboxes/child_0). The directory is not created by
SandboxExtension -- consumers (e.g., an FsExtension.createChildInstance)
are responsible for creating and managing it.
Child System Prompts
SandboxExtension supports injecting custom system prompts into child
sandboxes via two layers:
Layer 1: Infrastructure Builder (static, from Dart)
The systemPromptBuilder callback produces static, infrastructure-level
prompt content from ChildSpawnContext:
SandboxExtension(
platformFactory: () async => createPlatformMonty(),
sandboxBaseDir: '/data',
systemPromptBuilder: (context) =>
'You are child ${context.childId}. '
'Your workspace is ${context.workingDirectory}. '
'Do not access other children\'s data.',
)
- Computed from
ChildSpawnContext(childId, workingDirectory) - Infrastructure truths that should never be wrong
- Cannot be prompt-injected by the parent LLM
- Return
nullto skip the builder layer for a specific child
Layer 2: Parent LLM Fragment (dynamic, from Python)
The system_prompt parameter on sandbox_spawn lets the parent LLM
inject role-specific instructions at runtime:
h = sandbox_spawn(
"analyze(data)",
system_prompt="You are the validator. Check results for correctness."
)
- Role assignment, task-specific instructions
- The parent LLM's planning decision at runtime
- Optional -- omit if the parent doesn't need to customize
Concatenation Order
When both layers are present, the builder output comes first (infrastructure truth), then the runtime fragment (role assignment), separated by a blank line:
You are child 0. Your workspace is /data/.sandboxes/child_0.
You are the validator. Check results for correctness.
How It Works
The concatenated prompt is injected into the child's
ExtensionCoordinator.systemPromptPrefix after registry construction.
This setter-based approach guarantees prompt injection regardless of
whether the registry was built by inheritance or a custom factory --
factories cannot accidentally forget to wire the prompt.
If no extensions exist but a prompt is provided, an empty ExtensionCoordinator
is created automatically so the prompt (and introspection builtins) are
available to the child.
Disposal and Cleanup
When SandboxExtension.onDispose() is called:
- All living children are torn down (disposed).
- Each child's completer is completed with a
StateError. - Unhandled async errors are suppressed (via
future.ignore()). - The children map is cleared.
Disposal is idempotent -- calling onDispose() multiple times is safe.
When a child completes (normally or with error), the extension performs best-effort cleanup: the child's bridge is disposed, its platform is disposed, and its extension registry (if any) is disposed. Cleanup errors are swallowed to avoid masking the child's actual result.
Production Patterns
Fan-Out / Fan-In
# Fan out work
handles = []
items = ["task_a", "task_b", "task_c", "task_d"]
i = 0
while i < len(items):
h = sandbox_spawn('process("' + items[i] + '")')
handles.append(h)
i = i + 1
# Fan in results
results = sandbox_await_all(handles)
# Clean up all handles
i = 0
while i < len(handles):
sandbox_free(handles[i])
i = i + 1
Timeout with Fallback
h = sandbox_spawn("slow_computation()", timeout_ms=5000)
try:
result = sandbox_await(h)
except:
# Child timed out or failed
result = "fallback_value"
sandbox_free(h)
Checking Progress
h = sandbox_spawn("long_running_task()")
# Poll periodically (in practice, do useful work between checks)
while sandbox_is_alive(h):
# ... do other work ...
pass
result = sandbox_await(h)
output = sandbox_get_output(h)
sandbox_free(h)
Writing Custom Extensions for Production
Extension Design Checklist
-
Namespace: Choose a short, descriptive namespace (e.g.,
db,http,auth). It must match[a-z][a-z0-9_]*and be at most 32 characters. -
Function naming: All function names must start with
{namespace}_. Keep names descriptive but concise:db_query,db_execute,db_tables. -
System prompt context: Provide
systemPromptContextif your extension needs explanation beyond what the function schemas convey. This text goes into LLM system prompts viagenerateSystemPrompt(). -
Lifecycle hooks: Use
onRegister()to initialize resources (open connections, load configs). UseonDispose()to clean them up. Both must be idempotent. -
Error handling: Let exceptions propagate naturally -- the bridge converts them to Python errors. Only catch if you need custom recovery or cleanup.
-
Parameter types: Use the most specific
HostParamTypepossible. Reserveanyfor genuinely polymorphic parameters. UsejsonSchemaOverridefor complex types thatHostParamTypecannot express. -
Thread safety: If your extension holds mutable state, consider that
DefaultMontyBridgeprocesses one execution at a time (it throwsStateErroron concurrentexecute()calls), but futures batching means multiple handlers can run concurrently within a single execution.
Multi-Session Patterns
When running multiple bridge sessions (e.g., one per user), each session needs its own instances:
Future<(MontyBridge, ExtensionCoordinator)> createSession() async {
final registry = ExtensionCoordinator()
..register(StorageExtension()) // Fresh instance per session
..register(MathExtension());
final bridge = MontyBridge(platform: createPlatformMonty());
await registry.attachTo(bridge);
return (bridge, registry);
}
// Each session is fully isolated
final (bridge1, registry1) = await createSession();
final (bridge2, registry2) = await createSession();
// Dispose independently
bridge1.dispose();
await registry1.disposeAll();
Each extension instance maintains its own state. Two StorageExtension
instances do not share data.
Futures Batching (MontyRuntime(useFutures: true))
By default MontyRuntime dispatches host functions serially: each
call awaits its handler before resuming Python. Set useFutures: true
on the constructor to flip the bridge into the futures-batching path —
host calls dispatch concurrently within a single Python execution, and
Python's await ext() against a Dart-registered handler returns the
resolved value instead of raising TypeError.
final runtime = MontyRuntime(useFutures: true)
..register(slowHostFn);
await runtime.execute('''
import asyncio
results = await asyncio.gather(slow(1), slow(2), slow(3))
''').result;
// All three host calls dispatch concurrently; total wall time ≈ one delay,
// not three.
What changes when useFutures: true:
- Python calls a host function. The bridge dispatches it via
dispatchToolCallAsFuture— the handler is launched as an unawaitedFutureand registered in_pendingFutures. The platform is resumed viaresumeAsFuture()so Python keeps running. - If Python calls more host functions before suspending, those are also dispatched as futures, in parallel with earlier ones.
- When Python's interpreter actually needs the concrete result of a
host call (any
await fn()or implicit await insideasyncio.gather), the platform emitsMontyResolveFutureswith the list of pending call IDs. - The bridge awaits each registered future, collects values into
resultsand per-call exceptions intoerrors, then feeds the batch back via(_platform as MontyFutureCapable).resolveFutures( results, errors: errors).
This is transparent to handler implementations — they're written the same way regardless of the flag. The only difference is execution ordering.
When to leave the flag off (the default)
- Your handlers don't benefit from concurrency (sync values, fast pure computations).
- Your handlers mutate shared state and you need the legacy serial- dispatch guarantee. Concurrent dispatch lets handlers race against each other; serialising at the dispatch boundary is the simplest way to avoid that.
- You want bit-for-bit back-compat with pre-flag behaviour.
When to flip it on
- Your script uses Python
await ext()against a Dart-registered host function (the only way to express "this call is async" inside Python). WithoutuseFutures: true, that line raisesTypeError: 'str' object can't be awaitedbecause the eager dispatch path returns a plain value. - Your handlers do real I/O (HTTP, file, sub-process) and you want
asyncio.gatherto actually parallelise — sequential dispatch costs you the sum of all handler latencies.
Error handling
useFutures: true collects per-call errors into the errors map and
hands them to resolveFutures. The pydantic-monty engine surfaces a
failed future as a script termination (MontyScriptError) with the
error message verbatim — Python's own try / except RuntimeError
around await fn() does not catch it today. If you want recovery
semantics, do the catching Dart-side inside the handler and return a
sentinel value.
For the cell-by-cell contract across every (Dart × Python × API layer
× backend) combination, see
dart_monty_core's async-matrix deep dive and the
matrix tests in test/integration/_runtime_async_matrix_body.dart.