Options Reference

class kubeflow.trainer.options.Name(name: str) None[source]

Bases: object

Set a custom name for the TrainJob resource.

This option works with all backends.

Parameters:

name (str) – Custom name for the job. Must be a valid identifier.

name: str
__call__(job_spec: dict[str, Any], trainer: BuiltinTrainer | CustomTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]

Apply custom name to the job specification.

Parameters:
class kubeflow.trainer.options.Labels(labels: dict[str, str]) None[source]

Bases: object

Add labels to the TrainJob resource metadata (.metadata.labels).

Supported backends:
  • Kubernetes

Parameters:

labels (dict[str, str]) – Dictionary of label key-value pairs to add to TrainJob metadata.

labels: dict[str, str]
__call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]

Apply labels to the job specification.

Parameters:
  • job_spec (dict[str, Any]) – Job specification dictionary to modify.

  • trainer (CustomTrainer | BuiltinTrainer | None) – Optional trainer instance for context.

  • backend (RuntimeBackend) – Backend instance for validation.

Raises:

ValueError – If backend does not support labels.

class kubeflow.trainer.options.Annotations(annotations: dict[str, str]) None[source]

Bases: object

Add annotations to the TrainJob resource metadata (.metadata.annotations).

Supported backends:
  • Kubernetes

Parameters:

annotations (dict[str, str]) – Dictionary of annotation key-value pairs to add to TrainJob metadata.

annotations: dict[str, str]
__call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]

Apply annotations to the job specification.

Parameters:
  • job_spec (dict[str, Any]) – Job specification dictionary to modify.

  • trainer (CustomTrainer | BuiltinTrainer | None) – Optional trainer instance for context.

  • backend (RuntimeBackend) – Backend instance for validation.

Raises:

ValueError – If backend does not support annotations.

class kubeflow.trainer.options.TrainerCommand(command: list[str]) None[source]

Bases: object

Override the trainer container command (.spec.trainer.command).

Can only be used with CustomTrainerContainer. CustomTrainer generates its own command from the function, and BuiltinTrainer uses pre-configured commands.

Supported backends:
  • Kubernetes

Parameters:

command (list[str]) – List of command strings to override the default trainer command.

command: list[str]
__call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]

Apply trainer command override to the job specification.

Parameters:
Raises:

ValueError – If backend doesn’t support or trainer type conflicts.

class kubeflow.trainer.options.TrainerArgs(args: list[str]) None[source]

Bases: object

Override the trainer container arguments (.spec.trainer.args).

Can only be used with CustomTrainerContainer. CustomTrainer generates its own arguments from the function, and BuiltinTrainer uses pre-configured arguments.

Supported backends:
  • Kubernetes

Parameters:

args (list[str]) – List of argument strings to override the default trainer arguments.

args: list[str]
__call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]

Apply trainer args override to the job specification.

Parameters:
Raises:

ValueError – If backend doesn’t support or trainer type conflicts.

class kubeflow.trainer.options.RuntimePatch(training_runtime_spec: TrainingRuntimeSpecPatch | None = None) None[source]

Bases: object

Add runtime patches to the TrainJob (.spec.runtimePatches).

Runtime patches allow controllers, admission webhooks, and custom clients to attach structured patches to a TrainJob without conflicting with each other. Each patch is keyed by a unique manager field, which is automatically set to “trainer.kubeflow.org/kubeflow-sdk” by the SDK.

Supported backends:
  • Kubernetes

Parameters:

training_runtime_spec (TrainingRuntimeSpecPatch | None) – Allowed patches for ClusterTrainingRuntime or TrainingRuntime-based jobs.

training_runtime_spec: TrainingRuntimeSpecPatch | None = None
manager: str = 'trainer.kubeflow.org/kubeflow-sdk'
__call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]

Apply runtime patch to the job specification.

Parameters:
  • job_spec (dict[str, Any]) – Job specification dictionary to modify.

  • trainer (CustomTrainer | BuiltinTrainer | None) – Optional trainer instance for context.

  • backend (RuntimeBackend) – Backend instance for validation.

Raises:

ValueError – If backend does not support runtime patches.

class kubeflow.trainer.options.TrainingRuntimeSpecPatch(template: JobSetTemplatePatch | None = None) None[source]

Bases: object

Configuration for patching the TrainingRuntime spec.

Parameters:

template (JobSetTemplatePatch | None) – JobSet template patches.

template: JobSetTemplatePatch | None = None
class kubeflow.trainer.options.JobSetTemplatePatch(metadata: dict | None = None, spec: JobSetSpecPatch | None = None) None[source]

Bases: object

Configuration for patching the JobSet template.

Parameters:
  • metadata (dict | None) – Metadata patches (labels, annotations) for the JobSet.

  • spec (JobSetSpecPatch | None) – JobSet spec patches.

metadata: dict | None = None
spec: JobSetSpecPatch | None = None
class kubeflow.trainer.options.JobSetSpecPatch(replicated_jobs: list[ReplicatedJobPatch] | None = None) None[source]

Bases: object

Configuration for patching the JobSet spec.

Parameters:

replicated_jobs (list[ReplicatedJobPatch] | None) – Per-job patches, keyed by job name.

replicated_jobs: list[ReplicatedJobPatch] | None = None
class kubeflow.trainer.options.ReplicatedJobPatch(name: str, template: JobTemplatePatch | None = None) None[source]

Bases: object

Configuration for patching a specific replicated job within the JobSet.

Parameters:
  • name (str) – Name of the replicated job to patch (e.g. “node”, “launcher”).

  • template (JobTemplatePatch | None) – Job template patches.

name: str
template: JobTemplatePatch | None = None
class kubeflow.trainer.options.JobTemplatePatch(metadata: dict | None = None, spec: JobSpecPatch | None = None) None[source]

Bases: object

Configuration for patching a Job template within a replicated job.

Parameters:
  • metadata (dict | None) – Metadata patches (labels, annotations) for the Job template.

  • spec (JobSpecPatch | None) – Job spec patches.

metadata: dict | None = None
spec: JobSpecPatch | None = None
class kubeflow.trainer.options.JobSpecPatch(template: PodTemplatePatch | None = None) None[source]

Bases: object

Configuration for patching the Job spec.

Parameters:

template (PodTemplatePatch | None) – Pod template patches for this Job.

template: PodTemplatePatch | None = None
class kubeflow.trainer.options.PodTemplatePatch(metadata: dict | None = None, spec: PodSpecPatch | None = None) None[source]

Bases: object

Configuration for patching a Pod template within a Job.

Parameters:
  • metadata (dict | None) – Metadata patches (labels, annotations) for the Pod template.

  • spec (PodSpecPatch | None) – Pod spec patches.

metadata: dict | None = None
spec: PodSpecPatch | None = None
class kubeflow.trainer.options.PodSpecPatch(service_account_name: str | None = None, volumes: list[dict] | None = None, init_containers: list[ContainerPatch] | None = None, containers: list[ContainerPatch] | None = None, image_pull_secrets: list[dict] | None = None, security_context: dict | None = None, node_selector: dict[str, str] | None = None, affinity: dict | None = None, tolerations: list[dict] | None = None, scheduling_gates: list[dict] | None = None) None[source]

Bases: object

Configuration for patching pod spec fields that managers are permitted to set.

Parameters:
  • service_account_name (str | None) – Service account to use for the pods.

  • volumes (list[dict] | None) – Volumes to add/merge with the pod.

  • init_containers (list[ContainerPatch] | None) – Init containers to add/merge with the pod.

  • containers (list[ContainerPatch] | None) – Containers to add/merge with the pod.

  • image_pull_secrets (list[dict] | None) – Image pull secrets for the pods.

  • security_context (dict | None) – Pod-level security context.

  • node_selector (dict[str, str] | None) – Node selector to place pods on specific nodes.

  • affinity (dict | None) – Affinity rules for pod scheduling.

  • tolerations (list[dict] | None) – Tolerations for pod scheduling.

  • scheduling_gates (list[dict] | None) – Scheduling gates for the pods.

service_account_name: str | None = None
volumes: list[dict] | None = None
init_containers: list[ContainerPatch] | None = None
containers: list[ContainerPatch] | None = None
image_pull_secrets: list[dict] | None = None
security_context: dict | None = None
node_selector: dict[str, str] | None = None
affinity: dict | None = None
tolerations: list[dict] | None = None
scheduling_gates: list[dict] | None = None
class kubeflow.trainer.options.ContainerPatch(name: str, env: list[dict] | None = None, volume_mounts: list[dict] | None = None, security_context: dict | None = None) None[source]

Bases: object

Configuration for patching a specific container in a pod.

Parameters:
  • name (str) – Name of the container to patch (must exist in the Runtime).

  • env (list[dict] | None) – Environment variables to add/merge with the container. Each dict should have ‘name’ and ‘value’ or ‘valueFrom’ keys.

  • volume_mounts (list[dict] | None) – Volume mounts to add/merge with the container. Each dict should have ‘name’ and ‘mountPath’ keys at minimum.

  • security_context (dict | None) – Security context for the container.

name: str
env: list[dict] | None = None
volume_mounts: list[dict] | None = None
security_context: dict | None = None
__post_init__()[source]

Validate the container patch configuration.