Options Reference¶
- class kubeflow.trainer.options.Name(name: str) None[source]¶
Bases:
objectSet a custom name for the TrainJob resource.
This option works with all backends.
- Parameters:
name (
str) – Custom name for the job. Must be a valid identifier.
- __call__(job_spec: dict[str, Any], trainer: BuiltinTrainer | CustomTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]¶
Apply custom name to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – Job specification dictionary to modify.trainer (
BuiltinTrainer|CustomTrainer|CustomTrainerContainer|None) – Optional trainer instance for context.backend (
RuntimeBackend) – Backend instance for validation and context.
- class kubeflow.trainer.options.Labels(labels: dict[str, str]) None[source]¶
Bases:
objectAdd labels to the TrainJob resource metadata (.metadata.labels).
- Supported backends:
Kubernetes
- Parameters:
labels (
dict[str,str]) – Dictionary of label key-value pairs to add to TrainJob metadata.
- __call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]¶
Apply labels to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – Job specification dictionary to modify.trainer (
CustomTrainer|BuiltinTrainer|None) – Optional trainer instance for context.backend (
RuntimeBackend) – Backend instance for validation.
- Raises:
ValueError – If backend does not support labels.
- class kubeflow.trainer.options.Annotations(annotations: dict[str, str]) None[source]¶
Bases:
objectAdd annotations to the TrainJob resource metadata (.metadata.annotations).
- Supported backends:
Kubernetes
- Parameters:
annotations (
dict[str,str]) – Dictionary of annotation key-value pairs to add to TrainJob metadata.
- __call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]¶
Apply annotations to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – Job specification dictionary to modify.trainer (
CustomTrainer|BuiltinTrainer|None) – Optional trainer instance for context.backend (
RuntimeBackend) – Backend instance for validation.
- Raises:
ValueError – If backend does not support annotations.
- class kubeflow.trainer.options.TrainerCommand(command: list[str]) None[source]¶
Bases:
objectOverride the trainer container command (.spec.trainer.command).
Can only be used with CustomTrainerContainer. CustomTrainer generates its own command from the function, and BuiltinTrainer uses pre-configured commands.
- Supported backends:
Kubernetes
- __call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]¶
Apply trainer command override to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – The job specification to modify.trainer (
CustomTrainer|BuiltinTrainer|CustomTrainerContainer|None) – Optional trainer context for validation.backend (
RuntimeBackend) – Backend instance for validation.
- Raises:
ValueError – If backend doesn’t support or trainer type conflicts.
- class kubeflow.trainer.options.TrainerArgs(args: list[str]) None[source]¶
Bases:
objectOverride the trainer container arguments (.spec.trainer.args).
Can only be used with CustomTrainerContainer. CustomTrainer generates its own arguments from the function, and BuiltinTrainer uses pre-configured arguments.
- Supported backends:
Kubernetes
- __call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | CustomTrainerContainer | None, backend: RuntimeBackend) None[source]¶
Apply trainer args override to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – The job specification to modify.trainer (
CustomTrainer|BuiltinTrainer|CustomTrainerContainer|None) – Optional trainer context for validation.backend (
RuntimeBackend) – Backend instance for validation.
- Raises:
ValueError – If backend doesn’t support or trainer type conflicts.
- class kubeflow.trainer.options.RuntimePatch(training_runtime_spec: TrainingRuntimeSpecPatch | None = None) None[source]¶
Bases:
objectAdd runtime patches to the TrainJob (.spec.runtimePatches).
Runtime patches allow controllers, admission webhooks, and custom clients to attach structured patches to a TrainJob without conflicting with each other. Each patch is keyed by a unique manager field, which is automatically set to “trainer.kubeflow.org/kubeflow-sdk” by the SDK.
- Supported backends:
Kubernetes
- Parameters:
training_runtime_spec (
TrainingRuntimeSpecPatch|None) – Allowed patches for ClusterTrainingRuntime or TrainingRuntime-based jobs.
- training_runtime_spec: TrainingRuntimeSpecPatch | None = None¶
- __call__(job_spec: dict[str, Any], trainer: CustomTrainer | BuiltinTrainer | None, backend: RuntimeBackend) None[source]¶
Apply runtime patch to the job specification.
- Parameters:
job_spec (
dict[str,Any]) – Job specification dictionary to modify.trainer (
CustomTrainer|BuiltinTrainer|None) – Optional trainer instance for context.backend (
RuntimeBackend) – Backend instance for validation.
- Raises:
ValueError – If backend does not support runtime patches.
- class kubeflow.trainer.options.TrainingRuntimeSpecPatch(template: JobSetTemplatePatch | None = None) None[source]¶
Bases:
objectConfiguration for patching the TrainingRuntime spec.
- Parameters:
template (
JobSetTemplatePatch|None) – JobSet template patches.
- template: JobSetTemplatePatch | None = None¶
- class kubeflow.trainer.options.JobSetTemplatePatch(metadata: dict | None = None, spec: JobSetSpecPatch | None = None) None[source]¶
Bases:
objectConfiguration for patching the JobSet template.
- Parameters:
metadata (
dict|None) – Metadata patches (labels, annotations) for the JobSet.spec (
JobSetSpecPatch|None) – JobSet spec patches.
- spec: JobSetSpecPatch | None = None¶
- class kubeflow.trainer.options.JobSetSpecPatch(replicated_jobs: list[ReplicatedJobPatch] | None = None) None[source]¶
Bases:
objectConfiguration for patching the JobSet spec.
- Parameters:
replicated_jobs (
list[ReplicatedJobPatch] |None) – Per-job patches, keyed by job name.
- replicated_jobs: list[ReplicatedJobPatch] | None = None¶
- class kubeflow.trainer.options.ReplicatedJobPatch(name: str, template: JobTemplatePatch | None = None) None[source]¶
Bases:
objectConfiguration for patching a specific replicated job within the JobSet.
- Parameters:
name (
str) – Name of the replicated job to patch (e.g. “node”, “launcher”).template (
JobTemplatePatch|None) – Job template patches.
- template: JobTemplatePatch | None = None¶
- class kubeflow.trainer.options.JobTemplatePatch(metadata: dict | None = None, spec: JobSpecPatch | None = None) None[source]¶
Bases:
objectConfiguration for patching a Job template within a replicated job.
- Parameters:
metadata (
dict|None) – Metadata patches (labels, annotations) for the Job template.spec (
JobSpecPatch|None) – Job spec patches.
- spec: JobSpecPatch | None = None¶
- class kubeflow.trainer.options.JobSpecPatch(template: PodTemplatePatch | None = None) None[source]¶
Bases:
objectConfiguration for patching the Job spec.
- Parameters:
template (
PodTemplatePatch|None) – Pod template patches for this Job.
- template: PodTemplatePatch | None = None¶
- class kubeflow.trainer.options.PodTemplatePatch(metadata: dict | None = None, spec: PodSpecPatch | None = None) None[source]¶
Bases:
objectConfiguration for patching a Pod template within a Job.
- Parameters:
metadata (
dict|None) – Metadata patches (labels, annotations) for the Pod template.spec (
PodSpecPatch|None) – Pod spec patches.
- spec: PodSpecPatch | None = None¶
- class kubeflow.trainer.options.PodSpecPatch(service_account_name: str | None = None, volumes: list[dict] | None = None, init_containers: list[ContainerPatch] | None = None, containers: list[ContainerPatch] | None = None, image_pull_secrets: list[dict] | None = None, security_context: dict | None = None, node_selector: dict[str, str] | None = None, affinity: dict | None = None, tolerations: list[dict] | None = None, scheduling_gates: list[dict] | None = None) None[source]¶
Bases:
objectConfiguration for patching pod spec fields that managers are permitted to set.
- Parameters:
service_account_name (
str|None) – Service account to use for the pods.volumes (
list[dict] |None) – Volumes to add/merge with the pod.init_containers (
list[ContainerPatch] |None) – Init containers to add/merge with the pod.containers (
list[ContainerPatch] |None) – Containers to add/merge with the pod.image_pull_secrets (
list[dict] |None) – Image pull secrets for the pods.security_context (
dict|None) – Pod-level security context.node_selector (
dict[str,str] |None) – Node selector to place pods on specific nodes.tolerations (
list[dict] |None) – Tolerations for pod scheduling.scheduling_gates (
list[dict] |None) – Scheduling gates for the pods.
- init_containers: list[ContainerPatch] | None = None¶
- containers: list[ContainerPatch] | None = None¶
- class kubeflow.trainer.options.ContainerPatch(name: str, env: list[dict] | None = None, volume_mounts: list[dict] | None = None, security_context: dict | None = None) None[source]¶
Bases:
objectConfiguration for patching a specific container in a pod.
- Parameters:
name (
str) – Name of the container to patch (must exist in the Runtime).env (
list[dict] |None) – Environment variables to add/merge with the container. Each dict should have ‘name’ and ‘value’ or ‘valueFrom’ keys.volume_mounts (
list[dict] |None) – Volume mounts to add/merge with the container. Each dict should have ‘name’ and ‘mountPath’ keys at minimum.security_context (
dict|None) – Security context for the container.