Skip to content

API Reference

Packages

inference.networking.x-k8s.io/v1alpha2

Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.

Resource Types

Group

Underlying type: string

Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.

This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208

Valid values include:

  • "" - empty string implies core Kubernetes API group
  • "gateway.networking.k8s.io"
  • "foo.example.com"

Invalid values include:

  • "example.com/bar" - "/" is an invalid character

Validation: - MaxLength: 253 - Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$

Appears in: - PoolObjectReference

InferenceObjective

InferenceObjective is the Schema for the InferenceObjectives API.

Field Description Default Validation
apiVersion string inference.networking.x-k8s.io/v1alpha2
kind string InferenceObjective
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec InferenceObjectiveSpec
status InferenceObjectiveStatus

InferenceObjectiveSpec

InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.

The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.

Appears in: - InferenceObjective

Field Description Default Validation
priority integer Priority defines how important it is to serve the request compared to other requests in the same pool.
Priority is an integer value that defines the priority of the request.
The higher the value, the more critical the request is; negative values are allowed.
No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field.
However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'.
Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued).
All requests will be queued, and flow control will always allow requests of higher priority to be served first.
Fairness is only enforced and tracked between requests of the same priority.
Example: requests with Priority 10 will always be served before
requests with Priority of 0 (the value used if Priority is unset or no InfereneceObjective is specified).
Similarly requests with a Priority of -10 will always be served after requests with Priority of 0.
poolRef PoolObjectReference PoolRef is a reference to the inference pool, the pool must exist in the same namespace. Required: {}

InferenceObjectiveStatus

InferenceObjectiveStatus defines the observed state of InferenceObjective

Appears in: - InferenceObjective

Field Description Default Validation
conditions Condition array Conditions track the state of the InferenceObjective.
Known condition types are:
* "Accepted"
[map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]] MaxItems: 8

Kind

Underlying type: string

Kind refers to a Kubernetes Kind.

Valid values include:

  • "Service"
  • "HTTPRoute"

Invalid values include:

  • "invalid/kind" - "/" is an invalid character

Validation: - MaxLength: 63 - MinLength: 1 - Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$

Appears in: - PoolObjectReference

ObjectName

Underlying type: string

ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.

Validation: - MaxLength: 253 - MinLength: 1

Appears in: - PoolObjectReference

PoolObjectReference

PoolObjectReference identifies an API object within the namespace of the referrer.

Appears in: - InferenceObjectiveSpec

Field Description Default Validation
group Group Group is the group of the referent. inference.networking.k8s.io MaxLength: 253
Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
kind Kind Kind is kind of the referent. For example "InferencePool". InferencePool MaxLength: 63
MinLength: 1
Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
name ObjectName Name is the name of the referent. MaxLength: 253
MinLength: 1
Required: {}