API Reference¶
Packages¶
inference.networking.x-k8s.io/v1alpha2¶
Package v1alpha2 contains API Schema definitions for the inference.networking.x-k8s.io API group.
Resource Types¶
Group¶
Underlying type: string
Group refers to a Kubernetes Group. It must either be an empty string or a RFC 1123 subdomain.
This validation is based off of the corresponding Kubernetes validation: https://github.com/kubernetes/apimachinery/blob/02cfb53916346d085a6c6c7c66f882e3c6b0eca6/pkg/util/validation/validation.go#L208
Valid values include:
- "" - empty string implies core Kubernetes API group
- "gateway.networking.k8s.io"
- "foo.example.com"
Invalid values include:
- "example.com/bar" - "/" is an invalid character
Validation:
- MaxLength: 253
- Pattern: ^$|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$
Appears in: - PoolObjectReference
InferenceObjective¶
InferenceObjective is the Schema for the InferenceObjectives API.
| Field | Description | Default | Validation | 
|---|---|---|---|
| apiVersionstring | inference.networking.x-k8s.io/v1alpha2 | ||
| kindstring | InferenceObjective | ||
| metadataObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
| specInferenceObjectiveSpec | |||
| statusInferenceObjectiveStatus | 
InferenceObjectiveSpec¶
InferenceObjectiveSpec represents the desired state of a specific model use case. This resource is managed by the "Inference Workload Owner" persona.
The Inference Workload Owner persona is someone that trains, verifies, and leverages a large language model from a model frontend, drives the lifecycle and rollout of new versions of those models, and defines the specific performance and latency goals for the model. These workloads are expected to operate within an InferencePool sharing compute capacity with other InferenceObjectives, defined by the Inference Platform Admin.
Appears in: - InferenceObjective
| Field | Description | Default | Validation | 
|---|---|---|---|
| priorityinteger | Priority defines how important it is to serve the request compared to other requests in the same pool. Priority is an integer value that defines the priority of the request. The higher the value, the more critical the request is; negative values are allowed. No default value is set for this field, allowing for future additions of new fields that may 'one of' with this field. However, implementations that consume this field (such as the Endpoint Picker) will treat an unset value as '0'. Priority is used in flow control, primarily in the event of resource scarcity(requests need to be queued). All requests will be queued, and flow control will always allow requests of higher priority to be served first. Fairness is only enforced and tracked between requests of the same priority. Example: requests with Priority 10 will always be served before requests with Priority of 0 (the value used if Priority is unset or no InfereneceObjective is specified). Similarly requests with a Priority of -10 will always be served after requests with Priority of 0. | ||
| poolRefPoolObjectReference | PoolRef is a reference to the inference pool, the pool must exist in the same namespace. | Required: {} | 
InferenceObjectiveStatus¶
InferenceObjectiveStatus defines the observed state of InferenceObjective
Appears in: - InferenceObjective
| Field | Description | Default | Validation | 
|---|---|---|---|
| conditionsCondition array | Conditions track the state of the InferenceObjective. Known condition types are: * "Accepted" | [map[lastTransitionTime:1970-01-01T00:00:00Z message:Waiting for controller reason:Pending status:Unknown type:Ready]] | MaxItems: 8 | 
Kind¶
Underlying type: string
Kind refers to a Kubernetes Kind.
Valid values include:
- "Service"
- "HTTPRoute"
Invalid values include:
- "invalid/kind" - "/" is an invalid character
Validation:
- MaxLength: 63
- MinLength: 1
- Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$
Appears in: - PoolObjectReference
ObjectName¶
Underlying type: string
ObjectName refers to the name of a Kubernetes object. Object names can have a variety of forms, including RFC 1123 subdomains, RFC 1123 labels, or RFC 1035 labels.
Validation: - MaxLength: 253 - MinLength: 1
Appears in: - PoolObjectReference
PoolObjectReference¶
PoolObjectReference identifies an API object within the namespace of the referrer.
Appears in: - InferenceObjectiveSpec
| Field | Description | Default | Validation | 
|---|---|---|---|
| groupGroup | Group is the group of the referent. | inference.networking.k8s.io | MaxLength: 253 Pattern: ^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ | 
| kindKind | Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 MinLength: 1 Pattern: ^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$ | 
| nameObjectName | Name is the name of the referent. | MaxLength: 253 MinLength: 1 Required: {} |