Shaders

A shader specifies programmable operations that execute for each vertex, control point, tessellated vertex, primitive, fragment, or workgroup in the corresponding stage(s) of the graphics and compute pipelines.

Graphics pipelines include vertex shader execution as a result of primitive assembly, followed, if enabled, by tessellation control and evaluation shaders operating on patches, geometry shaders, if enabled, operating on primitives, and fragment shaders, if present, operating on fragments generated by Rasterization. In this specification, vertex, tessellation control, tessellation evaluation and geometry shaders are collectively referred to as pre-rasterization shader stages and occur in the logical pipeline before rasterization. The fragment shader occurs logically after rasterization.

Only the compute shader stage is included in a compute pipeline. Compute shaders operate on compute invocations in a workgroup.

Shaders can read from input variables, and read from and write to output variables. Input and output variables can be used to transfer data between shader stages, or to allow the shader to interact with values that exist in the execution environment. Similarly, the execution environment provides constants describing capabilities.

Shader variables are associated with execution environment-provided inputs and outputs using built-in decorations in the shader. The available decorations for each stage are documented in the following subsections.

Shader Objects

Shaders may be compiled and linked into pipeline objects as described in Pipelines chapter, or if the shaderObject feature is enabled they may be compiled into individual per-stage shader objects which can be bound on a command buffer independently from one another. Unlike pipelines, shader objects are not intrinsically tied to any specific set of state. Instead, state is specified dynamically in the command buffer.

Each shader object represents a single compiled shader stage, which may optionally be linked with one or more other stages.

Shader objects are represented by VkShaderEXT handles:

// Provided by VK_EXT_shader_object
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkShaderEXT)

Shader Object Creation

Shader objects may be created from shader code provided as SPIR-V, or in an opaque, implementation-defined binary format specific to the physical device.

To create one or more shader objects, call:

// Provided by VK_EXT_shader_object
VkResult vkCreateShadersEXT(
    VkDevice                                    device,
    uint32_t                                    createInfoCount,
    const VkShaderCreateInfoEXT*                pCreateInfos,
    const VkAllocationCallbacks*                pAllocator,
    VkShaderEXT*                                pShaders);

device is the logical device that creates the shader objects.
createInfoCount is the length of the pCreateInfos and pShaders arrays.
pCreateInfos is a pointer to an array of VkShaderCreateInfoEXT structures.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.
pShaders is a pointer to an array of VkShaderEXT handles in which the resulting shader objects are returned.

When this function returns, whether or not it succeeds, it is guaranteed that every element of pShaders will have been overwritten by either VK_NULL_HANDLE or a valid VkShaderEXT handle.

This means that whenever shader creation fails, the application can determine which shader the returned error pertains to by locating the first VK_NULL_HANDLE element in pShaders. It also means that an application can reliably clean up from a failed call by iterating over the pShaders array and destroying every element that is not VK_NULL_HANDLE.

Valid Usage

VUID-vkCreateShadersEXT-stage-09670
If the stage member of any element of pCreateInfos is VK_SHADER_STAGE_COMPUTE_BIT, device must support at least one queue family with the VK_QUEUE_COMPUTE_BIT capability
VUID-vkCreateShadersEXT-stage-09671
If the stage member of any element of pCreateInfos is VK_SHADER_STAGE_TASK_BIT_EXT, VK_SHADER_STAGE_MESH_BIT_EXT, VK_SHADER_STAGE_VERTEX_BIT, VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, VK_SHADER_STAGE_GEOMETRY_BIT, or VK_SHADER_STAGE_FRAGMENT_BIT, device must support at least one queue family with the VK_QUEUE_GRAPHICS_BIT capability
VUID-vkCreateShadersEXT-None-08400
The shaderObject feature must be enabled
VUID-vkCreateShadersEXT-pCreateInfos-08402
If the flags member of any element of pCreateInfos includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, the flags member of all other elements of pCreateInfos whose stage is VK_SHADER_STAGE_VERTEX_BIT, VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, VK_SHADER_STAGE_GEOMETRY_BIT, or VK_SHADER_STAGE_FRAGMENT_BIT must also include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-vkCreateShadersEXT-pCreateInfos-08403
If the flags member of any element of pCreateInfos includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, the flags member of all other elements of pCreateInfos whose stage is VK_SHADER_STAGE_TASK_BIT_EXT or VK_SHADER_STAGE_MESH_BIT_EXT must also include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-vkCreateShadersEXT-pCreateInfos-08404
If the flags member of any element of pCreateInfos whose stage is VK_SHADER_STAGE_TASK_BIT_EXT or VK_SHADER_STAGE_MESH_BIT_EXT includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, there must be no member of pCreateInfos whose stage is VK_SHADER_STAGE_VERTEX_BIT and whose flags member includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-vkCreateShadersEXT-pCreateInfos-08405
If there is any element of pCreateInfos whose stage is VK_SHADER_STAGE_MESH_BIT_EXT and whose flags member includes both VK_SHADER_CREATE_LINK_STAGE_BIT_EXT and VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT, there must be no element of pCreateInfos whose stage is VK_SHADER_STAGE_TASK_BIT_EXT and whose flags member includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-vkCreateShadersEXT-pCreateInfos-08409
For each element of pCreateInfos whose flags member includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, if there is any other element of pCreateInfos whose stage is logically later than the stage of the former and whose flags member also includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, the nextStage of the former must be equal to the stage of the element with the logically earliest stage following the stage of the former whose flags member also includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-vkCreateShadersEXT-pCreateInfos-08410
The stage member of each element of pCreateInfos whose flags member includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT must be unique
VUID-vkCreateShadersEXT-pCreateInfos-08411
The codeType member of all elements of pCreateInfos whose flags member includes VK_SHADER_CREATE_LINK_STAGE_BIT_EXT must be the same
VUID-vkCreateShadersEXT-pCreateInfos-08867
If pCreateInfos contains elements with both VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT and VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, both elements' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, both elements' codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage’s pCode contains an OpExecutionMode instruction specifying the type of subdivision, it must match the subdivision type specified in the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage
VUID-vkCreateShadersEXT-pCreateInfos-08868
If pCreateInfos contains elements with both VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT and VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, both elements' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, both elements' codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage’s pCode contains an OpExecutionMode instruction specifying the orientation of triangles, it must match the triangle orientation specified in the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage
VUID-vkCreateShadersEXT-pCreateInfos-08869
If pCreateInfos contains elements with both VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT and VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, both elements' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, both elements' codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage’s pCode contains an OpExecutionMode instruction specifying PointMode, the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage must also contain an OpExecutionMode instruction specifying PointMode
VUID-vkCreateShadersEXT-pCreateInfos-08870
If pCreateInfos contains elements with both VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT and VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, both elements' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, both elements' codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage’s pCode contains an OpExecutionMode instruction specifying the spacing of segments on the edges of tessellated primitives, it must match the segment spacing specified in the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage
VUID-vkCreateShadersEXT-pCreateInfos-08871
If pCreateInfos contains elements with both VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT and VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, both elements' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, both elements' codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage’s pCode contains an OpExecutionMode instruction specifying the output patch size, it must match the output patch size specified in the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage
VUID-vkCreateShadersEXT-pCreateInfos-09632
If pCreateInfos contains a VK_SHADER_STAGE_MESH_BIT_EXT with codeType of VK_SHADER_CODE_TYPE_SPIRV_EXT and VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT is not set, then the mesh shader’s entry point must not declare a variable with a DrawIndex BuiltIn decoration

Valid Usage (Implicit)

VUID-vkCreateShadersEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkCreateShadersEXT-pCreateInfos-parameter
pCreateInfos must be a valid pointer to an array of createInfoCount valid VkShaderCreateInfoEXT structures
VUID-vkCreateShadersEXT-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkCreateShadersEXT-pShaders-parameter
pShaders must be a valid pointer to an array of createInfoCount VkShaderEXT handles
VUID-vkCreateShadersEXT-createInfoCount-arraylength
createInfoCount must be greater than 0

Return Codes

Success

VK_SUCCESS
VK_INCOMPATIBLE_SHADER_BINARY_EXT

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY
VK_ERROR_INITIALIZATION_FAILED

The VkShaderCreateInfoEXT structure is defined as:

// Provided by VK_EXT_shader_object
typedef struct VkShaderCreateInfoEXT {
    VkStructureType                 sType;
    const void*                     pNext;
    VkShaderCreateFlagsEXT          flags;
    VkShaderStageFlagBits           stage;
    VkShaderStageFlags              nextStage;
    VkShaderCodeTypeEXT             codeType;
    size_t                          codeSize;
    const void*                     pCode;
    const char*                     pName;
    uint32_t                        setLayoutCount;
    const VkDescriptorSetLayout*    pSetLayouts;
    uint32_t                        pushConstantRangeCount;
    const VkPushConstantRange*      pPushConstantRanges;
    const VkSpecializationInfo*     pSpecializationInfo;
} VkShaderCreateInfoEXT;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
flags is a bitmask of VkShaderCreateFlagBitsEXT describing additional parameters of the shader.
stage is a VkShaderStageFlagBits value specifying a single shader stage.
nextStage is a bitmask of VkShaderStageFlagBits specifying zero or stages which may be used as a logically next bound stage when drawing with the shader bound.
codeType is a VkShaderCodeTypeEXT value specifying the type of the shader code pointed to be pCode.
codeSize is the size in bytes of the shader code pointed to be pCode.
pCode is a pointer to the shader code to use to create the shader.
pName is a pointer to a null-terminated UTF-8 string specifying the entry point name of the shader for this stage.
setLayoutCount is the number of descriptor set layouts pointed to by pSetLayouts.
pSetLayouts is a pointer to an array of VkDescriptorSetLayout objects used by the shader stage.
pushConstantRangeCount is the number of push constant ranges pointed to by pPushConstantRanges.
pPushConstantRanges is a pointer to an array of VkPushConstantRange structures used by the shader stage.
pSpecializationInfo is a pointer to a VkSpecializationInfo structure, as described in Specialization Constants, or NULL.

Valid Usage

VUID-VkShaderCreateInfoEXT-codeSize-08735
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, codeSize must be a multiple of 4
VUID-VkShaderCreateInfoEXT-pCode-08736
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must point to valid SPIR-V code, formatted and packed as described by the Khronos SPIR-V Specification
VUID-VkShaderCreateInfoEXT-pCode-08737
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must adhere to the validation rules described by the Validation Rules within a Module section of the SPIR-V Environment appendix
VUID-VkShaderCreateInfoEXT-pCode-08738
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must declare the Shader capability for SPIR-V code
VUID-VkShaderCreateInfoEXT-pCode-08739
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must not declare any capability that is not supported by the API, as described by the Capabilities section of the SPIR-V Environment appendix
VUID-VkShaderCreateInfoEXT-pCode-08740
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied
VUID-VkShaderCreateInfoEXT-pCode-08741
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must not declare any SPIR-V extension that is not supported by the API, as described by the Extension section of the SPIR-V Environment appendix
VUID-VkShaderCreateInfoEXT-pCode-08742
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and pCode declares any of the SPIR-V extensions listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied
VUID-VkShaderCreateInfoEXT-flags-08412
If stage is not VK_SHADER_STAGE_TASK_BIT_EXT, VK_SHADER_STAGE_MESH_BIT_EXT, VK_SHADER_STAGE_VERTEX_BIT, VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, VK_SHADER_STAGE_GEOMETRY_BIT, or VK_SHADER_STAGE_FRAGMENT_BIT, flags must not include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08486
If stage is not VK_SHADER_STAGE_FRAGMENT_BIT, flags must not include VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08487
If the attachmentFragmentShadingRate feature is not enabled, flags must not include VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08488
If stage is not VK_SHADER_STAGE_FRAGMENT_BIT, flags must not include VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08489
If the fragmentDensityMap feature is not enabled, flags must not include VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-09404
If flags includes VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT, the subgroupSizeControl feature must be enabled
VUID-VkShaderCreateInfoEXT-flags-09405
If flags includes VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT, the computeFullSubgroups feature must be enabled
VUID-VkShaderCreateInfoEXT-flags-11005
If flags includes VK_SHADER_CREATE_INDIRECT_BINDABLE_BIT_EXT, then the VkPhysicalDeviceDeviceGeneratedCommandsFeaturesEXT::deviceGeneratedCommands feature must be enabled
VUID-VkShaderCreateInfoEXT-flags-11006
If flags includes VK_SHADER_CREATE_INDIRECT_BINDABLE_BIT_EXT, then the identified entry point must not specify Xfb execution mode
VUID-VkShaderCreateInfoEXT-flags-08992
If flags includes VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT, stage must be one of VK_SHADER_STAGE_MESH_BIT_EXT, VK_SHADER_STAGE_TASK_BIT_EXT, or VK_SHADER_STAGE_COMPUTE_BIT
VUID-VkShaderCreateInfoEXT-flags-08485
If stage is not VK_SHADER_STAGE_COMPUTE_BIT, flags must not include VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08414
If stage is not VK_SHADER_STAGE_MESH_BIT_EXT, flags must not include VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT
VUID-VkShaderCreateInfoEXT-flags-08416
If flags includes both VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT and VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT, the local workgroup size in the X dimension of the shader must be a multiple of maxSubgroupSize
VUID-VkShaderCreateInfoEXT-flags-08417
If flags includes VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT but not VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT and no VkShaderRequiredSubgroupSizeCreateInfoEXT structure is included in the pNext chain, the local workgroup size in the X dimension of the shader must be a multiple of subgroupSize
VUID-VkShaderCreateInfoEXT-stage-08418
stage must not be VK_SHADER_STAGE_ALL_GRAPHICS or VK_SHADER_STAGE_ALL
VUID-VkShaderCreateInfoEXT-stage-08419
If the tessellationShader feature is not enabled, stage must not be VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT or VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT
VUID-VkShaderCreateInfoEXT-stage-08420
If the geometryShader feature is not enabled, stage must not be VK_SHADER_STAGE_GEOMETRY_BIT
VUID-VkShaderCreateInfoEXT-stage-08421
If the taskShader feature is not enabled, stage must not be VK_SHADER_STAGE_TASK_BIT_EXT
VUID-VkShaderCreateInfoEXT-stage-08422
If the meshShader feature is not enabled, stage must not be VK_SHADER_STAGE_MESH_BIT_EXT
VUID-VkShaderCreateInfoEXT-stage-08425
stage must not be VK_SHADER_STAGE_SUBPASS_SHADING_BIT_HUAWEI
VUID-VkShaderCreateInfoEXT-stage-08426
stage must not be VK_SHADER_STAGE_CLUSTER_CULLING_BIT_HUAWEI
VUID-VkShaderCreateInfoEXT-nextStage-08427
If stage is VK_SHADER_STAGE_VERTEX_BIT, nextStage must not include any bits other than VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, VK_SHADER_STAGE_GEOMETRY_BIT, and VK_SHADER_STAGE_FRAGMENT_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08428
If the tessellationShader feature is not enabled, nextStage must not include VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT or VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08429
If the geometryShader feature is not enabled, nextStage must not include VK_SHADER_STAGE_GEOMETRY_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08430
If stage is VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, nextStage must not include any bits other than VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08431
If stage is VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, nextStage must not include any bits other than VK_SHADER_STAGE_GEOMETRY_BIT and VK_SHADER_STAGE_FRAGMENT_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08433
If stage is VK_SHADER_STAGE_GEOMETRY_BIT, nextStage must not include any bits other than VK_SHADER_STAGE_FRAGMENT_BIT
VUID-VkShaderCreateInfoEXT-nextStage-08434
If stage is VK_SHADER_STAGE_FRAGMENT_BIT or VK_SHADER_STAGE_COMPUTE_BIT, nextStage must be 0
VUID-VkShaderCreateInfoEXT-nextStage-08435
If stage is VK_SHADER_STAGE_TASK_BIT_EXT, nextStage must not include any bits other than VK_SHADER_STAGE_MESH_BIT_EXT
VUID-VkShaderCreateInfoEXT-nextStage-08436
If stage is VK_SHADER_STAGE_MESH_BIT_EXT, nextStage must not include any bits other than VK_SHADER_STAGE_FRAGMENT_BIT
VUID-VkShaderCreateInfoEXT-pName-08440
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pName must be the name of an OpEntryPoint in pCode with an execution model that matches stage
VUID-VkShaderCreateInfoEXT-pCode-08492
If codeType is VK_SHADER_CODE_TYPE_BINARY_EXT, pCode must be aligned to 16 bytes
VUID-VkShaderCreateInfoEXT-pCode-08493
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, pCode must be aligned to 4 bytes
VUID-VkShaderCreateInfoEXT-pCode-08448
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the identified entry point includes any variable in its interface that is declared with the ClipDistance BuiltIn decoration, that variable must not have an array size greater than VkPhysicalDeviceLimits::maxClipDistances
VUID-VkShaderCreateInfoEXT-pCode-08449
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the identified entry point includes any variable in its interface that is declared with the CullDistance BuiltIn decoration, that variable must not have an array size greater than VkPhysicalDeviceLimits::maxCullDistances
VUID-VkShaderCreateInfoEXT-pCode-08450
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the identified entry point includes variables in its interface that are declared with the ClipDistance BuiltIn decoration and variables in its interface that are declared with the CullDistance BuiltIn decoration, those variables must not have array sizes which sum to more than VkPhysicalDeviceLimits::maxCombinedClipAndCullDistances
VUID-VkShaderCreateInfoEXT-pCode-08451
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and the identified entry point includes any variable in its interface that is declared with the SampleMask BuiltIn decoration, that variable must not have an array size greater than VkPhysicalDeviceLimits::maxSampleMaskWords
VUID-VkShaderCreateInfoEXT-pCode-08453
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT or VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, and the identified entry point has an OpExecutionMode instruction specifying a patch size with OutputVertices, the patch size must be greater than 0 and less than or equal to VkPhysicalDeviceLimits::maxTessellationPatchSize
VUID-VkShaderCreateInfoEXT-pCode-08454
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_GEOMETRY_BIT, the identified entry point must have an OpExecutionMode instruction specifying a maximum output vertex count that is greater than 0 and less than or equal to VkPhysicalDeviceLimits::maxGeometryOutputVertices
VUID-VkShaderCreateInfoEXT-pCode-08455
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_GEOMETRY_BIT, the identified entry point must have an OpExecutionMode instruction specifying an invocation count that is greater than 0 and less than or equal to VkPhysicalDeviceLimits::maxGeometryShaderInvocations
VUID-VkShaderCreateInfoEXT-pCode-08456
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is a pre-rasterization shader stage, and the identified entry point writes to Layer for any primitive, it must write the same value to Layer for all vertices of a given primitive
VUID-VkShaderCreateInfoEXT-pCode-08457
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is a pre-rasterization shader stage, and the identified entry point writes to ViewportIndex for any primitive, it must write the same value to ViewportIndex for all vertices of a given primitive
VUID-VkShaderCreateInfoEXT-pCode-08459
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_FRAGMENT_BIT, and the identified entry point writes to FragDepth in any execution path, all execution paths that are not exclusive to helper invocations must either discard the fragment, or write or initialize the value of FragDepth
VUID-VkShaderCreateInfoEXT-pCode-08460
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, the shader code in pCode must be valid as described by the Khronos SPIR-V Specification after applying the specializations provided in pSpecializationInfo, if any, and then converting all specialization constants into fixed constants
VUID-VkShaderCreateInfoEXT-codeType-08872
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, pCode must contain an OpExecutionMode instruction specifying the type of subdivision
VUID-VkShaderCreateInfoEXT-codeType-08873
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, pCode must contain an OpExecutionMode instruction specifying the orientation of triangles generated by the tessellator
VUID-VkShaderCreateInfoEXT-codeType-08874
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, pCode must contain an OpExecutionMode instruction specifying the spacing of segments on the edges of tessellated primitives
VUID-VkShaderCreateInfoEXT-codeType-08875
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and stage is VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, pCode must contain an OpExecutionMode instruction specifying the output patch size
VUID-VkShaderCreateInfoEXT-pPushConstantRanges-10063
Any two elements of pPushConstantRanges must not include the same stage in stageFlags
VUID-VkShaderCreateInfoEXT-codeType-10064
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and if a push constant block is declared in a shader, then an element of pPushConstantRanges::stageFlags must match stage
VUID-VkShaderCreateInfoEXT-codeType-10065
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and if a push constant block is declared in a shader, the block must be contained inside the element of pPushConstantRanges that matches the stage
VUID-VkShaderCreateInfoEXT-codeType-10383
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and a resource variable is declared in a shader, the corresponding descriptor set in pSetLayouts must match the shader stage
VUID-VkShaderCreateInfoEXT-codeType-10384
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and a resource variable is declared in a shader, and the descriptor type is not VK_DESCRIPTOR_TYPE_MUTABLE_EXT, the corresponding descriptor set in pSetLayouts must match the descriptor type
VUID-VkShaderCreateInfoEXT-codeType-10385
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and a resource variable is declared in a shader as an array, the corresponding descriptor set in pSetLayouts must match the descriptor count
VUID-VkShaderCreateInfoEXT-codeType-10386
If codeType is VK_SHADER_CODE_TYPE_SPIRV_EXT, and a resource variable is declared in a shader as an array of descriptors, then the descriptor type of that variable must not be VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK

Valid Usage (Implicit)

VUID-VkShaderCreateInfoEXT-sType-sType
sType must be VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT
VUID-VkShaderCreateInfoEXT-pNext-pNext
Each pNext member of any structure (including this one) in the pNext chain must be either NULL or a pointer to a valid instance of VkPipelineShaderStageRequiredSubgroupSizeCreateInfo or VkValidationFeaturesEXT
VUID-VkShaderCreateInfoEXT-sType-unique
The sType value of each struct in the pNext chain must be unique
VUID-VkShaderCreateInfoEXT-flags-parameter
flags must be a valid combination of VkShaderCreateFlagBitsEXT values
VUID-VkShaderCreateInfoEXT-stage-parameter
stage must be a valid VkShaderStageFlagBits value
VUID-VkShaderCreateInfoEXT-nextStage-parameter
nextStage must be a valid combination of VkShaderStageFlagBits values
VUID-VkShaderCreateInfoEXT-codeType-parameter
codeType must be a valid VkShaderCodeTypeEXT value
VUID-VkShaderCreateInfoEXT-pCode-parameter
pCode must be a valid pointer to an array of codeSize bytes
VUID-VkShaderCreateInfoEXT-pName-parameter
If pName is not NULL, pName must be a null-terminated UTF-8 string
VUID-VkShaderCreateInfoEXT-pSetLayouts-parameter
If setLayoutCount is not 0, and pSetLayouts is not NULL, pSetLayouts must be a valid pointer to an array of setLayoutCount valid VkDescriptorSetLayout handles
VUID-VkShaderCreateInfoEXT-pPushConstantRanges-parameter
If pushConstantRangeCount is not 0, and pPushConstantRanges is not NULL, pPushConstantRanges must be a valid pointer to an array of pushConstantRangeCount valid VkPushConstantRange structures
VUID-VkShaderCreateInfoEXT-pSpecializationInfo-parameter
If pSpecializationInfo is not NULL, pSpecializationInfo must be a valid pointer to a valid VkSpecializationInfo structure
VUID-VkShaderCreateInfoEXT-codeSize-arraylength
codeSize must be greater than 0

// Provided by VK_EXT_shader_object
typedef VkFlags VkShaderCreateFlagsEXT;

VkShaderCreateFlagsEXT is a bitmask type for setting a mask of zero or more VkShaderCreateFlagBitsEXT.

Possible values of the flags member of VkShaderCreateInfoEXT specifying how a shader object is created, are:

// Provided by VK_EXT_shader_object
typedef enum VkShaderCreateFlagBitsEXT {
    VK_SHADER_CREATE_LINK_STAGE_BIT_EXT = 0x00000001,
  // Provided by VK_EXT_shader_object with VK_EXT_subgroup_size_control or VK_VERSION_1_3
    VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT = 0x00000002,
  // Provided by VK_EXT_shader_object with VK_EXT_subgroup_size_control or VK_VERSION_1_3
    VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT = 0x00000004,
  // Provided by VK_EXT_shader_object with VK_EXT_mesh_shader or VK_NV_mesh_shader
    VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT = 0x00000008,
  // Provided by VK_EXT_shader_object with VK_KHR_device_group or VK_VERSION_1_1
    VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT = 0x00000010,
  // Provided by VK_KHR_fragment_shading_rate with VK_EXT_shader_object
    VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT = 0x00000020,
  // Provided by VK_EXT_fragment_density_map with VK_EXT_shader_object
    VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT = 0x00000040,
  // Provided by VK_EXT_device_generated_commands
    VK_SHADER_CREATE_INDIRECT_BINDABLE_BIT_EXT = 0x00000080,
} VkShaderCreateFlagBitsEXT;

VK_SHADER_CREATE_LINK_STAGE_BIT_EXT specifies that a shader is linked to all other shaders created in the same vkCreateShadersEXT call whose VkShaderCreateInfoEXT structures' flags include VK_SHADER_CREATE_LINK_STAGE_BIT_EXT.
VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT specifies that the SubgroupSize may vary in a task, mesh, or compute shader.
VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT specifies that the subgroup sizes must be launched with all invocations active in a task, mesh, or compute shader.
VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT specifies that a mesh shader must only be used without a task shader. Otherwise, the mesh shader must only be used with a task shader.
VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT specifies that a compute shader can be used with vkCmdDispatchBase with a non-zero base workgroup.
VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT specifies that a fragment shader can be used with a fragment shading rate attachment.
VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT specifies that a fragment shader can be used with a fragment density map attachment.
VK_SHADER_CREATE_INDIRECT_BINDABLE_BIT_EXT specifies that the shader can be used in combination with Device-Generated Commands.

The behavior of VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT and VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT differs subtly from the behavior of VK_PIPELINE_CREATE_RENDERING_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR and VK_PIPELINE_CREATE_RENDERING_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT in that the shader bit allows, but does not require the shader to be used with that type of attachment. This means that the application need not create multiple shaders when it does not know in advance whether the shader will be used with or without the attachment type, or when it needs the same shader to be compatible with usage both with and without. This may come at some performance cost on some implementations, so applications should still only set bits that are actually necessary.

Shader objects can be created using different types of shader code. Possible values of VkShaderCreateInfoEXT::codeType, are:

// Provided by VK_EXT_shader_object
typedef enum VkShaderCodeTypeEXT {
    VK_SHADER_CODE_TYPE_BINARY_EXT = 0,
    VK_SHADER_CODE_TYPE_SPIRV_EXT = 1,
} VkShaderCodeTypeEXT;

VK_SHADER_CODE_TYPE_BINARY_EXT specifies shader code in an opaque, implementation-defined binary format specific to the physical device.
VK_SHADER_CODE_TYPE_SPIRV_EXT specifies shader code in SPIR-V format.

Binary Shader Code

Binary shader code can be retrieved from a shader object using the command:

// Provided by VK_EXT_shader_object
VkResult vkGetShaderBinaryDataEXT(
    VkDevice                                    device,
    VkShaderEXT                                 shader,
    size_t*                                     pDataSize,
    void*                                       pData);

device is the logical device that shader object was created from.
shader is the shader object to retrieve binary shader code from.
pDataSize is a pointer to a size_t value related to the size of the binary shader code, as described below.
pData is either NULL or a pointer to a buffer.

If pData is NULL, then the size of the binary shader code of the shader object, in bytes, is returned in pDataSize. Otherwise, pDataSize must point to a variable set by the application to the size of the buffer, in bytes, pointed to by pData, and on return the variable is overwritten with the amount of data actually written to pData. If pDataSize is less than the size of the binary shader code, nothing is written to pData, and VK_INCOMPLETE will be returned instead of VK_SUCCESS.

The behavior of this command when pDataSize is too small differs from how some other getter-type commands work in Vulkan. Because shader binary data is only usable in its entirety, it would never be useful for the implementation to return partial data. Because of this, nothing is written to pData unless pDataSize is large enough to fit the data in its entirety.

Binary shader code retrieved using vkGetShaderBinaryDataEXT can be passed to a subsequent call to vkCreateShadersEXT on a compatible physical device by specifying VK_SHADER_CODE_TYPE_BINARY_EXT in the codeType member of VkShaderCreateInfoEXT.

The shader code returned by repeated calls to this function with the same VkShaderEXT is guaranteed to be invariant for the lifetime of the VkShaderEXT object.

Valid Usage

VUID-vkGetShaderBinaryDataEXT-None-08461
The shaderObject feature must be enabled
VUID-vkGetShaderBinaryDataEXT-None-08499
If pData is not NULL, it must be aligned to 16 bytes

Valid Usage (Implicit)

VUID-vkGetShaderBinaryDataEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkGetShaderBinaryDataEXT-shader-parameter
shader must be a valid VkShaderEXT handle
VUID-vkGetShaderBinaryDataEXT-pDataSize-parameter
pDataSize must be a valid pointer to a size_t value
VUID-vkGetShaderBinaryDataEXT-pData-parameter
If the value referenced by pDataSize is not 0, and pData is not NULL, pData must be a valid pointer to an array of pDataSize bytes
VUID-vkGetShaderBinaryDataEXT-shader-parent
shader must have been created, allocated, or retrieved from device

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

Binary Shader Compatibility

Binary shader compatibility means that binary shader code returned from a call to vkGetShaderBinaryDataEXT can be passed to a later call to vkCreateShadersEXT, potentially on a different logical and/or physical device, and that this will result in the successful creation of a shader object functionally equivalent to the shader object that the code was originally queried from.

Binary shader code queried from vkGetShaderBinaryDataEXT is not guaranteed to be compatible across all devices, but implementations are required to provide some compatibility guarantees. Applications may determine binary shader compatibility using either (or both) of two mechanisms.

Guaranteed compatibility of shader binaries is expressed through a combination of the shaderBinaryUUID and shaderBinaryVersion members of the VkPhysicalDeviceShaderObjectPropertiesEXT structure queried from a physical device. Binary shaders retrieved from a physical device with a certain shaderBinaryUUID are guaranteed to be compatible with all other physical devices reporting the same shaderBinaryUUID and the same or higher shaderBinaryVersion.

Whenever a new version of an implementation incorporates any changes that affect the output of vkGetShaderBinaryDataEXT, the implementation should either increment shaderBinaryVersion if binary shader code retrieved from older versions remains compatible with the new implementation, or else replace shaderBinaryUUID with a new value if backward compatibility has been broken. Binary shader code queried from a device with a matching shaderBinaryUUID and lower shaderBinaryVersion relative to the device on which vkCreateShadersEXT is being called may be suboptimal for the new device in ways that do not change shader functionality, but it is still guaranteed to be usable to successfully create the shader object(s).

Implementations are encouraged to share shaderBinaryUUID between devices and driver versions to the maximum extent their hardware naturally allows, and are strongly discouraged from ever changing the shaderBinaryUUID for the same hardware except unless absolutely necessary.

In addition to the shader compatibility guarantees described above, it is valid for an application to call vkCreateShadersEXT with binary shader code created on a device with a different or unknown shaderBinaryUUID and/or higher shaderBinaryVersion. In this case, the implementation may use any unspecified means of its choosing to determine whether the provided binary shader code is usable. If it is, vkCreateShadersEXT must return VK_SUCCESS, and the created shader object is guaranteed to be valid. Otherwise, in the absence of some error, vkCreateShadersEXT must return VK_INCOMPATIBLE_SHADER_BINARY_EXT to indicate that the provided binary shader code is not compatible with the device.

Binding Shader Objects

Once shader objects have been created, they can be bound to the command buffer using the command:

// Provided by VK_EXT_shader_object
void vkCmdBindShadersEXT(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    stageCount,
    const VkShaderStageFlagBits*                pStages,
    const VkShaderEXT*                          pShaders);

commandBuffer is the command buffer that the shader object will be bound to.
stageCount is the length of the pStages and pShaders arrays.
pStages is a pointer to an array of VkShaderStageFlagBits values specifying one stage per array index that is affected by the corresponding value in the pShaders array.
pShaders is a pointer to an array of VkShaderEXT handles and/or VK_NULL_HANDLE values describing the shader binding operations to be performed on each stage in pStages.

When binding linked shaders, an application may bind them in any combination of one or more calls to vkCmdBindShadersEXT (i.e., shaders that were created linked together do not need to be bound in the same vkCmdBindShadersEXT call).

Any shader object bound to a particular stage may be unbound by setting its value in pShaders to VK_NULL_HANDLE. If pShaders is NULL, vkCmdBindShadersEXT behaves as if pShaders was an array of stageCount VK_NULL_HANDLE values (i.e., any shaders bound to the stages specified in pStages are unbound).

Valid Usage

VUID-vkCmdBindShadersEXT-None-08462
The shaderObject feature must be enabled
VUID-vkCmdBindShadersEXT-pStages-08463
Every element of pStages must be unique
VUID-vkCmdBindShadersEXT-pStages-08464
pStages must not contain VK_SHADER_STAGE_ALL_GRAPHICS or VK_SHADER_STAGE_ALL
VUID-vkCmdBindShadersEXT-pStages-08465
pStages must not contain VK_SHADER_STAGE_RAYGEN_BIT_KHR, VK_SHADER_STAGE_ANY_HIT_BIT_KHR, VK_SHADER_STAGE_CLOSEST_HIT_BIT_KHR, VK_SHADER_STAGE_MISS_BIT_KHR, VK_SHADER_STAGE_INTERSECTION_BIT_KHR, or VK_SHADER_STAGE_CALLABLE_BIT_KHR
VUID-vkCmdBindShadersEXT-pStages-08467
pStages must not contain VK_SHADER_STAGE_SUBPASS_SHADING_BIT_HUAWEI
VUID-vkCmdBindShadersEXT-pStages-08468
pStages must not contain VK_SHADER_STAGE_CLUSTER_CULLING_BIT_HUAWEI
VUID-vkCmdBindShadersEXT-pShaders-08469
For each element of pStages, if pShaders is not NULL, and the element of the pShaders array with the same index is not VK_NULL_HANDLE, it must have been created with a stage equal to the corresponding element of pStages
VUID-vkCmdBindShadersEXT-pShaders-08470
If pStages contains both VK_SHADER_STAGE_TASK_BIT_EXT and VK_SHADER_STAGE_VERTEX_BIT, and pShaders is not NULL, and the same index in pShaders as VK_SHADER_STAGE_TASK_BIT_EXT in pStages is not VK_NULL_HANDLE, the same index in pShaders as VK_SHADER_STAGE_VERTEX_BIT in pStages must be VK_NULL_HANDLE
VUID-vkCmdBindShadersEXT-pShaders-08471
If pStages contains both VK_SHADER_STAGE_MESH_BIT_EXT and VK_SHADER_STAGE_VERTEX_BIT, and pShaders is not NULL, and the same index in pShaders as VK_SHADER_STAGE_MESH_BIT_EXT in pStages is not VK_NULL_HANDLE, the same index in pShaders as VK_SHADER_STAGE_VERTEX_BIT in pStages must be VK_NULL_HANDLE
VUID-vkCmdBindShadersEXT-pShaders-08476
If pStages contains VK_SHADER_STAGE_COMPUTE_BIT, the VkCommandPool that commandBuffer was allocated from must support compute operations
VUID-vkCmdBindShadersEXT-pShaders-08477
If pStages contains VK_SHADER_STAGE_VERTEX_BIT, VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, VK_SHADER_STAGE_GEOMETRY_BIT, or VK_SHADER_STAGE_FRAGMENT_BIT, the VkCommandPool that commandBuffer was allocated from must support graphics operations
VUID-vkCmdBindShadersEXT-pShaders-08478
If pStages contains VK_SHADER_STAGE_MESH_BIT_EXT or VK_SHADER_STAGE_TASK_BIT_EXT, the VkCommandPool that commandBuffer was allocated from must support graphics operations

Valid Usage (Implicit)

VUID-vkCmdBindShadersEXT-commandBuffer-parameter
commandBuffer must be a valid VkCommandBuffer handle
VUID-vkCmdBindShadersEXT-pStages-parameter
pStages must be a valid pointer to an array of stageCount valid VkShaderStageFlagBits values
VUID-vkCmdBindShadersEXT-pShaders-parameter
If pShaders is not NULL, pShaders must be a valid pointer to an array of stageCount valid or VK_NULL_HANDLE VkShaderEXT handles
VUID-vkCmdBindShadersEXT-commandBuffer-recording
commandBuffer must be in the recording state
VUID-vkCmdBindShadersEXT-commandBuffer-cmdpool
The VkCommandPool that commandBuffer was allocated from must support graphics, or compute operations
VUID-vkCmdBindShadersEXT-videocoding
This command must only be called outside of a video coding scope
VUID-vkCmdBindShadersEXT-stageCount-arraylength
stageCount must be greater than 0
VUID-vkCmdBindShadersEXT-commonparent
Both of commandBuffer, and the elements of pShaders that are valid handles of non-ignored parameters must have been created, allocated, or retrieved from the same VkDevice

Host Synchronization

Host access to commandBuffer must be externally synchronized
Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties

Command Buffer Levels	Render Pass Scope	Video Coding Scope	Supported Queue Types	Command Type
Primary Secondary	Both	Outside	Graphics Compute	State

Command Buffer Levels

Render Pass Scope

Video Coding Scope

Supported Queue Types

Command Type

Primary
Secondary

Both

Outside

Graphics
Compute

State

Setting State

Whenever shader objects are used to issue drawing commands, the appropriate dynamic state setting commands must have been called to set the relevant state in the command buffer prior to drawing:

If a shader is bound to the VK_SHADER_STAGE_VERTEX_BIT stage, the following commands must have been called in the command buffer prior to drawing:

If a shader is bound to the VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT stage, the following command must have been called in the command buffer prior to drawing:

vkCmdSetPatchControlPointsEXT, if primitiveTopology is VK_PRIMITIVE_TOPOLOGY_PATCH_LIST

If a shader is bound to the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT stage, the following command must have been called in the command buffer prior to drawing:

vkCmdSetTessellationDomainOriginEXT

If rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetRasterizationSamplesEXT
vkCmdSetSampleMaskEXT
vkCmdSetAlphaToCoverageEnableEXT
vkCmdSetAlphaToOneEnableEXT, if the alphaToOne feature is enabled
vkCmdSetPolygonModeEXT
vkCmdSetLineWidth, if polygonMode is VK_POLYGON_MODE_LINE, or if a shader is bound to the VK_SHADER_STAGE_VERTEX_BIT stage and primitiveTopology is a line topology, or if a shader which outputs line primitives is bound to the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT or VK_SHADER_STAGE_GEOMETRY_BIT stage
vkCmdSetCullMode
vkCmdSetFrontFace
vkCmdSetDepthTestEnable
vkCmdSetDepthWriteEnable
vkCmdSetDepthCompareOp, if depthTestEnable is VK_TRUE
vkCmdSetDepthBoundsTestEnable, if the depthBounds feature is enabled
vkCmdSetDepthBounds, if depthBoundsTestEnable is VK_TRUE
vkCmdSetDepthBiasEnable
vkCmdSetDepthBias or vkCmdSetDepthBias2EXT, if depthBiasEnable is VK_TRUE
vkCmdSetDepthClampEnableEXT, if the depthClamp feature is enabled
vkCmdSetStencilTestEnable
vkCmdSetStencilOp, if stencilTestEnable is VK_TRUE
vkCmdSetStencilCompareMask, if stencilTestEnable is VK_TRUE
vkCmdSetStencilWriteMask, if stencilTestEnable is VK_TRUE
vkCmdSetStencilReference, if stencilTestEnable is VK_TRUE

If a shader is bound to the VK_SHADER_STAGE_FRAGMENT_BIT stage, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetLogicOpEnableEXT, if the logicOp feature is enabled
vkCmdSetLogicOpEXT, if logicOpEnable is VK_TRUE
vkCmdSetColorBlendEnableEXT and vkCmdSetColorWriteMaskEXT, if color attachments are bound, with values set for every color attachment in the render pass instance active at draw time
vkCmdSetColorBlendEquationEXT or vkCmdSetColorBlendAdvancedEXT, if color attachments are bound, for every attachment whose index in pColorBlendEnables is a pointer to a value of VK_TRUE
vkCmdSetBlendConstants, if any index in pColorBlendEnables is VK_TRUE, and the same index in pColorBlendEquations is a VkColorBlendEquationEXT structure with any VkBlendFactor member with a value of VK_BLEND_FACTOR_CONSTANT_COLOR, VK_BLEND_FACTOR_ONE_MINUS_CONSTANT_COLOR, VK_BLEND_FACTOR_CONSTANT_ALPHA, or VK_BLEND_FACTOR_ONE_MINUS_CONSTANT_ALPHA

If the pipelineFragmentShadingRate feature is enabled, and a shader is bound to the VK_SHADER_STAGE_FRAGMENT_BIT stage, and rasterizerDiscardEnable is VK_FALSE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetFragmentShadingRateKHR

If the geometryStreams feature is enabled, and a shader is bound to the VK_SHADER_STAGE_GEOMETRY_BIT stage, the following command must have been called in the command buffer prior to drawing:

vkCmdSetRasterizationStreamEXT

If the VK_EXT_discard_rectangles extension is enabled, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetDiscardRectangleEnableEXT
vkCmdSetDiscardRectangleModeEXT, if discardRectangleEnable is VK_TRUE
vkCmdSetDiscardRectangleEXT, if discardRectangleEnable is VK_TRUE

If the VK_EXT_conservative_rasterization extension is enabled, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetConservativeRasterizationModeEXT
vkCmdSetExtraPrimitiveOverestimationSizeEXT, if conservativeRasterizationMode is VK_CONSERVATIVE_RASTERIZATION_MODE_OVERESTIMATE_EXT

If the depthClipEnable feature is enabled, the following command must have been called in the command buffer prior to drawing:

vkCmdSetDepthClipEnableEXT

If the VK_EXT_sample_locations extension is enabled, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetSampleLocationsEnableEXT
vkCmdSetSampleLocationsEXT, if sampleLocationsEnable is VK_TRUE

If the VK_EXT_provoking_vertex extension is enabled, and rasterizerDiscardEnable is VK_FALSE, and a shader is bound to the VK_SHADER_STAGE_VERTEX_BIT stage, the following command must have been called in the command buffer prior to drawing:

vkCmdSetProvokingVertexModeEXT

If any of the <features-stippledRectangularLines, stippledRectangularLines>>, <features-stippledBresenhamLines, stippledBresenhamLines>>, or <features-stippledSmoothLines, stippledSmoothLines>> features are enabled, and rasterizerDiscardEnable is VK_FALSE, and if polygonMode is VK_POLYGON_MODE_LINE or a shader is bound to the VK_SHADER_STAGE_VERTEX_BIT stage and primitiveTopology is a line topology or a shader which outputs line primitives is bound to the VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT or VK_SHADER_STAGE_GEOMETRY_BIT stage, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetLineRasterizationModeEXT
vkCmdSetLineStippleEnableEXT
vkCmdSetLineStipple, if stippledLineEnable is VK_TRUE

If the depthClipControl feature is enabled, the following command must have been called in the command buffer prior to drawing:

vkCmdSetDepthClipNegativeOneToOneEXT

If the colorWriteEnable feature is enabled, and a shader is bound to the VK_SHADER_STAGE_FRAGMENT_BIT stage, and rasterizerDiscardEnable is VK_FALSE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetColorWriteEnableEXT, with values set for every color attachment in the render pass instance active at draw time

If the attachmentFeedbackLoopDynamicState feature is enabled, and a shader is bound to the VK_SHADER_STAGE_FRAGMENT_BIT stage, and rasterizerDiscardEnable is VK_FALSE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetAttachmentFeedbackLoopEnableEXT

If the VK_NV_clip_space_w_scaling extension is enabled, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetViewportWScalingEnableNV
vkCmdSetViewportWScalingNV, if viewportWScalingEnable is VK_TRUE

If the depthClamp and depthClampControl features are enabled, and depthClampEnable is VK_TRUE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetDepthClampRangeEXT

If the VK_NV_viewport_swizzle extension is enabled, the following command must have been called in the command buffer prior to drawing:

vkCmdSetViewportSwizzleNV

If the VK_NV_fragment_coverage_to_color extension is enabled, and a shader is bound to the VK_SHADER_STAGE_FRAGMENT_BIT stage, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetCoverageToColorEnableNV
vkCmdSetCoverageToColorLocationNV, if coverageToColorEnable is VK_TRUE

If the VK_NV_framebuffer_mixed_samples extension is enabled, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetCoverageModulationModeNV
vkCmdSetCoverageModulationTableEnableNV, if coverageModulationMode is not VK_COVERAGE_MODULATION_MODE_NONE_NV
vkCmdSetCoverageModulationTableNV, if coverageModulationTableEnable is VK_TRUE

If the coverageReductionMode feature is enabled, and rasterizerDiscardEnable is VK_FALSE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetCoverageReductionModeNV

If the representativeFragmentTest feature is enabled, and rasterizerDiscardEnable is VK_FALSE, the following command must have been called in the command buffer prior to drawing:

vkCmdSetRepresentativeFragmentTestEnableNV

If the shadingRateImage feature is enabled, and rasterizerDiscardEnable is VK_FALSE, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetCoarseSampleOrderNV
vkCmdSetShadingRateImageEnableNV
vkCmdSetViewportShadingRatePaletteNV, if shadingRateImageEnable is VK_TRUE

If the exclusiveScissor feature is enabled, the following commands must have been called in the command buffer prior to drawing:

vkCmdSetExclusiveScissorEnableNV
vkCmdSetExclusiveScissorNV, if any value in pExclusiveScissorEnables is VK_TRUE

State can be set either at any time before or after shader objects are bound, but all required state must be set prior to issuing drawing commands.

If the commandBufferInheritance feature is enabled, graphics and compute state is inherited from the previously executed command buffer in the queue. Any valid state inherited in this way does not need to be set again in the current command buffer.

Interaction With Pipelines

Calling vkCmdBindShadersEXT causes the pipeline bind points corresponding to each stage in pStages to be disturbed, meaning that any pipelines that had previously been bound to those pipeline bind points are no longer bound.

If VK_PIPELINE_BIND_POINT_GRAPHICS is disturbed (i.e., if pStages contains any graphics stage), any graphics pipeline state that the previously bound pipeline did not specify as dynamic becomes undefined, and must be set in the command buffer before issuing drawing commands using shader objects.

Calls to vkCmdBindPipeline likewise disturb the shader stage(s) corresponding to pipelineBindPoint, meaning that any shaders that had previously been bound to any of those stages are no longer bound, even if the pipeline was created without shaders for some of those stages.

Shader Object Destruction

To destroy a shader object, call:

// Provided by VK_EXT_shader_object
void vkDestroyShaderEXT(
    VkDevice                                    device,
    VkShaderEXT                                 shader,
    const VkAllocationCallbacks*                pAllocator);

device is the logical device that destroys the shader object.
shader is the handle of the shader object to destroy.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.

Destroying a shader object used by one or more command buffers in the recording or executable state causes those command buffers to move into the invalid state.

Valid Usage

VUID-vkDestroyShaderEXT-None-08481
The shaderObject feature must be enabled
VUID-vkDestroyShaderEXT-shader-08482
All submitted commands that refer to shader must have completed execution
VUID-vkDestroyShaderEXT-pAllocator-08483
If VkAllocationCallbacks were provided when shader was created, a compatible set of callbacks must be provided here
VUID-vkDestroyShaderEXT-pAllocator-08484
If no VkAllocationCallbacks were provided when shader was created, pAllocator must be NULL

Valid Usage (Implicit)

VUID-vkDestroyShaderEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkDestroyShaderEXT-shader-parameter
If shader is not VK_NULL_HANDLE, shader must be a valid VkShaderEXT handle
VUID-vkDestroyShaderEXT-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkDestroyShaderEXT-shader-parent
If shader is a valid handle, it must have been created, allocated, or retrieved from device

Host Synchronization

Host access to shader must be externally synchronized

Shader Modules

Shader modules contain shader code and one or more entry points. Shaders are selected from a shader module by specifying an entry point as part of pipeline creation. The stages of a pipeline can use shaders that come from different modules. The shader code defining a shader module must be in the SPIR-V format, as described by the Vulkan Environment for SPIR-V appendix.

Shader modules are represented by VkShaderModule handles:

// Provided by VK_VERSION_1_0
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkShaderModule)

To create a shader module, call:

// Provided by VK_VERSION_1_0
VkResult vkCreateShaderModule(
    VkDevice                                    device,
    const VkShaderModuleCreateInfo*             pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkShaderModule*                             pShaderModule);

device is the logical device that creates the shader module.
pCreateInfo is a pointer to a VkShaderModuleCreateInfo structure.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.
pShaderModule is a pointer to a VkShaderModule handle in which the resulting shader module object is returned.

Once a shader module has been created, any entry points it contains can be used in pipeline shader stages as described in Compute Pipelines and Graphics Pipelines.

If the maintenance5 feature is enabled, shader module creation can be omitted entirely. Instead, applications should provide the VkShaderModuleCreateInfo structure directly in to pipeline creation by chaining it to VkPipelineShaderStageCreateInfo. This avoids the overhead of creating and managing an additional object.

Valid Usage

VUID-vkCreateShaderModule-pCreateInfo-06904
If pCreateInfo is not NULL, pCreateInfo->pNext must be NULL or a pointer to a valid instance of
- VkShaderModuleValidationCacheCreateInfoEXT
- VkValidationFeaturesEXT

Valid Usage (Implicit)

VUID-vkCreateShaderModule-device-parameter
device must be a valid VkDevice handle
VUID-vkCreateShaderModule-pCreateInfo-parameter
pCreateInfo must be a valid pointer to a valid VkShaderModuleCreateInfo structure
VUID-vkCreateShaderModule-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkCreateShaderModule-pShaderModule-parameter
pShaderModule must be a valid pointer to a VkShaderModule handle

Return Codes

Success

VK_SUCCESS

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY
VK_ERROR_INVALID_SHADER_NV

The VkShaderModuleCreateInfo structure is defined as:

// Provided by VK_VERSION_1_0
typedef struct VkShaderModuleCreateInfo {
    VkStructureType              sType;
    const void*                  pNext;
    VkShaderModuleCreateFlags    flags;
    size_t                       codeSize;
    const uint32_t*              pCode;
} VkShaderModuleCreateInfo;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
flags is reserved for future use.
codeSize is the size, in bytes, of the code pointed to by pCode.
pCode is a pointer to code that is used to create the shader module. The type and format of the code is determined from the content of the memory addressed by pCode.

Valid Usage

VUID-VkShaderModuleCreateInfo-codeSize-08735
If pCode is a pointer to SPIR-V code, codeSize must be a multiple of 4
VUID-VkShaderModuleCreateInfo-pCode-08736
If pCode is a pointer to SPIR-V code, pCode must point to valid SPIR-V code, formatted and packed as described by the Khronos SPIR-V Specification
VUID-VkShaderModuleCreateInfo-pCode-08737
If pCode is a pointer to SPIR-V code, pCode must adhere to the validation rules described by the Validation Rules within a Module section of the SPIR-V Environment appendix
VUID-VkShaderModuleCreateInfo-pCode-08738
If pCode is a pointer to SPIR-V code, pCode must declare the Shader capability for SPIR-V code
VUID-VkShaderModuleCreateInfo-pCode-08739
If pCode is a pointer to SPIR-V code, pCode must not declare any capability that is not supported by the API, as described by the Capabilities section of the SPIR-V Environment appendix
VUID-VkShaderModuleCreateInfo-pCode-08740
If pCode is a pointer to SPIR-V code, and pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied
VUID-VkShaderModuleCreateInfo-pCode-08741
If pCode is a pointer to SPIR-V code, pCode must not declare any SPIR-V extension that is not supported by the API, as described by the Extension section of the SPIR-V Environment appendix
VUID-VkShaderModuleCreateInfo-pCode-08742
If pCode is a pointer to SPIR-V code, and pCode declares any of the SPIR-V extensions listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied
VUID-VkShaderModuleCreateInfo-pCode-07912
If the VK_NV_glsl_shader extension is not enabled, pCode must be a pointer to SPIR-V code
VUID-VkShaderModuleCreateInfo-pCode-01379
If pCode is a pointer to GLSL code, it must be valid GLSL code written to the GL_KHR_vulkan_glsl GLSL extension specification
VUID-VkShaderModuleCreateInfo-codeSize-01085
codeSize must be greater than 0

Valid Usage (Implicit)

VUID-VkShaderModuleCreateInfo-sType-sType
sType must be VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
VUID-VkShaderModuleCreateInfo-flags-zerobitmask
flags must be 0
VUID-VkShaderModuleCreateInfo-pCode-parameter
pCode must be a valid pointer to an array of uint32_t values

// Provided by VK_VERSION_1_0
typedef VkFlags VkShaderModuleCreateFlags;

VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is currently reserved for future use.

To use a VkValidationCacheEXT to cache shader validation results, add a VkShaderModuleValidationCacheCreateInfoEXT structure to the pNext chain of the VkShaderModuleCreateInfo structure, specifying the cache object to use.

The VkShaderModuleValidationCacheCreateInfoEXT structure is defined as:

// Provided by VK_EXT_validation_cache
typedef struct VkShaderModuleValidationCacheCreateInfoEXT {
    VkStructureType         sType;
    const void*             pNext;
    VkValidationCacheEXT    validationCache;
} VkShaderModuleValidationCacheCreateInfoEXT;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
validationCache is the validation cache object from which the results of prior validation attempts will be written, and to which new validation results for this VkShaderModule will be written (if not already present).

Valid Usage (Implicit)

VUID-VkShaderModuleValidationCacheCreateInfoEXT-sType-sType
sType must be VK_STRUCTURE_TYPE_SHADER_MODULE_VALIDATION_CACHE_CREATE_INFO_EXT
VUID-VkShaderModuleValidationCacheCreateInfoEXT-validationCache-parameter
validationCache must be a valid VkValidationCacheEXT handle

To destroy a shader module, call:

// Provided by VK_VERSION_1_0
void vkDestroyShaderModule(
    VkDevice                                    device,
    VkShaderModule                              shaderModule,
    const VkAllocationCallbacks*                pAllocator);

device is the logical device that destroys the shader module.
shaderModule is the handle of the shader module to destroy.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.

A shader module can be destroyed while pipelines created using its shaders are still in use.

Valid Usage

VUID-vkDestroyShaderModule-shaderModule-01092
If VkAllocationCallbacks were provided when shaderModule was created, a compatible set of callbacks must be provided here
VUID-vkDestroyShaderModule-shaderModule-01093
If no VkAllocationCallbacks were provided when shaderModule was created, pAllocator must be NULL

Valid Usage (Implicit)

VUID-vkDestroyShaderModule-device-parameter
device must be a valid VkDevice handle
VUID-vkDestroyShaderModule-shaderModule-parameter
If shaderModule is not VK_NULL_HANDLE, shaderModule must be a valid VkShaderModule handle
VUID-vkDestroyShaderModule-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkDestroyShaderModule-shaderModule-parent
If shaderModule is a valid handle, it must have been created, allocated, or retrieved from device

Host Synchronization

Host access to shaderModule must be externally synchronized

Shader Module Identifiers

Shader modules have unique identifiers associated with them. To query an implementation provided identifier, call:

// Provided by VK_EXT_shader_module_identifier
void vkGetShaderModuleIdentifierEXT(
    VkDevice                                    device,
    VkShaderModule                              shaderModule,
    VkShaderModuleIdentifierEXT*                pIdentifier);

device is the logical device that created the shader module.
shaderModule is the handle of the shader module.
pIdentifier is a pointer to the returned VkShaderModuleIdentifierEXT.

The identifier returned by the implementation must only depend on shaderIdentifierAlgorithmUUID and information provided in the VkShaderModuleCreateInfo which created shaderModule. The implementation may return equal identifiers for two different VkShaderModuleCreateInfo structures if the difference does not affect pipeline compilation. Identifiers are only meaningful on different VkDevice objects if the device the identifier was queried from had the same shaderModuleIdentifierAlgorithmUUID as the device consuming the identifier.

Valid Usage

VUID-vkGetShaderModuleIdentifierEXT-shaderModuleIdentifier-06884
shaderModuleIdentifier feature must be enabled

Valid Usage (Implicit)

VUID-vkGetShaderModuleIdentifierEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkGetShaderModuleIdentifierEXT-shaderModule-parameter
shaderModule must be a valid VkShaderModule handle
VUID-vkGetShaderModuleIdentifierEXT-pIdentifier-parameter
pIdentifier must be a valid pointer to a VkShaderModuleIdentifierEXT structure
VUID-vkGetShaderModuleIdentifierEXT-shaderModule-parent
shaderModule must have been created, allocated, or retrieved from device

VkShaderModuleCreateInfo structures have unique identifiers associated with them. To query an implementation provided identifier, call:

// Provided by VK_EXT_shader_module_identifier
void vkGetShaderModuleCreateInfoIdentifierEXT(
    VkDevice                                    device,
    const VkShaderModuleCreateInfo*             pCreateInfo,
    VkShaderModuleIdentifierEXT*                pIdentifier);

device is the logical device that can create a VkShaderModule from pCreateInfo.
pCreateInfo is a pointer to a VkShaderModuleCreateInfo structure.
pIdentifier is a pointer to the returned VkShaderModuleIdentifierEXT.

The identifier returned by implementation must only depend on shaderIdentifierAlgorithmUUID and information provided in the VkShaderModuleCreateInfo. The implementation may return equal identifiers for two different VkShaderModuleCreateInfo structures if the difference does not affect pipeline compilation. Identifiers are only meaningful on different VkDevice objects if the device the identifier was queried from had the same shaderModuleIdentifierAlgorithmUUID as the device consuming the identifier.

The identifier returned by the implementation in vkGetShaderModuleCreateInfoIdentifierEXT must be equal to the identifier returned by vkGetShaderModuleIdentifierEXT given equivalent definitions of VkShaderModuleCreateInfo and any chained pNext structures.

Valid Usage

VUID-vkGetShaderModuleCreateInfoIdentifierEXT-shaderModuleIdentifier-06885
shaderModuleIdentifier feature must be enabled

Valid Usage (Implicit)

VUID-vkGetShaderModuleCreateInfoIdentifierEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkGetShaderModuleCreateInfoIdentifierEXT-pCreateInfo-parameter
pCreateInfo must be a valid pointer to a valid VkShaderModuleCreateInfo structure
VUID-vkGetShaderModuleCreateInfoIdentifierEXT-pIdentifier-parameter
pIdentifier must be a valid pointer to a VkShaderModuleIdentifierEXT structure

VkShaderModuleIdentifierEXT represents a shader module identifier returned by the implementation.

// Provided by VK_EXT_shader_module_identifier
typedef struct VkShaderModuleIdentifierEXT {
    VkStructureType    sType;
    void*              pNext;
    uint32_t           identifierSize;
    uint8_t            identifier[VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT];
} VkShaderModuleIdentifierEXT;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
identifierSize is the size, in bytes, of valid data returned in identifier.
identifier is a buffer of opaque data specifying an identifier.

Any returned values beyond the first identifierSize bytes are undefined. Implementations must return an identifierSize greater than 0, and less-or-equal to VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT.

Two identifiers are considered equal if identifierSize is equal and the first identifierSize bytes of identifier compare equal.

Implementations may return a different identifierSize for different modules. Implementations should ensure that identifierSize is large enough to uniquely define a shader module.

Valid Usage (Implicit)

VUID-VkShaderModuleIdentifierEXT-sType-sType
sType must be VK_STRUCTURE_TYPE_SHADER_MODULE_IDENTIFIER_EXT
VUID-VkShaderModuleIdentifierEXT-pNext-pNext
pNext must be NULL

VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT is the length in bytes of a shader module identifier, as returned in VkShaderModuleIdentifierEXT::identifierSize.

#define VK_MAX_SHADER_MODULE_IDENTIFIER_SIZE_EXT 32U

Binding Shaders

Before a shader can be used it must be first bound to the command buffer.

Calling vkCmdBindPipeline binds all stages corresponding to the VkPipelineBindPoint. Calling vkCmdBindShadersEXT binds all stages in pStages

The following table describes the relationship between shader stages and pipeline bind points:

Shader stage Pipeline bind point behavior controlled

Shader stage	Pipeline bind point	behavior controlled
`VK_SHADER_STAGE_VERTEX_BIT` `VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT` `VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT` `VK_SHADER_STAGE_GEOMETRY_BIT` `VK_SHADER_STAGE_FRAGMENT_BIT` `VK_SHADER_STAGE_TASK_BIT_EXT` `VK_SHADER_STAGE_MESH_BIT_EXT`	`VK_PIPELINE_BIND_POINT_GRAPHICS`	all drawing commands
`VK_SHADER_STAGE_COMPUTE_BIT`	`VK_PIPELINE_BIND_POINT_COMPUTE`	all dispatch commands
`VK_SHADER_STAGE_ANY_HIT_BIT_KHR` `VK_SHADER_STAGE_CALLABLE_BIT_KHR` `VK_SHADER_STAGE_CLOSEST_HIT_BIT_KHR` `VK_SHADER_STAGE_INTERSECTION_BIT_KHR` `VK_SHADER_STAGE_MISS_BIT_KHR` `VK_SHADER_STAGE_RAYGEN_BIT_KHR`	`VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR`	vkCmdTraceRaysNV vkCmdTraceRaysKHR and vkCmdTraceRaysIndirectKHR
`VK_SHADER_STAGE_SUBPASS_SHADING_BIT_HUAWEI` `VK_SHADER_STAGE_CLUSTER_CULLING_BIT_HUAWEI`	`VK_PIPELINE_BIND_POINT_SUBPASS_SHADING_HUAWEI`	vkCmdSubpassShadingHUAWEI
`VK_SHADER_STAGE_COMPUTE_BIT`	`VK_PIPELINE_BIND_POINT_EXECUTION_GRAPH_AMDX`	all execution graph commands

VK_SHADER_STAGE_VERTEX_BIT
VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT
VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT
VK_SHADER_STAGE_GEOMETRY_BIT
VK_SHADER_STAGE_FRAGMENT_BIT
VK_SHADER_STAGE_TASK_BIT_EXT
VK_SHADER_STAGE_MESH_BIT_EXT

VK_PIPELINE_BIND_POINT_GRAPHICS

all drawing commands

VK_SHADER_STAGE_COMPUTE_BIT

VK_PIPELINE_BIND_POINT_COMPUTE

all dispatch commands

VK_SHADER_STAGE_ANY_HIT_BIT_KHR
VK_SHADER_STAGE_CALLABLE_BIT_KHR
VK_SHADER_STAGE_CLOSEST_HIT_BIT_KHR
VK_SHADER_STAGE_INTERSECTION_BIT_KHR
VK_SHADER_STAGE_MISS_BIT_KHR
VK_SHADER_STAGE_RAYGEN_BIT_KHR

VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR

vkCmdTraceRaysNV vkCmdTraceRaysKHR and vkCmdTraceRaysIndirectKHR

VK_SHADER_STAGE_SUBPASS_SHADING_BIT_HUAWEI
VK_SHADER_STAGE_CLUSTER_CULLING_BIT_HUAWEI

VK_PIPELINE_BIND_POINT_SUBPASS_SHADING_HUAWEI

vkCmdSubpassShadingHUAWEI

VK_SHADER_STAGE_COMPUTE_BIT

VK_PIPELINE_BIND_POINT_EXECUTION_GRAPH_AMDX

all execution graph commands

Shader Execution

At each stage of the pipeline, multiple invocations of a shader may execute simultaneously. Further, invocations of a single shader produced as the result of different commands may execute simultaneously. The relative execution order of invocations of the same shader type is undefined. Shader invocations may complete in a different order than that in which the primitives they originated from were drawn or dispatched by the application. However, fragment shader outputs are written to attachments in rasterization order.

The relative execution order of invocations of different shader types is largely undefined. However, when invoking a shader whose inputs are generated from a previous pipeline stage, the shader invocations from the previous stage are guaranteed to have executed far enough to generate input values for all required inputs.

Shader Termination

A shader invocation that is terminated has finished executing instructions.

Executing OpReturn in the entry point, or executing OpTerminateInvocation in any function will terminate an invocation. Implementations may also terminate a shader invocation when OpKill is executed in any function; otherwise it becomes a helper invocation.

In addition to the above conditions, helper invocations may be terminated when all non-helper invocations in the same derivative group either terminate or become helper invocations.

A shader stage for a given command completes execution when all invocations for that stage have terminated.

Depending on the implementation, OpKill will be functionally equivalent to either OpTerminateInvocation or OpDemoteToHelperInvocation. To obtain the most predictable behavior, shader authors should use OpTerminateInvocation or OpDemoteToHelperInvocation rather than OpKill wherever possible.

Shader Memory Access Ordering

The order in which image or buffer memory is read or written by shaders is largely undefined. For some shader types (vertex, tessellation evaluation, and in some cases, fragment), even the number of shader invocations that may perform loads and stores is undefined.

In particular, the following rules apply:

Vertex and tessellation evaluation shaders will be invoked at least once for each unique vertex, as defined in those sections.
Fragment shaders will be invoked zero or more times, as defined in that section.
The relative execution order of invocations of the same shader type is undefined. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are always written to the framebuffer in rasterization order, stores executed by fragment shader invocations are not.
The relative execution order of invocations of different shader types is largely undefined.

The above limitations on shader invocation order make some forms of synchronization between shader invocations within a single set of primitives unimplementable. For example, having one invocation poll memory written by another invocation assumes that the other invocation has been launched and will complete its writes in finite time.

The Memory Model appendix defines the terminology and rules for how to correctly communicate between shader invocations, such as when a write is Visible-To a read, and what constitutes a Data Race.

Applications must not cause a data race.

The SPIR-V SubgroupMemory, CrossWorkgroupMemory, and AtomicCounterMemory memory semantics are ignored. Sequentially consistent atomics and barriers are not supported and SequentiallyConsistent is treated as AcquireRelease. SequentiallyConsistent should not be used.

Shader Inputs and Outputs

Data is passed into and out of shaders using variables with input or output storage class, respectively. User-defined inputs and outputs are connected between stages by matching their Location decorations. Additionally, data can be provided by or communicated to special functions provided by the execution environment using BuiltIn decorations.

In many cases, the same BuiltIn decoration can be used in multiple shader stages with similar meaning. The specific behavior of variables decorated as BuiltIn is documented in the following sections.

Task Shaders

Task shaders operate in conjunction with the mesh shaders to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Its primary purpose is to create a variable amount of subsequent mesh shader invocations.

Task shaders are invoked via the execution of the programmable mesh shading pipeline.

The task shader has no fixed-function inputs other than variables identifying the specific workgroup and invocation. In the TaskNV Execution Model the number of mesh shader workgroups to create is specified via a TaskCountNV decorated output variable. In the TaskEXT Execution Model the number of mesh shader workgroups to create is specified via the OpEmitMeshTasksEXT instruction.

The task shader can write additional outputs to task memory, which can be read by all of the mesh shader workgroups it created.

Task Shader Execution

Task workloads are formed from groups of work items called workgroups and processed by the task shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Task shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize or LocalSizeId execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup. If the subpass includes multiple views in its view mask, a Task shader using TaskEXT Execution Model may be invoked separately for each view.

Mesh Shaders

Mesh shaders operate in workgroups to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Each workgroup emits zero or more output primitives and the group of vertices and their associated data required for each output primitive.

Mesh shaders are invoked via the execution of the programmable mesh shading pipeline.

The only inputs available to the mesh shader are variables identifying the specific workgroup and invocation and, if applicable, any outputs written to task memory by the task shader that spawned the mesh shader’s workgroup. The mesh shader can operate without a task shader as well.

The invocations of the mesh shader workgroup write an output mesh, comprising a set of primitives with per-primitive attributes, a set of vertices with per-vertex attributes, and an array of indices identifying the mesh vertices that belong to each primitive. The primitives of this mesh are then processed by subsequent graphics pipeline stages, where the outputs of the mesh shader form an interface with the fragment shader.

Mesh Shader Execution

Mesh workloads are formed from groups of work items called workgroups and processed by the mesh shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Mesh shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize or LocalSizeId execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

The global workgroups may be generated explicitly via the API, or implicitly through the task shader’s work creation mechanism. If the subpass includes multiple views in its view mask, a Mesh shader using MeshEXT Execution Model may be invoked separately for each view.

Cluster Culling Shaders

Cluster Culling shaders are invoked via the execution of the Programmable Cluster Culling Shading pipeline.

The only inputs available to the cluster culling shader are variables identifying the specific workgroup and invocation.

Cluster Culling shaders operate in workgroups to perform cluster-based culling and produce zero or more cluster drawing command that will be processed by subsequent stages of the graphics pipeline.

The Cluster Drawing Command(CDC) is very similar to the MDI command, invocations in workgroup can emit zero of more CDC to draw zero or more visible cluster.

Cluster Culling Shader Execution

Cluster Culling workloads are formed from groups of work items called workgroups and processed by the cluster culling shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Cluster Culling shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize or LocalSizeId execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

Vertex Shaders

Each vertex shader invocation operates on one vertex and its associated vertex attribute data, and outputs one vertex and associated data. Graphics pipelines using primitive shading must include a vertex shader, and the vertex shader stage is always the first shader stage in the graphics pipeline.

Vertex Shader Execution

A vertex shader must be executed at least once for each vertex specified by a drawing command. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view. During execution, the shader is presented with the index of the vertex and instance for which it has been invoked. Input variables declared in the vertex shader are filled by the implementation with the values of vertex attributes associated with the invocation being executed.

If the same vertex is specified multiple times in a drawing command (e.g. by including the same index value multiple times in an index buffer) the implementation may reuse the results of vertex shading if it can statically determine that the vertex shader invocations will produce identical results.

It is implementation-dependent when and if results of vertex shading are reused, and thus how many times the vertex shader will be executed. This is true also if the vertex shader contains stores or atomic operations (see vertexPipelineStoresAndAtomics).

Tessellation Control Shaders

The tessellation control shader is used to read an input patch provided by the application and to produce an output patch. Each tessellation control shader invocation operates on an input patch (after all control points in the patch are processed by a vertex shader) and its associated data, and outputs a single control point of the output patch and its associated data, and can also output additional per-patch data. The input patch is sized according to the patchControlPoints member of VkPipelineTessellationStateCreateInfo, as part of input assembly.

The input patch can also be dynamically sized with patchControlPoints parameter of vkCmdSetPatchControlPointsEXT.

To dynamically set the number of control points per patch, call:

// Provided by VK_EXT_extended_dynamic_state2, VK_EXT_shader_object
void vkCmdSetPatchControlPointsEXT(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    patchControlPoints);

commandBuffer is the command buffer into which the command will be recorded.
patchControlPoints specifies the number of control points per patch.

This command sets the number of control points per patch for subsequent drawing commands when drawing using shader objects, or when the graphics pipeline is created with VK_DYNAMIC_STATE_PATCH_CONTROL_POINTS_EXT set in VkPipelineDynamicStateCreateInfo::pDynamicStates. Otherwise, this state is specified by the VkPipelineTessellationStateCreateInfo::patchControlPoints value used to create the currently active pipeline.

Valid Usage

VUID-vkCmdSetPatchControlPointsEXT-None-09422
At least one of the following must be true:
- The extendedDynamicState2PatchControlPoints feature is enabled
- The shaderObject feature is enabled
VUID-vkCmdSetPatchControlPointsEXT-patchControlPoints-04874
patchControlPoints must be greater than zero and less than or equal to VkPhysicalDeviceLimits::maxTessellationPatchSize

Valid Usage (Implicit)

VUID-vkCmdSetPatchControlPointsEXT-commandBuffer-parameter
commandBuffer must be a valid VkCommandBuffer handle
VUID-vkCmdSetPatchControlPointsEXT-commandBuffer-recording
commandBuffer must be in the recording state
VUID-vkCmdSetPatchControlPointsEXT-commandBuffer-cmdpool
The VkCommandPool that commandBuffer was allocated from must support graphics operations
VUID-vkCmdSetPatchControlPointsEXT-videocoding
This command must only be called outside of a video coding scope

Host Synchronization

Host access to commandBuffer must be externally synchronized
Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties

Command Buffer Levels	Render Pass Scope	Video Coding Scope	Supported Queue Types	Command Type
Primary Secondary	Both	Outside	Graphics	State

Command Buffer Levels

Render Pass Scope

Video Coding Scope

Supported Queue Types

Command Type

Primary
Secondary

Both

Outside

Graphics

State

The size of the output patch is controlled by the OpExecutionMode OutputVertices specified in the tessellation control or tessellation evaluation shaders, which must be specified in at least one of the shaders. The size of the input and output patches must each be greater than zero and less than or equal to VkPhysicalDeviceLimits::maxTessellationPatchSize.

Tessellation Control Shader Execution

A tessellation control shader is invoked at least once for each output vertex in a patch. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

Inputs to the tessellation control shader are generated by the vertex shader. Each invocation of the tessellation control shader can read the attributes of any incoming vertices and their associated data. The invocations corresponding to a given patch execute logically in parallel, with undefined relative execution order. However, the OpControlBarrier instruction can be used to provide limited control of the execution order by synchronizing invocations within a patch, effectively dividing tessellation control shader execution into a set of phases. Tessellation control shaders will read undefined values if one invocation reads a per-vertex or per-patch output written by another invocation at any point during the same phase, or if two invocations attempt to write different values to the same per-patch output in a single phase.

Tessellation Evaluation Shaders

The Tessellation Evaluation Shader operates on an input patch of control points and their associated data, and a single input barycentric coordinate indicating the invocation’s relative position within the subdivided patch, and outputs a single vertex and its associated data.

Tessellation Evaluation Shader Execution

A tessellation evaluation shader is invoked at least once for each unique vertex generated by the tessellator. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

Geometry Shaders

The geometry shader operates on a group of vertices and their associated data assembled from a single input primitive, and emits zero or more output primitives and the group of vertices and their associated data required for each output primitive.

Geometry Shader Execution

A geometry shader is invoked at least once for each primitive produced by the tessellation stages, or at least once for each primitive generated by primitive assembly when tessellation is not in use. A shader can request that the geometry shader runs multiple instances. A geometry shader is invoked at least once for each instance. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

Fragment Shaders

Fragment shaders are invoked as a fragment operation in a graphics pipeline. Each fragment shader invocation operates on a single fragment and its associated data. With few exceptions, fragment shaders do not have access to any data associated with other fragments and are considered to execute in isolation of fragment shader invocations associated with other fragments.

Compute Shaders

Compute shaders are invoked via _dispatching commands. In general, they have access to similar resources as shader stages executing as part of a graphics pipeline.

Compute workloads are formed from groups of work items called workgroups and processed by the compute shader in the current compute pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Compute shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize or LocalSizeId execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

Ray Generation Shaders

A ray generation shader is similar to a compute shader. Its main purpose is to execute ray tracing queries using pipeline trace ray instructions (such as OpTraceRayKHR) and process the results.

Ray Generation Shader Execution

One ray generation shader is executed per ray tracing dispatch. Its location in the shader binding table (see Shader Binding Table for details) is passed directly into vkCmdTraceRaysKHR using the pRaygenShaderBindingTable parameter or vkCmdTraceRaysNV using the raygenShaderBindingTableBuffer and raygenShaderBindingOffset parameters .

Intersection Shaders

Intersection shaders enable the implementation of arbitrary, application defined geometric primitives. An intersection shader for a primitive is executed whenever its axis-aligned bounding box is hit by a ray.

Like other ray tracing shader domains, an intersection shader operates on a single ray at a time. It also operates on a single primitive at a time. It is therefore the purpose of an intersection shader to compute the ray-primitive intersections and report them. To report an intersection, the shader calls the OpReportIntersectionKHR instruction.

An intersection shader communicates with any-hit and closest shaders by generating attribute values that they can read. Intersection shaders cannot read or modify the ray payload.

Intersection Shader Execution

The order in which intersections are found along a ray, and therefore the order in which intersection shaders are executed, is unspecified.

The intersection shader of the closest AABB which intersects the ray is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated.

Any-Hit Shaders

The any-hit shader is executed after the intersection shader reports an intersection that lies within the current [t_min,t_max] of the ray. The main use of any-hit shaders is to programmatically decide whether or not an intersection will be accepted. The intersection will be accepted unless the shader calls the OpIgnoreIntersectionKHR instruction. Any-hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can read or modify the ray payload.

Any-Hit Shader Execution

The order in which intersections are found along a ray, and therefore the order in which any-hit shaders are executed, is unspecified.

The any-hit shader of the closest hit is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated.

Closest Hit Shaders

Closest hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can read or modify the ray payload. They also have access to a number of system-generated values. Closest hit shaders can call pipeline trace ray instructions to recursively trace rays.

Closest Hit Shader Execution

Exactly one closest hit shader is executed when traversal is finished and an intersection has been found and accepted.

Miss Shaders

Miss shaders can access the ray payload and can trace new rays through the pipeline trace ray instructions, but cannot access attributes since they are not associated with an intersection.

Miss Shader Execution

A miss shader is executed instead of a closest hit shader if no intersection was found during traversal.

Callable Shaders

Callable shaders can access a callable payload that works similarly to ray payloads to do subroutine work.

Callable Shader Execution

A callable shader is executed by calling OpExecuteCallableKHR from an allowed shader stage.

Interpolation Decorations

Variables in the Input storage class in a fragment shader’s interface are interpolated from the values specified by the primitive being rasterized.

Interpolation decorations can be present on input and output variables in pre-rasterization shaders but have no effect on the interpolation performed.

An undecorated input variable will be interpolated with perspective-correct interpolation according to the primitive type being rasterized. Lines and polygons are interpolated in the same way as the primitive’s clip coordinates. If the NoPerspective decoration is present, linear interpolation is instead used for lines and polygons. For points, as there is only a single vertex, input values are never interpolated and instead take the value written for the single vertex.

If the Flat decoration is present on an input variable, the value is not interpolated, and instead takes its value directly from the provoking vertex. Fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision floating-point type must be decorated with Flat.

Interpolation of input variables is performed at an implementation-defined position within the fragment area being shaded. The position is further constrained as follows:

If the Centroid decoration is used, the interpolation position used for the variable must also fall within the bounds of the primitive being rasterized.
If the Sample decoration is used, the interpolation position used for the variable must be at the position of the sample being shaded by the current fragment shader invocation.
If a sample count of 1 is used, the interpolation position must be at the center of the fragment area.

As Centroid constrains the interpolation position to lie within the covered area of the primitive, using it may cause the position to differ between neighboring fragments when it otherwise would not. Derivatives calculated based on these differing locations can produce inconsistent results compared to undecorated inputs. Thus using Centroid with input variables used in derivative calculations is not recommended.

If the PerVertexKHR decoration is present on an input variable, the value is not interpolated, and instead values from all input vertices are available in an array. Each index of the array corresponds to one of the vertices of the primitive that produced the fragment.

If the CustomInterpAMD decoration is present on an input variable, the value cannot be accessed directly; instead the extended instruction InterpolateAtVertexAMD must be used to obtain values from the input vertices.

Static Use

A SPIR-V module declares a global object in memory using the OpVariable instruction, which results in a pointer x to that object. A specific entry point in a SPIR-V module is said to statically use that object if that entry point’s call tree contains a function containing a instruction with x as an id operand. A shader entry point also statically uses any variables explicitly declared in its interface.

Scope

A scope describes a set of shader invocations, where each such set is a scope instance. Each invocation belongs to one or more scope instances, but belongs to no more than one scope instance for each scope.

The operations available between invocations in a given scope instance vary, with smaller scopes generally able to perform more operations, and with greater efficiency.

Cross Device

All invocations executed in a Vulkan instance fall into a single cross device scope instance.

Whilst the CrossDevice scope is defined in SPIR-V, it is disallowed in Vulkan. API synchronization commands can be used to communicate between devices.

Device

All invocations executed on a single device form a device scope instance.

If the vulkanMemoryModel and vulkanMemoryModelDeviceScope features are enabled, this scope is represented in SPIR-V by the Device Scope, which can be used as a Memory Scope for barrier and atomic operations.

If both the shaderDeviceClock and vulkanMemoryModelDeviceScope features are enabled, using the Device Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same device scope instance.

There is no method to synchronize the execution of these invocations within SPIR-V, and this can only be done with API synchronization primitives.

Invocations executing on different devices in a device group operate in separate device scope instances.

Queue Family

Invocations executed by queues in a given queue family form a queue family scope instance.

This scope is identified in SPIR-V as the QueueFamily Scope if the vulkanMemoryModel feature is enabled, or if not, the Device Scope, which can be used as a Memory Scope for barrier and atomic operations.

If the shaderDeviceClock feature is enabled, but the vulkanMemoryModelDeviceScope feature is not enabled, using the Device Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same queue family scope instance.

There is no method to synchronize the execution of these invocations within SPIR-V, and this can only be done with API synchronization primitives.

Each invocation in a queue family scope instance must be in the same device scope instance.

Command

Any shader invocations executed as the result of a single command such as vkCmdDispatch or vkCmdDraw form a command scope instance. For indirect drawing commands with drawCount greater than one, invocations from separate draws are in separate command scope instances. For ray tracing shaders, an invocation group is an implementation-dependent subset of the set of shader invocations of a given shader stage which are produced by a single trace rays command.

There is no specific Scope for communication across invocations in a command scope instance. As this has a clear boundary at the API level, coordination here can be performed in the API, rather than in SPIR-V.

Each invocation in a command scope instance must be in the same queue-family scope instance.

For shaders without defined workgroups, this set of invocations forms an invocation group as defined in the SPIR-V specification.

Primitive

Any fragment shader invocations executed as the result of rasterization of a single primitive form a primitive scope instance.

There is no specific Scope for communication across invocations in a primitive scope instance.

Any generated helper invocations are included in this scope instance.

Each invocation in a primitive scope instance must be in the same command scope instance.

Any input variables decorated with Flat are uniform within a primitive scope instance.

Shader Call

Any shader-call-related invocations that are executed in one or more ray tracing execution models form a shader call scope instance.

The ShaderCallKHR Scope can be used as Memory Scope for barrier and atomic operations.

Each invocation in a shader call scope instance must be in the same queue family scope instance.

Workgroup

A local workgroup is a set of invocations that can synchronize and share data with each other using memory in the Workgroup storage class.

The Workgroup Scope can be used as both an Execution Scope and Memory Scope for barrier and atomic operations.

Each invocation in a local workgroup must be in the same command scope instance.

Only task, mesh, and compute shaders have defined workgroups - other shader types cannot use workgroup functionality. For shaders that have defined workgroups, this set of invocations forms an invocation group as defined in the SPIR-V specification.

When variables declared with the Workgroup storage class are explicitly laid out (hence they are also decorated with Block), the amount of storage consumed is the size of the largest Block variable, not counting any padding at the end. The amount of storage consumed by the non-Block variables declared with the Workgroup storage class is implementation-dependent. However, the amount of storage consumed may not exceed the largest block size that would be obtained if all active non-Block variables declared with Workgroup storage class were assigned offsets in an arbitrary order by successively taking the smallest valid offset according to the Standard Storage Buffer Layout rules, and with Boolean values considered as 32-bit integer values for the purpose of this calculation. (This is equivalent to using the GLSL std430 layout rules.)

Subgroup

A subgroup (see the subsection “Control Flow” of section 2 of the SPIR-V 1.3 Revision 1 specification) is a set of invocations that can synchronize and share data with each other efficiently.

The Subgroup Scope can be used as both an Execution Scope and Memory Scope for barrier and atomic operations. Other subgroup features allow the use of group operations with subgroup scope.

If the shaderSubgroupClock feature is enabled, using the Subgroup Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same subgroup.

For shaders that have defined workgroups, each invocation in a subgroup must be in the same local workgroup.

In other shader stages, each invocation in a subgroup must be in the same device scope instance.

Only shader stages that support subgroup operations have defined subgroups.

Subgroups are not guaranteed to be a subset of a single command in shaders that do not have defined workgroups. Values that are guaranteed to be uniform for a given command or sub command may then not be uniform for the subgroup, and vice versa. As such, applications must take care when dealing with mixed uniformity.

A somewhat common example of this would something like trying to optimize access to per-draw data using subgroup operations:

buffer { uint draw_data[]; };

flat in int vDrawID; // Passed through from vertex shader

void main()
{
    uint local_draw_data = subgroupBroadcastFirst(draw_data[local_draw_data]);
}

This can be done in an attempt to optimize the shader to only perform the loads once per subgroup. However, if the implementation packs multiple draws into a single subgroup, invocations from draws with a different drawID are now receiving data from the wrong invocation. Applications should rely on implementations to do this kind of optimization automatically where the implementation can, rather than trying to force it.

Quad

A quad scope instance is formed of four shader invocations.

In a fragment shader, each invocation in a quad scope instance is formed of invocations in neighboring framebuffer locations (x_i, y_i), where:

i is the index of the invocation within the scope instance.
w and h are the number of pixels the fragment covers in the x and y axes.
w and h are identical for all participating invocations.
(x₀) = (x₁ - w) = (x₂) = (x₃ - w)
(y₀) = (y₁) = (y₂ - h) = (y₃ - h)
Each invocation has the same layer and sample indices.

In a mesh, task, or compute shader, if the DerivativeGroupQuadsKHR execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation IDs (x_i, y_i), where:

i is the index of the invocation within the quad scope instance.
(x₀) = (x₁ - 1) = (x₂) = (x₃ - 1)
(y₀) = (y₁) = (y₂ - 1) = (y₃ - 1)
x₀ and y₀ are integer multiples of 2.
Each invocation has the same z coordinate.

In a mesh, task, or compute shader, if the DerivativeGroupLinearKHR execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation indices (l_i), where:

i is the index of the invocation within the quad scope instance.
(l₀) = (l₁ - 1) = (l₂ - 2) = (l₃ - 3)
l₀ is an integer multiple of 4.

In all shaders, each invocation in a quad scope instance is formed of invocations in adjacent subgroup invocation indices (s_i), where:

i is the index of the invocation within the quad scope instance.
(s₀) = (s₁ - 1) = (s₂ - 2) = (s₃ - 3)
s₀ is an integer multiple of 4.

Each invocation in a quad scope instance must be in the same subgroup.

In a fragment shader, each invocation in a quad scope instance must be in the same primitive scope instance.

Fragment , mesh, task, and compute shaders have defined quad scope instances. If the quadOperationsInAllStages limit is supported, any shader stages that support subgroup operations also have defined quad scope instances.

Fragment Interlock

A fragment interlock scope instance is formed of fragment shader invocations based on their framebuffer locations (x,y,layer,sample), executed by commands inside a single subpass.

The specific set of invocations included varies based on the execution mode as follows:

If the SampleInterlockOrderedEXT or SampleInterlockUnorderedEXT execution modes are used, only invocations with identical framebuffer locations (x,y,layer,sample) are included.
If the PixelInterlockOrderedEXT or PixelInterlockUnorderedEXT execution modes are used, fragments with different sample ids are also included.
If the ShadingRateInterlockOrderedEXT or ShadingRateInterlockUnorderedEXT execution modes are used, fragments from neighboring framebuffer locations are also included. The shading rate image or fragment shading rate determines these fragments.

Only fragment shaders with one of the above execution modes have defined fragment interlock scope instances.

There is no specific Scope value for communication across invocations in a fragment interlock scope instance. However, this is implicitly used as a memory scope by OpBeginInvocationInterlockEXT and OpEndInvocationInterlockEXT.

Each invocation in a fragment interlock scope instance must be in the same queue family scope instance.

Invocation

The smallest scope is a single invocation; this is represented by the Invocation Scope in SPIR-V.

Fragment shader invocations must be in a primitive scope instance.

Invocations in fragment shaders that have a defined fragment interlock scope must be in a fragment interlock scope instance.

Invocations in shaders that have defined workgroups must be in a local workgroup.

Invocations in shaders that have a defined subgroup scope must be in a subgroup.

Invocations in shaders that have a defined quad scope must be in a quad scope instance.

All invocations in all stages must be in a command scope instance.

Group Operations

Group operations are executed by multiple invocations within a scope instance; with each invocation involved in calculating the result. This provides a mechanism for efficient communication between invocations in a particular scope instance.

Group operations all take a Scope defining the desired scope instance to operate within. Only the Subgroup scope can be used for these operations; the subgroupSupportedOperations limit defines which types of operation can be used.

Basic Group Operations

Basic group operations include the use of OpGroupNonUniformElect, OpControlBarrier, OpMemoryBarrier, and atomic operations.

OpGroupNonUniformElect can be used to choose a single invocation to perform a task for the whole group. Only the invocation with the lowest id in the group will return true.

The Memory Model appendix defines the operation of barriers and atomics.

Vote Group Operations

The vote group operations allow invocations within a group to compare values across a group. The types of votes enabled are:

Do all active group invocations agree that an expression is true?
Do any active group invocations evaluate an expression to true?
Do all active group invocations have the same value of an expression?

These operations are useful in combination with control flow in that they allow for developers to check whether conditions match across the group and choose potentially faster code-paths in these cases.

Arithmetic Group Operations

The arithmetic group operations allow invocations to perform scans and reductions across a group. The operators supported are add, mul, min, max, and, or, xor.

For reductions, every invocation in a group will obtain the cumulative result of these operators applied to all values in the group. For exclusive scans, each invocation in a group will obtain the cumulative result of these operators applied to all values in invocations with a lower index in the group. Inclusive scans are identical to exclusive scans, except the cumulative result includes the operator applied to the value in the current invocation.

The order in which these operators are applied is implementation-dependent.

Ballot Group Operations

The ballot group operations allow invocations to perform more complex votes across the group. The ballot functionality allows all invocations within a group to provide a boolean value and get as a result what each invocation provided as their boolean value. The broadcast functionality allows values to be broadcast from an invocation to all other invocations within the group.

Shuffle Group Operations

The shuffle group operations allow invocations to read values from other invocations within a group.

Shuffle Relative Group Operations

The shuffle relative group operations allow invocations to read values from other invocations within the group relative to the current invocation in the group. The relative operations supported allow data to be shifted up and down through the invocations within a group.

Clustered Group Operations

The clustered group operations allow invocations to perform an operation among partitions of a group, such that the operation is only performed within the group invocations within a partition. The partitions for clustered group operations are consecutive power-of-two size groups of invocations and the cluster size must be known at pipeline creation time. The operations supported are add, mul, min, max, and, or, xor.

Rotate Group Operations

The rotate group operations allow invocations to read values from other invocations within the group relative to the current invocation and modulo the size of the group. Clustered rotate group operations perform the same operation within individual partitions of a group.

The partitions for clustered rotate group operations are consecutive power-of-two size groups of invocations and the cluster size must be known at pipeline creation time.

Quad Group Operations

Quad group operations (OpGroupNonUniformQuad*) are a specialized type of group operations that only operate on quad scope instances. Whilst these instructions do include a Scope parameter, this scope is always overridden; only the quad scope instance is included in its execution scope.

Fragment shaders that statically execute either OpGroupNonUniformQuadBroadcast or OpGroupNonUniformQuadSwap must launch sufficient invocations to ensure their correct operation; additional helper invocations are launched for framebuffer locations not covered by rasterized fragments if necessary.

The index used to select participating invocations is i, as described for a quad scope instance, defined as the quad index in the SPIR-V specification.

For OpGroupNonUniformQuadBroadcast this value is equal to Index. For OpGroupNonUniformQuadSwap, it is equal to the implicit Index used by each participating invocation.

Derivative Operations

Derivative operations calculate the partial derivative for an expression P as a function of an invocation’s x and y coordinates.

Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification.

A derivative group in a fragment shader is equivalent to the quad scope instance if the QuadDerivativesKHR execution mode is specified, otherwise it is equivalent to the primitive scope instance. A derivative group in a mesh, task, or compute shader is equivalent to the quad scope instance.

Derivatives are calculated assuming that P is piecewise linear and continuous within the derivative group.

The following control-flow restrictions apply to derivative operations:

If the QuadDerivativesKHR execution mode is specified, dynamic instances of any derivative operations must be executed in control flow that is uniform within the current quad scope instance.
If the QuadDerivativesKHR execution mode is not specified:
- dynamic instances of explicit derivative instructions (OpDPdx*, OpDPdy*, and OpFwidth*) must be executed in control flow that is uniform within a derivative group.
- dynamic instances of implicit derivative operations can be executed in control flow that is not uniform within the derivative group, but results are undefined.

Fragment shaders that statically execute derivative operations must launch sufficient invocations to ensure their correct operation; additional helper invocations are launched for framebuffer locations not covered by rasterized fragments if necessary.

In a mesh, task, or compute shader, it is the application’s responsibility to ensure that sufficient invocations are launched.

Derivative operations calculate their results as the difference between the result of P across invocations in the quad. For fine derivative operations (OpDPdxFine and OpDPdyFine), the values of DPdx(P_i) are calculated as

: DPdx(P₀) = DPdx(P₁) = P₁ - P₀
: DPdx(P₂) = DPdx(P₃) = P₃ - P₂

and the values of DPdy(P_i) are calculated as

: DPdy(P₀) = DPdy(P₂) = P₂ - P₀
: DPdy(P₁) = DPdy(P₃) = P₃ - P₁

where i is the index of each invocation as described in Quad.

Coarse derivative operations (OpDPdxCoarse and OpDPdyCoarse), calculate their results in roughly the same manner, but may only calculate two values instead of four (one for each of DPdx and DPdy), reusing the same result no matter the originating invocation. If an implementation does this, it should use the fine derivative calculations described for P₀.

Derivative values are calculated between fragments rather than pixels. If the fragment shader invocations involved in the calculation cover multiple pixels, these operations cover a wider area, resulting in larger derivative values. This in turn will result in a coarser LOD being selected for image sampling operations using derivatives.

Applications may want to account for this when using multi-pixel fragments; if pixel derivatives are desired, applications should use explicit derivative operations and divide the results by the size of the fragment in each dimension as follows:

: DPdx(P_n)' = DPdx(P_n) / w
: DPdy(P_n)' = DPdy(P_n) / h

where w and h are the size of the fragments in the quad, and DPdx(P_n)' and DPdy(P_n)' are the pixel derivatives.

The results for OpDPdx and OpDPdy may be calculated as either fine or coarse derivatives, with implementations favoring the most efficient approach. Implementations must choose coarse or fine consistently between the two.

Executing OpFwidthFine, OpFwidthCoarse, or OpFwidth is equivalent to executing the corresponding OpDPdx* and OpDPdy* instructions, taking the absolute value of the results, and summing them.

Executing an OpImage*Sample*ImplicitLod instruction is equivalent to executing OpDPdx(Coordinate) and OpDPdy(Coordinate), and passing the results as the Grad operands dx and dy.

It is expected that using the ImplicitLod variants of sampling functions will be substantially more efficient than using the ExplicitLod variants with explicitly generated derivatives.

Helper Invocations

When performing derivative or quad group operations in a fragment shader, additional invocations may be spawned in order to ensure correct results. These additional invocations are known as helper invocations and can be identified by a non-zero value in the HelperInvocation built-in. Stores and atomics performed by helper invocations must not have any effect on memory except for the Function, Private and Output storage classes, and values returned by atomic instructions in helper invocations are undefined.

While storage to Output storage class has an effect even in helper invocations, it does not mean that helper invocations have an effect on the framebuffer. Output variables in fragment shaders can be read from as well, and they behave more like Private variables for the duration of the shader invocation.

If the MaximallyReconvergesKHR execution mode is applied to the entry point, helper invocations must remain active for all instructions for the lifetime of the quad scope instance they are a part of. If the MaximallyReconvergesKHR execution mode is not applied to the entry point, helper invocations may be considered inactive for group operations other than derivative and quad group operations. All invocations in a quad scope instance may become permanently inactive at any point once the only remaining invocations in that quad scope instance are helper invocations.

Cooperative Matrices

A cooperative matrix type is a SPIR-V type where the storage for and computations performed on the matrix are spread across the invocations in a scope instance. These types give the implementation freedom in how to optimize matrix multiplies.

SPIR-V defines the types and instructions, but does not specify rules about what sizes/combinations are valid, and it is expected that different implementations may support different sizes.

To enumerate the supported cooperative matrix types and operations, call:

// Provided by VK_KHR_cooperative_matrix
VkResult vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR(
    VkPhysicalDevice                            physicalDevice,
    uint32_t*                                   pPropertyCount,
    VkCooperativeMatrixPropertiesKHR*           pProperties);

physicalDevice is the physical device.
pPropertyCount is a pointer to an integer related to the number of cooperative matrix properties available or queried.
pProperties is either NULL or a pointer to an array of VkCooperativeMatrixPropertiesKHR structures.

If pProperties is NULL, then the number of cooperative matrix properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the application to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties. If pPropertyCount is less than the number of cooperative matrix properties available, at most pPropertyCount structures will be written, and VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not all the available cooperative matrix properties were returned.

Valid Usage (Implicit)

VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR-physicalDevice-parameter
physicalDevice must be a valid VkPhysicalDevice handle
VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR-pPropertyCount-parameter
pPropertyCount must be a valid pointer to a uint32_t value
VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR-pProperties-parameter
If the value referenced by pPropertyCount is not 0, and pProperties is not NULL, pProperties must be a valid pointer to an array of pPropertyCount VkCooperativeMatrixPropertiesKHR structures

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

To enumerate additional supported cooperative matrix types and operations, call:

// Provided by VK_NV_cooperative_matrix2
VkResult vkGetPhysicalDeviceCooperativeMatrixFlexibleDimensionsPropertiesNV(
    VkPhysicalDevice                            physicalDevice,
    uint32_t*                                   pPropertyCount,
    VkCooperativeMatrixFlexibleDimensionsPropertiesNV* pProperties);

physicalDevice is the physical device.
pPropertyCount is a pointer to an integer related to the number of cooperative matrix properties available or queried.
pProperties is either NULL or a pointer to an array of VkCooperativeMatrixFlexibleDimensionsPropertiesNV structures.

If pProperties is NULL, then the number of flexible dimensions properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the application to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties. If pPropertyCount is less than the number flexible dimensions properties available, at most pPropertyCount structures will be written, and VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not all the available flexible dimensions properties were returned.

If the cooperativeMatrixFlexibleDimensions feature is not supported, the implementation must advertise zero properties.

Valid Usage (Implicit)

VUID-vkGetPhysicalDeviceCooperativeMatrixFlexibleDimensionsPropertiesNV-physicalDevice-parameter
physicalDevice must be a valid VkPhysicalDevice handle
VUID-vkGetPhysicalDeviceCooperativeMatrixFlexibleDimensionsPropertiesNV-pPropertyCount-parameter
pPropertyCount must be a valid pointer to a uint32_t value
VUID-vkGetPhysicalDeviceCooperativeMatrixFlexibleDimensionsPropertiesNV-pProperties-parameter
If the value referenced by pPropertyCount is not 0, and pProperties is not NULL, pProperties must be a valid pointer to an array of pPropertyCount VkCooperativeMatrixFlexibleDimensionsPropertiesNV structures

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

To enumerate the supported cooperative matrix types and operations, call:

// Provided by VK_NV_cooperative_matrix
VkResult vkGetPhysicalDeviceCooperativeMatrixPropertiesNV(
    VkPhysicalDevice                            physicalDevice,
    uint32_t*                                   pPropertyCount,
    VkCooperativeMatrixPropertiesNV*            pProperties);

physicalDevice is the physical device.
pPropertyCount is a pointer to an integer related to the number of cooperative matrix properties available or queried.
pProperties is either NULL or a pointer to an array of VkCooperativeMatrixPropertiesNV structures.

Valid Usage (Implicit)

VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesNV-physicalDevice-parameter
physicalDevice must be a valid VkPhysicalDevice handle
VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesNV-pPropertyCount-parameter
pPropertyCount must be a valid pointer to a uint32_t value
VUID-vkGetPhysicalDeviceCooperativeMatrixPropertiesNV-pProperties-parameter
If the value referenced by pPropertyCount is not 0, and pProperties is not NULL, pProperties must be a valid pointer to an array of pPropertyCount VkCooperativeMatrixPropertiesNV structures

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

Each VkCooperativeMatrixPropertiesKHR or VkCooperativeMatrixPropertiesNV structure describes a single supported combination of types for a matrix multiply/add operation ( OpCooperativeMatrixMulAddKHR or OpCooperativeMatrixMulAddNV ). The multiply can be described in terms of the following variables and types (in SPIR-V pseudocode):

    %A is of type OpTypeCooperativeMatrixKHR %AType %scope %MSize %KSize %MatrixAKHR
    %B is of type OpTypeCooperativeMatrixKHR %BType %scope %KSize %NSize %MatrixBKHR
    %C is of type OpTypeCooperativeMatrixKHR %CType %scope %MSize %NSize %MatrixAccumulatorKHR
    %Result is of type OpTypeCooperativeMatrixKHR %ResultType %scope %MSize %NSize %MatrixAccumulatorKHR

    %Result = %A * %B + %C // using OpCooperativeMatrixMulAddKHR

    %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
    %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
    %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
    %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize

    %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV

A matrix multiply with these dimensions is known as an MxNxK matrix multiply.

The VkCooperativeMatrixPropertiesKHR structure is defined as:

// Provided by VK_KHR_cooperative_matrix
typedef struct VkCooperativeMatrixPropertiesKHR {
    VkStructureType       sType;
    void*                 pNext;
    uint32_t              MSize;
    uint32_t              NSize;
    uint32_t              KSize;
    VkComponentTypeKHR    AType;
    VkComponentTypeKHR    BType;
    VkComponentTypeKHR    CType;
    VkComponentTypeKHR    ResultType;
    VkBool32              saturatingAccumulation;
    VkScopeKHR            scope;
} VkCooperativeMatrixPropertiesKHR;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
MSize is the number of rows in matrices A, C, and Result.
KSize is the number of columns in matrix A and rows in matrix B.
NSize is the number of columns in matrices B, C, Result.
AType is the component type of matrix A, of type VkComponentTypeKHR.
BType is the component type of matrix B, of type VkComponentTypeKHR.
CType is the component type of matrix C, of type VkComponentTypeKHR.
ResultType is the component type of matrix Result, of type VkComponentTypeKHR.
saturatingAccumulation indicates whether the SaturatingAccumulation operand to OpCooperativeMatrixMulAddKHR must be present or not. If it is VK_TRUE, the SaturatingAccumulation operand must be present. If it is VK_FALSE, the SaturatingAccumulation operand must not be present.
scope is the scope of all the matrix types, of type VkScopeKHR.

If some types are preferred over other types (e.g. for performance), they should appear earlier in the list enumerated by vkGetPhysicalDeviceCooperativeMatrixPropertiesKHR.

At least one entry in the list must have power of two values for all of MSize, KSize, and NSize.

If the cooperativeMatrixWorkgroupScope feature is not supported, scope must be VK_SCOPE_SUBGROUP_KHR.

Valid Usage (Implicit)

VUID-VkCooperativeMatrixPropertiesKHR-sType-sType
sType must be VK_STRUCTURE_TYPE_COOPERATIVE_MATRIX_PROPERTIES_KHR
VUID-VkCooperativeMatrixPropertiesKHR-pNext-pNext
pNext must be NULL

The VkCooperativeMatrixFlexibleDimensionsPropertiesNV structure is defined as:

// Provided by VK_NV_cooperative_matrix2
typedef struct VkCooperativeMatrixFlexibleDimensionsPropertiesNV {
    VkStructureType       sType;
    void*                 pNext;
    uint32_t              MGranularity;
    uint32_t              NGranularity;
    uint32_t              KGranularity;
    VkComponentTypeKHR    AType;
    VkComponentTypeKHR    BType;
    VkComponentTypeKHR    CType;
    VkComponentTypeKHR    ResultType;
    VkBool32              saturatingAccumulation;
    VkScopeKHR            scope;
    uint32_t              workgroupInvocations;
} VkCooperativeMatrixFlexibleDimensionsPropertiesNV;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
MGranularity is the granularity of the number of rows in matrices A, C, and Result. The rows must be an integer multiple of this value.
KGranularity is the granularity of columns in matrix A and rows in matrix B. The columns/rows must be an integer multiple of this value.
NGranularity is the granularity of columns in matrices B, C, Result. The columns must be an integer multiple of this value.
AType is the component type of matrix A, of type VkComponentTypeKHR.
BType is the component type of matrix B, of type VkComponentTypeKHR.
CType is the component type of matrix C, of type VkComponentTypeKHR.
ResultType is the component type of matrix Result, of type VkComponentTypeKHR.
saturatingAccumulation indicates whether the SaturatingAccumulation operand to OpCooperativeMatrixMulAddKHR must be present or not. If it is VK_TRUE, the SaturatingAccumulation operand must be present. If it is VK_FALSE, the SaturatingAccumulation operand must not be present.
scope is the scope of all the matrix types, of type VkScopeKHR.
workgroupInvocations is the number of invocations in the local workgroup when this combination of values is supported.

Rather than explicitly enumerating a list of supported sizes, VkCooperativeMatrixFlexibleDimensionsPropertiesNV advertises size granularities, where the matrix must be a multiple of the advertised size. The M and K granularities apply to rows and columns of matrices with Use of MatrixA, K, and N apply to rows and columns of matrices with Use of MatrixB, M, and N apply to rows and columns of matrices with Use of MatrixAccumulator.

For a given type combination, if multiple workgroup sizes are supported there may be multiple VkCooperativeMatrixFlexibleDimensionsPropertiesNV structures with different granularities.

All granularity values must be powers of two.

Different A/B types may require different granularities but share the same accumulator type. In such a case, the supported granularity for a matrix with the accumulator type would be the smallest advertised granularity.

Valid Usage (Implicit)

VUID-VkCooperativeMatrixFlexibleDimensionsPropertiesNV-sType-sType
sType must be VK_STRUCTURE_TYPE_COOPERATIVE_MATRIX_FLEXIBLE_DIMENSIONS_PROPERTIES_NV
VUID-VkCooperativeMatrixFlexibleDimensionsPropertiesNV-pNext-pNext
pNext must be NULL

The VkCooperativeMatrixPropertiesNV structure is defined as:

// Provided by VK_NV_cooperative_matrix
typedef struct VkCooperativeMatrixPropertiesNV {
    VkStructureType      sType;
    void*                pNext;
    uint32_t             MSize;
    uint32_t             NSize;
    uint32_t             KSize;
    VkComponentTypeNV    AType;
    VkComponentTypeNV    BType;
    VkComponentTypeNV    CType;
    VkComponentTypeNV    DType;
    VkScopeNV            scope;
} VkCooperativeMatrixPropertiesNV;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
MSize is the number of rows in matrices A, C, and D.
KSize is the number of columns in matrix A and rows in matrix B.
NSize is the number of columns in matrices B, C, D.
AType is the component type of matrix A, of type VkComponentTypeNV.
BType is the component type of matrix B, of type VkComponentTypeNV.
CType is the component type of matrix C, of type VkComponentTypeNV.
DType is the component type of matrix D, of type VkComponentTypeNV.
scope is the scope of all the matrix types, of type VkScopeNV.

If some types are preferred over other types (e.g. for performance), they should appear earlier in the list enumerated by vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.

At least one entry in the list must have power of two values for all of MSize, KSize, and NSize.

Valid Usage (Implicit)

VUID-VkCooperativeMatrixPropertiesNV-sType-sType
sType must be VK_STRUCTURE_TYPE_COOPERATIVE_MATRIX_PROPERTIES_NV
VUID-VkCooperativeMatrixPropertiesNV-pNext-pNext
pNext must be NULL

Possible values for VkScopeKHR include:

// Provided by VK_KHR_cooperative_matrix
typedef enum VkScopeKHR {
    VK_SCOPE_DEVICE_KHR = 1,
    VK_SCOPE_WORKGROUP_KHR = 2,
    VK_SCOPE_SUBGROUP_KHR = 3,
    VK_SCOPE_QUEUE_FAMILY_KHR = 5,
  // Provided by VK_NV_cooperative_matrix
    VK_SCOPE_DEVICE_NV = VK_SCOPE_DEVICE_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_SCOPE_WORKGROUP_NV = VK_SCOPE_WORKGROUP_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_SCOPE_SUBGROUP_NV = VK_SCOPE_SUBGROUP_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_SCOPE_QUEUE_FAMILY_NV = VK_SCOPE_QUEUE_FAMILY_KHR,
} VkScopeKHR;

or the equivalent

// Provided by VK_NV_cooperative_matrix
typedef VkScopeKHR VkScopeNV;

VK_SCOPE_DEVICE_KHR corresponds to SPIR-V Device scope.
VK_SCOPE_WORKGROUP_KHR corresponds to SPIR-V Workgroup scope.
VK_SCOPE_SUBGROUP_KHR corresponds to SPIR-V Subgroup scope.
VK_SCOPE_QUEUE_FAMILY_KHR corresponds to SPIR-V QueueFamily scope.

All enum values match the corresponding SPIR-V value.

Possible values for VkComponentTypeKHR include:

// Provided by VK_KHR_cooperative_matrix, VK_NV_cooperative_vector
typedef enum VkComponentTypeKHR {
    VK_COMPONENT_TYPE_FLOAT16_KHR = 0,
    VK_COMPONENT_TYPE_FLOAT32_KHR = 1,
    VK_COMPONENT_TYPE_FLOAT64_KHR = 2,
    VK_COMPONENT_TYPE_SINT8_KHR = 3,
    VK_COMPONENT_TYPE_SINT16_KHR = 4,
    VK_COMPONENT_TYPE_SINT32_KHR = 5,
    VK_COMPONENT_TYPE_SINT64_KHR = 6,
    VK_COMPONENT_TYPE_UINT8_KHR = 7,
    VK_COMPONENT_TYPE_UINT16_KHR = 8,
    VK_COMPONENT_TYPE_UINT32_KHR = 9,
    VK_COMPONENT_TYPE_UINT64_KHR = 10,
  // Provided by VK_NV_cooperative_vector
    VK_COMPONENT_TYPE_SINT8_PACKED_NV = 1000491000,
  // Provided by VK_NV_cooperative_vector
    VK_COMPONENT_TYPE_UINT8_PACKED_NV = 1000491001,
  // Provided by VK_NV_cooperative_vector
    VK_COMPONENT_TYPE_FLOAT_E4M3_NV = 1000491002,
  // Provided by VK_NV_cooperative_vector
    VK_COMPONENT_TYPE_FLOAT_E5M2_NV = 1000491003,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_FLOAT16_NV = VK_COMPONENT_TYPE_FLOAT16_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_FLOAT32_NV = VK_COMPONENT_TYPE_FLOAT32_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_FLOAT64_NV = VK_COMPONENT_TYPE_FLOAT64_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_SINT8_NV = VK_COMPONENT_TYPE_SINT8_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_SINT16_NV = VK_COMPONENT_TYPE_SINT16_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_SINT32_NV = VK_COMPONENT_TYPE_SINT32_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_SINT64_NV = VK_COMPONENT_TYPE_SINT64_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_UINT8_NV = VK_COMPONENT_TYPE_UINT8_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_UINT16_NV = VK_COMPONENT_TYPE_UINT16_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_UINT32_NV = VK_COMPONENT_TYPE_UINT32_KHR,
  // Provided by VK_NV_cooperative_matrix
    VK_COMPONENT_TYPE_UINT64_NV = VK_COMPONENT_TYPE_UINT64_KHR,
} VkComponentTypeKHR;

or the equivalent

// Provided by VK_NV_cooperative_matrix
typedef VkComponentTypeKHR VkComponentTypeNV;

VK_COMPONENT_TYPE_FLOAT16_KHR corresponds to SPIR-V OpTypeFloat 16.
VK_COMPONENT_TYPE_FLOAT32_KHR corresponds to SPIR-V OpTypeFloat 32.
VK_COMPONENT_TYPE_FLOAT64_KHR corresponds to SPIR-V OpTypeFloat 64.
VK_COMPONENT_TYPE_SINT8_KHR corresponds to SPIR-V OpTypeInt 8 0/1.
VK_COMPONENT_TYPE_SINT16_KHR corresponds to SPIR-V OpTypeInt 16 0/1.
VK_COMPONENT_TYPE_SINT32_KHR corresponds to SPIR-V OpTypeInt 32 0/1.
VK_COMPONENT_TYPE_SINT64_KHR corresponds to SPIR-V OpTypeInt 64 0/1.
VK_COMPONENT_TYPE_UINT8_KHR corresponds to SPIR-V OpTypeInt 8 0/1.
VK_COMPONENT_TYPE_UINT16_KHR corresponds to SPIR-V OpTypeInt 16 0/1.
VK_COMPONENT_TYPE_UINT32_KHR corresponds to SPIR-V OpTypeInt 32 0/1.
VK_COMPONENT_TYPE_UINT64_KHR corresponds to SPIR-V OpTypeInt 64 0/1.
VK_COMPONENT_TYPE_SINT8_PACKED_NV corresponds to four 8-bit signed integers packed in a 32-bit unsigned integer.
VK_COMPONENT_TYPE_UINT8_PACKED_NV corresponds to four 8-bit unsigned integers packed in a 32-bit unsigned integer.
VK_COMPONENT_TYPE_FLOAT_E4M3_NV corresponds to a floating-point type with a sign bit in the most significant bit, followed by four exponent bits, followed by three mantissa bits.
VK_COMPONENT_TYPE_FLOAT_E5M2_NV corresponds to a floating-point type with a sign bit in the most significant bit, followed by five exponent bits, followed by two mantissa bits.

Cooperative Vectors

A cooperative vector type is a SPIR-V vector type optimized for the evaluation of small neural networks.

SPIR-V defines the types and instructions, but does not specify rules about what combinations of types are valid, and it is expected that different implementations may support different combinations.

To enumerate the supported cooperative vector type combinations, call:

// Provided by VK_NV_cooperative_vector
VkResult vkGetPhysicalDeviceCooperativeVectorPropertiesNV(
    VkPhysicalDevice                            physicalDevice,
    uint32_t*                                   pPropertyCount,
    VkCooperativeVectorPropertiesNV*            pProperties);

physicalDevice is the physical device.
pPropertyCount is a pointer to an integer related to the number of cooperative vector properties available or queried.
pProperties is either NULL or a pointer to an array of VkCooperativeVectorPropertiesNV structures.

If pProperties is NULL, then the number of cooperative vector properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the user to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties. If pPropertyCount is less than the number of cooperative vector properties available, at most pPropertyCount structures will be written, and VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not all the available cooperative vector properties were returned.

Valid Usage (Implicit)

VUID-vkGetPhysicalDeviceCooperativeVectorPropertiesNV-physicalDevice-parameter
physicalDevice must be a valid VkPhysicalDevice handle
VUID-vkGetPhysicalDeviceCooperativeVectorPropertiesNV-pPropertyCount-parameter
pPropertyCount must be a valid pointer to a uint32_t value
VUID-vkGetPhysicalDeviceCooperativeVectorPropertiesNV-pProperties-parameter
If the value referenced by pPropertyCount is not 0, and pProperties is not NULL, pProperties must be a valid pointer to an array of pPropertyCount VkCooperativeVectorPropertiesNV structures

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

Each VkCooperativeVectorPropertiesNV structure describes a single supported combination of types for a matrix-vector multiply (or multiply-add) operation (OpCooperativeVectorMatrixMulNV or OpCooperativeVectorMatrixMulAddNV).

The VkCooperativeVectorPropertiesNV structure is defined as:

// Provided by VK_NV_cooperative_vector
typedef struct VkCooperativeVectorPropertiesNV {
    VkStructureType       sType;
    void*                 pNext;
    VkComponentTypeKHR    inputType;
    VkComponentTypeKHR    inputInterpretation;
    VkComponentTypeKHR    matrixInterpretation;
    VkComponentTypeKHR    biasInterpretation;
    VkComponentTypeKHR    resultType;
    VkBool32              transpose;
} VkCooperativeVectorPropertiesNV;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
inputType is the component type of vector Input, of type VkComponentTypeKHR.
inputInterpretation is the value of InputInterpretation, of type VkComponentTypeKHR.
matrixInterpretation is the value of MatrixInterpretation, of type VkComponentTypeKHR.
biasInterpretation is the value of BiasInterpretation, of type VkComponentTypeKHR.
resultType is the component type of Result Type, of type VkComponentTypeKHR.
transpose is a boolean indicating whether opaque layout matrices with this combination of input and output types supports transposition.

VK_COMPONENT_TYPE_SINT8_PACKED_NV and VK_COMPONENT_TYPE_UINT8_PACKED_NV must not be used for members other than inputInterpretation.

The following combinations must be supported (each row is a required combination):

inputType	inputInterpretation	matrixInterpretation	biasInterpretation	resultType
FLOAT16	FLOAT16	FLOAT16	FLOAT16	FLOAT16
UINT32	SINT8_PACKED	SINT8	SINT32	SINT32
SINT8	SINT8	SINT8	SINT32	SINT32
FLOAT32	SINT8	SINT8	SINT32	SINT32
FLOAT16	FLOAT_E4M3	FLOAT_E4M3	FLOAT16	FLOAT16
FLOAT16	FLOAT_E5M2	FLOAT_E5M2	FLOAT16	FLOAT16

inputType

inputInterpretation

matrixInterpretation

biasInterpretation

resultType

FLOAT16

UINT32

SINT8_PACKED

SINT8

SINT32

SINT8

SINT32

FLOAT32

SINT8

SINT32

FLOAT16

FLOAT_E4M3

FLOAT16

FLOAT_E5M2

FLOAT16

Valid Usage (Implicit)

VUID-VkCooperativeVectorPropertiesNV-sType-sType
sType must be VK_STRUCTURE_TYPE_COOPERATIVE_VECTOR_PROPERTIES_NV
VUID-VkCooperativeVectorPropertiesNV-pNext-pNext
pNext must be NULL
VUID-VkCooperativeVectorPropertiesNV-inputType-parameter
inputType must be a valid VkComponentTypeKHR value
VUID-VkCooperativeVectorPropertiesNV-inputInterpretation-parameter
inputInterpretation must be a valid VkComponentTypeKHR value
VUID-VkCooperativeVectorPropertiesNV-matrixInterpretation-parameter
matrixInterpretation must be a valid VkComponentTypeKHR value
VUID-VkCooperativeVectorPropertiesNV-biasInterpretation-parameter
biasInterpretation must be a valid VkComponentTypeKHR value
VUID-VkCooperativeVectorPropertiesNV-resultType-parameter
resultType must be a valid VkComponentTypeKHR value

To query the size of a cooperative vector matrix, or to convert a matrix to another layout and type, call:

// Provided by VK_NV_cooperative_vector
VkResult vkConvertCooperativeVectorMatrixNV(
    VkDevice                                    device,
    const VkConvertCooperativeVectorMatrixInfoNV* pInfo);

device is the device.
pInfo is a pointer to a VkConvertCooperativeVectorMatrixInfoNV structure containing information about the layout conversion.

If pInfo->dstData is NULL, then the number of bytes required to store the converted matrix is returned in pDstSize. Otherwise, pInfo->pDstSize must point to a variable set by the user to the number of bytes in pInfo->dstData, and on return the variable is overwritten with the number of bytes actually written to pInfo->dstData. pInfo->srcData can be NULL when pInfo->dstData is NULL. If pInfo->pDstSize is less than the number of bytes required to store the converted matrix, no bytes will be written, and VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not enough space was provided.

Valid Usage

VUID-vkConvertCooperativeVectorMatrixNV-pInfo-10073
If pInfo->srcData.hostAddress is NULL, then pInfo->dstData.hostAddress must be NULL
VUID-vkConvertCooperativeVectorMatrixNV-pInfo-10074
If pInfo->srcData.hostAddress is not NULL, then pInfo->srcSize must be large enough to contain the source matrix, based either on the standard matrix layout or based on the size filled out by this command
VUID-vkConvertCooperativeVectorMatrixNV-pInfo-10075
If pInfo->dstData.hostAddress is not NULL, then the value pointed to by pInfo->pDstSize must be large enough to contain the destination matrix, based either on the standard matrix layout or based on the size filled out by this command
VUID-vkConvertCooperativeVectorMatrixNV-pInfo-10076
If pInfo->dstData.hostAddress is not NULL, the source and destination memory ranges must not overlap

Valid Usage (Implicit)

VUID-vkConvertCooperativeVectorMatrixNV-device-parameter
device must be a valid VkDevice handle
VUID-vkConvertCooperativeVectorMatrixNV-pInfo-parameter
pInfo must be a valid pointer to a valid VkConvertCooperativeVectorMatrixInfoNV structure

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY

Each VkConvertCooperativeVectorMatrixInfoNV structure describes a request to convert the layout and type of a cooperative vector matrix.

The VkConvertCooperativeVectorMatrixInfoNV structure is defined as:

// Provided by VK_NV_cooperative_vector
typedef struct VkConvertCooperativeVectorMatrixInfoNV {
    VkStructureType                      sType;
    const void*                          pNext;
    size_t                               srcSize;
    VkDeviceOrHostAddressConstKHR        srcData;
    size_t*                              pDstSize;
    VkDeviceOrHostAddressKHR             dstData;
    VkComponentTypeKHR                   srcComponentType;
    VkComponentTypeKHR                   dstComponentType;
    uint32_t                             numRows;
    uint32_t                             numColumns;
    VkCooperativeVectorMatrixLayoutNV    srcLayout;
    size_t                               srcStride;
    VkCooperativeVectorMatrixLayoutNV    dstLayout;
    size_t                               dstStride;
} VkConvertCooperativeVectorMatrixInfoNV;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
srcSize is the length in bytes of srcData.
srcData is either NULL or a pointer to the source data in the source layout.
pDstSize is a pointer to an integer related to the number of bytes required or requested to convert.
dstData is either NULL or a pointer to the destination data in the destination layout.
srcComponentType is the type of a source matrix element.
dstComponentType is the type of a destination matrix element.
numRows is the number of rows in the matrix.
numColumns is the number of columns in the matrix.
srcLayout is the layout of the source matrix.
srcStride is the number of bytes between a consecutive row or column (depending on srcLayout) of the source matrix, if it is row-major or column-major.
dstLayout is the layout the matrix is converted to.
dstStride is the number of bytes between a consecutive row or column (depending on dstLayout) of destination matrix, if it is row-major or column-major.

When called from vkCmdConvertCooperativeVectorMatrixNV, the deviceAddress members of srcData and dstData are used. When called from vkConvertCooperativeVectorMatrixNV, the hostAddress members of srcData and dstData are used.

For each of the source and destination matrix, if the layout is not either VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_ROW_MAJOR_NV or VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_COLUMN_MAJOR_NV, then the corresponding stride parameter is ignored.

The size of the destination is only a function of the destination layout information, and does not depend on the source layout information.

Conversion can be used to convert between VK_COMPONENT_TYPE_FLOAT32_KHR or VK_COMPONENT_TYPE_FLOAT16_KHR and any supported lower-precision floating-point type. In this case, the conversion uses round-to-nearest-even rounding.

Valid Usage

VUID-VkConvertCooperativeVectorMatrixInfoNV-srcLayout-10077
If srcLayout is row-major or column-major, then srcStride must be greater than the length of a row/column, and a multiple of the element size
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstLayout-10078
If dstLayout is row-major or column-major, then dstStride must be greater than the length of a row/column, and a multiple of the element size
VUID-VkConvertCooperativeVectorMatrixInfoNV-srcComponentType-10079
If srcComponentType is not a supported VkCooperativeVectorPropertiesNV::matrixInterpretation value as reported by vkGetPhysicalDeviceCooperativeVectorPropertiesNV, then srcComponentType must be VK_COMPONENT_TYPE_FLOAT32_KHR
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstComponentType-10080
If dstComponentType is not a supported VkCooperativeVectorPropertiesNV::matrixInterpretation value as reported by vkGetPhysicalDeviceCooperativeVectorPropertiesNV, then dstComponentType must be VK_COMPONENT_TYPE_FLOAT32_KHR
VUID-VkConvertCooperativeVectorMatrixInfoNV-srcComponentType-10081
If srcComponentType and dstComponentType are not equal, then one must be VK_COMPONENT_TYPE_FLOAT32_KHR or VK_COMPONENT_TYPE_FLOAT16_KHR and the other must be a lower-precision floating-point type
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstComponentType-10082
If dstComponentType is VK_COMPONENT_TYPE_FLOAT_E4M3_NV or VK_COMPONENT_TYPE_FLOAT_E5M2_NV, then dstLayout must be VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_INFERENCING_OPTIMAL_NV or VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_TRAINING_OPTIMAL_NV

Valid Usage (Implicit)

VUID-VkConvertCooperativeVectorMatrixInfoNV-sType-sType
sType must be VK_STRUCTURE_TYPE_CONVERT_COOPERATIVE_VECTOR_MATRIX_INFO_NV
VUID-VkConvertCooperativeVectorMatrixInfoNV-pNext-pNext
pNext must be NULL
VUID-VkConvertCooperativeVectorMatrixInfoNV-srcData-parameter
srcData must be a valid VkDeviceOrHostAddressConstKHR union
VUID-VkConvertCooperativeVectorMatrixInfoNV-pDstSize-parameter
pDstSize must be a valid pointer to a size_t value
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstData-parameter
dstData must be a valid VkDeviceOrHostAddressKHR union
VUID-VkConvertCooperativeVectorMatrixInfoNV-srcComponentType-parameter
srcComponentType must be a valid VkComponentTypeKHR value
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstComponentType-parameter
dstComponentType must be a valid VkComponentTypeKHR value
VUID-VkConvertCooperativeVectorMatrixInfoNV-srcLayout-parameter
srcLayout must be a valid VkCooperativeVectorMatrixLayoutNV value
VUID-VkConvertCooperativeVectorMatrixInfoNV-dstLayout-parameter
dstLayout must be a valid VkCooperativeVectorMatrixLayoutNV value

Possible values for VkCooperativeVectorMatrixLayoutNV include:

// Provided by VK_NV_cooperative_vector
typedef enum VkCooperativeVectorMatrixLayoutNV {
    VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_ROW_MAJOR_NV = 0,
    VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_COLUMN_MAJOR_NV = 1,
    VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_INFERENCING_OPTIMAL_NV = 2,
    VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_TRAINING_OPTIMAL_NV = 3,
} VkCooperativeVectorMatrixLayoutNV;

VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_ROW_MAJOR_NV corresponds to SPIR-V RowMajorNV layout.
VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_COLUMN_MAJOR_NV corresponds to SPIR-V ColumnMajorNV layout.
VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_INFERENCING_OPTIMAL_NV corresponds to SPIR-V InferencingOptimalNV layout.
VK_COOPERATIVE_VECTOR_MATRIX_LAYOUT_TRAINING_OPTIMAL_NV corresponds to SPIR-V TrainingOptimalNV layout.

All enum values match the corresponding SPIR-V value.

Row-major layout has elements of each row stored consecutively in memory, with a controllable stride from the start of one row to the start of the next row. Column-major layout has elements of each column stored consecutively in memory, with a controllable stride from the start of one column to the start of the next column. Inferencing-optimal and Training-optimal layouts are implementation-dependent, and the application can convert a matrix to those layouts using vkConvertCooperativeVectorMatrixNV or vkCmdConvertCooperativeVectorMatrixNV. Training-optimal layout with VK_COMPONENT_TYPE_FLOAT16_KHR or VK_COMPONENT_TYPE_FLOAT32_KHR type has the additional guarantee that the application can reinterpret the data as an array of elements and perform element-wise operations on the data, and finite values in any padding elements do not affect the result of a matrix-vector multiply (inf/NaN values may still cause NaN values in the result).

To convert a matrix to another layout and type, call:

// Provided by VK_NV_cooperative_vector
void vkCmdConvertCooperativeVectorMatrixNV(
    VkCommandBuffer                             commandBuffer,
    uint32_t                                    infoCount,
    const VkConvertCooperativeVectorMatrixInfoNV* pInfos);

commandBuffer is the command buffer into which the command will be recorded.
infoCount is the number of layout conversions to perform.
pInfos is a pointer to an array of VkConvertCooperativeVectorMatrixInfoNV structures containing information about the layout conversion.

This command does the same conversions as vkConvertCooperativeVectorMatrixNV, but executes on the device. One conversion is performed for each of the infoCount elements of pInfos.

This command’s execution is synchronized using VK_PIPELINE_STAGE_2_CONVERT_COOPERATIVE_VECTOR_MATRIX_BIT_NV.

Valid Usage

VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfo-10083
For each element of pInfo, srcData::deviceAddress and dstData::deviceAddress must be valid device addresses
VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfo-10084
For each element of pInfo, srcData::deviceAddress must be 64 byte aligned
VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfo-10085
For each element of pInfo, dstData::deviceAddress must be 64 byte aligned
VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfo-10086
For each element of pInfo, srcSize must be large enough to contain the source matrix, based either on the standard matrix layout or based on the size filled out by vkConvertCooperativeVectorMatrixNV
VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfo-10087
For each element of pInfo, the value pointed to by pDstSize must be large enough to contain the destination matrix, based either on the standard matrix layout or based on the size filled out by vkConvertCooperativeVectorMatrixNV
VUID-vkCmdConvertCooperativeVectorMatrixNV-None-10088
Memory accessed by the sources and destinations of all of the conversions must not overlap

Valid Usage (Implicit)

VUID-vkCmdConvertCooperativeVectorMatrixNV-commandBuffer-parameter
commandBuffer must be a valid VkCommandBuffer handle
VUID-vkCmdConvertCooperativeVectorMatrixNV-pInfos-parameter
pInfos must be a valid pointer to an array of infoCount valid VkConvertCooperativeVectorMatrixInfoNV structures
VUID-vkCmdConvertCooperativeVectorMatrixNV-commandBuffer-recording
commandBuffer must be in the recording state
VUID-vkCmdConvertCooperativeVectorMatrixNV-commandBuffer-cmdpool
The VkCommandPool that commandBuffer was allocated from must support graphics, or compute operations
VUID-vkCmdConvertCooperativeVectorMatrixNV-renderpass
This command must only be called outside of a render pass instance
VUID-vkCmdConvertCooperativeVectorMatrixNV-videocoding
This command must only be called outside of a video coding scope
VUID-vkCmdConvertCooperativeVectorMatrixNV-infoCount-arraylength
infoCount must be greater than 0

Host Synchronization

Host access to commandBuffer must be externally synchronized
Host access to the VkCommandPool that commandBuffer was allocated from must be externally synchronized

Command Properties

Command Buffer Levels	Render Pass Scope	Video Coding Scope	Supported Queue Types	Command Type
Primary Secondary	Outside	Outside	Graphics Compute	Action

Command Buffer Levels

Render Pass Scope

Video Coding Scope

Supported Queue Types

Command Type

Primary
Secondary

Outside

Graphics
Compute

Action

Validation Cache

Validation cache objects allow the result of internal validation to be reused, both within a single application run and between multiple runs. Reuse within a single run is achieved by passing the same validation cache object when creating supported Vulkan objects. Reuse across runs of an application is achieved by retrieving validation cache contents in one run of an application, saving the contents, and using them to preinitialize a validation cache on a subsequent run. The contents of the validation cache objects are managed by the validation layers. Applications can manage the host memory consumed by a validation cache object and control the amount of data retrieved from a validation cache object.

Validation cache objects are represented by VkValidationCacheEXT handles:

// Provided by VK_EXT_validation_cache
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkValidationCacheEXT)

To create validation cache objects, call:

// Provided by VK_EXT_validation_cache
VkResult vkCreateValidationCacheEXT(
    VkDevice                                    device,
    const VkValidationCacheCreateInfoEXT*       pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkValidationCacheEXT*                       pValidationCache);

device is the logical device that creates the validation cache object.
pCreateInfo is a pointer to a VkValidationCacheCreateInfoEXT structure containing the initial parameters for the validation cache object.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.
pValidationCache is a pointer to a VkValidationCacheEXT handle in which the resulting validation cache object is returned.

Applications can track and manage the total host memory size of a validation cache object using the pAllocator. Applications can limit the amount of data retrieved from a validation cache object in vkGetValidationCacheDataEXT. Implementations should not internally limit the total number of entries added to a validation cache object or the total host memory consumed.

Once created, a validation cache can be passed to the vkCreateShaderModule command by adding this object to the VkShaderModuleCreateInfo structure’s pNext chain. If a VkShaderModuleValidationCacheCreateInfoEXT object is included in the VkShaderModuleCreateInfo::pNext chain, and its validationCache field is not VK_NULL_HANDLE, the implementation will query it for possible reuse opportunities and update it with new content. The use of the validation cache object in these commands is internally synchronized, and the same validation cache object can be used in multiple threads simultaneously.

Implementations should make every effort to limit any critical sections to the actual accesses to the cache, which is expected to be significantly shorter than the duration of the vkCreateShaderModule command.

Valid Usage (Implicit)

VUID-vkCreateValidationCacheEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkCreateValidationCacheEXT-pCreateInfo-parameter
pCreateInfo must be a valid pointer to a valid VkValidationCacheCreateInfoEXT structure
VUID-vkCreateValidationCacheEXT-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkCreateValidationCacheEXT-pValidationCache-parameter
pValidationCache must be a valid pointer to a VkValidationCacheEXT handle

Return Codes

Success

VK_SUCCESS

Failure

VK_ERROR_OUT_OF_HOST_MEMORY

The VkValidationCacheCreateInfoEXT structure is defined as:

// Provided by VK_EXT_validation_cache
typedef struct VkValidationCacheCreateInfoEXT {
    VkStructureType                    sType;
    const void*                        pNext;
    VkValidationCacheCreateFlagsEXT    flags;
    size_t                             initialDataSize;
    const void*                        pInitialData;
} VkValidationCacheCreateInfoEXT;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
flags is reserved for future use.
initialDataSize is the number of bytes in pInitialData. If initialDataSize is zero, the validation cache will initially be empty.
pInitialData is a pointer to previously retrieved validation cache data. If the validation cache data is incompatible (as defined below) with the device, the validation cache will be initially empty. If initialDataSize is zero, pInitialData is ignored.

Valid Usage

VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01534
If initialDataSize is not 0, it must be equal to the size of pInitialData, as returned by vkGetValidationCacheDataEXT when pInitialData was originally retrieved
VUID-VkValidationCacheCreateInfoEXT-initialDataSize-01535
If initialDataSize is not 0, pInitialData must have been retrieved from a previous call to vkGetValidationCacheDataEXT

Valid Usage (Implicit)

VUID-VkValidationCacheCreateInfoEXT-sType-sType
sType must be VK_STRUCTURE_TYPE_VALIDATION_CACHE_CREATE_INFO_EXT
VUID-VkValidationCacheCreateInfoEXT-pNext-pNext
pNext must be NULL
VUID-VkValidationCacheCreateInfoEXT-flags-zerobitmask
flags must be 0
VUID-VkValidationCacheCreateInfoEXT-pInitialData-parameter
If initialDataSize is not 0, pInitialData must be a valid pointer to an array of initialDataSize bytes

// Provided by VK_EXT_validation_cache
typedef VkFlags VkValidationCacheCreateFlagsEXT;

VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, but is currently reserved for future use.

Validation cache objects can be merged using the command:

// Provided by VK_EXT_validation_cache
VkResult vkMergeValidationCachesEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        dstCache,
    uint32_t                                    srcCacheCount,
    const VkValidationCacheEXT*                 pSrcCaches);

device is the logical device that owns the validation cache objects.
dstCache is the handle of the validation cache to merge results into.
srcCacheCount is the length of the pSrcCaches array.
pSrcCaches is a pointer to an array of validation cache handles, which will be merged into dstCache. The previous contents of dstCache are included after the merge.

The details of the merge operation are implementation-dependent, but implementations should merge the contents of the specified validation caches and prune duplicate entries.

Valid Usage

VUID-vkMergeValidationCachesEXT-dstCache-01536
dstCache must not appear in the list of source caches

Valid Usage (Implicit)

VUID-vkMergeValidationCachesEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkMergeValidationCachesEXT-dstCache-parameter
dstCache must be a valid VkValidationCacheEXT handle
VUID-vkMergeValidationCachesEXT-pSrcCaches-parameter
pSrcCaches must be a valid pointer to an array of srcCacheCount valid VkValidationCacheEXT handles
VUID-vkMergeValidationCachesEXT-srcCacheCount-arraylength
srcCacheCount must be greater than 0
VUID-vkMergeValidationCachesEXT-dstCache-parent
dstCache must have been created, allocated, or retrieved from device
VUID-vkMergeValidationCachesEXT-pSrcCaches-parent
Each element of pSrcCaches must have been created, allocated, or retrieved from device

Host Synchronization

Host access to dstCache must be externally synchronized

Return Codes

Success

VK_SUCCESS

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

Data can be retrieved from a validation cache object using the command:

// Provided by VK_EXT_validation_cache
VkResult vkGetValidationCacheDataEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        validationCache,
    size_t*                                     pDataSize,
    void*                                       pData);

device is the logical device that owns the validation cache.
validationCache is the validation cache to retrieve data from.
pDataSize is a pointer to a value related to the amount of data in the validation cache, as described below.
pData is either NULL or a pointer to a buffer.

If pData is NULL, then the maximum size of the data that can be retrieved from the validation cache, in bytes, is returned in pDataSize. Otherwise, pDataSize must point to a variable set by the application to the size of the buffer, in bytes, pointed to by pData, and on return the variable is overwritten with the amount of data actually written to pData. If pDataSize is less than the maximum size that can be retrieved by the validation cache, at most pDataSize bytes will be written to pData, and vkGetValidationCacheDataEXT will return VK_INCOMPLETE instead of VK_SUCCESS, to indicate that not all of the validation cache was returned.

Any data written to pData is valid and can be provided as the pInitialData member of the VkValidationCacheCreateInfoEXT structure passed to vkCreateValidationCacheEXT.

Two calls to vkGetValidationCacheDataEXT with the same parameters must retrieve the same data unless a command that modifies the contents of the cache is called between them.

Applications can store the data retrieved from the validation cache, and use these data, possibly in a future run of the application, to populate new validation cache objects. The results of validation, however, may depend on the vendor ID, device ID, driver version, and other details of the device. To enable applications to detect when previously retrieved data is incompatible with the device, the initial bytes written to pData must be a header consisting of the following members:

Table 1. Layout for Validation Cache Header Version `VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT`
Offset	Size	Meaning
0	4	length in bytes of the entire validation cache header written as a stream of bytes, with the least significant byte first
4	4	a VkValidationCacheHeaderVersionEXT value written as a stream of bytes, with the least significant byte first
8	`VK_UUID_SIZE`	a layer commit ID expressed as a UUID, which uniquely identifies the version of the validation layers used to generate these validation results

The first four bytes encode the length of the entire validation cache header, in bytes. This value includes all fields in the header including the validation cache version field and the size of the length field.

The next four bytes encode the validation cache version, as described for VkValidationCacheHeaderVersionEXT. A consumer of the validation cache should use the cache version to interpret the remainder of the cache header.

If pDataSize is less than what is necessary to store this header, nothing will be written to pData and zero will be written to pDataSize.

Valid Usage (Implicit)

VUID-vkGetValidationCacheDataEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkGetValidationCacheDataEXT-validationCache-parameter
validationCache must be a valid VkValidationCacheEXT handle
VUID-vkGetValidationCacheDataEXT-pDataSize-parameter
pDataSize must be a valid pointer to a size_t value
VUID-vkGetValidationCacheDataEXT-pData-parameter
If the value referenced by pDataSize is not 0, and pData is not NULL, pData must be a valid pointer to an array of pDataSize bytes
VUID-vkGetValidationCacheDataEXT-validationCache-parent
validationCache must have been created, allocated, or retrieved from device

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_OUT_OF_HOST_MEMORY
VK_ERROR_OUT_OF_DEVICE_MEMORY

Possible values of the second group of four bytes in the header returned by vkGetValidationCacheDataEXT, encoding the validation cache version, are:

// Provided by VK_EXT_validation_cache
typedef enum VkValidationCacheHeaderVersionEXT {
    VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT = 1,
} VkValidationCacheHeaderVersionEXT;

VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one of the validation cache.

To destroy a validation cache, call:

// Provided by VK_EXT_validation_cache
void vkDestroyValidationCacheEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        validationCache,
    const VkAllocationCallbacks*                pAllocator);

device is the logical device that destroys the validation cache object.
validationCache is the handle of the validation cache to destroy.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.

Valid Usage

VUID-vkDestroyValidationCacheEXT-validationCache-01537
If VkAllocationCallbacks were provided when validationCache was created, a compatible set of callbacks must be provided here
VUID-vkDestroyValidationCacheEXT-validationCache-01538
If no VkAllocationCallbacks were provided when validationCache was created, pAllocator must be NULL

Valid Usage (Implicit)

VUID-vkDestroyValidationCacheEXT-device-parameter
device must be a valid VkDevice handle
VUID-vkDestroyValidationCacheEXT-validationCache-parameter
If validationCache is not VK_NULL_HANDLE, validationCache must be a valid VkValidationCacheEXT handle
VUID-vkDestroyValidationCacheEXT-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkDestroyValidationCacheEXT-validationCache-parent
If validationCache is a valid handle, it must have been created, allocated, or retrieved from device

Host Synchronization

Host access to validationCache must be externally synchronized

CUDA Modules

Creating a CUDA Module

CUDA modules must contain some kernel code and must expose at least one function entry point.

CUDA modules are represented by VkCudaModuleNV handles:

// Provided by VK_NV_cuda_kernel_launch
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkCudaModuleNV)

To create a CUDA module, call:

// Provided by VK_NV_cuda_kernel_launch
VkResult vkCreateCudaModuleNV(
    VkDevice                                    device,
    const VkCudaModuleCreateInfoNV*             pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkCudaModuleNV*                             pModule);

device is the logical device that creates the shader module.
pCreateInfo is a pointer to a VkCudaModuleCreateInfoNV structure.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.
pModule is a pointer to a VkCudaModuleNV handle in which the resulting CUDA module object is returned.

Once a CUDA module has been created, the application may create the function entry point, which must refer to one function in the module.

Valid Usage (Implicit)

VUID-vkCreateCudaModuleNV-device-parameter
device must be a valid VkDevice handle
VUID-vkCreateCudaModuleNV-pCreateInfo-parameter
pCreateInfo must be a valid pointer to a valid VkCudaModuleCreateInfoNV structure
VUID-vkCreateCudaModuleNV-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkCreateCudaModuleNV-pModule-parameter
pModule must be a valid pointer to a VkCudaModuleNV handle

Return Codes

Success

VK_SUCCESS

Failure

VK_ERROR_INITIALIZATION_FAILED
VK_ERROR_OUT_OF_HOST_MEMORY

The VkCudaModuleCreateInfoNV structure is defined as:

// Provided by VK_NV_cuda_kernel_launch
typedef struct VkCudaModuleCreateInfoNV {
    VkStructureType    sType;
    const void*        pNext;
    size_t             dataSize;
    const void*        pData;
} VkCudaModuleCreateInfoNV;

sType is a VkStructureType value identifying this structure.
pNext may be NULL or may be a pointer to a structure extending this structure.
dataSize is the length of the pData array.
pData is a pointer to CUDA code

Valid Usage

VUID-VkCudaModuleCreateInfoNV-dataSize-09413
dataSize must be the total size in bytes of the PTX files or binary cache passed to pData

Valid Usage (Implicit)

VUID-VkCudaModuleCreateInfoNV-sType-sType
sType must be VK_STRUCTURE_TYPE_CUDA_MODULE_CREATE_INFO_NV
VUID-VkCudaModuleCreateInfoNV-pNext-pNext
pNext must be NULL
VUID-VkCudaModuleCreateInfoNV-pData-parameter
pData must be a valid pointer to an array of dataSize bytes
VUID-VkCudaModuleCreateInfoNV-dataSize-arraylength
dataSize must be greater than 0

Creating a CUDA Function Handle

CUDA functions are represented by VkCudaFunctionNV handles. Handles to global functions may then be used to issue a kernel launch (i.e. dispatch) from a commandbuffer. See Dispatching Command for CUDA PTX kernel.

// Provided by VK_NV_cuda_kernel_launch
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkCudaFunctionNV)

To create a CUDA function, call:

// Provided by VK_NV_cuda_kernel_launch
VkResult vkCreateCudaFunctionNV(
    VkDevice                                    device,
    const VkCudaFunctionCreateInfoNV*           pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkCudaFunctionNV*                           pFunction);

device is the logical device that creates the shader module.
pCreateInfo is a pointer to a VkCudaFunctionCreateInfoNV structure.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.
pFunction is a pointer to a VkCudaFunctionNV handle in which the resulting CUDA function object is returned.

Valid Usage (Implicit)

VUID-vkCreateCudaFunctionNV-device-parameter
device must be a valid VkDevice handle
VUID-vkCreateCudaFunctionNV-pCreateInfo-parameter
pCreateInfo must be a valid pointer to a valid VkCudaFunctionCreateInfoNV structure
VUID-vkCreateCudaFunctionNV-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkCreateCudaFunctionNV-pFunction-parameter
pFunction must be a valid pointer to a VkCudaFunctionNV handle

Return Codes

Success

VK_SUCCESS

Failure

VK_ERROR_INITIALIZATION_FAILED
VK_ERROR_OUT_OF_HOST_MEMORY

The VkCudaFunctionCreateInfoNV structure is defined as:

// Provided by VK_NV_cuda_kernel_launch
typedef struct VkCudaFunctionCreateInfoNV {
    VkStructureType    sType;
    const void*        pNext;
    VkCudaModuleNV     module;
    const char*        pName;
} VkCudaFunctionCreateInfoNV;

sType is a VkStructureType value identifying this structure.
pNext is NULL or a pointer to a structure extending this structure.
module is the CUDA VkCudaModuleNV module in which the function resides.
pName is a null-terminated UTF-8 string containing the name of the shader entry point for this stage.

Valid Usage (Implicit)

VUID-VkCudaFunctionCreateInfoNV-sType-sType
sType must be VK_STRUCTURE_TYPE_CUDA_FUNCTION_CREATE_INFO_NV
VUID-VkCudaFunctionCreateInfoNV-pNext-pNext
pNext must be NULL
VUID-VkCudaFunctionCreateInfoNV-module-parameter
module must be a valid VkCudaModuleNV handle
VUID-VkCudaFunctionCreateInfoNV-pName-parameter
pName must be a null-terminated UTF-8 string

Destroying a CUDA Function

To destroy a CUDA function handle, call:

// Provided by VK_NV_cuda_kernel_launch
void vkDestroyCudaFunctionNV(
    VkDevice                                    device,
    VkCudaFunctionNV                            function,
    const VkAllocationCallbacks*                pAllocator);

device is the logical device that destroys the Function.
function is the handle of the CUDA function to destroy.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.

Valid Usage (Implicit)

VUID-vkDestroyCudaFunctionNV-device-parameter
device must be a valid VkDevice handle
VUID-vkDestroyCudaFunctionNV-function-parameter
function must be a valid VkCudaFunctionNV handle
VUID-vkDestroyCudaFunctionNV-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkDestroyCudaFunctionNV-function-parent
function must have been created, allocated, or retrieved from device

Destroying a CUDA Module

To destroy a CUDA shader module, call:

// Provided by VK_NV_cuda_kernel_launch
void vkDestroyCudaModuleNV(
    VkDevice                                    device,
    VkCudaModuleNV                              module,
    const VkAllocationCallbacks*                pAllocator);

device is the logical device that destroys the shader module.
module is the handle of the CUDA module to destroy.
pAllocator controls host memory allocation as described in the Memory Allocation chapter.

Valid Usage (Implicit)

VUID-vkDestroyCudaModuleNV-device-parameter
device must be a valid VkDevice handle
VUID-vkDestroyCudaModuleNV-module-parameter
module must be a valid VkCudaModuleNV handle
VUID-vkDestroyCudaModuleNV-pAllocator-parameter
If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure
VUID-vkDestroyCudaModuleNV-module-parent
module must have been created, allocated, or retrieved from device

Reading back CUDA Module Cache

After uploading the PTX kernel code, the module compiles the code to generate a binary cache with all the necessary information for the device to execute it. It is possible to read back this cache for later use, such as accelerating the initialization of further executions.

To get the CUDA module cache call:

// Provided by VK_NV_cuda_kernel_launch
VkResult vkGetCudaModuleCacheNV(
    VkDevice                                    device,
    VkCudaModuleNV                              module,
    size_t*                                     pCacheSize,
    void*                                       pCacheData);

device is the logical device that destroys the Function.
module is the CUDA module.
pCacheSize is a pointer containing the amount of bytes to be copied in pCacheData
pCacheData is a pointer to a buffer in which to copy the binary cache

If pCacheData is NULL, then the size of the binary cache, in bytes, is returned in pCacheSize. Otherwise, pCacheSize must point to a variable set by the application to the size of the buffer, in bytes, pointed to by pCacheData, and on return the variable is overwritten with the amount of data actually written to pCacheData. If pCacheSize is less than the size of the binary shader code, nothing is written to pCacheData, and VK_INCOMPLETE will be returned instead of VK_SUCCESS.

The returned cache may then be used later for further initialization of the CUDA module, by sending this cache instead of the PTX code when using vkCreateCudaModuleNV.

Using the binary cache instead of the original PTX code should significantly speed up initialization of the CUDA module, given that the whole compilation and validation will not be necessary.

As with VkPipelineCache, the binary cache depends on the specific implementation. The application must assume the cache upload might fail in many circumstances and thus may have to get ready for falling back to the original PTX code if necessary. Most often, the cache may succeed if the same device driver and architecture is used between the cache generation from PTX and the use of this cache. In the event of a new driver version or if using a different device architecture, this cache may become invalid.

Valid Usage (Implicit)

VUID-vkGetCudaModuleCacheNV-device-parameter
device must be a valid VkDevice handle
VUID-vkGetCudaModuleCacheNV-module-parameter
module must be a valid VkCudaModuleNV handle
VUID-vkGetCudaModuleCacheNV-pCacheSize-parameter
pCacheSize must be a valid pointer to a size_t value
VUID-vkGetCudaModuleCacheNV-pCacheData-parameter
If the value referenced by pCacheSize is not 0, and pCacheData is not NULL, pCacheData must be a valid pointer to an array of pCacheSize bytes
VUID-vkGetCudaModuleCacheNV-module-parent
module must have been created, allocated, or retrieved from device

Return Codes

Success

VK_SUCCESS
VK_INCOMPLETE

Failure

VK_ERROR_INITIALIZATION_FAILED

Limitations

CUDA and Vulkan do not use the device in the same configuration. The following limitations must be taken into account:

It is not possible to read or write global parameters from Vulkan. The only way to share resources or send values to the PTX kernel is to pass them as arguments of the function. See Resources sharing between CUDA Kernel and Vulkan for more details.
No calls to functions external to the module PTX are supported.
Vulkan disables some shader/kernel exceptions, which could break CUDA kernels relying on exceptions.
CUDA kernels submitted to Vulkan are limited to the amount of shared memory, which can be queried from the physical capabilities. It may be less than what CUDA can offer.
CUDA instruction-level preemption (CILP) does not work.
CUDA Unified Memory will not work in this extension.
CUDA Dynamic parallelism is not supported.
vk*DispatchIndirect is not available.