Crash Diagnostic Layer
The Crash Diagnostic Layer (CDL) is a Vulkan layer to help track down and identify the cause of GPU hangs and crashes. It works by instrumenting command buffers with completion checkpoints. When an error is detected a dump file containing incomplete command buffers is written. Often the last complete or incomplete commands are responsible for the crash.
Building
See the associated BUILD.md file for details on how to build the Crash Diagnostic Layer for various platforms.
Runtime requirements
CDL uses the following extensions. If an extension is not present, some functionality might be disabled.
VK_KHR_timeline_semaphore
(or Vulkan 1.2) - used to track forward progress of queue submissionsVK_NV_device_diagnostic_commands
- used to track forward progress within command buffersVK_AMD_buffer_marker
- used to track forward progress within command buffers (if the NV extension is not present) and the state of binary semaphores.VK_AMD_device_coherent_memory
- improves the accuracy of VK_AMD_buffer_marker trackingVK_EXT_device_fault
- allows drivers to report data related to aVK_ERROR_DEVICE_LOST
, such as invalid device addresses.VK_EXT_device_address_binding_report
- allows drivers to report lifecycle information about device address assignments, which can help identify the source of device faults.
Running
CDL can be used as an explicit or implicit layer. The loader's documentation describes the difference between implicit and explicit layers, but the relevant bit here is that implicit layers are meant to be available to all applications on the system, even if the application doesn't explicitly enable the layer. On the other hand, explicit layers are easier to use when doing application development.
Explicit Layer
To use CDL as an explicit layer, enable it with vkconfig
or do the following:
- The directory containing the file
VkLayer_crash_diagnostic.json
is included in the layer search path, by including it in either theVK_LAYER_PATH
orVK_ADD_LAYER_PATH
environment variable or by usingvkconfig
. - Include the layer name in the VK_INSTANCE_LAYERS environment variable or the ppEnabledLayerNames field of VkInstanceCreateInfo.
Implicit Layer
When using CDL as an implicit layer, it is disabled by default. To enable the layer's functionality, set the CDL_ENABLE
environment variable to 1.
Registering the Layer
In order to be discovered by the Vulkan loader at runtime, implicit layers must be registered. The registration process is platform-specific and is discussed in detail in the Vulkan-Loader documentation. In all cases, it is the layer manifest (the .json file) that is registered; the manifest contains a relative path to the layer library, which can be in a separate directory.
Basic Usage
Once the layer is enabled, if vkQueueSubmit()
or other Vulkan functions returns a fatal error code (usually VK_ERROR_DEVICE_LOST
), a dump file of the command buffers (and other state) that failed to execute are written to disk.
Logging
CDL outputs messages at runtime to indicate it is enabled, to trace certain commands, and to report when a gpu crash has occurred. The message_severity
, log_file
, trace_on
and trace_all_semaphores
configuration settings control which messages are output and where they are sent.
VK_EXT_debug_utils
and VK_EXT_debug_report
extensions can also be used to receive log messages directly in the application.
Dump files
A unique log file directory is created every time an application is run with CDL enabled. The log file directories are named with the timestamp of the format YYYY-MM-DD-HHMMSS. This prevents subsequent runs from overwriting old data, in case the application is non-deterministic and producing different crashes on each run. The location of these files is platform-specific:
- Linux:
${HOME}/cdl/
- Windows:
%USERPROFILE%\cdl\
The output directory for dump files can be changed using the output_path
configuration setting, described below.
The name of the dump file is cdl_dump.yaml
, since the file is in the YAML format. Other files, such as dumped shaders, may be present in the dump directory.
Configuration
This layer implements the VK_EXT_layer_settings
extension, so it can be configured with vkconfig
, programmatically, or with environment variables. See the manifest for this layer and the layer manifest schema for full details. The discussion below uses the key
field to identify each setting, the env
field defines the corresponding environment variable, and the label
field defines the text you will see when using vkconfig
.
watchdog_timeout_ms
can be set to enable a watchdog thread that monitors the time between queue submissions. If this timeout value (specified in milliseconds) is hit without a queue submission, the gpu is assumed to be crashed and a dump file is created.- Dump files
output_path
can be set to override the directory where log files and shader binaries are written. This can be a full path (starting with/
or a drive letter) or a path relative to the application current working directory.dump_queue_submissions
controls which queue submissions are dumped.running
causes only the submission currently executing to be dumped.pending
will also dump any submissions that have not started execution.dump_command_buffers
controls which command buffers are dumped.running
causes only the command buffer currently executing to be dumped.pending
will also dump any command buffers that have not started execution.all
will dump all known command buffers.dump_commands
controls which commands are dumped.running
causes only the commands currently executing to be dumped.pending
will also dump any commands that have not started execution.all
will dump all commands in the command buffer.dump_shaders
controls if shaders are included in the dump directory. Possible values for this setting are:off
- no output,on_crash
- only dump the shaders that are bound at the time of a gpu crash,on_bind
- dump shaders only when they are bound, andall
- dump all shaders as soon as they are created.
- Logging
message_severity
can be set to a comma-separated list of the types of messages CDL should output to the default logger. Application defined loggers should control which messages they want to recieve with the options available in theVK_EXT_debug_utils
orVK_EXT_debug_report
extensions.log_file
can be set to control where log messages are sent by the default logger. There are several special values.stderr
andstdout
send messages to the application console.none
disables the default logger. Any other value is assumed to be an absolute or relative path to the log file.trace_on
can be enabled to cause important commands, such asvkQueueSubmit
to be logged as they occur.trace_all_semaphores
enables logging messages about every vulkan command that uses semaphores.
- State tracking
sync_after_commands
adds a pipeline barrier after every instrumented vulkan command. This will reduce performance and may cause some hangs to go away. This option currently only works when usingVK_KHR_dynamic_rendering
instrument_all_commands
can be enabled to include completion markers around every vulkan command. This may allow more accuratute fault locations at the expense of larger command buffers and reduced performance.track_semaphores
enables detailed semaphore state reporting in runtime logging and dump files.VK_AMD_buffer_marker
is required for this feature.
Layer Details
Layer Properties
- API Version: 1.3.296
- Implementation Version: 1
- Layer Manifest: VkLayer_crash_diagnostic.json
- File Format: 1.2.0
- Layer Binary: libVkLayer_crash_diagnostic.so
- Platforms: WINDOWS, LINUX
- Status: ALPHA
- Number of Layer Settings: 4
Layer Settings Overview
Setting | Type | Default Value | vk_layer_settings.txt Variable | Environment Variable | Platforms |
---|---|---|---|---|---|
Watchdog timeout (ms) | INT | 30000 | lunarg_crash_diagnostic.watchdog_timeout_ms | CDL_WATCHDOG_TIMEOUT_MS | WINDOWS, LINUX, MACOS, ANDROID |
Output Path | STRING | lunarg_crash_diagnostic.output_path | CDL_OUTPUT_PATH | WINDOWS, LINUX, MACOS, ANDROID | |
Dump queue submissions | ENUM | running | lunarg_crash_diagnostic.dump_queue_submits | CDL_DUMP_QUEUE_SUBMITS | WINDOWS, LINUX, MACOS, ANDROID |
Dump command buffers | ENUM | running | lunarg_crash_diagnostic.dump_command_buffers | CDL_DUMP_COMMAND_BUFFERS | WINDOWS, LINUX, MACOS, ANDROID |
Dump commands | ENUM | running | lunarg_crash_diagnostic.dump_commands | CDL_DUMP_COMMANDS | WINDOWS, LINUX, MACOS, ANDROID |
Dump shaders | ENUM | off | lunarg_crash_diagnostic.dump_shaders | CDL_DUMP_SHADERS | WINDOWS, LINUX, MACOS, ANDROID |
Message Severity | FLAGS | error | lunarg_crash_diagnostic.message_severity | VK_LUNARG_CRASH_DIAGNOSTIC_MESSAGE_SEVERITY | WINDOWS, LINUX, MACOS, ANDROID |
Log file name | STRING | stderr | lunarg_crash_diagnostic.log_file | CDL_LOG_FILE | WINDOWS, LINUX, MACOS |
Enable Tracing | BOOL | false | lunarg_crash_diagnostic.trace_on | CDL_TRACE_ON | WINDOWS, LINUX, MACOS, ANDROID |
Enable semaphore log tracing. | BOOL | false | lunarg_crash_diagnostic.trace_all_semaphores | CDL_TRACE_ALL_SEMAPHORES | WINDOWS, LINUX, MACOS, ANDROID |
Synchronize commands | BOOL | false | lunarg_crash_diagnostic.sync_after_commands | CDL_SYNC_AFTER_COMMANDS | WINDOWS, LINUX, MACOS, ANDROID |
Instrument all commands | BOOL | false | lunarg_crash_diagnostic.instrument_all_commands | CDL_INSTRUMENT_ALL_COMMANDS | WINDOWS, LINUX, MACOS, ANDROID |
Track semaphores | BOOL | true | lunarg_crash_diagnostic.track_semaphores | CDL_TRACK_SEMAPHORES | WINDOWS, LINUX, MACOS, ANDROID |
Layer Settings Details
Watchdog timeout (ms)(ALPHA)
If set to a non-zero number, a watchdog thread will be created. This will trigger if the application fails to submit new commands within a set time (in milliseconds) and a log will be created as if the a lost device error was encountered.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.watchdog_timeout_ms
- Environment Variable: CDL_WATCHDOG_TIMEOUT_MS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: INT
- Setting Default Value: 30000
Output Path(ALPHA)
The directory where dump files and shader binaries are written.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.output_path
- Environment Variable: CDL_OUTPUT_PATH
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: STRING
- Setting Default Value:
Dump queue submissions(ALPHA)
Control which queue submissions are included in the dump.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.dump_queue_submits
- Environment Variable: CDL_DUMP_QUEUE_SUBMITS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: ENUM
- Setting Default Value: running
Enum Value | Label | Description | Platforms |
---|---|---|---|
running | Running | Submissions that were executing at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
pending | Pending | Submissions executing or pending at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
Dump command buffers(ALPHA)
Control which command buffers are included in the dump.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.dump_command_buffers
- Environment Variable: CDL_DUMP_COMMAND_BUFFERS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: ENUM
- Setting Default Value: running
Enum Value | Label | Description | Platforms |
---|---|---|---|
running | Running | Command Buffers that were executing at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
pending | Pending | Command Buffers executing or pending at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
all | All | All known Command Buffers. | WINDOWS, LINUX, MACOS, ANDROID |
Dump commands(ALPHA)
Control which commands are included in the dump.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.dump_commands
- Environment Variable: CDL_DUMP_COMMANDS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: ENUM
- Setting Default Value: running
Enum Value | Label | Description | Platforms |
---|---|---|---|
running | Running | Command Buffers that were executing at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
pending | Pending | Commands executing or pending at the time of the dump. | WINDOWS, LINUX, MACOS, ANDROID |
all | All | All known Commands. | WINDOWS, LINUX, MACOS, ANDROID |
Dump shaders(ALPHA)
Control of shader dumping.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.dump_shaders
- Environment Variable: CDL_DUMP_SHADERS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: ENUM
- Setting Default Value: off
Enum Value | Label | Description | Platforms |
---|---|---|---|
off | Off | Never dump shaders. | WINDOWS, LINUX, MACOS, ANDROID |
on_crash | On Crash | Dump currently bound shaders after a crash is detected. | WINDOWS, LINUX, MACOS, ANDROID |
on_bind | On Bind | Dump only bound shaders. | WINDOWS, LINUX, MACOS, ANDROID |
all | All | Dump all shaders. | WINDOWS, LINUX, MACOS, ANDROID |
Message Severity(ALPHA)
Comma-delineated list of options specifying the types of log messages to be reported
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.message_severity
- Environment Variable: VK_LUNARG_CRASH_DIAGNOSTIC_MESSAGE_SEVERITY
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: FLAGS
- Setting Default Value: error
Flags | Label | Description | Platforms |
---|---|---|---|
error | Error | Report errors such as device lost or setup problems in the layer. | WINDOWS, LINUX, MACOS, ANDROID |
warn | Warning | Report non-fatal problems that may interfere with operation of the layer | WINDOWS, LINUX, MACOS, ANDROID |
info | Info | Report informational messages. | WINDOWS, LINUX, MACOS, ANDROID |
verbose | Verbose | For layer development. Report messages for debugging layer behavior. | WINDOWS, LINUX, MACOS, ANDROID |
Log file name(ALPHA)
none = no logging, stderr or stdout = to the console, otherwise an absolute or relative path
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.log_file
- Environment Variable: CDL_LOG_FILE
- Platforms: WINDOWS, LINUX, MACOS
- Setting Type: STRING
- Setting Default Value: stderr
Enable Tracing(ALPHA)
All Vulkan API calls intercepted by the layer will be logged to the console.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.trace_on
- Environment Variable: CDL_TRACE_ON
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: BOOL
- Setting Default Value: false
Enable semaphore log tracing.(ALPHA)
Semaphore events will be logged to console.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.trace_all_semaphores
- Environment Variable: CDL_TRACE_ALL_SEMAPHORES
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: BOOL
- Setting Default Value: false
Synchronize commands(ALPHA)
Add pipeline barriers after instrumented commands.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.sync_after_commands
- Environment Variable: CDL_SYNC_AFTER_COMMANDS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: BOOL
- Setting Default Value: false
Instrument all commands(ALPHA)
All commands will be instrumented.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.instrument_all_commands
- Environment Variable: CDL_INSTRUMENT_ALL_COMMANDS
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: BOOL
- Setting Default Value: false
Track semaphores(ALPHA)
Enable semaphore tracking.
Setting Properties:
- vk_layer_settings.txt Variable: lunarg_crash_diagnostic.track_semaphores
- Environment Variable: CDL_TRACK_SEMAPHORES
- Platforms: WINDOWS, LINUX, MACOS, ANDROID
- Setting Type: BOOL
- Setting Default Value: true