From f3d87fb8c7e81af9dbf7514bdd2dbc5b5b575a8a Mon Sep 17 00:00:00 2001 From: Joshua Goins Date: Sat, 13 May 2023 12:54:40 -0400 Subject: [PATCH] add new blog post --- content/blog/optimizing-shader-structures.md | 272 +++++++++++++++++++ 1 file changed, 272 insertions(+) create mode 100644 content/blog/optimizing-shader-structures.md diff --git a/content/blog/optimizing-shader-structures.md b/content/blog/optimizing-shader-structures.md new file mode 100644 index 0000000..93dcb99 --- /dev/null +++ b/content/blog/optimizing-shader-structures.md @@ -0,0 +1,272 @@ +--- +title: "Optimizing and sharing shader structures" +date: "2023-05-13" +summary: "I use a lot of data structures in my shaders, including usage in push +constants and SSBOs. However, the complexity is getting out of hand!" +tags: +- Vulkan +--- + +In my engine I have a bunch of big data structures used in push constants, +shader buffers, and more. Typically, they are written for a machine first +and a human second (due to alignment, padding, and packing) which is not +ideal in my opinion. This comes with numerous issues, because the optimization +is hand-written and it's easy to create bugs due to mistyping or forgetting +alignment rules. + +Here is one such example (real code, unfortunately) for exposing different +knobs and options to one of my post-processing steps: + +```glsl +layout(push_constant) uniform PushConstant { + vec4 viewport; + vec4 options; + vec4 transform_ops; + vec4 ao_options; + vec4 ao_options2; + vec4 proj_info; + mat4 cameraProj; + mat4 invProj; +}; +``` + +Can you tell me, with full confidence, what each of these options do? _I_ +probably couldn't, and is a safe haven for bugs because it's +extremely easy to mix up accessors (e.g. `ao_options.x` and `ao_options.y`). +First, I want to explain some of the reasons why this is necessary in the +first place. + +## Alignment rules in Vulkan + +I want to give a real example that I see plenty of newer graphics programmers +run into. Say you're beginning to explore [Phong shading](https://en.wikipedia.org/wiki/Phong_shading), and you want +to expose a position and a color property so you can change them while the +program is running. + +In a 3D environment, there's three axes (x, y and z) so our first choice is +a **vec3**. Light color would also make sense as a **vec3**, because color +(when emitted) from a light can't really be "transparent". The GLSL code +would end up looking like this: + +```glsl +#version 430 + +out vec4 finalColor; + +layout(binding = 0) buffer block { + vec3 position; + vec3 color; +} light; + +void main() { + const vec3 dummy = vec3(1) - light.position; + finalColor = vec4(vec3(1.0, 1.0, 1.0) * light.color, 1.0); +} + +``` + +_(There's no actual formula or anything in here, we just want to make sure +the GLSL compiler doesn't optimize anything out.)_ + +When writing the structure on the C++ side, you would naturally write this: + +```cpp +struct Light { + glm::vec3 position; + glm::vec3 color; +} light; + +light.position = {1, 5, 0}; +light.color = {3, 2, -1}; +``` + +For this example I used the [debug printf](https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/debug_printf.md) system part of the Vulkan SDK[^1]. This allows us to get an exact reading of the +buffer as it's seen from the shader. The output is as follows: + +```bash +Position = (1.000000, 5.000000, 0.000000) +Color = (2.000000, -1.000000, 0.000000) +``` + +Surprised? You might ask why is the last bit of the vector getting chopped +off - and someone might suggest writing the C++ structure like this instead: + +```cpp +struct Light { + glm::vec4 position; + glm::vec4 color; +}; +``` + +This seems to fix the issue: + +```bash +Position = (1.000000, 5.000000, 0.000000) +Color = (3.000000, 2.000000, -1.000000) +``` + +But why does it suddenly work when we change to it a **vec4**? Fortunately the [the Vulkan specification](https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#interfaces-resources-layout) is available and tells us why: + +> The base alignment of the type of an OpTypeStruct member is defined recursively as follows: +> * A scalar has a base alignment equal to its scalar alignment. +> * A two-component vector has a base alignment equal to twice its scalar alignment. +> * **A three- or four-component vector has a base alignment equal to four times its scalar alignment.** +> * ... + + +That _third bullet point_ hits it right on the head, **vec4 and vec3 have the +_same_ alignment**, which you can also achieve by writing this: + +```cpp +struct Light { + glm::vec3 color; + alignas(16) glm::vec3 position; +}; +``` + +There's a bunch of more nitty and dirty alignment issues that stem from +differences between C++ and GLSL, this is just an example of one of them. +These are esoteric in my opinion, and it gets even harder to write decent +structures meant for humans - who are usually the ones writing shaders! + +--- + +Another great example of odd cases of shader code not working when expected +is this shader block. Take a look at this four bool structure, which seems okay at +first glance: + +```cpp +struct TestBuffer { + bool a = false; + bool b = true; + bool c = false; + bool d = true; +}; +``` + +```glsl +layout(binding = 0) buffer readonly TestBuffer { + bool a, b, c, d; +}; +``` + +Oh wait... no, it's not actually okay: + +```bash +a = 1, b = 0, c = 0, d = 0 +``` + +I'm not exactly sure why it doesn't work and if anyone knows, please let me know. +It seems to be because [SPIR-V doesn't seem to define a physical +size for bool](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpTypeBool), +so I'm not sure what it's represented as. Changing them to integers works +though. + +Of course some might say this is a non-problem, because +_"Just use integers! they're just booleans!"_. I disagree, booleans and integers +are very different semantically for humans but of course less so for computers. You can also +pack a lot of booleans into the space of one 32-bit integer, which could be +a possible space-saving optimization. + +## Sharing structures + +One of the other problems I get annoyed with is keeping the structures +in sync, there's usually one (or many!) instances of the structure written in +C++ and many in GLSL. I even went through some of my shaders, and discovered +instances where I updated the structure in only some places and not others. +This is problematic because member order could change, meaning the structure +itself could be undefined (and can also easily escape notice, depending on +the shader is used). + +Having just _one_ definition for all of my shaders and C++ would be a huge +improvement, even if I still had to pack and optimize manually. + +## StructCompiler + +What I ended up with is a new pre-processing step, called the **StructCompiler**. +I tried looking around on Google, and couldn't find anything similar - so I +don't know if this tool is actually unnecessary (maybe developers are instead +just pulling struct information from shader reflection?) but I did have a lot of +fun making it anyway. + +It's goals are: +* Be able to define the shader structures in one, centralized file. +* Structures should be able to be written on a higher-level, +allowing us to decouple the actual member order, alignment and packing from +the logic. +* The structure can be reused in GLSL and C++. + +First you write a `.struct` file. Here's the same, ugly post-processing +structure shown in the beginning, but now written in struct syntax[^2]: + +```glsl +primary PostPushConstant { + viewport: vec4 + camera_proj: mat4 + inv_proj: mat4 + inv_view: mat4 + + enable_aa: bool + enable_dof: bool + + exposure: float + display_color_space: int + tonemapping: int + + ao_radius: float + ao_r2: float + ao_rneginvr2: float + ao_rdotvbias: float + ao_intensity: float + ao_bias: float +} +``` + +This one looks **much better**, doesn't it? Even without knowing anything +else about the actual shader, you can guess which options do what +with some accuracy. Here's what it looks like when compiled to C++: + +```cpp +struct PostPushConstant { + glm::mat4 camera_proj; + glm::mat4 inv_proj; + glm::mat4 inv_view; + glm::vec4 viewport; + glm::ivec4 enable_aa_enable_dof_display_color_space_tonemapping_; + glm::vec4 exposure_ao_radius_ao_r2_ao_rneginvr2_; + glm::vec4 ao_rdotvbias_ao_intensity_ao_bias_; + ... +}; +``` + +_(Setters like `set_exposure()` are used instead of accessing the glm::vec4 manually.)_ + +As I said before, the goal is to write it in a higher-level language which +can then be ruthlessly optimized without worry. The optimization is basic right +now, but it performs the same packing I did before by hand. Usage in GLSL is also easy: + +```glsl +#use_struct(push_constant, post, post_push_constant) +``` + +_(The syntax could use some work, but the first argument is usage. +The second argument is the name of the struct, and the third argument is a unique name.)_ + +Since the member order and names are undefined, you must access the members by +a getter in GLSL. I think this is a worthwhile trade-off for more readable code, and +the compiler should optimize these away anyway. + +```glsl +vec3 ao_result = pow(ao, ao_intensity()) +``` + +This tool runs as a pre-processing step +in my offline shader system, but the struct files are copied into the runtime directory because +the runtime shaders also use them + +The source code is [available here](https://git.sr.ht/~redstrate/structcompiler), which is just ripped from my engine tree. It's +quickly written, but it's already working and I have replaced all of my large structures already! I'm pretty happy with how this tool turned out, and I can't wait to explore how I can expand on this more. + +[^1]: The debug printf, along with detailed examples of alignment mishaps is definitely future Graphics Dump material! + +[^2]: The syntax looks eerily similar to Rust, which was intentional :-)