--- title: "Optimizing and sharing shader structures" date: "2023-05-13" summary: "I use a lot of data structures in my shaders, including usage in push constants and SSBOs. However, the complexity is getting out of hand!" tags: - Vulkan --- In my engine I have a bunch of big data structures used in push constants, shader buffers, and more. Typically, they are written for a machine first and a human second (due to alignment, padding, and packing) which is not ideal in my opinion. This comes with numerous issues, because the optimization is hand-written and it's easy to create bugs due to mistyping or forgetting alignment rules. Here is one such example (real code, unfortunately) for exposing different knobs and options to one of my post-processing steps: ```glsl layout(push_constant) uniform PushConstant { vec4 viewport; vec4 options; vec4 transform_ops; vec4 ao_options; vec4 ao_options2; vec4 proj_info; mat4 cameraProj; mat4 invProj; }; ``` Can you tell me, with full confidence, what each of these options do? _I_ probably couldn't, and is a safe haven for bugs because it's extremely easy to mix up accessors (e.g. `ao_options.x` and `ao_options.y`). First, I want to explain some of the reasons why this is necessary in the first place. ## Alignment rules in Vulkan I want to give a real example that I see plenty of newer graphics programmers run into. Say you're beginning to explore [Phong shading](https://en.wikipedia.org/wiki/Phong_shading), and you want to expose a position and a color property so you can change them while the program is running. In a 3D environment, there's three axes (x, y and z) so our first choice is a **vec3**. Light color would also make sense as a **vec3**, because color (when emitted) from a light can't really be "transparent". The GLSL code would end up looking like this: ```glsl #version 430 out vec4 finalColor; layout(binding = 0) buffer block { vec3 position; vec3 color; } light; void main() { const vec3 dummy = vec3(1) - light.position; finalColor = vec4(vec3(1.0, 1.0, 1.0) * light.color, 1.0); } ``` _(There's no actual formula or anything in here, we just want to make sure the GLSL compiler doesn't optimize anything out.)_ When writing the structure on the C++ side, you would naturally write this: ```cpp struct Light { glm::vec3 position; glm::vec3 color; } light; light.position = {1, 5, 0}; light.color = {3, 2, -1}; ``` For this example I used the [debug printf](https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/debug_printf.md) system part of the Vulkan SDK[^1]. This allows us to get an exact reading of the buffer as it's seen from the shader. The output is as follows: ```bash Position = (1.000000, 5.000000, 0.000000) Color = (2.000000, -1.000000, 0.000000) ``` Surprised? You might ask why is the last bit of the vector getting chopped off - and someone might suggest writing the C++ structure like this instead: ```cpp struct Light { glm::vec4 position; glm::vec4 color; }; ``` This seems to fix the issue: ```bash Position = (1.000000, 5.000000, 0.000000) Color = (3.000000, 2.000000, -1.000000) ``` But why does it suddenly work when we change to it a **vec4**? Fortunately the [the Vulkan specification](https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#interfaces-resources-layout) is available and tells us why: > The base alignment of the type of an OpTypeStruct member is defined recursively as follows: > * A scalar has a base alignment equal to its scalar alignment. > * A two-component vector has a base alignment equal to twice its scalar alignment. > * **A three- or four-component vector has a base alignment equal to four times its scalar alignment.** > * ... That _third bullet point_ hits it right on the head, **vec4 and vec3 have the _same_ alignment**, which you can also achieve by writing this: ```cpp struct Light { glm::vec3 color; alignas(16) glm::vec3 position; }; ``` There's a bunch of more nitty and dirty alignment issues that stem from differences between C++ and GLSL, this is just an example of one of them. These are esoteric in my opinion, and it gets even harder to write decent structures meant for humans - who are usually the ones writing shaders! --- Another great example of odd cases of shader code not working when expected is this shader block. Take a look at this four bool structure, which seems okay at first glance: ```cpp struct TestBuffer { bool a = false; bool b = true; bool c = false; bool d = true; }; ``` ```glsl layout(binding = 0) buffer readonly TestBuffer { bool a, b, c, d; }; ``` Oh wait... no, it's not actually okay: ```bash a = 1, b = 0, c = 0, d = 0 ``` I'm not exactly sure why it doesn't work and if anyone knows, please let me know. It seems to be because [SPIR-V doesn't seem to define a physical size for bool](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpTypeBool), so I'm not sure what it's represented as. Changing them to integers works though. Of course some might say this is a non-problem, because _"Just use integers! they're just booleans!"_. I disagree, booleans and integers are very different semantically for humans but of course less so for computers. You can also pack a lot of booleans into the space of one 32-bit integer, which could be a possible space-saving optimization. ## Sharing structures One of the other problems I get annoyed with is keeping the structures in sync, there's usually one (or many!) instances of the structure written in C++ and many in GLSL. I even went through some of my shaders, and discovered instances where I updated the structure in only some places and not others. This is problematic because member order could change, meaning the structure itself could be undefined (and can also easily escape notice, depending on the shader is used). Having just _one_ definition for all of my shaders and C++ would be a huge improvement, even if I still had to pack and optimize manually. ## StructCompiler What I ended up with is a new pre-processing step, called the **StructCompiler**. I tried looking around on Google, and couldn't find anything similar - so I don't know if this tool is actually unnecessary (maybe developers are instead just pulling struct information from shader reflection?) but I did have a lot of fun making it anyway. It's goals are: * Be able to define the shader structures in one, centralized file. * Structures should be able to be written on a higher-level, allowing us to decouple the actual member order, alignment and packing from the logic. * The structure can be reused in GLSL and C++. First you write a `.struct` file. Here's the same, ugly post-processing structure shown in the beginning, but now written in struct syntax[^2]: ```glsl primary PostPushConstant { viewport: vec4 camera_proj: mat4 inv_proj: mat4 inv_view: mat4 enable_aa: bool enable_dof: bool exposure: float display_color_space: int tonemapping: int ao_radius: float ao_r2: float ao_rneginvr2: float ao_rdotvbias: float ao_intensity: float ao_bias: float } ``` This one looks **much better**, doesn't it? Even without knowing anything else about the actual shader, you can guess which options do what with some accuracy. Here's what it looks like when compiled to C++: ```cpp struct PostPushConstant { glm::mat4 camera_proj; glm::mat4 inv_proj; glm::mat4 inv_view; glm::vec4 viewport; glm::ivec4 enable_aa_enable_dof_display_color_space_tonemapping_; glm::vec4 exposure_ao_radius_ao_r2_ao_rneginvr2_; glm::vec4 ao_rdotvbias_ao_intensity_ao_bias_; ... }; ``` _(Setters like `set_exposure()` are used instead of accessing the glm::vec4 manually.)_ As I said before, the goal is to write it in a higher-level language which can then be ruthlessly optimized without worry. The optimization is basic right now, but it performs the same packing I did before by hand. Usage in GLSL is also easy: ```glsl #use_struct(push_constant, post, post_push_constant) ``` _(The syntax could use some work, but the first argument is usage. The second argument is the name of the struct, and the third argument is a unique name.)_ Since the member order and names are undefined, you must access the members by a getter in GLSL. I think this is a worthwhile trade-off for more readable code, and the compiler should optimize these away anyway. ```glsl vec3 ao_result = pow(ao, ao_intensity()) ``` This tool runs as a pre-processing step in my offline shader system, but the struct files are copied into the runtime directory because the runtime shaders also use them The source code is [available here](https://git.sr.ht/~redstrate/structcompiler), which is just ripped from my engine tree. It's quickly written, but it's already working and I have replaced all of my large structures already! I'm pretty happy with how this tool turned out, and I can't wait to explore how I can expand on this more. [^1]: The debug printf, along with detailed examples of alignment mishaps is definitely future Graphics Dump material! [^2]: The syntax looks eerily similar to Rust, which was intentional :-)