From f3d87fb8c7e81af9dbf7514bdd2dbc5b5b575a8a Mon Sep 17 00:00:00 2001
From: Joshua Goins <josh@redstrate.com>
Date: Sat, 13 May 2023 12:54:40 -0400
Subject: [PATCH] add new blog post

---
 content/blog/optimizing-shader-structures.md | 272 +++++++++++++++++++
 1 file changed, 272 insertions(+)
 create mode 100644 content/blog/optimizing-shader-structures.md

diff --git a/content/blog/optimizing-shader-structures.md b/content/blog/optimizing-shader-structures.md
new file mode 100644
index 0000000..93dcb99
--- /dev/null
+++ b/content/blog/optimizing-shader-structures.md
@@ -0,0 +1,272 @@
+---
+title: "Optimizing and sharing shader structures"
+date: "2023-05-13"
+summary: "I use a lot of data structures in my shaders, including usage in push
+constants and SSBOs. However, the complexity is getting out of hand!"
+tags:
+- Vulkan
+---
+
+In my engine I have a bunch of big data structures used in push constants,
+shader buffers, and more. Typically, they are written for a machine first
+and a human second (due to alignment, padding, and packing) which is not
+ideal in my opinion. This comes with numerous issues, because the optimization
+is hand-written and it's easy to create bugs due to mistyping or forgetting
+alignment rules.
+
+Here is one such example (real code, unfortunately) for exposing different
+knobs and options to one of my post-processing steps:
+
+```glsl
+layout(push_constant) uniform PushConstant {
+    vec4 viewport;
+    vec4 options;
+    vec4 transform_ops;
+    vec4 ao_options;
+    vec4 ao_options2;
+    vec4 proj_info;
+    mat4 cameraProj;
+    mat4 invProj;
+};
+```
+
+Can you tell me, with full confidence, what each of these options do? _I_
+probably couldn't, and is a safe haven for bugs because it's
+extremely easy to mix up accessors (e.g. `ao_options.x` and `ao_options.y`).
+First, I want to explain some of the reasons why this is necessary in the
+first place.
+
+## Alignment rules in Vulkan
+
+I want to give a real example that I see plenty of newer graphics programmers
+run into. Say you're beginning to explore [Phong shading](https://en.wikipedia.org/wiki/Phong_shading), and you want
+to expose a position and a color property so you can change them while the
+program is running.
+
+In a 3D environment, there's three axes (x, y and z) so our first choice is
+a **vec3**. Light color would also make sense as a **vec3**, because color
+(when emitted) from a light can't really be "transparent". The GLSL code
+would end up looking like this:
+
+```glsl
+#version 430
+
+out vec4 finalColor;
+
+layout(binding = 0) buffer block {
+    vec3 position;
+    vec3 color;
+} light;
+
+void main() {
+    const vec3 dummy = vec3(1) - light.position;
+    finalColor = vec4(vec3(1.0, 1.0, 1.0) * light.color, 1.0);
+}
+
+```
+
+_(There's no actual formula or anything in here, we just want to make sure
+the GLSL compiler doesn't optimize anything out.)_
+
+When writing the structure on the C++ side, you would naturally write this:
+
+```cpp
+struct Light {
+    glm::vec3 position;
+    glm::vec3 color;
+} light;
+
+light.position = {1, 5, 0};
+light.color = {3, 2, -1};
+```
+
+For this example I used the [debug printf](https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/main/docs/debug_printf.md) system part of the Vulkan SDK[^1]. This allows us to get an exact reading of the
+buffer as it's seen from the shader. The output is as follows:
+
+```bash
+Position = (1.000000, 5.000000, 0.000000)
+Color = (2.000000, -1.000000, 0.000000)
+```
+
+Surprised? You might ask why is the last bit of the vector getting chopped
+off - and someone might suggest writing the C++ structure like this instead:
+
+```cpp
+struct Light {
+    glm::vec4 position;
+    glm::vec4 color;
+};
+```
+
+This seems to fix the issue:
+
+```bash
+Position = (1.000000, 5.000000, 0.000000)
+Color = (3.000000, 2.000000, -1.000000)
+```
+
+But why does it suddenly work when we change to it a **vec4**? Fortunately the [the Vulkan specification](https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#interfaces-resources-layout) is available and tells us why:
+
+> The base alignment of the type of an OpTypeStruct member is defined recursively as follows:
+> * A scalar has a base alignment equal to its scalar alignment.
+> * A two-component vector has a base alignment equal to twice its scalar alignment.
+> * **A three- or four-component vector has a base alignment equal to four times its scalar alignment.**
+> * ...
+
+
+That _third bullet point_ hits it right on the head, **vec4 and vec3 have the
+_same_ alignment**, which you can also achieve by writing this:
+
+```cpp
+struct Light {
+    glm::vec3 color;
+    alignas(16) glm::vec3 position;
+};
+```
+
+There's a bunch of more nitty and dirty alignment issues that stem from
+differences between C++ and GLSL, this is just an example of one of them.
+These are esoteric in my opinion, and it gets even harder to write decent
+structures meant for humans - who are usually the ones writing shaders!
+
+---
+
+Another great example of odd cases of shader code not working when expected
+is this shader block. Take a look at this four bool structure, which seems okay at
+first glance:
+
+```cpp
+struct TestBuffer {
+    bool a = false;
+    bool b = true;
+    bool c = false;
+    bool d = true;
+};
+```
+
+```glsl
+layout(binding = 0) buffer readonly TestBuffer {
+    bool a, b, c, d;
+};
+```
+
+Oh wait... no, it's not actually okay:
+
+```bash
+a = 1, b = 0, c = 0, d = 0
+```
+
+I'm not exactly sure why it doesn't work and if anyone knows, please let me know.
+It seems to be because [SPIR-V doesn't seem to define a physical
+size for bool](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpTypeBool),
+so I'm not sure what it's represented as. Changing them to integers works
+though.
+
+Of course some might say this is a non-problem, because
+_"Just use integers! they're just booleans!"_. I disagree, booleans and integers
+are very different semantically for humans but of course less so for computers. You can also
+pack a lot of booleans into the space of one 32-bit integer, which could be
+a possible space-saving optimization.
+
+## Sharing structures
+
+One of the other problems I get annoyed with is keeping the structures
+in sync, there's usually one (or many!) instances of the structure written in
+C++ and many in GLSL. I even went through some of my shaders, and discovered
+instances where I updated the structure in only some places and not others.
+This is problematic because member order could change, meaning the structure
+itself could be undefined (and can also easily escape notice, depending on
+the shader is used).
+
+Having just _one_ definition for all of my shaders and C++ would be a huge
+improvement, even if I still had to pack and optimize manually.
+
+## StructCompiler
+
+What I ended up with is a new pre-processing step, called the **StructCompiler**.
+I tried looking around on Google, and couldn't find anything similar - so I
+don't know if this tool is actually unnecessary (maybe developers are instead
+just pulling struct information from shader reflection?) but I did have a lot of
+fun making it anyway.
+
+It's goals are:
+* Be able to define the shader structures in one, centralized file.
+* Structures should be able to be written on a higher-level,
+allowing us to decouple the actual member order, alignment and packing from
+the logic.
+* The structure can be reused in GLSL and C++.
+
+First you write a `.struct` file. Here's the same, ugly post-processing
+structure shown in the beginning, but now written in struct syntax[^2]:
+
+```glsl
+primary PostPushConstant {
+    viewport: vec4
+    camera_proj: mat4
+    inv_proj: mat4
+    inv_view: mat4
+
+    enable_aa: bool
+    enable_dof: bool
+
+    exposure: float
+    display_color_space: int
+    tonemapping: int
+
+    ao_radius: float
+    ao_r2: float
+    ao_rneginvr2: float
+    ao_rdotvbias: float
+    ao_intensity: float
+    ao_bias: float
+}
+```
+
+This one looks **much better**, doesn't it? Even without knowing anything
+else about the actual shader, you can guess which options do what
+with some accuracy. Here's what it looks like when compiled to C++:
+
+```cpp
+struct PostPushConstant {
+    glm::mat4 camera_proj;
+    glm::mat4 inv_proj;
+    glm::mat4 inv_view;
+    glm::vec4 viewport;
+    glm::ivec4 enable_aa_enable_dof_display_color_space_tonemapping_;
+    glm::vec4 exposure_ao_radius_ao_r2_ao_rneginvr2_;
+    glm::vec4 ao_rdotvbias_ao_intensity_ao_bias_;
+    ...
+};
+```
+
+_(Setters like `set_exposure()` are used instead of accessing the glm::vec4 manually.)_
+
+As I said before, the goal is to write it in a higher-level language which
+can then be ruthlessly optimized without worry. The optimization is basic right
+now, but it performs the same packing I did before by hand. Usage in GLSL is also easy:
+
+```glsl
+#use_struct(push_constant, post, post_push_constant)
+```
+
+_(The syntax could use some work, but the first argument is usage.
+The second argument is the name of the struct, and the third argument is a unique name.)_
+
+Since the member order and names are undefined, you must access the members by
+a getter in GLSL. I think this is a worthwhile trade-off for more readable code, and
+the compiler should optimize these away anyway.
+
+```glsl
+vec3 ao_result = pow(ao, ao_intensity())
+```
+
+This tool runs as a pre-processing step
+in my offline shader system, but the struct files are copied into the runtime directory because
+the runtime shaders also use them
+
+The source code is [available here](https://git.sr.ht/~redstrate/structcompiler), which is just ripped from my engine tree. It's
+quickly written, but it's already working and I have replaced all of my large structures already! I'm pretty happy with how this tool turned out, and I can't wait to explore how I can expand on this more.
+
+[^1]: The debug printf, along with detailed examples of alignment mishaps is definitely future Graphics Dump material!
+
+[^2]: The syntax looks eerily similar to Rust, which was intentional :-)