redstrate.com/content/blog/optimizing-shader-structures.md
2023-05-13 12:55:13 -04:00

9.1 KiB

title date summary tags
Optimizing and sharing shader structures 2023-05-13 I use a lot of data structures in my shaders, including usage in push constants and SSBOs. However, the complexity is getting out of hand!
Vulkan

In my engine I have a bunch of big data structures used in push constants, shader buffers, and more. Typically, they are written for a machine first and a human second (due to alignment, padding, and packing) which is not ideal in my opinion. This comes with numerous issues, because the optimization is hand-written and it's easy to create bugs due to mistyping or forgetting alignment rules.

Here is one such example (real code, unfortunately) for exposing different knobs and options to one of my post-processing steps:

layout(push_constant) uniform PushConstant {
    vec4 viewport;
    vec4 options;
    vec4 transform_ops;
    vec4 ao_options;
    vec4 ao_options2;
    vec4 proj_info;
    mat4 cameraProj;
    mat4 invProj;
};

Can you tell me, with full confidence, what each of these options do? I probably couldn't, and is a safe haven for bugs because it's extremely easy to mix up accessors (e.g. ao_options.x and ao_options.y). First, I want to explain some of the reasons why this is necessary in the first place.

Alignment rules in Vulkan

I want to give a real example that I see plenty of newer graphics programmers run into. Say you're beginning to explore Phong shading, and you want to expose a position and a color property so you can change them while the program is running.

In a 3D environment, there's three axes (x, y and z) so our first choice is a vec3. Light color would also make sense as a vec3, because color (when emitted) from a light can't really be "transparent". The GLSL code would end up looking like this:

#version 430

out vec4 finalColor;

layout(binding = 0) buffer block {
    vec3 position;
    vec3 color;
} light;

void main() {
    const vec3 dummy = vec3(1) - light.position;
    finalColor = vec4(vec3(1.0, 1.0, 1.0) * light.color, 1.0);
}

(There's no actual formula or anything in here, we just want to make sure the GLSL compiler doesn't optimize anything out.)

When writing the structure on the C++ side, you would naturally write this:

struct Light {
    glm::vec3 position;
    glm::vec3 color;
} light;

light.position = {1, 5, 0};
light.color = {3, 2, -1};

For this example I used the debug printf system part of the Vulkan SDK1. This allows us to get an exact reading of the buffer as it's seen from the shader. The output is as follows:

Position = (1.000000, 5.000000, 0.000000)
Color = (2.000000, -1.000000, 0.000000)

Surprised? You might ask why is the last bit of the vector getting chopped off - and someone might suggest writing the C++ structure like this instead:

struct Light {
    glm::vec4 position;
    glm::vec4 color;
};

This seems to fix the issue:

Position = (1.000000, 5.000000, 0.000000)
Color = (3.000000, 2.000000, -1.000000)

But why does it suddenly work when we change to it a vec4? Fortunately the the Vulkan specification is available and tells us why:

The base alignment of the type of an OpTypeStruct member is defined recursively as follows:

  • A scalar has a base alignment equal to its scalar alignment.
  • A two-component vector has a base alignment equal to twice its scalar alignment.
  • A three- or four-component vector has a base alignment equal to four times its scalar alignment.
  • ...

That third bullet point hits it right on the head, vec4 and vec3 have the same alignment, which you can also achieve by writing this:

struct Light {
    glm::vec3 color;
    alignas(16) glm::vec3 position;
};

There's a bunch of more nitty and dirty alignment issues that stem from differences between C++ and GLSL, this is just an example of one of them. These are esoteric in my opinion, and it gets even harder to write decent structures meant for humans - who are usually the ones writing shaders!


Another great example of odd cases of shader code not working when expected is this shader block. Take a look at this four bool structure, which seems okay at first glance:

struct TestBuffer {
    bool a = false;
    bool b = true;
    bool c = false;
    bool d = true;
};
layout(binding = 0) buffer readonly TestBuffer {
    bool a, b, c, d;
};

Oh wait... no, it's not actually okay:

a = 1, b = 0, c = 0, d = 0

I'm not exactly sure why it doesn't work and if anyone knows, please let me know. It seems to be because SPIR-V doesn't seem to define a physical size for bool, so I'm not sure what it's represented as. Changing them to integers works though.

Of course some might say this is a non-problem, because "Just use integers! they're just booleans!". I disagree, booleans and integers are very different semantically for humans but of course less so for computers. You can also pack a lot of booleans into the space of one 32-bit integer, which could be a possible space-saving optimization.

Sharing structures

One of the other problems I get annoyed with is keeping the structures in sync, there's usually one (or many!) instances of the structure written in C++ and many in GLSL. I even went through some of my shaders, and discovered instances where I updated the structure in only some places and not others. This is problematic because member order could change, meaning the structure itself could be undefined (and can also easily escape notice, depending on the shader is used).

Having just one definition for all of my shaders and C++ would be a huge improvement, even if I still had to pack and optimize manually.

StructCompiler

What I ended up with is a new pre-processing step, called the StructCompiler. I tried looking around on Google, and couldn't find anything similar - so I don't know if this tool is actually unnecessary (maybe developers are instead just pulling struct information from shader reflection?) but I did have a lot of fun making it anyway.

It's goals are:

  • Be able to define the shader structures in one, centralized file.
  • Structures should be able to be written on a higher-level, allowing us to decouple the actual member order, alignment and packing from the logic.
  • The structure can be reused in GLSL and C++.

First you write a .struct file. Here's the same, ugly post-processing structure shown in the beginning, but now written in struct syntax2:

primary PostPushConstant {
    viewport: vec4
    camera_proj: mat4
    inv_proj: mat4
    inv_view: mat4

    enable_aa: bool
    enable_dof: bool

    exposure: float
    display_color_space: int
    tonemapping: int

    ao_radius: float
    ao_r2: float
    ao_rneginvr2: float
    ao_rdotvbias: float
    ao_intensity: float
    ao_bias: float
}

This one looks much better, doesn't it? Even without knowing anything else about the actual shader, you can guess which options do what with some accuracy. Here's what it looks like when compiled to C++:

struct PostPushConstant {
    glm::mat4 camera_proj;
    glm::mat4 inv_proj;
    glm::mat4 inv_view;
    glm::vec4 viewport;
    glm::ivec4 enable_aa_enable_dof_display_color_space_tonemapping_;
    glm::vec4 exposure_ao_radius_ao_r2_ao_rneginvr2_;
    glm::vec4 ao_rdotvbias_ao_intensity_ao_bias_;
    ...
};

(Setters like set_exposure() are used instead of accessing the glm::vec4 manually.)

As I said before, the goal is to write it in a higher-level language which can then be ruthlessly optimized without worry. The optimization is basic right now, but it performs the same packing I did before by hand. Usage in GLSL is also easy:

#use_struct(push_constant, post, post_push_constant)

(The syntax could use some work, but the first argument is usage. The second argument is the name of the struct, and the third argument is a unique name.)

Since the member order and names are undefined, you must access the members by a getter in GLSL. I think this is a worthwhile trade-off for more readable code, and the compiler should optimize these away anyway.

vec3 ao_result = pow(ao, ao_intensity())

This tool runs as a pre-processing step in my offline shader system, but the struct files are copied into the runtime directory because the runtime shaders also use them

The source code is available here, which is just ripped from my engine tree. It's quickly written, but it's already working and I have replaced all of my large structures already! I'm pretty happy with how this tool turned out, and I can't wait to explore how I can expand on this more.


  1. The debug printf, along with detailed examples of alignment mishaps is definitely future Graphics Dump material! ↩︎

  2. The syntax looks eerily similar to Rust, which was intentional :-) ↩︎