OMBRE Dev-Blog Year 2 | Froyok

November 05, 2024

Summary of the Year

Feelings
Goals and Results

Dev-Blog Progress

Conclusion

I have been working on my own game engine for a while now and since today is November 5 (the anniversary date of my engine), it means it is time to post a new yearly recap.

If you are interested, you can check out part 1 over here.

(The Ombre engine in 2024.)

If you want to follow the day to day progress of my work, you can check out:

My threads on Mastodon: thread 1 & thread 2

My thread on Bluesky

My dev-blog on the Graphic Programming discord server

The wip channel on my own discord server

Summary of the Year

(Me writing this blog post for a year.)

Feelings

A lot happened in a year, and at the same time... not so much ?
Looking back to what I did, I worked on a lot of big chunks and I think that's what makes me feel that not a lot happened.

Clearly the engine complexity is growing and numerous important refactors happened to handle the new features (and various requirements). I also feel like I procrastinated quite a few times before jumping into some big topics, fearing they will take too long.
In practice, the first steps were always quick (ex: the physics integration) but the cleanup and proper building of the architecture was the longest.

It's also the first time I have a project of mine growing this big. I haven't looked at how many lines of code I currently have (I waste a lot of space anyway), but the amount of files clearly grew by a bunch.

(What you see here is only the tip of the iceberg.)

Overall I still have a good idea of where things are and how they work. I started to forget some details however, so I'm wondering if I should write some high level documentation (at least for myself) on some of my design choices. I also started using diagrams to help shape up in my mind how things are related.

I think my organic approach is starting to show its limit and maybe I will need to think about putting in place a better working methodology. But at the same time, I doubt I will integrate many more big modules in future, so at some point the architecture will stabilize and I should be able to focus on smaller parts.

Goals and Results

I had several goals this year and achieved quite a few of them.

What I consider done:

Transparency: rendering transparent objects is now possible, using basic back to front sorting at the moment. Being a forward renderer by nature made it actually not too hard to support, even if I had to refactor my light rendering system. I also added support for rough glass like in Doom (2016).
Reflections: rendering and using cubemaps for specular reflections was another big chunk of work which took a few months. I have further plans to extend the reflections system however, notably with reflection proxies.
Physics: being able to load and simulate basic physics was another important subject for me. I wanted at minimum to walk around in levels and simulate basic boxes under gravity.
Levels: building levels, maps and switching between them was another point I wanted to address. The goal being to build a test/demo level of all the engine's features. That demo doesn't exist yet but plenty of tools have been made to get there (notably by processing levels made in TrenchBroom).

What is still in progress:

Entities & events: clearly far from done, but I'm happy with the direction it's going in term of implementation. Adding basic bricks for gameplay elements shouldn't be too hard now. Plus there is always the possibility to use lua scripting as well.

What hasn't been started:

A lot of things !
If you are curious about the (long) list of features I want to support, you can take a look at the roadmap frequently updated here.

Dev-Blog Progress

The listing below roughly covers development events from November 2023 to November 2024.

(Ready for a magic trick ?)

November 2023

(November 10)

I started working on adding support for transparency. My main goal in mind is being able to render glass surface and hopefully with a bit of refraction like seen in DOOM (2016 version).

So I introduced the notion of blend modes in my material definition format to be able to sort out materials in different groups:

That was also a good opportunity to isolate objects that require to be rendered into the emissive pass.

(November 14)

Still thinking about how to proceed with transparency. One thing I realized is that the way I manage my lighting system doesn't work.

Right now to render my lights and the affected objects, I use additive blending. This means I re-render the same object for each light that touches it. With shadow volumes it is kinda the default way to go since you usually setup a stencil mask before rendering the object.

For transparency however, at least for compositing, you need to render your object once. This means you need to evaluate all the lights while drawing the object... and right now nothing is made do that in the engine. Also I was applying the fog as a separate screen-space pass, but for transparent objects it needs to be applied while rendering the object as well.

One "easy" way would be to declare a pre-defined array of structs as an uniform input of my shader and feed that for each time I draw my objects. However the framework I use (Löve) doesn't support assigning structs one the go. Uniform Buffer Objects (UBO) are not supported either.

Shader Storage Buffer Object (SSBO) on the other hand are supported (and already in use for the shadow volumes), so that's what I'm going to use. My idea is that for each frame I render, I will gather the relevant lights into a single SSBO and then feed that to the shaders for rendering.

Note: I could maybe find a way to work around the additive blending for the shadow volumes by storing the light mask into a separate texture, and then sample it as a mask during the shading of the objects. I could therefor store them in an array/atlas like you would do with shadow maps.

(November 19)

I'm taking the opportunity of setting up an SSBO for the lights to refactor as well some other stuff. Notably merging a lot of common properties into a single SSBO to reduce the number of bindings I do per shader. Like the camera information (position, direction, matrices) or the fog properties.

(November 25)

I'm getting there with the main SSBO rework. Until now I never had to really handle different kind of types into a buffer. Also because of a limitation in the framework, I had to split my matrices into a bunch of vec4:

struct _CameraFormat
{
    vec4 ViewProjectionMatrix_row1;
    vec4 ViewProjectionMatrix_row2;
    vec4 ViewProjectionMatrix_row3;
    vec4 ViewProjectionMatrix_row4;

    vec4 ProjMatrix_row1;
    vec4 ProjMatrix_row2;
    vec4 ProjMatrix_row3;
    vec4 ProjMatrix_row4;

    vec4 ViewMatrix_row1;
    vec4 ViewMatrix_row2;
    vec4 ViewMatrix_row3;
    vec4 ViewMatrix_row4;

    vec4 InverseProjectionMatrix_row1;
    vec4 InverseProjectionMatrix_row2;
    vec4 InverseProjectionMatrix_row3;
    vec4 InverseProjectionMatrix_row4;

    vec4 InverseViewMatrix_row1;
    vec4 InverseViewMatrix_row2;
    vec4 InverseViewMatrix_row3;
    vec4 InverseViewMatrix_row4;

    vec3 Direction;
    vec3 Position;
    float NearPlane;
    float FarPlane;
}

readonly buffer BufferGlobals
{
    _GlobalsFormat GlobalsData[];
};

Which I merge afterward as follow:

GlobalsFormat InitGlobals( _GlobalsFormat In )
{
    GlobalsFormat Out;

    Out.Camera.ViewProjectionMatrix = mat4(
        In.Camera_ViewProjectionMatrix_1,
        In.Camera_ViewProjectionMatrix_2,
        In.Camera_ViewProjectionMatrix_3,
        In.Camera_ViewProjectionMatrix_4
    );
    Out.Camera.ProjMatrix = mat4(
        In.Camera_ProjMatrix_1,
        In.Camera_ProjMatrix_2,
        In.Camera_ProjMatrix_3,
        In.Camera_ProjMatrix_4
    );
    Out.Camera.ViewMatrix = mat4(
        In.Camera_ViewMatrix_1,
        In.Camera_ViewMatrix_2,
        In.Camera_ViewMatrix_3,
        In.Camera_ViewMatrix_4
    );
    Out.Camera.InverseProjectionMatrix = mat4(
        In.Camera_InverseProjectionMatrix_1,
        In.Camera_InverseProjectionMatrix_2,
        In.Camera_InverseProjectionMatrix_3,
        In.Camera_InverseProjectionMatrix_4
    );
    Out.Camera.InverseViewMatrix = mat4(
        In.Camera_InverseViewMatrix_1,
        In.Camera_InverseViewMatrix_2,
        In.Camera_InverseViewMatrix_3,
        In.Camera_InverseViewMatrix_4
    );

    return Out;
}

GlobalsFormat Globals = InitGlobals( GlobalsData[0] );

It's ugly, but at least I can then just do:

vec4 Something = Globals.Camera.ViewProjectionMatrix * Vector;

With all of that in place, I fired up the engine (because it was in a borked state for a while) and...

(Well...)

(November 28)

I can now render lights again !
I had many little things not working following the huge refactor which I had to fix.

(Okay it's a single light for now, but still !)

Unrelated, I also reworked my outline shader. With a panning checker you can emulate the classic "selection" pattern from image editing software:

December 2023

(December 1)

Now I can render multiple lights !
In the example below I have a Fill light (red) and a Point light (yellow) lighting both meshes. Shadows are disabled temporarily to debug things out.

("Let there be two lights !")

While visually it's not much, behind the scenes this required to change the way I was grouping information. Until now, I was storing a list of meshes per-light, which was useful to both build the shadow volume (if the light is casting shadows) and then draw objects being lit by that light.

Now instead, I build a list of lights into a single big buffer, with specific indexes. Next I iterate as before on the meshes in the scene to know if they are touched by a light, but instead of storing that in the light, I store the light ID in the mesh data.

This means I now render meshes directly (and once !) and feed them both the light buffer and the list of indexes they are affected by.

Later on I might rework that system to evolved into some kind of clustered/tiled rendering system. For now it's good enough.

(December 2)

Baby steps but I have my first mesh rendering with transparency, however something is off:

SSAO contribution is based on the depth pre-pass, in which transparent objects are not rendered. So the character here, which is hidden by the mostly opaque but transparent object in front of it shows shadowing from the occlusion...

I should disable that by just not applying the SSAO when rendering the transparent material. Since the goal of using this rendering method is for holograms or glass like materials, that should be fine.

There is also another little issue:

Right now my rendering process culls back faces, but some front faces which are behind the object are visible still and create quite ugly edges. That's when I remembered this article on gamedeveloper.com from the creators of Ethan Carter:

The idea sounds nice on paper but I need to try it out to properly grasp all the details.

(December 4)

After sorting a few things out, I added the Fog effect on the transparent objects:

I also retrieved some old code and plugged in my HDR color grading (via LUT generation) and exposed some settings to tweak the color in-engine on the fly:

Also a minor bug I was having is that transparent objects were sorted in the wrong order (background objects where rendered last, so ending up in front of the rest) mostly because that's the order I was using for the depth pre-pass (to render early on close objects have an early discard of the other).

(December 5)

The "just-in-time" depth pre-pass for transparent objects now works.

The idea is to build a new depth buffer just before rendering the transparent object. You render it as opaque into that new depth and use it to draw the transparent version in the color buffer afterward. This basically applies self-occlusion on transparent objects.

(Off vs On, for the "just in time" depth prepass)

Notice how the bright triangles near the mouth are gone ? Much cleaner !

Right now however the feature is always enabled, and while the depth buffer copy plus additional render isn't that expensive (especially since I doubt I will have many objects using it), I want to put it behind a setting.

Another reason why I like building my own engine. I can add simple features like that on a whim.

I also got a quick and dirty version of glass rendering working:

Here basically a plane cuts the objects (character + sphere) in half but let stuff behind it visible while also catching the light in its specular reflections.
Everything is dark because I temporarily set the diffuse black so try out a premultiplied blending mode.

(December 6)

I added support for the fog on the glass material too. This was a bit more tricky than I thought at first because of the premultiplied blending:

However I have now a strange NaN creating issues and making my bloom explode:

(It doesn't look like much, but that's a close up on the glass reflection)

Investigating things a bit further, it seems to be coming from the specular shading and notably the GGX function:

I remember having the same issue when working on my area lights and I had to switch to another GGX function to fix it. This time I want to figure out the real problem, it would be a good opportunity to fix the area lights code too.

The current function I'm using is SmithGGXCorrelated() from here. The regular and fast version both seem to produce the issue. I thought this was a roughness value being too low at first, but I'm already clamping it with a minimum threshold so I'm not sure.

If I compare the other function I'm using, results are noticeably different:

(Left is Filament GGX function visibility term, right is Disney/UE4 version)

Also it seems to be producing a NaN/Infinite value on Nvidia, but on my AMD it's just the maximum value of the buffer (RGBA16F).

(December 7)

So, I figured out a solution for those fireflies generating INF/NANs. I was clamping my roughness early in my code, but that wasn't enough. Mostly because I was clamping before the perceptual roughness (roughness * roughness), so my clamping threshold was too low. The fireflies were coming from very shiny areas (aka roughness == 0).

I now clamp the perceptual roughness as well, but had to do it before the call to the specular lobe computation because it affected visibility and distribution terms. That's why I couldn't figure out why some stuff was still exploding yesterday, the problem affected multiple locations in the code.

The good news is that this made me notice I wasn't applying properly the attenuation factor on my tube lights, so I fixed that:

(December 9)

Now the real fun begins !
I started working on frosted/blurry glass like I wanted to.

While playing around, I figured that the way it is setup right now might fails.

I assumed I wanted to blur only the opaque geometry and then render transparent surfaces and then feed them the previous blur result. However if another transparent surface is behind the glass, it won't be visible (since the blur was generated before that object is drawn).

This means I need to blur precisely when I need to render the glass. Now I see why in DOOM 2016 presentation slides they said "Refraction transfers limited to 2 per frame for performance":

(From "DOOM - The devil is in the details", also more info here)

But still, to be sure I fired up RenderDoc to capture the game (on Windows, because on Linux RenderDoc would corrupt the rendering and I would only see the fog and everything else was black):

(Here is the first room displaying the blurry glass effect.)

So yes, in the first level if you align two blurry window they both get their own blurry pass. That blurry pass is done with a separable box blur into individual render targets.

Curiously, the glass is made of multiple meshes drawn in separate drawcalls. I wonder why. I thought initially it was because of the decal parameterization they mentioned in the slides, but that mesh actually cut the decal texture in half.

Also the shader is fed the blurry mips as individual textures, not merged into a single one. I guess because it wasn't worth the hassle to regroup them.

(December 10-11)

I now reworked the ordering to properly render my frosted glass. Now I check just before drawing the object if the shader needs a blurry version of the color and then render it on the fly. I re-use the same shader I was using for the bloom downsampling pass (so not a dual pass box blur).
I also merged the mips level into a single texture so this way I only have to bind a single resource when rendering my object and can choose the mip level on the fly via the LOD level.

(Here I plugged in as well the depth to control the blur intensity)

Initially I tried using the same idea as in DOOM for controlling the blur radius which is blending two mips together, but I couldn't get a good looking result. So instead I went with using a blue noise texture in screenspace to randomize the sampling and hide the bilinear filtering blockiness:

(Still looking a bit boxy, but good enough)

Also I couldn't resist trying out a texture to control the blur (and make a dirty glass):

(December 15)

Following a little feature request to the framework I use, I can now see my textures with a proper name when debugging in RenderDoc (if you are reading this, thx again for the quick support ❤️). Until today they all had automatic IDs which was making debugging a bit difficult in some cases.

(December 17-18)

I spent a few days cleaning up the transparency code (removing hacks and exposing properly the new features), which made me notice that my blurry glass blending was incorrect.

I fixed all of that to get the proper refraction behavior:

But it looks like I refactored too hard...

(Bug or feature ?)

Another feature I initially added was using a scissor rect to optimize the blur pass. Some objects may be considered as drawn by the frustum culling because of their bounding sphere, but in practice they aren't visible on screen.

The issue is that it still triggers the blur pass. So I added a scissor rect based on the mesh BBox projection on screen as an additional way to know if the object is visible. So it's another level of optimization to discard processing the blur pass.

However if all the projected points are outside the frustum then the projection fails and the scissor collapse. So you kind of randomly see the object disappear.
After thinking a bit about it, it reminded me of an actual issue I had when working on my portal rendering in Unreal Engine which required clipping polygons against the frustum.

That's not some work I want to do right now, so I postponed that for later and simply disabled the scissor stuff for now.

(December 20)

While still cleaning up some stuff, I also took case of the care where no lights were enabled in the scene. Until now this wasn't an issue, but I assumed I didn't have to render any meshes since I relied on the depth pre-pass once again.

However for transparent objects, this meant that they would disappear when not overlapping a light... which is quite noticeable if you still have fog in the scene !

Even if there is no fog, you want to render your transparent object still, because they could be occluding another object in front of your eyes.

(A cold foggy night)

(December 26-28)

I'm away from home (guess why) and on a laptop. So I choose to work on small stuff for convenience.

First I fixed my directional lights that I had in a borked state for a while following the light system refactor. Now all my lights behave properly with the new buffer system.

I quickly build a new format to define the structure of a scene, so that I could more easily load a specific configuration at startup. It's a simple json file listing lights, objects and a few other properties. Before I was editing my startup script file to load objects and setup lights.

I also added new properties on my fill lights to better control their intensity and specify how much SSAO is visible. I should be able to better control their GI/ambient look now:

(A simple scene to test out lights, here with a directional light and a fill light.)

I also tried to see if fading out the SSAO based on the fill light radius would be a good idea (using the invert of the falloff) as a way to reduce the sharp transition of the occlusion between a shadowed and non-shadowed area (like on the image above). However I felt like just reducing the general SSAO intensity in this case was enough, it didn't benefit from something more complex.

(December 30)

I got debug names working for shader too.

January 2024

(January 2)

On the graphic programming discord server an alternate version of AgX was posted. It's actually quite nice. Here are some comparisons between various tone curves:

(In order: linear, custom curve, AgX (new), Tony McMapFace)

In principle, it is close to the custom curve I initially built and used before switching to Tony. If you compare AgX vs the Linear curve, you can see that only bright colors are toned down. Colors below a specific threshold remain unchanged.

However this version of AgX still features a noticeable color skew with the blue light. So while Tony is slightly washing down some colors, it remains the best value here to me.

(January 3)

I didn't want to start anything big yet, so I fiddled around in the engine and ended-up adding Fast Approximate Anti-Aliasing (FXAA). It was very straightforward to integrate. I probably spent more time setting up the right buffers to get it working than anything else.

(Aliasing probably don't show up well on the thumbnail, so don't hesitate to expand it.)

I had to disable the vignette and dithering of the final tonemap/display mapping pass, but I will likely split that into another pass (and maybe add some sharpen).

(January 4)

I remembered that Subpixel Morphological Antialiasing (SMAA) existed and was used in Crysis 2 and 3. I went looking for the original presentation but also fired up Crysis 3 to check out which settings were exposed:

(All the options available in Crysis 3 menu for anti-aliasing)

However while looking around (I wanted to capture a frame to analyze some render passes) I stumbled upon the lens-dirt and lens-flares. I couldn't resist investigating that.

(Here is the frame I captured, basically from the first few seconds of gameplay.)

First I noticed the rendering of tiny sprites on the image below. I presume those are water droplets (not moving, likely just fading) since it was raining heavily in the scene:

Then in a following pass a classic lens dirt texture was applied. Strangely via tesselated plane it seems:

But the most important part was the nearby light lens-flare drawing:

It looks like it's a simple sprite based lens-flare, but several meshes are drawn, with for example an arc that gets properly distorted based on the light origin direction on screen. It's easier to understand in movement:

(Kinda neat.)

I would have hoped for a more post-process based effect, but it is still nice. :)

One thing interesting to note, is that the light lens-flare is not rendered in the scene directly as regular transparent sprite but in the lens dirt buffer later instead in the pipeline. So I imagine that all lens-flares are accumulated there.

(January 5)

I spent the day looking at frame captures of the game Syndicate (2012 version). I figured out a way to finally capture it in renderDoc, after years of trying.

It's full of surprises and interesting bits and I think there is more to say than with Crysis 3. So I think I'm going to write an article about it. Meanwhile I did a thread on mastodon of the early discoveries.

(January 6-7)

I did two things today: I tried out new tricks for my SSAO and did other tests with my FXAA pass.

For the SSAO I attempted once again to use a quadrant/cell setup.
The idea in order to improve cache coherency is to use the same ray direction when walking along the depth buffer to avoid divergence.

The issue with that is you obviously get the same direction across a bunch of pixels locally, so it becomes obvious and you loose the randomness. That's when deinterleaved/interleaved operations enter the room: with them you instead process one pixel every four for example (dividing the image in 4x4 groups). Usually this is done via several buffers, rendered individually or via MRT.

The idea here with the cells is to do it in one buffer instead, but by adjusting the UV coordinates to group the pixels together per cell. Visually it looks a bit like you duplicated the image by the number of cells:

(Don't mind the visual artifacts, this was some WIP tests.)

Then once the SSAO has been computed, you recombine all the pixels (inverting the UV coordinate transformation) into a regular image.

Unfortunately it didn't improve performance.
On my desktop machine with a RX 7900 XT, it basically increased the rendering cost by 0.1ms on each pass (one full resolution, one another being half). On my laptop machine with a GTX 1060, the full resolution pass went from 1.5ms to 2.3ms.

So right now nothing beats a pure and simple pixel shader pass it seems. Unless of course I use my dirty checker pattern trick (as mentioned in part 1).

Big thanks to Jasper for his help on figuring out the maths for the UV coordinates.

Regarding the FXAA, it is still working fine, but I was annoyed by the fact that it was running it after all the other post effects. Since FXAA take into account the contrast between pixels to determine if it is an edge that needs to be fixed or not, it means any effects that creates edges or hide them will affect that process.

One particular effect that comes to mind is bloom. Since my bloom is combined before the anti-aliasing pass, and given I use a linear blending and not an additive one, it can make FXAA ineffective in some places. My chromatic aberration effect is also at fault, since it replicates the images multiple times it can exacerbate aliasing along edges.

Thinking about how modern anti-aliasing is done before any post-processing, notably Temporal Anti-Aliasing, I was wondering if I could make FXAA run first instead of last. However FXAA is designed to work on color range values between 0 and 1 and in a perceptual color space, aka the opposite of HDR rendering. That's when I remembered that sometimes in some render pipeline, some temporary tone curve are applied, then reverted. For example to blend UI on an HDR buffer.

So the idea was to find a tone curve that was reversible, then I would be able to process as follow:

Transform from HDR to LDR range with a temporary tone curve
Convert from Linear sRGB to sRGB color space
Apply FXAA
Convert from sRGB to Linear sRGB color space
Revert the tone curve, going back to HDR
Apply the rest of the post-processing chain

Initially, I tried out a simple curve like Reinhard (and its Squared version from here). While FXAA seemed to be working, there was a significant loss of brightness. My maximum HDR value was going down from 40000 to 900, so the bloom afterward was quite affected. I noticed some loss of precision in gradients too, like with the fog effect. So definitely not the right curve, but it allowed me to test the idea.

Searching around, I stumbled again on this page from AMD which mentioned a tone curve specially designed for this kind of operations.

It still wasn't enough alone, so I ended up re-using a "trick" I use for my LUT based color grading:

In a setup pass for FXAA:

    // Convert from LDR to HDR
    if( UseColorEncoding )
    {
        Color.rgb *= 0.1;
        Color.rgb = TonemapResolve( Color.rgb );
    }

    // Convert to non-linear sRGB
    Color.rgb = LinearSRGB_to_SRGB( Color.rgb );

    // Setup Luma for FXAA
    Color.a = SRGB_Luminance( Output.rgb );

Then in the FXAA pass:

    // Apply FXAA
    vec4 Color = FxaaPixelShader( [...] );

    // Convert back from sRGB to Linear sRGB
    Color.rgb = SRGB_to_LinearSRGB( Output.rgb );

    // Convert back from LDR to HDR
    if( UseColorEncoding )
    {
        Output.rgb = InvertTonemapResolve( Output.rgb );
        Output.rgb *= 10.0;
    }

This is done in RGBA16F buffer to have enough precision.
Because of the multiplication, you lose some contrast in the image making some edges ignored by the FXAA filter. This can be adjusted by changing some of the input value to the FXAA function. In my case just lowering the Edge Threshold Min variable was enough.

Here are the results:

(FXAA off, aliasing is explicitly visible)

(FXAA on, applied last in the pipeline)

(FXAA on, applied first with the custom tone curve)

Let's zoom a bit on the FXAA results:

The gif above alternates between two states of the FXAA pass (first vs last). It's easy to see now that when the FXAA is last in the pipeline, it doesn't resolves all the aliasing.
Only the aliasing between the black area and the uniform yellow near the blue line is fixed. The green to yellow transition remains untouched. When FXAA is first, the aliasing is correctly handled because those color don't exist yet (since chromatic aberration is applied after).

(January 9)

I quickly discussed about my FXAA idea with some folks and a recent frame breakdown from the game Knockout was mentioned which says:

However, if anti-aliasing is enabled, we output to YCoCg (swizzled to place Y in the green channel) in an RGB10A2 target. Since FXAA needs luma to compute edges, and TAA wants to compute color neighborhood bounds in YCoCg, applying this transform on output makes more sense than doing it on the fly later. (This idea was borrowed from "Decima Engine: Advances in Lighting and AA" at SIGGRAPH 2017.)

However after looking into it a bit, in both cases they perform their anti-aliasing at the end after applying a tone curve. The YCoCg color space isn't meant to handle HDR color. I initially thought you could extract the Luminance and the Color property separately, but that doesn't really work anyway after to apply FXAA.

I also went over the SSAO pass to see if there were ways to improve it, in relation to the Quadrant/Cell division stuff. A compute shader might be the answer, but I need to try it out.

(January 11)

I played again with my lens-flares, mostly because I stumbled on Bart Wronski article again about his own experiment on the subject and I wanted to try it out myself as well:

This was a very quick and dirty idea, I mostly tried:

Scaling the bloom downsample buffer to make them thinner, widening the bloom. However this didn't make thin lines and instead created something more like a butterfly pattern. So not a great success.
I also adjust the UV of the ghosts to scale them non-uniformly, this however created a nice effect and re-enforced the shining effect. So I think I'm going to keep that.

(A video showing the new ghosts non-uniformly scaled.)

(January 20)

I have spent the past few days working on the Syndicate article, continuing my deep dive into the game. I noticed that the game render its fog as a linear blend over the already rendered scene, by re-using the mesh geometry but without any textures.

This made me think a bit about how my fog post-process pass was working. Until now I was sampling the scene color buffer and then applying the fog based on the depth and storing the result into a new buffer. This means I was basically handling the blending myself. It works, but now I feel a bit silly doing an extra copy when I could just do a simple linear blending with the GPU directly.

So I spent the evening testing that idea and it seems to be working. I will have to cleanup and check that everything works as before, but that's promising !

(The new fog look identical to before... but that's the point.)

I also noted a possible optimization to apply, which I will likely try out when I will cleanup the code: the idea is to use depth testing to discard early on the processing of pixels when the fog start far away from the camera. I saw the trick mentioned on the old UDK documentation.

"The FogStartDistance can be used to artificially keep some defined area in front of the viewer without fog. This also helps performance as pixels can be culled by the z buffer."
"Depending on the scene content and when using a far fog start distance the rendering cost can be 50% or less. This optimization is implemented by rendering a full screen quad that has a z value and depth test enabled."

(January 31)

I released my article on the game Syndicate, it's available here: Breakdown: Syndicate (2012).

It's a deep dive on a frame from the game, looking at how things are rendered (in particular the bloom).

(A screenshot from the game Syndicate.)

February 2024

(February 1-5)

Writing that article about Syndicate was intensive, so I decided to change my mind by playing again with lens-flares. I wanted to try again some ideas, notably John Chapman "streak" effect:

(Notice the lines/streaks near the border of the screen.)

I played around with the idea until something started to work nicely. I refined the initial effect by combining the same texture in different manner and scales, but with a more contrasted white noise distribution. I also adjusted the UV position based on the camera angle so that the effect only change when looking around.

I also added some new colorations to tint the flares and make them a bit more vibrant:

This is a simple HSV function using a gradient coming from the center of the screen to the edges as the position of the color. I offset the gradient based on the camera vector angle (like the streaks) which allows the color to change over time with the camera movement.

It's not realistic at all, but it stylizes the result in a nice way which I like a lot.

Something else I did as well is replace the blur function I was using on the lens-flare. Until now I was using a circular blur but the result wasn't perfect (and very noisy).

So I went to look for other method, more bokeh inspired and tried out this one with is another kind of round/circular blur. However I wanted to try out a more polygonal shape, to give an even more "photo" based look. That's when I stumbled upon this implementation which is actually very simple.

Until now I was using a simple "vignette" like gradient to scale the bokeh shape when reading the borders of the screen, but it was distorting a bit too much the hexagonal shape. So I instead switched to a more linear curve.

I also adjusted my dirt mask texture once again. I'm still generating it via Substance Designer, but some patterns on it didn't make sense anymore (notably switching from round to hexagon based shapes).

(February 6-8)

For a few days I decided to experiment with generating corrupted/glitchy images.

Initially I tried to read random bytes from memory, but if I wasn't careful enough this would lead to a segmentation fault when I was reading beyond the memory allocated by the engine.

(Example of a 512x512 texture generated from random bytes.)

Because I was reading in memory things like meshes and texture, the results were quite noisy and not very appealing to me. I was trying to get something more like visual/gpu based corruptions/artifacts.

On Mastodon I got the suggestion of trying out the Hilbert curve to shape up a bit the artifacts and that helped a bit:

Still, I was 100% happy so I decided to try reading bytes from other sources. So I went on to find other kind of big files I could randomly read. I ended-up trying out video files, audio files and even software executable. One in particular helped me get cool results, so I refined a bit my code and my display shader to generate more images:

(Examples of some outputs I generated.)

In the end I don't think this will end-up in my engine/game as a runtime feature, but that might serve as a inspiration source for some visual effects.

(February 9-13)

I did once again a few retakes on my lens-flares. With the new hexagonal blur the ghost weren't working as before and that was bothering me. Plus I felt some ghosts weren't working as intended so I decided re-tweaking all the settings once again and cleanup up a bit the shaders.

(One benefit of reworking the ghosts: using of the "unused" pass to add some nice looking halo around lights.)

I then moved once to reworking the bloom. Well more exactly to add an additional bloom pass. My engine is meant to support a sci-fi game project and for a long time I wanted to look into doing anamorphic bloom.

(Various tests of tweaking my current bloom implementation.)

As you can see, my first tries weren't successful, adjusting the bloom render target size or sampling coordinates led to undersampling issues and wonky shapes.

Another issue I have been dealing with: unstable bloom.

The issue is related to aliasing and even with FXAA enabled it was quite noticeable. So I went to find solutions, which I meant reading once again the great presentation by Jimenez on post-processing:

(One of the slide from the presentation.)

Turns I didn't understood properly the slide just above. In this image the sampling pattern is composed of 20 samples (5 blocks or 4 samples), but can be optimized into 13 samples (4 in the middle, 9 around).

This mean that if you compute the "Karis average" on the 13 samples pattern, you do the full version. Instead the idea behind the partial one is to perform the luminance average on each 2x2 samples block individually and then combining the results (with adjusted weights). This does produce a different result and helped stabilize quite a bit my aliasing/firefly issue with the bloom.

Notice on the video above how the full average gives a lot more wobbly effect and lose in intensity. The partial average retains a lot better the glow shape and the flickering is less pronounced.

As I was still struggling with the anamorphic bloom, I went to look for alternative implementations to get some ideas. I ended up finding this repository on github by Keijiro Takahashi.

The main idea is that instead of doing a regular bloom downsample by dividing the screen resolution by two each time, you instead divide only the height. This is how you can retain a very sharp line that extend very far. Once I got this figured out, it was pretty easy to implement.

(In progress directional blur.)

The effect works surprisingly great as a screen space based method.

As a twist, I decided to make this effect vertical and not horizontal. While usually anamorphic lens-flare are horizontal, sensor bloom on some camera can appear as vertical lines instead which was a bit the idea I wanted to go with.

(February 14-16)

Given I was working on bloom, I decided it was finally time to implement the bloom/fog effect I saw in a repository a long time ago. The effect was called Screen Space Multiple Scattering (SSMS) but I think calling it a screen space fog blur/haze is also a good way to describe it.

(The screen space fog in action. Notice how sweaty the atmosphere is.)

The effect is simple: it's basically bloom, but during the first downsample you mask the pixels by a fog function (which can be a simple distance or a more complex analytical fog evaluation) and then to the regular downsample/upsample process. When you blend the effect into the scene, you mask again by the fog function.

(Other examples of the fog blur in action. Silent Hill here I come !)

I initially went with a simple distance based range to implement the effect to get it working. Then plugged in my fog functions to get better results, which allowed me to try out the effect in a different situation like with height based fog:

(Must be very cold near the floor.)

It events works okay with transparency. I was a bit afraid that transparent object wouldn't blend properly because they don't write into the depth buffer. This is still true, but in practice you need a very big distance difference to notice the problem.

(Four transparent panes and other objects blending into the fog blur.)

There is one artifact to be aware of however, is that if the fog blur distance starts after the regular analytical fog, this can lead to visible black outlines on objects. This is because right now the blur is not depth aware like a bilateral blur would be.

(The black outline when wrongly mixing the regular fog and the blur.)

I might try to improve that in the future, but in practice when playing with the effect I just had to make sure they were in the right order to not have to care about this issue. So I'm calling it good enough !

(February 17)

I'm still not done with bloom !
After getting that sweet fog working, I wanted to try again to implement halation.

(Example of strong halation from an episode of the TV show Charmed.)

To summarize in a line, halation is when light bleeds on a film, creating noticeable yellow/orange glow on area of contrast (usually light sources). This can give a quite nice warm tone to an image.

(The halation effect in action.)

The way it works is quite simple and rely on doing a high-pass filter between a sharp and blurry version of the final scene color. So I ended-up re-using the bloom downsample to achieve the effect and blend it into the image. The blending is done in a way so that it doesn't add energy, avoiding skewing the scene luminosity.

I also took this opportunity to add some animated grain on my final render. I had the code disabled for quite some time and decided to add it back to finalize the look. It's a simple white noise texture (with a few extra tweaks) that repeats in screen space for now.

(February 18-21)

While working on the new various effects, the firefly/flickering issue got a lot more noticeable (especially with the fog). So I decided to tackle that problem once and for all.

The issue here is that the neon bar slightly change shape as I move forward, because it emits very bright pixels the total intensity varies from one frame to another producing a very noticeable flickering.

I tried for some time to figure out a way to reduce that spatial aliasing but it wasn't easy come up with a solution.

So I gave up and decided to look into temporal anti-aliasing, which was known to be handling this kind of problems. Until now I preferred to avoid it given it is known to blur and smudge the image overall.

(Test of blending the current frame with the previous one.)

Before deciding if implementing TAA would be worth it, I decided to try out simply blending together the previous and current frame. Results weren't that much convincing (and even funny, see just above), but I decided to continue still.

Once I figured out the previous frame reprojection, results started to improved quite a bit. So I went to play with the blending value to compare things:

(Before and after enabling temporal accumulation/blending.)

Flickering is a lot less pronounced, so this is clearly working ! Disocclusion artifact started to appear however:

(White lines below the neon bar appears while it moves up.)

This was quickly fixed by doing color clipping/clamping as suggested in this great article.

Once that was fixed, everything got working almost perfectly. There was a few issues I noticed however:

A slightly visible fade on bright source moving very quickly. That's the temporal nature of things catching up. However I gave a try at very quick flickering lights and didn't notice the problem, I presume because the clipping did a very good job.
When moving the camera very quickly I could notice that the image would sometimes stop moving after the camera. I figure this is an issue related to projection and maybe the color clipping.

(See how the glow/fog fades after the bar rotates. Almost feels like an eye adaptation effect.)

It both cases this is subtle enough that I decided it was good enough as-is.

There is another good reason why those issues don't bother me: the temporal accumulation is done only in the downsamples passes of the fog and bloom.
I don't apply temporal accumulation on the main scene color buffer, I kept my FXAA pass instead. I think this is great tradeoff, the main image stays sharp and flickers/fireflies are gone from the bloom/fog.

(February 22)

Temporal accumulation can be great, so why not trying to use is for the SSAO pass and reduce even further the noise or reduce its cost ?

(Before and after temporal accumulation.)

You can see how the blockiness of the gradient got smoothed out. I simply sent a random float value to the SSAO pass to use an offset to my bayer matrix pattern. Then the regular blur would apply and finally its result would get blended with the previous frame (with the same UV reprojection and color clipping principle than for the bloom). I didn't notice any obvious artifacts with that method.

However it turned out to be bit more tricky to get working than what I expected regarding the noise offset. I'm not satisfied with the results yet, but instead of spending even more time on my SSAO I decided to leave it as-is for now.

Small bonus, a render featuring the new bloom, fog, halation and film noise in Sponza:

(February 27)

Spent a bit of time reworking my chromatic aberration post-process effect. I wasn't happy with the blue noise pattern that was very noticeable when doing a screenshot of the engine.

(An example of the new chromatic aberration.)

I switched from a simple UV scale (from the center of the image) into instead using a direction vector from the center. The change is subtle, but it results into smoother results. I also increased the numbers of samples and reducing the intensity of the noise to soften even more the look.

Now the effect is more noticeable while also being less strong and invasive.

(February 28)

I have been wanting to add additional fog effects for a while. With the recent introduction of the fog blur I decided to look into adding local "fog volumes".

I started with a simple "fog sphere" based on Inigo Quilez shader:

(I didn't succeed on the first try unfortunately...)

I had to fiddle a few hours with the input for the function to get it right. But once it worked it looked really nice:

(The fog sphere blending properly against a wall and floor.)

Once the sphere was working, following with the cube version wasn't too difficult:

(The cube version of the fog volume.)

Stretching the cube shape can be really great to create a beam of light:

The fog blur on top can help blend the volume into the rest of the scene. For now the volumes are simple mesh shape with a shader on top, but I plan on converting that into some kind of entity/actor that can be placed easily into the level (like meshes or lights).

March 2024

(March 6-10)

One final stretch goal I gave myself to finish version 0.2.4 of the engine was to switch my bloom pass to compute. This was mostly because I wanted to reduce their cost as much as possible since I was starting to do several of them within a frame (notably refraction blur, fog blur and the bloom itself).

so one idea what to look into doing the downsample pass for the bloom blur in a single compute shader dispatch (like AMD and Nvidia have showcased)

I struggled a bit with the shared memory (even after "stealing" some code):

(Weird sampling pattern that I figured out how to fix at some point.)

With an update of the framework I was also able to render the downsamples directly into the mipmaps of a single render target, instead of having to jungle with severals.

However I haven't figured out yet how to do all the downsample into a single pass. The shared memory across workgroup is still puzzling me a bit.

In the end I decided to keep the original fragment shader version for now and to postpone the change to later. Performance wise, on my beefy GPU, the fragment and compute based downsamples were basically identical.

(March 12)

The big subject that I have been having in my mind for a few month now is rendering cubemap for handling reflections on materials. Introducing this notion in the engine actually triggers quite a few changes in its architecture.

First of all, currently the engine has a main render and post-process function. These two functions are in their own "module" which own their own render targets. To render the scene, the engine feed the current scene and camera to the function.

Cubemap are a different point of view, so I need to be able to specify that I want to render another camera basically, potentially 6 times in a frame (for each side of the cube). However this raise the question of who should own the render targets and how.

I think I will introduce the notion of a "view" which basically tells the engine how to render a specific viewport from the scene. Cubemap will have their own, and the main camera as well.

Since I plan on supporting portals, I will have at some point the notion of rendering a view within another view. That's why I think I will separate the management of the render target outside of the view module/object. This way the main camera can send render targets shared across views to render portals.

Another question that came into mind is "how do you handle metallic surface while capturing cubemaps ?". Metallic surfaces are specular reflections only, but if no cubemap already exists, should the reflection be only about local lights ? Or about a global skybox ? That wouldn't work for interiors, it would create light leaks.

To answer this question better, I fired up an old build of the Unreal Engine that I had still lying around and did a simple test scene with a few spheres:

(A simple scene with 3 spheres, the cubemap was captured from the center.)

As you can see, the a metallic sphere reflect the other but in a strange manner. Its surface is fully gray and not shiny at all.

Looking at Unreal Engine documentation, there is an explanation:

"Only the level's diffuse is captured to reduce error. Purely specular surfaces (metals) will have their specular applied as if it were diffuse during the capture."

From: Reflections Captures

Setting the material roughness to 1 (non-glossy) gives us a similar result:

(One of the metallic sphere now has a very high roughness value.)

Finally, when you have multiple cubemaps in a scene there will be the question of blending between them notably for moving objects. For now I think I will just find the closest cubemap and use it during the shading pass of the objects. At least as a first step.

See you in a few days... there is quite a bit of code to refactor.

(March 14)

Passing my days thinking about how to change my engine architecture. I'm struggling a bit exactly on how to organize and manage the ownerships of things.

I took advantage of a drawing board to sketch out some stuff:

(March 20)

The engine is starting to render again (because yeah, the refactoring broke a lot of stuff).

I'm pretty happy with the refactor, it's cleaning up a lot of things that had started to accumulate in wrong places. It also clarified quite a bit the main loop of the engine.

One notable example is my post-processing code that was a single giant file. I have cut it down into several of them instead, one per effect. It makes things easier to track. I see them as submodules which are initialized via the main renderer module.

Not everything is operational yet, but I can at least push the HDR scene color into the backbuffer and see stuff !

Another cleanup is my editor code which had started to have references to objects and modules in a lot of directions, so it lacked clarity on who own what and when.

(March 23)

The engine rendering is back to normal, but this time underneath I have the concept of a main view that gets pushed to the screen.

This mean I have the basics to create additional views (with of without post-processing), so this will be useful for cubemaps but also potentially editor views.

(The engine rendering again, with a slight bit depth bug I fixed since.)

I took this rework opportunity to improve the gaussian blur I was doing for the background of the editor window.

I also adjusted a bit the code of my bloom downsamples/upsamples. I now use TextureViews to create indirection to the buffer mipmaps and be able to do the operation into a single buffer instead of having several ones at hand. It's still not done in a single compute pass, but it's still better than nothing.

(March 24-26)

Before jumping into cubemaps and everything involved, I wanted to see if I could instead add a Depth of Field post-process pass. I already use a similar effect for my lens-flare, so I wanted to see if I could use it on the main buffer.

Things didn't work on the first try however:

Took me some time to figure out what what happening, but in retrospect it was obvious: I tried re-using buffers to avoid allocating new ones. Given the DOF and Lens-flares shared the same process, I shared the same buffers. But I forgot the lens-flare buffer would be mixed with bloom later on, so I couldn't safely re-use it, even if the effect happened after the DOF itself.

Still, while it was broken, I continued to play with it because it generated some nice images, such as:

Once I got that bug sorted out, I got the blur working on properly and it gave some pretty good results:

However, once I started to integrate the notion of blur size based on the depth, I started to face some issues. My goal was to only blur the background of the scene, leaving the foreground untouched. But even in this context I couldn't figure out a good way to discard/ignore samples that shouldn't contribute.

I tried to add some tests based on the depth but it led to strange artifacts:

Another example:

Notice how the bright light on the helmet creates an halo on the background. From the research I could find on the subject, it seems to be a side-effect of the separable nature of the blur from the method I used.

(March 28)

From the previous set back, I decided to look into an alternative method. It seems popular ways of doing Depth of Field nowadays is based on a "Scatter & Gather" principle, which relies on doing multiple samples around a certain point.

It seems pretty easy to do in a circle, like DOOM (2016) demonstrated, but I was more interested into having an hexagon like shape like in my previous blur implementation.

I stumbled upon Crytek slides on the subject:

This slide in particular caught my attention because it seemed to address the issue I was facing:

There was also a slide dedicated to the computation of the bokeh "shape":

I found an implementation on shadertoy, so I gave it a try:

The ability to morph the shape on the fly is a nice touch. In my case I decided to pre-compute the position of the samples on the CPU side and store them in an SSBO buffer to send them to the shader. This way I can still recompute them on the fly, but don't need to do it every frame.

April 2024

(April 1)

Instead of making progress I'm procrastinating by doing other things. I notably fixed my debug draw mode, which will be useful to look at the DOF alone and not combined with the rest of the post-process chain.

I also decided to do a revamp of my engine page: https://www.froyok.fr/ombre/. Looks a bit more professional now, even if it still lacks content. Of course it serve no purpose other that bragging about it. :D
(Okay maybe it's useful for the todo list.)

As I'm still scratching my head over the Depth of Field implementation, I exposed a few controls in the UI:

(You can see the sampling kernel being recomputed on the fly here.)

(April 2-3)

I'm toying around with my code to add some behaviors in the Depth of Field effect. I tried out chromatic aberration:

(I think my favorite is the first one, the blue fringe plays nicely with the halation effect.)

Adjusting the weights of the sample of the kernel also allows to simulate holes in the shape which is pleasant to look at:

However my Circle of Confusion computation and sample discard process are still not working, meaning I'm still experiencing bleeding:

So yeah, still hitting a wall on this. I'm a bit sad that all this rework feels for nothing. Hopefully future me will figure something out.

(Me, at 4am, writing those lines.)

(April 6)

I made good progress, past me was right to trust me, as always !

So in short, I finally figured out how to properly scale and discard samples, which means my Depth of Field is starting to look nice:

It doesn't looks like it right now, but there are a few issues remaining:

The sampling pattern being a bit scattered create visible holes, but that's because I haven't added a pass to fill those holes:
The upscale pass is also showing obvious aliasing and blending issues:

But outside of those things, it starting to work really well already:

(April 9-10)

I added the filling pass, so I don't get holes anymore in my bokeh and it actually look nice:

(No more holes.)

(A beauty shot just for the pleasure.)

However I got sidetracked because I noticed a few bugs in my screenspace fog blur:

It is flickering more than usual
There is a weird blending issues that is darkening things too much where the fog isn't fully opaque
The foreground is bleeding into the background

This took me two days to iron out all the details. Initially I thought this was regression introduced with the update of my framework and my depth reprojection wasn't working properly.

That wasn't it. For some reason, I wasn't sampling the depth buffer anymore when downsampling the scene color. Taking it into once again fixed the foreground bleeding, but the flickering remained and felt even more pronounced in some ways.

(Before vs After the color bleeding, here with the white neon.)

A few hours of debugging later I noticed that the way I was doing the reprojection and notably the color clipping was wrong (in short: I was clipping color without taking into account the depth mask). Once I fixed this, the issue was gone. This also opened the door to the fix for the fog blending (weird stories about pre-multiplying colors, etc).

(Before vs After the fix for the fog blending, the black band is gone.)

("Incredible, no more artifacts...")

Once those issues were fixed I got back on the DOF.

By tweaking the sample weights I was able to make the center of the bokeh fade out which allowed to simulate other kind of DOF effects:

(Works even on non-circular bokeh by pre-computing the sample "ring" position.)

I also added back the chromatic aberration on the bokeh to create some nice colors:

(Here is the DOF enabled with other post-effects on top, like lens-flares/bloom.)

(April 13)

Following a suggestion on Discord, I wanted to see if I could try out other bokeh shapes and notably one looking like an heart just for the fun of it.

I looked around for functions to distributes the samples along specific paths and stumbled upon the ones on Wolfram but I found them difficult to work with. Also I struggled to figure out how to put patterns inside the area defined by the outline.

That didn't stop me to try it out, it gave some "interesting" results:

So I continued to look around and found this page that showed how to build heart like shapes in other ways. There is one in particular that caught my eye:

You can see that the shape can be split in two other: a circle (itself split in two) and a square. Both of these shapes are easy to generate, so I started to rework a bit the code I was using to the generate the sampling kernel and it got me there:

You can see the distributions of the samples isn't perfect, but that was good enough to try it out and the results were quite pleasing:

(❤️)

(April 27)

The past few days passed quickly and I haven't been super productive, mostly because of work stuff (but also Helldivers 2).

I took some time to finish the refactoring of my rendering code. I was missing for example the refraction blurring with my glass/transparency rendering.

This week however I wanted to try out some optimizations I had in mind for improving the performance of my shadow volumes.

The first one is based on the suggestion of a colleague. The idea is to use a custom projection matrix when rendering the shadow volume that has different near and far clipping planes compared to the main camera. The goal is to restrict the rendering area to the volume of the light.

In practice, it kinda means simulating the Depth Bounds extension.

It took me a bit of time to figure things out, notably that I needed to do a copy of the depth buffer and clamp then rescale its range to make it match the projection matrix. Once that was in place, I could do performance comparison with and without the "optimization".

(As the camera moves away, the custom far clip plane makes the shadow slides over it.)

Unfortunately I didn't see any kind of performance improvement. On my old laptop it even performed worse. I presume mostly because of the cost of the extra depth copy.

So I had another idea: use the light mask to modify the depth buffer and help the rasterizer discard fragments that will never contribute. Since I already do a depth copy, I hijacked this code pass and instead masked the depth with a value that would make the depth test fail.

(The test scene, 3 heavy meshes and a single point light.)

(Here is the depth buffer masked by the light volume.)

(Here is the shadow volume buffer, with only the depth test.)

Guess what ? No improvements either (still worse performance, but not as bad as the custom near/far planes).
I wasn't super happy about it. However in retrospect I'm not that surprised. My previous tests last year with stencil masking and using the depth bound extension didn't yield any good results either.

Ho well, at least I tried.
I'm not closing all the doors on this however. I still have some ideas I wan't to try:

I could do the "tesselation" in a second compute dispatch, processing only specific triangles (the silhouette triangle, which a re long and thin). It's kinda what already do in the main one, however I'm thinking that maybe the GPU could balance the loads better if all threads were doing the same work (rather than sporadically when processing also front and backface triangles).
I could do the rasterizing of the shadow volume in two pass: one compute, one regular raster (given I do additive rendering, it's easy to split). Moving the thin/long triangle in the compute pass to avoid the 2x2 pixel groups overhead. I may not need tesselation at all this way (but compute raster is hard so it's not for the near future).

The second idea was related to how my compute shader dispatching was done. When working on another bit of code, someone on the Graphic Programming discord server pointed out to me that dispatching lots of threads to then discard them in the shader code was a bad practice. I fixed that promptly. However I never applied that fix to the shadow volume compute shader.

So that's what I did, and just doing that actually gave some noticeable performance gains.

In the same test scene as above, on my test machines I got:

From 0.1ms to 0.065ms on my desktop.
From 0.61ms to 0.22ms on my laptop.

At least I got a small win. :)

May 2024

(May 1)

Today was off work, so I took the day do fix my Depth of Field. My DOF is mostly working okay, but there are two points that bother me:

Tweaking the focus point is difficult.
My DOF has a nasty white halo artifact around objects when a sharp in-focus / out-of-focus transition happens (see just below).

(The halo artifact)

Fixing the focus was easy enough. I went with the method you specify the start and stop point of the focus transition. I feel this is much more intuitive, especially since I don't use real camera properties in general. I adjusted the way I retrieve the depth linear version then extracted the valid range to convert it into a mask for my Circle of Confusion radius.

I wasn't able to fix my DOF transition however. I think I have to rework how I do the bokeh blur and filling to make it easier to upscale afterward, but I haven't figured out how yet.

(May 5)

Today I looked into rendering light shafts. Stuff like that:

(Example from AMD presentation "Rendering shadows in participating media" - 2004.)

But I wanted to explore using geometry for this instead of doing volume raymarching. The basic idea is rendering local meshes for specific lights instead of having a unified solution. The idea is roughly similar to shadow volumes.

I'm inspired from the GDC presentation "From Terrain To Godrays:
Better Use of DX11" by Nvidia:

(Extracts from the presentation.)

I started by putting a directional light in one of my test scene and used its properties to also render a shadow map (my first one ever !):

(A directional light with its own shadow map.)

Then I exported a tesselated plane from Blender and used the shadow map to push its vertices along the light direction:

(Weird shower.)

Hehhh, what the hell ? Some weird precision issue maybe ?

To understand what is happening here you need to know how the mesh is rendered. I use an additive blend mode, front faces add light, while back faces remove light (same principle as shadow volumes). As-is I would have expected nothing to appear because front and back wouldn't cancel each other out, except around the character.

After digging around, I switched back my color buffer from RG11B10 format to RGBA16F and that fixed it. I left it as-is to continue the experiment, but I imagine in the future I will render these meshes in a separate pass (maybe at half resolution to blur it, etc.).

(The plane projected against the mesh via the shadow map.)

Something I noticed here, is that the mesh vertex density seems to matter a lot. The way I sample the texture as well. Moving around the geometry shows a lot of instabilities.

In the video above I only output positive values from the fragment shader (to make debug easier). When I enabled back negative values it gave me this:

(Spooky !)

Yeah, negative values don't play well when blending directly into the main color buffer...

I stopped here, not being sure how to continue. I decided to shelve the idea for now.

What I find promising about this workflow is that I could expand it to regular shadow volumes. It has the potential of being faster to render because it could give a better control over the geometry polycount. For example by implementing some LOD system (fixed or dynamic tesselation in compute) to improve performance and reduce overdraw in some ways.

So definitely an interresting idea !

(May 12)

I spent the past few days finally working on my cubemap rendering once again.

Getting the actual cubemap rendering wasn't too hard, it took less than a day to setup. The long refactoring finally payed off, so I'm quite happy about it. :)

I even made a quick debug shader to visualize it in the scene:

(The cubemap being rendered every frame gets automatic updated.)

My screen space fog scattering can create seams since it's a post-process, but if I keep the start distance away enough it's not noticable:

(A bit hard to notice, but seams appear on the floor.)

With an actual cubemap texture in my hand, I decided to move toward the next step. The goal of rendering cubemap is to use them afterward for shading my object, notably for specular reflection (aka radiance).

The first part of the split sum approximation to do Image Based Lighting (IBL) is to generate the DFG terms as a LUT texture. I went with the code conveniently listed here. We will see if I need to revisit it later.

Next I need to pre-filter my cubemap texture to store the different roughness levels into its mipmaps. Rather than reinventing the wheel, I decided to use some tools provided with Filament to do so. Filament has cmgen which can read a cubemap texture and generate the proper mipmaps for it.

So first I had to export my cubemap as an exr from the engine, I decided to go with a cubemap cross display. At first I tried to do it on my own. I figured how to draw each side of the cubemap via custom UV coordinates, but stitching them together in one shader proved a bit more difficult. In this end I went with this shadertoy.

Once I exported my texture, I went to check it out. Colors were wrong in most editing software I could try. Turns out that Love framework exported the color channels in the RGBA order, but basically every software out there expected them in the ABGR order, without checking the channel name first. That took a bit of time to figure out, but that was easy to work around in the shader at least.

Next I ran cmgen to generate the cubemap texture. Using exr, hdr or even dds as the output format, it would only generate each cube face and mip level as individual files. Choosing ktx format instead would give me a single file... but I cannot load it in Love. And Khronos tool only deals with KTX version 2 format, while the cubemap I have is KTX version 1.

So I went to look for alternative solutions, aka being able to load custom mipmaps into a dds file. I'm fine dealing with each cubemap face as a separate file, since that's what Love can load as input.

Well, after spending two days looking for the perfect tool, I can say: none exist usable as-is. It's either:

Stuff you need to compile yourself (or don't have binaries for Linux)
Import custom mips, but only encode in BC1/BC5, no BC6h
Import custom mips but as png, no HDR input
Write my own DDS library to do it

Result: 💀

I don't want to spend days trying to figure out how to write a DDS file. Given my experience in the past with DDS tooling, that would be the best solution for the future, but right now I don't want to focus on that and prefer to do actual rendering in my engine.

(So yeah, this is a rant.)

My current idea/workaround would be to store my HDR cubemap in RGBA8 with RGBM encoding. This means I could do in steps like so:

Load all the cubemap mips and store them in a single RGBA16F/RG11B10 cubemap texture.
Same as before, but in an RGBA8 format with RGBM encoding (as described here and here).
Write my own DDS tool (maybed inspired from this)

Obviously the last step for much later.

With how much DDS is key for game engine and GPU rendering, I'm baffled to see so little tools are out there in a good shape.

(May 17)

In the end I went with RG11B10 format that I dump as a binary blob on disk and then reload on next launch. This way I don't have to bother storing the data into an intermediate format.

Because of constraint with my framework, I dumped every face and mipmap as an individual file. So I use a zip archive to gather all the relevant files from a cubemap into a single file. It takes less than 0.05s to load a 512x512 cubemap via this archive, so good enough for now until I get some DDS going on.

With a prefiltered cubemap now loading, I decided to update my debug shader to be able to play with the mipmap:

(Polishing my orb. Also yeah it's upside down...)

Initially I used texture instead of textureLod in my shaders, which gave me really weird facetting artifacts that took a while to figure out.

So Cubemaps are working ? Sampling is working too ? Great, time to plug it into every shader:

(Woops...)

I forgot to move the roughness/glossiness code before I sample the IBL, which made glossy shiny surfaces everywhere. Once I fixed that I got the following result:

(It's... bright.)

I stopped a bit here, because I didn't understand why everything was so much brighter. Note that I only render the IBL radiance, there is no irradiance (on purpose, because I use diffuse only fill lights instead).

I wonder if this is because I set my light to be overly bright initially to compensate the lack if local IBL. So now I have to rebalance my lighting. But I checked in Unreal Engine and there is no such strong brightness difference with and without a local cubemap. Also if my lights were too bright then my tone-mapping and/or bloom would have been way more off as well.

I spent a day on this specific problem, really worried I had screwed up something fundamendal. To investigate more easily what was happening I built the classic "debug pbr balls", and well...

Yeah, something is definitely wrong here. All the balls should be red, and the green one here are fully "metallic". Digging into the code, I isolated the part that was causing issues. Let's take a look at the DFG LUT sampling:

    float DFG_Lod = Params.RoughnessPerceptual;
    vec3 DFG = textureLod( buffer_ibl_dfg, vec2(NoV, DFG_Lod), 0.0 ).rgb;

    vec3 E = mix( DFG.xxx, DFG.yyy, Params.F0 );

Adjusting the code to this:

    vec3 E = mix( 1.0 - DFG.xxx, 1.0 - DFG.yyy, Params.F0 );

Somehow fixed my issue ? Wait, this is weird ! Why do I need to flip the value here ? It's not about the texture being upside / down.

Well it turns out that Filament (the PBR reference that I use) stores its DFG terms into a LUT texture but with value swapped between Red and Green. I'm not sure why, especially given that in their documentation they don't do that.

Okay, but that doesn't explain why inverting the values fixed the problem ?
I actually just got lucky, inverting the values gave me gradients that kinda matched the other channel of the LUT respectively. Once I clarified all of that I removed my weird code flip and properly swaped the channels instead. My pbr balls started to look way better:

(Don't mind the green ball, it's just here to ensure I don't mistake which side is which.)

Once that was fixed, I played a with the lighting to see how things were going on:

(Metallic surfaces are looking really good now !)

(Fancy Sponza !)

What are the next steps ?

I need to figure out how to properly apply the IBL. It should be considered as an ambient light but my engine doesn't have this notion at all. Given I can render a mesh several times because of my light system (with the shadow volumes) I need to make sure I don't render the IBL twice or more.
Parralax corrected cubemap is probably the next step I will look into as well. It seems relatively simple and should improve quite a bit the reflections.

(May 20)

Parallax correction didn't take long and was easy to hack into the current rendering code.
I went with hardcoded values for now in the shader until I properly handle cubemaps on the editor side.

(Result is already pretty cool !)

(Other examples where I modified the floor glossiness in the shader.)

I also been thinking about how I should rework the way I render my objects in order to properly incorporate the IBL lighting.

Right now I have several "main" passes:

Emissive
Lights casting shadows
Lights not casting shadows
Transparency

The reason why it is split like so is because shadow volumes write to the stencil mask, therefore I cannot render several lights casting shadows all at once. Because emissive bits are also lighting, they need to ignore shadows as well. I handled this by using specific "material/blend" types to know in which pass an object needs to rendered into.

It works, but it's not pretty. For example to handle objects that get shaded and receives shadows but also emit lights I need a big "if" in the shader with a state I change before the beginning of the pass (to avoid writing twice the wrong information).

Now that I have IBL lighting, the straightforwad solution would be to render every object of the scene into the emissive pass but this time with ibl + emissive bits. That's a possibility, but I don't find it very elegant nor easy to maintain in the future. If only I could render all my objects only once instead...

Wait. I can !

The thing about my shadow volume, is that they are not rendered into the stencil buffer directly. So instead of using the stencil, I could instead store the shadow volume result as a regular texture like... shadow maps.

Since shadow volumes are binary in nature, I only need a way to store them as a binary mask (black or white). So what if I went to store a bit mask into a regular texture and sample it when shading my objects ? Stay tuned. :)

(May 21)

I discussed again about my light shaft method with a colleague and made some nice progress:

Right now I use a simple distance based opacity fade and not a fog function evaluation. So it doesn't have any kind of realistic basis. The geometry aliasing is still noticeable.

Will have to improve it in the future !

(May 22)

Back on cubemap reflections, I decided to get rid of the FDG LUT sampling following a comment from Jonathan Stone.

(Mastodon post)

Here is the function from the MaterialX codebase:

// Rational quadratic fit to Monte Carlo data for GGX directional albedo.
vec3 mx_ggx_dir_albedo_analytic(float NdotV, float alpha, vec3 F0, vec3 F90)
{
    float x = NdotV;
    float y = alpha;
    float x2 = mx_square(x);
    float y2 = mx_square(y);
    vec4 r = vec4(0.1003, 0.9345, 1.0, 1.0) +
             vec4(-0.6303, -2.323, -1.765, 0.2281) * x +
             vec4(9.748, 2.229, 8.263, 15.94) * y +
             vec4(-2.038, -3.748, 11.53, -55.83) * x * y +
             vec4(29.34, 1.424, 28.96, 13.08) * x2 +
             vec4(-8.245, -0.7684, -7.507, 41.26) * y2 +
             vec4(-26.44, 1.436, -36.11, 54.9) * x2 * y +
             vec4(19.99, 0.2913, 15.86, 300.2) * x * y2 +
             vec4(-5.448, 0.6286, 33.37, -285.1) * x2 * y2;
    vec2 AB = clamp(r.xy / r.zw, 0.0, 1.0);
    return F0 * AB.x + F90 * AB.y;
}

So that's one more texture sample I can save up, which is important to me since I cannot use bindless textures. No comparison screenshot here because in practice I didn't see any differences.

(May 29)

I finally did the shadow system refactor.
In the end I went with merging all my mesh/light rendering into a single pass instead of three (emissive, light shadow casting and non casting) and not just putting the IBL lighting into the emissive pass.

So now I render the shadow volume first and store them into a single RGBA8 texture (which gives me 32 masks). Then I sample during the shading of an object (if it's casting). I had to reorganize a lot of code to get to work.

(Several lights casting shadows in Sponza.)

(My debug view mode showing each light shadow mask.)

This big refactoring produced quite a few bugs, one in particular being:

(I don't have any particle system. This is just memory garbage being sampled.)

I also had to rework how I sent my light data on the GPU and when, to avoid some one frame delay.

(Sharp shadows look so good <3)

In the middle of this I also took the opportunity to rework the way I eveluate the bounds of my point lights on screen.

(Perfect and tight sphere screen bounds)

Before I was projecting points from a sphere mesh, that worked but wasn't always accurate leading so some incorrect clipping of the shadows in some cases.

With the help of Jasper (big thanks once again) I got this little code snippet working to do it analytically instead.

June 2024

(June 07)

I'm once again working on cubemaps, this time I wanted to move the cubemap filtering on the GPU to make it faster. The dump to disk followed by the CPU processing turned to be too slow for me. It also opens the door later for runtime generation as well.

At first it didn't go well...

(Is it finally disco time ?)

...but I got it working.

(Pre-filtered cubemap on the GPU)

I also wanted to rework how I handle cubemap and store them into an octahedral texture:

The benefit of storing cubemaps this way are several:

You can store a whole cubemap into a single square texture, instead of several for each faces like I was forced to.
It makes it easier to manage in my pipeline. Notably I can now store all my cubemaps in a single array (since cubemap arrays aren't supported by my framework).

The main issue with octahedral encoding/decoding is the filtering, which can lead to obvious seams:

For now I compensate by clamping early on UVs to make these seams less obvious:

This can be further improved by adding some pixel padding at the texture borders, but I haven't done it yet. Another solution could be custom filtering, but that doesn't seem worth it for the use case here performance wise (especially if I need to sample several cubemaps to blend them together).

Octahedral cubemaps also produce heavy distortions:

I'm not too worried about those artifacts however, they are obvious on very reflective surfaces, but anything more noisy/less glossy should hide it pretty well (I hope).

(June 08)

I added a new post-process to create a slight lens/barrel distortion. Nothing special, it's just to make the final image a bit more fancy:

(June 16)

My recent progress was focused on fixing some aftermath from the shadow volume / bitmask refactoring. For example light masking wasn't working as expected.

(Directional light weren't constrained by their volume anymore.)

(The spot light mesh wasn't scaling properly.)

All back to normal now !

(June 18)

My cubemap workflow isn't finished yet, but I'm a bit tired of it so I decided to move on.

My current thinking is about occlusion culling and scene management. So I started exploring my level editing workflow and looked into TrenchBroom.

(Screenshot of the level editor TrenchBroom.)

I did a bit of mapping (creating levels) for Half-Life a looong time ago and always liked the workflow around brush editing. I always felt this was the best way to create unique areas and be creative, compared to building meshes in say Blender and having to import that into an editor.

Playing around in TrenchBroom is reviving all these memories (...from 15 years ago).

July 2024

(July 10)

Quite a few days have passed. I was mainly looking into how to parse map files from TrenchBroom, to be able to convert them into actual meshes/geometry.

The thing is that brushes in map file are simple volumes defined by the intersection of planes. So you have to compute the intersection to get the corners/points of the convex volume it builds. It didn't go well at first:

(This is supposed to be a cube, not a Picasso.)

However after looking at existing parsers (like this or this) I was able to get it working and output some simple obj files. This gave me some basics to think about the next steps.

One thought I had with that brush workflow is that by default meshes you generate get a faceted look (because each "face" has its own surface normal perpenticular to the plane it was defined by).
So I was wondering if you could do an average of the normals across faces based on a minimum angles. Which made me wonder if existing games did that.

For that I launched The Chronicles of Riddick: Assault on Dark Athena (since it was finally working on my Linux) and wanted to look at some levels of the game. I knew it was made in a similar way to old games like Quake or Half-Life (aka using Binary Space Partitioning or BSP) and see how they handled the geometry on smooth surfaces.

(A small area from the tutorial showing smooth curved walls.)

(Other examples with smooth pipes.)

I also saw that existing modern Quake mapping tool also handle smoothing vertex normals. For example here with the "phong shading" section of the Ericw-tools documentation.

So it's definitely doable, but I have to figure out how (maybe as a post-process via my Blender exporter ?)

(July 14)

With the ability to convert map file into objs, I started to look into how to match what I could do in the editor and what I needed in my scenes.

So I built my own game definition file, which list "entities" and their properties to be displayed in the editor. The goal is to make the map files the source of truth to build the final levels and meshes.

I made a few debug textures and looked into how to display meshes. Principaly to figure out scales. TB works with integer units on power of two (because it heavily focus on Quake based games).

(First test of textures to evaluate the grid size.)

(First test of importing a 3D mesh, here a character to get a scale reference.)

(July 17)

I decided to add a fallback mesh, in case the engine is unable to load one.
It's inspired by the famous "error" mesh in Source Engine (I guess because of nostalgia from using it years ago).

(Source engine error mesh)

I started from a simple font in Blender which turned into a very little amount triangles:

(My own "error" mesh)

Another issue I tackled was regarding my handling my TBNs. I wasn't aware that when using doing the cross product of the Tangent with the Normal vector, you should multiply the result by the handness to get the right Bitangent.

Fortunately that was very easy to fix, I just needed to retrieve the information in Blender and to export it in my mesh files. From there I could retrieve the handness information in the vertex shader and apply it.

I noticed the result immediately: normal maps on my meshes weren't broken anymore !

Here is a quick before/after the fix comparison:

(Notice the before/after of the bottom of the heel.)

(Plenty of areas looking better here.)

(July 21)

While in the background I'm struggling with the TrenchBroom map parsing, I'm playing around and building new icons to be able to quickly toggle features in-engine:

(The extended toolbar in the viewport with various features on/off.)

As usual, I made my icons in Substance Designer because it's very easy to make quick shapes with it:

(July 31)

I continue to struggle with TrenchBroom processing, so I'm goofing around in the editor and got the following images out of it:

Quite moody, I like it.

I finally had a breakthrough however and got the mesh conversion process working:

(Basic cubes processed and imported, but not material assignment yet.)

August

(August 5-8)

A few days later I started to get into material processing, which led to handling missing materials in case a map reference stuff that doesn't exist yet on the engine side. So of course I had to make another Source engine reference wich a nice purple chercker shader:

On the image above I did a quick hack to force the missing shader everywhere. I also plugged some camera based lighting and the SSAO pass to read better objects.
In Sponza this helped quite a bit the reading of the geometry against the checker. The checker opacity also fade based on distance to reduce the Moiré effect.

I then moved on handling the removal of faces from the mesh processing:

Brush faces marked with the "nodraw" texture don't generate triangles during the map processing, making it possible to discard hidden faces. This gives some nice control over the brush to mesh convertion directly from TrenchBroom.

I also faced a few fun bugs, like this one where under some angle on small brushes the generation of the vertex position wasn't properly clipped:

Took me a few hours of reviewing the code processing the meshes but it ended up being an epsilon value being too high, so brushes smaller than that epsilon would end up in this state.

After that I looked at handling the brush coordinates in a better manner, so instead of generating absolute/world space vertex positions, I would offset them based on the brush bounding box center.

This made possible to apply local transformation on meshes in-engine afterward, like so:

(August 9)

I started to focus on map reloading. My goal is to be able to edit some brushes in TrenchBroom and have the result update in-engine, which would make possible to preview things in a better way (notably lights and shadows).

So the first step was to implement a way to switch scenes on the fly after the engine started (before I could only do it at startup):

I got a bit bored with that code, so I fired up the engine and loaded Sponza to make some beauty shots:

(August 10)

In order to support the reloading of maps, I needed to rework the way I manage resources. In particular meshes, once they are loaded in memory they would be re-used. So I added a way to force the reloading from disk, which helped update a map that gets regenerated after the engine started:

In this video I update a map in TrenchBroom then run my Python toolset to generate the meshes. Then in the engine I reselect the map/level to force a reload. This gets meshes updated as well.

With all of that in place, it wasn't really hard to add an automatic reload mode:

Here the engine monitor the current level loaded and trigger the Python toolset by itself when it detects a change (based on the timestamp). Once it is done, the level is reloaded.

Working on this feature was also a good opportunity to look into how a frame is generated and rendered in Love framework. I wanted to limit the update rate and framerate of the engine when it's not in focus, to avoid consuming too much resources.

I had to tweak a few things from the default behavior, but fortunately that was doable in Lua directly. So now the engine lock the framerate to 60 FPS when Vsync is off, or to 5 FPS when it is not in focus.

(August 14)

Another bunch of days passed and I got Open Dynamics Engine (ODE) working in Ombre to simulate physics. I built a little library/interface to be able to init the physics system and then simulate a basic sphere:

I went with ODE because it has a native C interface I could quickly hook in from Lua (via FFI). Other physics engine don't really have one by default or would require building one.

Getting started wasn't too hard, but it led to a lot of open questions regarding how I want to handle the physics data in the engine. I will have to figure out all that stuff.

I also didn't expect to integrate a physics engine that quickly. I always feared that compiling the dynamic library and then building the interface would take a while. Once again I was wrong.

(August 16)

It's not a true physics engine integration without a ball canon:

Current the level geometry itself doesn't exist in the physics world, so I'm using some hardcoded plane/walls as the default space.

Going "brrr" with the amount of simulated objects doesn't lead to great performance unfortunately:

(Simulating 1000 balls leads to 14 FPS while in general I should be at 144... not great.)

(August 17)

I continued to dig into ODE to figure out the performance issues.
From my understand (and the manual), it seems ODE expects/requires the user to build its own layer of optimization/grouping to discard evaluation of objects that will never touch. Automatic sleeping of inactive objects seems also slow to triggers with some shapes.

So, I'm clearly not happy with ODE. It requires too much work that I don't want to deal with.

(August 18)

So, this time I decided to ditch ODE in favor of Jolt Physics. I didn't go this way at first because it requires using second hand C wrapper to use this engine (which is C++ based).

Took me even less time to get started once I got the library compiled and so I tested again that 1000 balls test... which Jolt also struggled with. I guess that's not a good/realistic test then.

Well, actually...

Jolt is much better than ODE. It's just that I was building in debug. Silly me.
Its defaults are actually really great. I barely had to tweak anything to get it working and get good performance, and that's everything I was asking for.

Then I tried out cubes:

The fun part regarding the screen above ? The slowest part of the engine here is the processing of all the meshes (3000 cubes) for the shadow volume generation, before they actually get rendered. Turns out that tons of cubes isn't that slow to render.

There is a good opportunity to improve performance in the future. By batching better that processing for example, or caching the mesh of the shadow volume since right now it gets regnerated each frame.

I could also reduce the light radius of course.

While looking around for other stuff I also stumbled on a trick I had missed from Moving Frostbite to Physically Based Rendering (page 78):

float ComputeSpecOcclusion( float NdotV , float AO, float roughness )
{
    return Saturate(
        pow( NdotV + AO, exp2( -16.0 * roughness - 1.0 ) ) - 1.0 + AO
    );
}

That little function can be used to adjust the IBL reflections from the AO (in my case via the SSAO). It helps ground objects a lot better and reduce a bit light leaking from the cubemaps:

(Before vs After applying the function.)

Because my SSAO upscale is bad currently, it makes halos on edges a lot more noticeable. I hope I will be able to fix it some day.

(August 19)

While working with cubes yesterday I noticed that sometimes they would flip in very abrupt/unstable ways:

(You can see the weird flips when the cubes slow down.)

Turns out that converting Quaternions to Euler angles to feed that to the mesh being rendered afterward isn't such a great idea. I got myself a nice gimbal lock right here.

I found a way to quickly hack my transform update code to insert the Quaternion rotation instead. I will have to figure out a proper ways to manage multiple sources to define rotations.

(August 24)

I started working on a new scene, with the intent of finishing the work around cubemaps. I want to properly support blending several cubemaps in a same location.

I'm using simple brushes in TrenchBroom to define the parralax correction volume for the cubemap. Making it a separate brush like so makes it easy to share it across several cubemaps, with helps align/sync the reflections.

Working on the cubemap workflow again is also the occasion to think about local proxy reflections like those used in Remember Me (basically bilboards rendered in a mirror way).

The part that annoys me with that approach is the use of a separate buffer. In the GPU Pro book, which has more details, they mention they used a simple gaussian blur to approximate the roughness convolution at each mip level.

I wonder if I could render the proxies into a separate buffer, blur them, then render them over the cubemap. This way I could keep the proper convolution computation from the cubemaps and just add the proxies when there are some.

However that opens a can of worm about how to blend the proxies properly when they become very blurry however (since their shape change, colors should mix with the background, etc.).

(August 28)

I exposed a few more information from TrenchBroom and decided to separate the influence radius of a cubemap from the parralax correction bounds:

The reason for this is that I noticed in Unreal Engine 4 they are mixed together, the bounds serve both as influence weight and parallax correction, which can create blending issues or misalignment if you want to avoid fading too early:

In the image above the outer bounds overlap the walls, while the inner bounds match the floor/wall limit. This make the fading of the reflection happen within the walls, but above the floor. Because of this, the floor reflection is dimmed and slightly misaligned.

For Unreal this is not a big deal, since the goal is to use them as fallback from Screen Space Reflections (SSR). In my case however I need them to be a lot more accurate since they are my primary source.

I then tried to blend several cubemaps together based on their influence radius. I wanted to use the influence weighting form Lagarde at first, but I didn't figure it out and got shiny looking bugs instead, such as:

So I went with a simple additive blending for now (using the influence distance as a mask multiplied over the cubemap). It's not perfect but it's a good enough starting point.

Once I had alignment and blending working, I added the proper capture and loading of each cubemap in the scene (but requires an engine restart right now).

All of this started giving me nice looking results:

(Another example with more material roughness variations)

And to get in the mood:

(August 29)

I requested a little update of the Love framework to make it possible to easily reload cubemaps. Since they are stored in a Texture Array on the GPU (each slice being an octahedral 2D texture), I needed a way to replace a specific slice on the fly without having to re-upload everything.

Once that reload was working, I tried out some ways to improve the quality. My octahedral textures are only 512x512 pixels, which can produce low quality details and noticeable distortion. So one idea I had was to apply some blue noise in the cubemap sampling to hide artifacts:

I'm not 100% sold on this idea for now. Making the noise move actually make the artfiacts visible again (because of visual persistency) so a strong noise is needed to compensate, which modifies/blurs too much the original reflection details.

So another idea I got was changing the anisotropy filtering, which so far I never really played with. It made a slight difference without any real impact on the rendering time. So I decied to have it on all the time for cubemaps (at x4 level).

To finish, here are some additional beauty shots:

(August 30)

Today I decided to rework how I do debug display of cubemaps in the editor, as I wanted a better way to preview the roughness/glossiness range. So I went with a simple checker:

I also changed a bit how I render some of the editor visuals because until now I didn't had any kind of sorting. With proper depth testing it behaves a lot better:

Since I was working on the editor rendering, I made the icons scale based on the camera distance to avoid them from filling the screen when too close:

(August 31)

Another detail I wanted to fix for good was the octahedral seam issue when switching mip levels. I didn't have yet a one pixel padding margin, this is now the case:

The 1 pixel margin works fine overall, until the last mip level for some reason. I ended up adding a simple hack to clamp/scale the UV coordinates when on the last mip to compensate. It's not perfect but heh, good enough.

The looping video just above shows however that my way of doing importance sampling isn't perfect as spots of bright lights are still quite noticeable. I'm already throwing quite a lot of samples at it, so I will have to figure something out about this another time.

Anyway once the border worked, results got pretty good with no seams noticeable:

Setember 2024

(September 1)

Switching gears and looking again at my debug view for my shadow volumes.

I wanted to improve a bit the debug information shown and also make it a bit smaller to fit better with the editor UI.

At first I wanted to draw the light index in the shader, but I couldn't find something that worked well. So I went with regular text printing. Not optimal but enough for simple debug purpose.

(September 2-4)

And back on cubemaps !
I wanted to cleanup and rework the blending that I left over. It didn't go well on that new try, but it sure made great visual bugs:

Since I was struggling with that bit of code, I went back to the Lagarde article and even fired up RenderMonkey (in wine) to try out the sample file:

Now I need to sit down and digest it properly.

(September 6-7)

On the Graphic Programming discord server William VII kindly shared their own shadertoy implementation, which helped me clarify a few things.

I decided to try it out:

It didn't work on the first try, but it got me close. I got weird noise because I was reading memory where I shouldn't... Oops.

But a few tweaks later, it got me this:

From there I made a quick video to showcase it in movement:

In bonus here are some close-up shots of metallic surfaces:

(September 13)

With cubemap blending now behind me I decided to look into adding a character in my scene.
My goal is to be able to walk and look around. This means I need to expose more elements from the physics system (aka Jolt) and handle game inputs properly for this.

Getting a character to walk wasn't too hard since Jolt provide everything needed for that already. It's not fully handled automatically by the engine like basic objects however. Fortunately there are very well made samples to look into.

Working on characters is also a good opportunity to define scales and the size of my player/character. For this I added some debug drawing to match better the scene:

Next I made an editor/game toggle to switch between different camera modes:

Finally walking around ! :)

(September 15)

I hit a snag.
My next idea was to implement trigger volumes, with the objective of using one to start the opening of a door. Simple on paper, not so much in term of implementation.

See, Jolt event callback functions that fire up when objects collide/touch each other can be made from other threads than the main one. LuaJit doesn't like that at all and will crash with a nice "panic" message:

One solution is to wrap those events and find a way to store them into an array, which I could access later.

But I have no idea how to do that.

(Me poking the panic error to make it go away)

(September 21)

That took a while.
I had to learn and figure out how to wrap these functions. I was able to write a little wrapper static library to solve my problem.

It works only on Linux for now, but shouldn't be too hard to make it available for Windows as well.

(September 26)

I cleaned up my wrapper a bit and used it to track objects that were entering trigger volumes:

614     Trigger add:    cdata<const struct JPH_Body *>: 0x561db24ff1c0
639     Trigger remove: cdata<struct JPH_Body *>: 0x561db24ff1c0

(September 30)

I cross compiled my wrapper to Windows from Linux with the help of MinGW and tried it out, I didn't get any issues. Very happy to not have to deal with any kind of Windows toolchain/compiler for this.

I also took the opportunity to post my wrapper on Github for those who are interrested about it.

October 2024

(October 1)

New month means new philosophical questions.
Now that I'm starting to handle more and more physics states, I'm starting to rethink my engine architecture and the related data.

I tried making up a little diagram in Miro to visualize the dependencies:

Especially regarding the simulated objects update loop. It's not clear to me yet who should own the state of things and how they are linked together. So far all physics based objects are in a separate list owned by the Physics system, so I have to sync them manually afterward.

I never had to handle object IDs, but if I wanna keep these as separate list I would need to keep them in sync. It makes the update loop a bit tricky. I could move the data into the mesh object itself, outside of the physics system/module, but it's more stuff I have to track/cleanup afterward.

Lots of questions as well regarding transform update coming from the editor side.

I need to think more about all of this.

(October 5)

More thinking about how to handle the physics data. I updated my diagram:

I'm also thinking of not bothering with building convex volumes for the world collisions and instead feed the indexed triangle data directly. I'm just not sure how reliable this will be for Jolt compared to a convex/closed volume.

(October 6)

Another day, another diagram. They are suprisingly very effective to help me think and build a hierarchy of steps to see how elements depend on each other.

In this case I have been wondering at which point in the mesh creation process should I handle collision creation.

The way it works (or that I'm thinking of) is to only have to call MESH.NewMesh( Path, IsSimulated ) on the game code side. The engine handle the rest for you.

Collisions are then either retrieved from the mesh file (GetChunkFromConfig() loads a json file with a few info, collision included) or generated on the fly from the mesh data itself (triangles).

This will likely evolve in the future, notably for the physics based objects that will want to use constraints/hinges, etc. I'm not there yet however.

(October 8)

Yesterday I noticed a little issue with static objects. When they are moved around they don't wake up dynamic object. That's kinda expected, but when editing and testing stuff from the editor this is a bit annoying.

So I ended up hooking one of the raycast function from Jolt to gather the object around the one I'm moving and waking them up.

That make editor interactions so much better. :)

(October 14)

Not much to say today, I'm mainly refactoring the physics code. It was starting to be a mess after all my experimentations, and now I have a good idea of how I want to manage all of this.

(October 19)

I'm back to the design board, this time to think about how to handle entities, events and how to update them. The context here is still how to trigger a door and update it.

The solution seems to be Lua coroutines, mostly to handle some kind of states.

Quick explanation of the colors:

Green is for modules/singletons
Light green for functions in those modules
Dark Red for objects in the level
Light red for functions in those object classes

(October 21)

It's alive !

I finally have a simple door entity, which drives a specific mesh automatically. Currently I use a dedicated keyboard shortcut to open/close the door (And while the door is moving the shortcut doesn't do anything). Next step will be to handle that with a trigger volume.

As expected, this was possible with the help of coroutines.

I also freaking love how simple it is to setup:

    DoorMesh = MESH.NewMesh( "engine/mesh/base/cube.chunk", "kinematic" )
    DoorMesh:SetScale( 1.0, 1.0, 0.125 )
    SceneWorld:AddMeshStatic( DoorMesh )

    SceneDoor = DOOR.New()
    SceneDoor:SetPosition( 0.0, 1.0, 3.25 )
    SceneDoor:SetRotation( 0.0, 0.0, 0.0 )
    SceneDoor:LinkMesh( DoorMesh )

[...]

    if INPUT.IsInputPressed( "a" ) then
        SceneDoor:Activate()
    end

The entity manager automatically handles the update of the door while its coroutine is still alive.

I noticed however the physics on the door wasn't properly working, so I did a quick fix:

(October 22)

The door physics broke. Again. Somehow.

(October 24)

Factorio's DLC came out.

(October 25)

Finally !
A door that opens when you walk to it thanks to a trigger volume, and closes when you leave the volume.

Ho yeah, and physics work as expected of course:

You may notice a slight shadow bug in the videos above, which can be easily explained as follow:

On this little drawing of mine, you can see a point light (red dot and circle) and a wall (black line in the center). The light cast shadows on the wall, which generates long triangles that go toward the light bounds.

Because the wall is a simple quad, it lacks vertices in its middle that would make it fit properly the light bounds when projected. This create a hole where there is no shadows.

The reason why it works like this is because I avoid projecting the shadow volume silhouette triangles too far to reduce as much as possible the amount of pixels to raster. But on simple meshes like here it create issues.

There are various solutions I could try, such as:

Subdividing the mesh in my modeling program/level editor (a bit tedious to manage)
Subdividing the backface of the mesh to better fit the light volume (at the expense of more processing time in the compute pass)
Projecting the mesh silhouette way outside the light bounds (at the cost of pixel raster)

Each solution has its pros and cons.

So far I decided to ignore it, I prefer to not try to solve an issue that may never appear in practice in more real scenarios and/or better crafted test levels. Wait and see.

(October 26)

I'm exposing a few more entities in TrenchBroom to drive more effects. I'm currently adding the fog settings. Which means I need fluffy icons. :D

(October 28)

This morning I wanted to verify if the fog settings and a few other light types were properly loading, so I edited my test level:

While testing that level I stumbled on another shadow bug, this time with the directional light. For some reason when I'm under an object the shadow flips (the bright area becomes dark).

I started digging into the code, but not being able to see the shadow volume mesh made things difficult. Since the shadow refactor I had lost the ability to view the wireframe of the mesh. I decided to bring it back.

I ended-up having to copy the mesh data into a secondary buffer to be able to draw it via the editor debug rendering. Once that was done, I saw this:

Something is clearly wrong.

If this is not clear to you, notice how the top of the meshes (the cube and the floating platform) don't have polygons at their top, only at their bottom. The floor doesn't even have geometry/shadow as well !

I quickly figured out the issue. For some reason when I developped the directional light, I used the direction vector improperly and I ended-up flipping the projection distance to compensate.

So what happens is that the faces looking away from the light get considered as the facing triangles and the silhouette is built from them... and then moved away from the light.

Then if you end-up in the shadow this creates the wrong values in the shadow volume counter and flips the shadow. The fix was pretty easy and everything got back to normal.

I could even explain this bit from my code:

    // No sure why we need the negative flip here
    // but it fixes the object shadows in "test_lights"
    Distance = -1.0 * Distance

Past me has no clue. Present me now knows.

(October 29)

A few days ago I noticed some performance issues with my player/character in the scene. With the character code not even running Jolt would take 0.8ms to update it. That worried me quite a bit.

I started to dig around, even tried adding a few more characters to see how it would scale up:

It was 2am so I decided to go to sleep without a solution.

The same day after waking up I realized I got too tired to notice that I was using the Debug version of the Jolt library. Switching to the Release version made performance a lot better (went down to 0.15ms).

As a result, I made it clear in the UI which version I'm loading to avoid this kind of suprise in the future.

(October 30)

What a better way to finish the month than to notice that my cubemap blending solution isn't properly working ?

(Me trying to look away from my problems.)

I noticed that when several cubemaps overlap it creates a very visible line/seams at the edge of the influence radius. Below is a screenshot where I boosted the contrast to make the issue obvious:

(See the two sharp lines here.)

I switched to simple colors to debug and it became even more obvious:

At first I thought my code was the issue, but I went back to the shadertoy I used as a basis and it displayed a very similar problem:

(Notice the sharp lines in the transition of where the three circles overlap)

So is the shadertoy code here wrong ? To be sure I went back to Lagarde demo files in RenderMonkey (yeah, for serious this time) and the issue is present there as well:

So for now my guess is that the method as flaws. I will have to think about another way to handle the blending. I can also live with it, because this is the kind of artifacts that nobody will notice in practice with roughness and normal varaitions in the materials.

Conclusion

That's it !
I'm stopping here for this time. Thank you for taking the time to read all of this (if you did), I hope it was entertaining.

See you next year !

(Slurp!)

If you want to follow the day to day progress of my work, you can check out:

My threads on Mastodon: thread 1 & thread 2

My thread on Bluesky

My dev-blog on the Graphic Programming discord server

The wip channel on my own discord server