Tuesday, July 18, 2017

Adaptive Mixed-Resolution Particles

In the previous post I introduced to you the first version of adaptive offscreen particles. I also mentioned there a number of drawbacks. In the new version of AOP, which I want to talk about further, I tried to get rid of these disadvantages.


In GPU Gems's Offscreen particles article was talked about mixed-resolution rendering. This approach is well suited to effects which have low resolution textures, such as smoke. But, what if we want to keep the details, such as sparks, fire, stones, and at the same time get a profit from mixed-resolution rendering? The answer is color contrast filter with overdraw prediction.


The same approach is used as described in the post below. The result can be used to disable optimization in the case when the overhead is greater than the gain from its usage.

I used green color for buildings to hide content details.

Lower resolution particles rendering

Offscreen particles can be rendered in 1/4, 1/16, or smaller screen size.
The scene color buffer and the offscreen buffers should have the same bitness in order to insure the similar result after alpha blending. I would recommend A16B16G16R16F format.

Figure 1. Offscreen particles accumulated in a separate render target.

Depth preparing

For correct offscreen particle rendering we have to downscale the original depth buffer getting maximum depths.

Figure 2. Depth downsampling stages.

Full resolution detail particles rendering and edges fixing

Of course detail of offscreen particles on some places doesn't suite as. Therefore these places should be found with the color contrast filter and the depth detection filter and finally replaces by fullscreen particles.

Building the stencil mask

In performance reason the stencil mask should be used to separate the offscreen final apply and the fullscreen particles pass.

Color contrast filter

According to its name this filter searches the contrast in color. If the contrast high enough the detail is required. This filter is pretty cool thing because it depends on information on the screen. It means if camera is closely to particles less details will be found which leads to better performance.

Figure 3. Color contrast filter working result.

A - max component value of center color
B - max component value of neighbors colors
C - contrast, 15 by default
should_be_detailed = abs( B - A ) > A / C;

Edge detection filter

In addition this filter should be used to cover edges. My filter based on depth discontinues.

Figure 4. Edge detection filter working result.

Figure 5. Stencil mask for combining.


Based on different values from stencil mask, the offscreen particles are blended and the detailing particles are rendered into the scene buffer. How to apply offscreen particles into scene color buffer I wrote in the previous article. (I used green color for buildings to hide content details).

Figure 6. Applying low resolution offscreen particles into scene color buffer according to stencil mask.

Figure 7. Rendering detail particles into scene color buffer according to stencil mask.

 Figure 8. Final result.

As a result we saved details and got great performance boost! Particles with optimization rendered 2-3 times faster than without it in this particular example.

(article is not finished and will be updated)

Tuesday, April 11, 2017

Adaptive Offscreen Particles

Particle System is known of its sometimes huge overdraw and fill rate which lead to performance issues. Sometimes it can be critical when camera inside an explosion. Then lots of particles cause huge overdraw and dramatic drop of fps and pleasure of gaming. Particularly the problem can be solved through using lower fixed resolution offscreen render target, which was perfectly described in GPU Gems 3 here.

Conventional offscreen particles solutions still can't solve continuously growing of overdrawn pixels and have constant quality due to fixed dimension of the render target. Therefore that solution didn’t suit me.

I wanted to have Particle System which would be capable to predict the expected overdraw amount on the same current frame and scale itself quality down if it is necessary according to a budget of the system. I’ve found an elegant solution of problems mentioned above which I called Adaptive Offscreen Particles. The technique works on GPU only. There are no any readback to CPU required. The key of this technique is overdraw prediction.

Overdraw prediction

How to know the particles overdraw on the current frame? Just render them. It requires special render stage when all particles render to small render target N*N (N is 64 for example) with additive blending and writing the weight, which represents relative execution complexity. By default the written weight can be 1.0 for all shaders. Depth buffer on this stage also can be used in order to avoid accounting of invisible pixels. It gives rough but acceptable result. At the end we calculate the sum of all written weights which also can be treated as a number of overdrawn pixels. Calculation of the overdrawn pixels can be implemented in several ways. One of them in the compute shader by just summation of weights into global shared variable. Another one is downsample few times with summation of four neighbors till 1x1 result is got.

Figure 1. Accumulated by particles in 64x64 render target. Intensity was reduced for clarity.

Figure 2. Most left - source 64x64. Next - downsampled with summation. Most right - the final sum in 1x1 render target.

For convenience we are going to operate by a number of redrawn screens which is a very close approximation of actual number of redrawn full screens on the color pass. For instance usual explosion in a game can have up to hundreds of redrawn screens.
number of redrawn screens = overdrawn pixels / N * N

Particles rendering

Prediction calculation is finished. Now particles should be rendered in color render target with doing alpha blending. Keeping the color render target and the scene depth buffer in fixed full-screen size. Instead of changing their dimension like it is done in other approaches, only the virtual rect is going to be changed. This process is being done for the depth buffer (Figure 4) in order to particles have correct depth testing and for the particles which move into left top corner of the screen (Figure 5). To make such kind of shifting we have to get scale factor which will keep particles performance in scope of set budget.

Calculating rect scaling factor

// budget - Budget of the particle system. Screens allowed to be overdrawn without scaling.
// redrawn - Runtime calculated number of redrawn screens.
// min_scale - 0.25 usually. 16 times smaller rect
// floor helps to avoid frequent resolution changing
scale = clamp( 1.0 / sqrt( floor( redrawn ) / budget ), min_scale, 1.0 );

Figure 3. Graph of scale factor function for the four redrawn screens budget.

Depth buffer runtime scaling pixel shader code

// read N depths around, make the max of them
if ( any( screen_uv > scale ) ) discard;
float depth0 = read_depth( screen_uv / scale + offset0 );
float depthN = read_depth( screen_uv / scale + offsetN );
return max( depth0...depthN );

Figure 4. Scaling the scene depth buffer into particles depth buffer according to scale factor. Left - original scene depth buffer. Right - particles depth buffer with copied depth information into actual rect.

The positioning of the particles

// p - final output projection coordinates of particle’s vertex
// scale = 0 is full screen resolution
// scale = 0.5 is half screen resolution
// scale = 0.25 is quarter screen resolution

p.x = p.x * scale - ( 1.0 - scale ) * p.w;
p.y = p.y * scale + ( 1.0 - scale ) * p.w;

Figure 5. Scaling a particle’s quad into left top corner by the vertex shader.

Blending state for particles shader

Blend Operation = Additive
Source Blend Color = One
Destination Blend Color = Inverted Source Alpha
Blend Operation Alpha = Additive
Source Blend Alpha = Inverted Destination Alpha
Destination Blend Alpha = One

Bilateral Upsampling

Final stage is bilateral upsampling of the particles color buffer with special blending.

Figure 6. Left - buffer of accumulated particles in scope of the actual rect. Right - upsampled result blended with the scene color.

Figure 7. Comparison of upsampling without and with bilateral filter. Left - particles rendered with scale factor 0.25. Right - bilateral upsampling result.

Bilateral filter shader

float fine_scene_depth = read_scene_depth( screen_uv );
float4 result = 0.0f;

if ( scale == 1.0f )
 return read_particles_color( screen_uv );

int radius = scale >= 0.5f ? 1 : 2;

for ( int x = -radius; x <= radius; ++x )
 for ( int y = -radius; y <= radius; ++y )
  float2 uv = screen_uv * scale + float2( x, y ) * pixel_size;

  float coarse_depth = read_particles_depth( uv );
  float4 particles_color = read_particles_color( uv );

  float weight = 1.0f / ( depth_epsilon + abs( fine_scene_depth - coarse_depth ) );

  result += particles_color * weight;

  weights += weight;

return result / weights;

Blending state upscaling shader

Blend Operation = Additive
Source Blend Color = One
Destination Blend Color = Inverted Source Alpha
Blend Operation Alpha = Reverse Subtract
Source Blend Alpha = OneDestination Blend Alpha = Additive


Here is performance comparison result of typical explosion in a game. Shaders support lighting and shadowing. 300 particles drawn, 50-60 screens redrawn. 1920x1080, NVidia GTX 680.

  • Budget for the particle system
  • Scalable quality
  • Overhead for the scaling, but most of times covered by its profit