Tuesday, April 11, 2017

Adaptive Offscreen Particles

Particle System is known of its sometimes huge overdraw and fill rate which lead to performance issues. Sometimes it can be critical when camera inside an explosion. Then lots of particles cause huge overdraw and dramatic drop of fps and pleasure of gaming. Particularly the problem can be solved through using lower fixed resolution offscreen render target, which was perfectly described in GPU Gems 3 here.

Conventional offscreen particles solutions still can't solve continuously growing of overdrawn pixels and have constant quality due to fixed dimension of the render target. Therefore that solution didn’t suit me.

I wanted to have Particle System which would be capable to predict the expected overdraw amount on the same current frame and scale itself quality down if it is necessary according to a budget of the system. I’ve found an elegant solution of problems mentioned above which I called Adaptive Offscreen Particles. The technique works on GPU only. There are no any readback to CPU required. The key of this technique is overdraw prediction.

Overdraw prediction

How to know the particles overdraw on the current frame? Just render them. It requires special render stage when all particles render to small render target N*N (N is 64 for example) with additive blending and writing the weight, which represents relative execution complexity. By default the written weight can be 1.0 for all shaders. Depth buffer on this stage also can be used in order to avoid accounting of invisible pixels. It gives rough but acceptable result. At the end we calculate the sum of all written weights which also can be treated as a number of overdrawn pixels. Calculation of the overdrawn pixels can be implemented in several ways. One of them in the compute shader by just summation of weights into global shared variable. Another one is downsample few times with summation of four neighbors till 1x1 result is got.

Figure 1. Accumulated by particles in 64x64 render target. Intensity was reduced for clarity.


Figure 2. Most left - source 64x64. Next - downsampled with summation. Most right - the final sum in 1x1 render target.

For convenience we are going to operate by a number of redrawn screens which is a very close approximation of actual number of redrawn full screens on the color pass. For instance usual explosion in a game can have up to hundreds of redrawn screens.
number of redrawn screens = overdrawn pixels / N * N

Particles rendering

Prediction calculation is finished. Now particles should be rendered in color render target with doing alpha blending. Keeping the color render target and the scene depth buffer in fixed full-screen size. Instead of changing their dimension like it is done in other approaches, only the virtual rect is going to be changed. This process is being done for the depth buffer (Figure 4) in order to particles have correct depth testing and for the particles which move into left top corner of the screen (Figure 5). To make such kind of shifting we have to get scale factor which will keep particles performance in scope of set budget.

Calculating rect scaling factor

// budget - Budget of the particle system. Screens allowed to be overdrawn without scaling.
// redrawn - Runtime calculated number of redrawn screens.
// min_scale - 0.25 usually. 16 times smaller rect
// floor helps to avoid frequent resolution changing
scale = clamp( 1.0 / sqrt( floor( redrawn ) / budget ), min_scale, 1.0 );


Figure 3. Graph of scale factor function for the four redrawn screens budget.

Depth buffer runtime scaling pixel shader code

// read N depths around, make the max of them
if ( any( screen_uv > scale ) ) discard;
float depth0 = read_depth( screen_uv / scale + offset0 );
...
float depthN = read_depth( screen_uv / scale + offsetN );
return max( depth0...depthN );


Figure 4. Scaling the scene depth buffer into particles depth buffer according to scale factor. Left - original scene depth buffer. Right - particles depth buffer with copied depth information into actual rect.


The positioning of the particles


// p - final output projection coordinates of particle’s vertex
// scale = 0 is full screen resolution
// scale = 0.5 is half screen resolution
// scale = 0.25 is quarter screen resolution

p.x = p.x * scale - ( 1.0 - scale ) * p.w;
p.y = p.y * scale + ( 1.0 - scale ) * p.w;


Figure 5. Scaling a particle’s quad into left top corner by the vertex shader.


Blending state for particles shader

Blend Operation = Additive
Source Blend Color = One
Destination Blend Color = Inverted Source Alpha
Blend Operation Alpha = Additive
Source Blend Alpha = Inverted Destination Alpha
Destination Blend Alpha = One


Bilateral Upsampling


Final stage is bilateral upsampling of the particles color buffer with special blending.

Figure 6. Left - buffer of accumulated particles in scope of the actual rect. Right - upsampled result blended with the scene color.


Figure 7. Comparison of upsampling without and with bilateral filter. Left - particles rendered with scale factor 0.25. Right - bilateral upsampling result.


Bilateral filter shader

float fine_scene_depth = read_scene_depth( screen_uv );
float4 result = 0.0f;

if ( scale == 1.0f )
 return read_particles_color( screen_uv );

int radius = scale >= 0.5f ? 1 : 2;

for ( int x = -radius; x <= radius; ++x )
{
 for ( int y = -radius; y <= radius; ++y )
 {
  float2 uv = screen_uv * scale + float2( x, y ) * pixel_size;

  float coarse_depth = read_particles_depth( uv );
  float4 particles_color = read_particles_color( uv );

  float weight = 1.0f / ( depth_epsilon + abs( fine_scene_depth - coarse_depth ) );

  result += particles_color * weight;

  weights += weight;
 }
}

return result / weights;

Blending state upscaling shader

Blend Operation = Additive
Source Blend Color = One
Destination Blend Color = Inverted Source Alpha
Blend Operation Alpha = Reverse Subtract
Source Blend Alpha = OneDestination Blend Alpha = Additive


Performance

Here is performance comparison result of typical explosion in a game. Shaders support lighting and shadowing. 300 particles drawn, 50-60 screens redrawn. 1920x1080, NVidia GTX 680.




Props
  • Budget for the particle system
  • Scalable quality
Cons
  • Overhead for the scaling, but most of times covered by its profit

Links