Hello! I'm curious if anyone here has a good idea about interleaving works between a compute shader and a fragment shader.
Some relevant details:
- My app is built with Rust and wgpu, and I'm running on an M1 Macbook Pro.
- I have a single encoder with a compute pipeline and a render pipeline.
- The compute shader writes to a storage buffer defined like this:
@group(0) @binding(2) var<storage, read_write> output: array<vec4<f32>>;
- The fragment shader reads from the same buffer. Basically, each fragment is just one element of the
vec4<f32>
. The fragment shader is very simple, and doesn't touch anything else in the storage buffer.
I've added timestamp queries to the pipeline, and what I'm seeing is this:
Duration #1: 47.800208ms
Duration #2: 47.809876ms
Frame time: 51.2545ms
Duration #1
is computed from the compute shader timestamps (the duration between the beginning and end of the compute pass) and Duration #2
is the time for the render pass, computed the same way.
Frame time
is measured on the CPU.
I expected the duration of the compute shader and fragment shader to add up to the frame time (approximately). But it doesn't and I'm confused about why! Could it be due to interleaving of the compute pass and render pass? If so, I'm curious how the synchronization works. How does the GPU figure out the dependencies between the write (a compute shader invocation) and the reader (fragment shader invocation)?
I don't have any explicit synchronization, but I'm also not seeing any tearing or anything that would indicate that there is a data race between the shaders.
those durations are suspiciously close for being times of completely different processes! plus, is it possible that the compute for frame N was run in frame N-1?
I agree. My colleague’s theory was that they are interleaved, and that they are both running at the same time but that they are somehow synchronized by the runtime. I’m skeptical that we’d get that for free.
I also wondered whether they are somehow running in parallel but the computer shader is one frame ahead, but I don’t see how that’s possible in the code. Both pipelines are added to the same encoder and so definitely submitted together.
I asked on another forum, and the conclusion was that the timestamps may not be reliable. There are a bunch of wgpu issues related to timestamps on Metal and TBDR architectures