v3dv: Switch to the common submit framework

I decided to go ahead and attempt this myself for my own education. I'm not going to claim I know what I'm doing but it's a good excuse to crawl around in v3dv. Still pretty WIP. This MR is mostly so I can run CI.

Edited 4 months ago by Jason Ekstrand

Detached merge request pipeline #558634 passed for 93fbaae7 3 months ago

Approval is optional

Test summary results are being parsed

View full report

Merged by 3 months ago (Apr 13, 2022 5:57pm UTC) 3 months ago

Merge details

Pipeline #558659 waiting for manual action for 93fbaae7 on main

I'm not going to claim I know what I'm doing

Pretty sure that at least you know what you are doing with the common framework.

Thanks for it. I will start to take a look. If I find it better that my own wip code, I will switch and use it as reference.
Resolved by Jason Ekstrand 4 months ago
Resolved by Jason Ekstrand 4 months ago

"vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup"

LGTM, but don't have too much experience there yet: Acked-by: Alejandro Piñeiro <[email protected]>

Last reply by Jason Ekstrand 4 months ago
"v3dv: Stop directly setting vk_device::alloc"

Reviewed-by: Alejandro Piñeiro <[email protected]>
Resolved by Jason Ekstrand 4 months ago
Resolved by Jason Ekstrand 4 months ago
Resolved by Jason Ekstrand 4 months ago

Looks promising. Just had time for a skim, addded some questions and skims.

The bad thing is that I already found some regressions, just running the smoke-test subset through the simulator, tomorrow will try to confirm them on the device.

Edited by Alejandro Piñeiro 4 months ago

Last reply by Jason Ekstrand 4 months ago
Resolved by Jason Ekstrand 4 months ago

@jekstrand, @apinheiro: FYI, the Rpi4s we use in CI do not have a kernel with multisync support (they are running a kernel currently available for Raspberry Pi OS, which didn't pick up this feature yet).

In order to ensure this works with multisync too you can add this change dac47c40 to the MR, it will make the RPI4s in the CI download a multisync-enabled kernel and then the driver will detect the feature is available and automatically switch to that.

Last reply by Iago Toral 4 months ago
For the runtime change : Reviewed-by: Lionel Landwerlin <[email protected]>
a8573f14 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup

6d9e4a00 - v3dv: Stop directly setting vk_device::alloc

c892bfc8 - v3dv: Switch to the common submit framework

Compare with previous version
397cdf4a - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup

63786dd7 - v3dv: Stop directly setting vk_device::alloc

736fcb9c - v3dv: Switch to the common submit framework

Compare with previous version
Resolved by Iago Toral 3 months ago

I've got most stuff passing pretty well. Still need to debug some of the external cross-device fails. That said, there are a few tests that we cannot pass with the current design:

dEQP-VK.api.external.fence.sync_fd.signal_import_temporary

dEQP-VK.api.external.fence.sync_fd.dup_temporary

dEQP-VK.api.external.semaphore.sync_fd.signal_import_temporary

dEQP-VK.api.external.semaphore.sync_fd.dup2_temporary

Why not? Because they do the following:

Create a command buffer with a vkCmdWaitEvents()

Submit the command buffer

vkGetSemaphoreFdKHR() to try to get a sync_file

vkSetEvent()

The problem is that vkGetSemaphoreFdKHR() must wait for the fence to materialize in the syncobj before it can return a sync_file. This means waiting for the entire submit to flushe through which won't happen until vkSetEvent() has been called. It's a deadlock and no amount of additional threading will fix this. It would have been a problem with the old thread pool design as well; it just shows up now because sync_file support is suddenly advertised.

Sadly, it seems that this is pretty explicitly allowed by the spec:

VUID-vkCmdWaitEvents-pEvents-01163

If pEvents includes one or more events that will be signaled by vkSetEvent after commandBuffer has been submitted to a queue, then vkCmdWaitEvents must not be called inside a render pass instance

We could potentially get this disallowed for sync_file because it's a real cheap-shot there. But given that the "execute some stuff on the CPU" model isn't really how Vulkan is intended to work, we may get some push-back on that. The only other option is to get rid of all the CPU jobs. Or we can disable sync_file support. 🙃

Edited by Jason Ekstrand 4 months ago

Last reply by Iago Toral 4 months ago
8711f92b - v3dv: Put indirect compute CSD jobs in the job list

cbf2dd7f - v3dv: Switch to the common submit framework

Compare with previous version
ac664ccb - v3dv: Put indirect compute CSD jobs in the job list

54f8f368 - v3dv: Switch to the common submit framework

Compare with previous version
I'm now down to just vkGetQueryPoolResults() with VK_QUERY_RESULT_WAIT_BIT. I think it's going to need a bit of reworking because it depends on v3dv_bo_wait() which doesn't really work when you have a submit thread which may delay things. We could vkDeviceWaitIdle() but that seems like a bit of a heavy hammer. Honestly, I'm not really sure how it's working now. 😕
Resolved by Iago Toral 3 months ago

Honestly, I'm not really sure how it's working now. 😕

For the same doubts you mentioned on IRC? New ones?

Last reply by Iago Toral 4 months ago
Reviewed-by: Iago Toral Quiroga <[email protected]>
Iago Toral @itoral started a thread on an outdated change in commit cd2687db 4 months ago

Resolved by Jason Ekstrand 4 months ago
Iago Toral @itoral started a thread on an outdated change in commit cd2687db 4 months ago

Resolved by Jason Ekstrand 4 months ago
cc470188...28ae397b - 48 commits from branch mesa:main

568fddd2 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup

6f1ea603 - v3dv: Stop directly setting vk_device::alloc

5f9de40c - v3dv: Put indirect compute CSD jobs in the job list

72e96249 - v3dv: Switch to the common submit framework

Compare with previous version
Toggle commit list
2f942136 - v3dv: Don't use pthread functions on c11 mutexes

bd09b35d - v3dv: Destroy the mutex on the teardown path

e7f6a7fb - v3dv: Switch to the common submit framework

Compare with previous version
"v3dv: Don't use pthread functions on c11 mutexes"

Yes, this messing up on having one kind of mutex, but using other library was something I noticed too last week. Didn't realize that we also had that issue on the pipeline cache. Thanks for the cleanup.

Reviewed-by: Alejandro Piñeiro <[email protected]>
"v3dv: Destroy the mutex on the teardown path"

That's ok, but as far as I see this change is gone with "v3dv: Switch to the common submit framework"

Reviewed-by: Alejandro Piñeiro <[email protected]>
b1e8aec8 - v3dv: Destroy the device mutex on the teardown path

574eee40 - v3dv: Switch to the common device lost tracking

d93abe70 - v3dv: Use util/os_time helpers

4ea45541 - v3dv: Add a condition variable for queries

Compare with previous version
Toggle commit list
0832c652 - v3dv: Don't use pthread functions on c11 mutexes

9e12596c - v3dv: Destroy the device mutex on the teardown path

7bfd5a02 - v3dv: Switch to the common device lost tracking

dfd3fa80 - v3dv: Use util/os_time helpers

38610c49 - v3dv: Add a condition variable for queries

37c88d52 - v3dv: Switch to the common submit framework

Compare with previous version
Toggle commit list
bd69f323 - v3dv: Add a condition variable for queries

252e6b8e - v3dv: Switch to the common submit framework

Compare with previous version
"v3dv: Switch to the common device lost tracking"

Minor typo on the commit message: s/requres/requires

Reviewed-by: Alejandro Piñeiro <[email protected]>
"v3dv: Use util/os_time helpers"

Reviewed-by: Alejandro Piñeiro <[email protected]>
CI is green. I think we're ready for review for real.
"v3dv: Add a condition variable for queries"

Reviewed-by: Alejandro Piñeiro <[email protected]>
Resolved by Alejandro Piñeiro 4 months ago

CI is green.

Ditto for the custom smoke-testing cts subset that we use internally, that was manually tailored (CI subset is somewhat random), and has slightly more tests that the CI subset.

Just in case, I will start a full CTS run.

I think we're ready for review for real.

I started with the smaller patches. Tomorrow will continue with the bigger one (the real switch).

Again thanks for all this work. Really appreciated.

Last reply by Alejandro Piñeiro 4 months ago
CI is green.

Just realized something. We want CI to be green with both multisync and non-multisync environments. I guess that you tested including Iago's patch. Did you test too without it?

Most of my testing has been without multisync. I just did one non-multisync run.

As mentioned we found some fails with our smoke testing subset. Will try to confirm and add the list here.

Please register or sign in to reply

b540a8c0...5b4e41e4 - 24 commits from branch mesa:main
cb1f3dfb - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
165756e9 - v3dv: Stop directly setting vk_device::alloc
e140dbac - v3dv: Put indirect compute CSD jobs in the job list
96ddfe99 - v3dv: Don't use pthread functions on c11 mutexes
6e34fa67 - v3dv: Destroy the device mutex on the teardown path
1c96cc72 - v3dv: Switch to the common device lost tracking
88cd8e86 - v3dv: Use util/os_time helpers
9461736a - v3dv: Add a condition variable for queries
8b5d7f82 - v3dv: Switch to the common submit framework
0dcb6f94 - v3dv: Use the core version feature helpers
326c6c1c - v3dv: Use the core version property helpers
6de74b41 - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list

fa5a067f - v3dv: Use the core version feature helpers
00265eff - v3dv: Use the core version property helpers
6d4eddd8 - v3dv: Add emulated timeline semaphore support

Compare with previous version

ea540ddf - v3dv: Use the core version property helpers
104dcf5d - v3dv: Add emulated timeline semaphore support

Compare with previous version

8f4ef0d3 - v3dv: Use the core version feature helpers
9850d182 - v3dv: Use the core version property helpers
56667df4 - v3dv: Add emulated timeline semaphore support

Compare with previous version

Now, with timeline semaphores! I should stop typing patches and let you review. 😂
If you want me to pull the top few that do timeline semaphores into a separate MR, I can do that.
Well, I think that it is ok to keep them on this MR. But I think that now that we are on the situation of reviewing, testing, and finding some regressions, probably it is better to stop to add features and focus on stabilize the code.

Really appreciated though that you added timeline semaphores though! Thanks again.
That's awesome! I didn't know there was support for emulated timeline semaphores.

I hope I can finish reviewing the patches tomorrow, but modulo a few things I found and commented here I think it is looking pretty great.
I think I'll leave "real" timeline semaphores to someone else. As much fun as I've had hacking on v3d for a few days, the pain of kernel development decreases the motivation to go quite that far. It should be pretty easy to wire up in the v3d kernel driver, though. I'd be happy to give some pointers or maybe even throw you an untested patch or two.
As much fun as I've had hacking on v3d for a few days, the pain of kernel development decreases the motivation to go quite that far.

Hehehe, I feel you 😄

It should be pretty easy to wire up in the v3d kernel driver, though. I'd be happy to give some pointers or maybe even throw you an untested patch or two.

Thanks again Jason, we should really buy you a beer or something next time we have a chance to meet face to face ;)

Please register or sign in to reply

Are Reviewed-by: Iago Toral Quiroga <[email protected]>

Resolved by Iago Toral 3 months ago

@jekstrand: a question about the submit thread model, is my understanding correct that we are spawning a single thread for this? So for example in this case:

We would block on the CmdWaitEvents until the client signals the event and would not start processing the 2nd vkQueueSubmit until that happens?
Last reply by Iago Toral 3 months ago

Iago Toral @itoral started a thread on an outdated change in commit 8b5d7f82 4 months ago

Resolved by Iago Toral 3 months ago

Iago Toral @itoral started a thread on an outdated change in commit 8b5d7f82 4 months ago

Resolved by Iago Toral 3 months ago

I think this may not be handling correctly the scenario where our first job in a command buffer batch is a CPU job. For example, if the first command is vkCmdResetQueryPool, the first CPU job will wait for the relevant queries to finish before resetting but will not wait for semaphores, if any.

We have been handling this scenario by emitting a noop job to consume the wait semaphores first and then waiting for idle before processing the CPU job. See https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/broadcom/vulkan/v3dv_queue.c#L1137

AFAICS, it seems that we are not handling that case here, no?
@jekstrand: I believe this has not been addressed yet?
@jekstrand: you didn't include this patch in the branch yet, do you have any concerns related to it?
In the multi-wait case, this is handled by the vk_sync_wait_many() in queue_wait_idle(). In the non-multi-wait case, we first use SYNC_FILE_MERGE to fold the wait semaphores into queue->last_job_syncs.syncs[V3DV_QUEUE_ANY] and then we wait on that sync in queue_wait_idle().

In all the cases where queue_wait_idle() is called, the resource being touched is not a shared resource so there's no way that a different device will be touching it and v3dv_bo_wait() is sufficient. If you ever decide to allow multiple queues within a single v3dv_device (that should be possible now), we'll need to be a bit more clever. I typed the clever version at one point but decided it wasn't needed.
Yes, that makes sense. Maybe it is a good idea to add a comment in v3dv_queue_driver_submit explaining this for future reference.

Please register or sign in to reply

Iago Toral @itoral started a thread on an outdated change in commit 9850d182 4 months ago

Resolved by Jason Ekstrand 3 months ago

Resolved by Jason Ekstrand 3 months ago

I did a full run of dEQP-VK.synchronization.* with multisync (CI only runs a small fraction) and got a few failures:
Last reply by Iago Toral 4 months ago

1f9fc701...b591409b - 15 commits from branch mesa:main
8e48b68c - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
42f514ff - v3dv: Stop directly setting vk_device::alloc
d042bf20 - v3dv: Put indirect compute CSD jobs in the job list
a452438c - v3dv: Don't use pthread functions on c11 mutexes
88143b73 - v3dv: Destroy the device mutex on the teardown path
998a21fb - v3dv: Switch to the common device lost tracking
a8c7189b - v3dv: Use util/os_time helpers
5b4fde3e - v3dv: Add a condition variable for queries
ab5dc325 - v3dv: Switch to the common submit framework
e1b05d8c - v3dv: Use the core version feature helpers
33e89262 - v3dv: Use the core version property helpers
59fcbf21 - v3dv: Add emulated timeline semaphore support
c09bd06c - FIXUP: Use a dummy GPU job for signaling after a CPU job
0241f383 - FIXUP: Drop a no longer relevant FIXME comment
dc41f7e5 - FIXUP: Use WAIT_ALL in queue_wait_idle

Compare with previous version

Toggle commit list

As mentioned by @jasuarez on the last full runs there were still some flake failures. But those runs didn't include last Jason's fixups pointed by Iago. Tomorrow I will ask him to do a new one, or I could try to do it manually myself on my rpi4.

Meanwhile tonight I will try to test the branch with real apps (just in case) and going on with the review.

Iago Toral @itoral started a thread on an outdated change in commit c09bd06c 3 months ago

Resolved by Jason Ekstrand 3 months ago

Reviewed-by: Iago Toral Quiroga <[email protected]>

Iago Toral @itoral started a thread on an outdated change in commit 59fcbf21 3 months ago

Resolved by Jason Ekstrand 3 months ago

src/broadcom/vulkan/v3dv_device.c

879	880	device->drm_syncobj_type.import_sync_file = NULL;
880	881	device->drm_syncobj_type.export_sync_file = NULL;
881	882
	883	/* Multiwait is required for emultead timeline semaphores and is supported
	884	* by the v3d kernel interface.
I presume this is an intrinsic feature of a syncobject then, right? Roughly. It just means that you don't modify the syncobj when you wait on it so you can wait over and over again if you want. It's true of basically all syncobj interfaces. Please register or sign in to reply

It seems there is a problem with WSI, anything that tries to render to a window freezes and never actually seems to render anything. I tested this with different apps and it seems to happen in all of them. I think this happens because we end up waiting on the semaphore from a previous acquire image, which never gets signaled.
0001-FIXUP-Fix-WSI.patch

This change fixes the problem, it basically removes the whole sync_for_memory stuff and puts back our previous version of AcquireNextImage2KHR that signals fence/semaphore directly after calling the common code. I noticed that Turnip does the same thing by the way.

@jekstrand: is there any implicit requirement for the sync-for-memory stuff to work?
It seems that the common framework signals with sync_for_memory only on QueuePresent by inserting a VK_STRUCTURE_TYPE_WSI_MEMORY_SIGNAL_SUBMIT_INFO_MESA struct into a QueueSubmit produced from there. Not sure how this is supposed to work though, since I think the cases I see are doing:

So we end up waiting forever in step 2.
I have just confirmed that Iago's patch fix some tests that were failing yesterday like this:

dEQP-VK.wsi.xlib.incremental_present.scale_none.immediate.identity.inherit.incremental_present

and got the gfxreconstruct traces working again.

I will keep doing tests, but meanwhile. @jasuarez could you do a new full cts run using this branch (that is just Jason's branch plus Iago's fix)?:

https://gitlab.freedesktop.org/apinheiro/mesa/-/commits/wip/apinheiro/vk-submit
@jekstrand: this is the stacktrace for the case where we hang in step 2, in case it is helpful:

Edited by Iago Toral 3 months ago
@jasuarez pointed to some building warnings when using Iago's fix (consideded errors when the CI builds). Have a fix here:

apinheiro/mesa@c39b785f
Ugh... Yeah, I just blindly applied the thing from RADV because that's what I did for lavapipe too. I think what you've done in your patch is mostly ok. The only problem is that the temporary/permanent semantics are a bit off. We should just copy+paste from panvk.
Ok, I've pushed with updates from all the comments, including this one. I pulled in Iago's patch but modified it with the code from panvk.
Hrm... Actually, the panvk code should be equivalent to what's there today. I need to do some digging. Fortunately, lavapipe also explodes and that one's easy to debug.
Ok, I'm not not clear at all what we should be doing here. Question: How does implicit sync work on v3d? I don't see anything in the kernel interface that lets you wait/signal on BOs. Do you always do a full CPU stall before handing off to X11?
Reading the v3d kernel code, it seems to always sync everything all the time. If that's the case, how have we had any sync problems ever?!? How is vkcube tearing? Maybe I'm reading it wrong? Anyway, I'm going to put it down for now until one of you can help me figure out how this stuff works. Once I know how implicit sync works, fixing WSI should be a snap.
If everything syncs all the time like I suspect from my reading of the v3d kernel code then I think the WSI fixup patch as it exists in this MR right now should be correct. I've tested it a bit with vkcube and some sascha demos.
If everything syncs all the time like I suspect from my reading of the v3d kernel code

AFAIR, that's the case, but I will let Iago to confirm it.

I've tested it a bit with vkcube and some sascha demos.

I have been testing with vkQuake2, Quake3e (that have a Vulkan renderer), and RBDoom3-BFG, and seems to work fine. Pending to try with some ue4 demos that we have around, that we use as reference as intensive vulkan app.

I also tested with gfxreconstruct, and they work in general, but got some new asserts now. Having said so, those are old traces, and it is not the first time that I get weird behaviour when using a recent gfxreconstruct with old traces, so I think that it would be better to capture new ones and try again.

Having said so, before going too much on those apps, when doing full CTS runs we found some new flake tests. @jasuarez runs (so using the CI environment for that) got crashes now and then on the following tests:

and I got the following failures using one of my rpi4:

But as mentioned they are flake tests. In fact, we were not able to reproduce them manually. Individually they work fine, and also if run a bigger subset with a similar pattern (like dEQP-VK.binding_model.descriptor_copy.*). Tomorrow we will try to reproduce those failures without needing to do a full CTS run.
Ok, I'm not not clear at all what we should be doing here. Question: How does implicit sync work on v3d? I don't see anything in the kernel interface that lets you wait/signal on BOs. Do you always do a full CPU stall before handing off to X11?

Reading the v3d kernel code, it seems to always sync everything all the time. If that's the case, how have we had any sync problems ever?!? How is vkcube tearing? Maybe I'm reading it wrong? Anyway, I'm going to put it down for now until one of you can help me figure out how this stuff works. Once I know how implicit sync works, fixing WSI should be a snap.

I think @mwen might be able to help here since she is the one familiar with the kernel bits. Melissa, some context about what we are looking at here: Jason has been rewriting how we do synchronization in v3dv driver to use the common sync code in Mesa and now we're specifically looking at how we send images to the display engine, particularly when it comes to the synchronization aspect so we can be certain that we don't start rendering or presenting too early.

Please register or sign in to reply

dc41f7e5...b91b90c2 - 10 commits from branch mesa:main
5d2a6001 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
4ed9c0f2 - v3dv: Stop directly setting vk_device::alloc
31b1dcc3 - v3dv: Put indirect compute CSD jobs in the job list
f11e9ea3 - v3dv: Don't use pthread functions on c11 mutexes
31b5b1a6 - v3dv: Destroy the device mutex on the teardown path
8366c38e - v3dv: Switch to the common device lost tracking
e926c994 - v3dv: Use util/os_time helpers
78a620e3 - v3dv: Add a condition variable for queries
75f3a1e4 - v3dv: Switch to the common submit framework
b1d82183 - v3dv: Use the core version feature helpers
cfab02da - v3dv: Use the core version property helpers
ced6413a - v3dv: Add emulated timeline semaphore support
a5577acf - FIXUP: Use a dummy GPU job for signaling after a CPU job
be89bf8c - FIXUP: Drop a no longer relevant FIXME comment
789c7541 - FIXUP: Use WAIT_ALL in queue_wait_idle
96d1741b - FIXUP: Fix WSI

Compare with previous version

Toggle commit list

I think we have another issue with signaling where the last job may complete before some other job we executed previously in the command buffer in a different queue. For example, if we have a command buffer like this:

TFU job -> CSD job -> CL job

We would signal with the CL job, but it may be possible that the TFU job is still running since that uses a totally different hw module. Currently, we have been handling this by only signaling on the last job if we have not submitted any jobs in the same command buffer batch to any of the other hw queues, and if that is not the case, we instead emit a serialized noop job at the end that handles signaling. See https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/broadcom/vulkan/v3dv_queue.c#L1306.

Yeah, this is a problem and it seems we actually have 2 bugs: I did a quick change where I we never signal on the last job and instead always signal with a noop job at the end, the idea being that we should be serializing that job against all queues and prevent the issue I describe above. And yes, this gets rid of the flaky crashes but then I found that some tests started to fail more permanently because the noop jobs are only really serialized if we don't have wait semaphores. This is because set_in_syncs assumes that if we are the first job in any queue we just need to wait on the semaphores and there is no need to serialize, however, this is not valid when we are trying to use a noop job for signaling at the end. In the particular case I was looking at, the test was doing a bunch of CSD jobs, so emitting a noop job (which is a CL) at the end to signal would see that it was the first CL job and decide that it only had to wait on the semaphores and not on the last job sycns, so we would be signaling immediately without waiting for the CSD jobs to finish. I fixed this by manually setting sync_info.wait_count = 0 when dispatching the noop job and we had a last job (meaning that have already waited on the input semaphores), and with that it seems that we were able to do a full CI run without issues.

So I think we have 2 ways to fix this properly:

Restore the logic I linked to above where we only signal on the last job of the command buffer if we had only submitted jobs to that one queue (since we know they are serialized in that case) and otherwise, signal with a noop job at the end (but making sure that it waits for the last syncs so it is properly serialized).
Always signal with the last job if it is a GPU job, but force proper serialization against last job syncs too, independently of whether we need to wait for semaphores or not too.

Personally, I think 2 makes more sense.

Tomorrow I have to run some errands mid-morning but I might be able to write that patch before I leave, unless @jekstrand beats me to it :)

I like option 2 as well. Throwing noop jobs at the kernel seems a bit wasteful but, given how many jobs per command buffer you end up with, it really doesn't seem that bad.
I've implemented 2. There is a fixup commit at the end and also one I snuck into the middle called v3dv: Always wait on last_job_syncs if job->serialize.
Thanks Jason, the fixes look good to me.

@jasuarez: could you test the branch in CI?
So I've tested this MR using stock kernel (both 32bits and 64bits), as well as using a multisync-enabled kernel (only in 64bits, I don't have one for 32bits), and everything works fine.
Thanks a bunch! @itoral Is that a review? I've not seen one for the big mega-patch yet.

Edited by Jason Ekstrand 3 months ago
The other thing we were waiting on was the question about implicit sync for WSI. It seems the display driver is supposed to wait on a fence for v3d to be done with rendering, but currently this is messed up and it is not working properly, it is a known issue and they are working on fixing this.

I've just made a comment that we should be signaling immediately the semaphore/fence in vkAcquireNextImage2 by initializing their value to 1 which is what we have been doing before (instead of 0 like you currently do in this branch), however, since the kernel has this bug, I wonder if that is the right thing to do right now.
But in any case, it is exactly what we have been doing all this time anyway, so I don't think we need to figure this out right now.

Please register or sign in to reply

7a050a93...837f781c - 57 commits from branch mesa:main
f4768f4a - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
0d7685bb - v3dv: Stop directly setting vk_device::alloc
9f018315 - v3dv: Put indirect compute CSD jobs in the job list
59f859eb - v3dv: Don't use pthread functions on c11 mutexes
746af781 - v3dv: Destroy the device mutex on the teardown path
9b605c21 - v3dv: Switch to the common device lost tracking
cee64456 - v3dv: Use util/os_time helpers
0126a2bf - v3dv: Add a condition variable for queries
378effb6 - v3dv: Always wait on last_job_syncs if job->serialize
37c3a2e9 - v3dv: Switch to the common submit framework
197cc92b - v3dv: Use the core version feature helpers
d27452d6 - v3dv: Use the core version property helpers
efa8db2e - v3dv: Add emulated timeline semaphore support
a475a2ce - FIXUP: Use a dummy GPU job for signaling after a CPU job
aa978ef6 - FIXUP: Drop a no longer relevant FIXME comment
a78b1853 - FIXUP: Use WAIT_ALL in queue_wait_idle
1007be22 - FIXUP: Fix WSI

Compare with previous version

Toggle commit list

"v3dv: Add emulated timeline semaphore support" is:

Reviewed-by: Iago Toral Quiroga <[email protected]>

"v3dv: Always wait on last_job_syncs if job->serialize" is

Reviewed-by: Iago Toral Quiroga <[email protected]>

As @jasuarez has been focused on doing full CTS runs (and @itoral fixing the issues he found) I has been testing a little with real apps, specifically if there were any performance changes, even if it was just looking for FPS.

So I tested with the gfxreconstruct traces that I had an hand: ue4 shooter demo, ue4 vehicle demo, quake3e, vkQuake.

For the ue4 demos, there are not significant changes, although a slight improvement with the vehicle ones (slight as 11.5 fps vs 12.0 fps on average).

But on the case of Quake3e and vkQuake there is a significant FPS average drop. Around 10fps, and in some traces even more. I also briefly checked on Quake3e directly, just in case it was a issue with gfxreconstruct, and seems to confirm that performance drop.

But taking into account that it is not affecting all games, and the CTS is passing, I don't think that we should block this MR due that, and investigate and fix it later.

Iago Toral @itoral started a thread on an outdated change in commit 1007be22 3 months ago

Resolved by Jason Ekstrand 3 months ago

I've squashed in all the fixups. It should be basically equivalent to the last version only the big patch is now all one patch again.
It seems you didn't actually push the update, but in any case the big patch is:

Reviewed-by: Iago Toral Quiroga <[email protected]>

I think we are ready to land this. Although most of us will be off until Monday due to local holidays, since we have done full CI runs I don't expect there to be any unexpected issues, but in anyc ase @apinheiro will be around a few hours until Friday should anything unexpected happen.

I'll try to figure out why Quake and Quake3 take a performance hit with this next week.

Edited by Iago Toral 3 months ago

Please register or sign in to reply

e3ac6295...cf1390e1 - 349 commits from branch mesa:main
a563d3c2 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
6af49bfa - v3dv: Stop directly setting vk_device::alloc
4bafdc48 - v3dv: Put indirect compute CSD jobs in the job list
e3a26369 - v3dv: Don't use pthread functions on c11 mutexes
93c0d557 - v3dv: Destroy the device mutex on the teardown path
caf6b729 - v3dv: Switch to the common device lost tracking
12b88928 - v3dv: Use util/os_time helpers
d3aa9f3c - v3dv: Add a condition variable for queries
49a17589 - v3dv: Always wait on last_job_syncs if job->serialize
960f6b3c - v3dv: Switch to the common submit framework
d5ab9ef9 - v3dv: Use the core version feature helpers
aa5119d3 - v3dv: Use the core version property helpers
7c51d8be - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list

7c51d8be...7478b00c - 53 commits from branch mesa:main
b284c512 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
0208bb2d - v3dv: Stop directly setting vk_device::alloc
25441b5e - v3dv: Put indirect compute CSD jobs in the job list
30191fd9 - v3dv: Don't use pthread functions on c11 mutexes
32527f3c - v3dv: Destroy the device mutex on the teardown path
8bd7bd95 - v3dv: Switch to the common device lost tracking
e5a0e212 - v3dv: Use util/os_time helpers
00b84fae - v3dv: Add a condition variable for queries
321f0b85 - v3dv: Always wait on last_job_syncs if job->serialize
316728a5 - v3dv: Switch to the common submit framework
1973b2da - v3dv: Use the core version feature helpers
1cc917bc - v3dv: Use the core version property helpers
93fbaae7 - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list

Marge Bot @marge-bot merged 2 hours ago

Just want to say again thanks @jekstrand for working on this MR.
You're more than welcome! One of the things I'm excited about with my new role at Collabora is being able to dip my toes into all the different drivers. This seemed like as good an excuse as any to get to know v3dv a bit.

Please register or sign in to reply

v3dv: Switch to the common submit framework

v3dv: Switch to the common submit framework

Merged by 3 months ago (Apr 13, 2022 5:57pm UTC) 3 months ago

Recommend

V3DV + Zink

v3dv status update 2020-09-07

V3DV Vulkan driver update: VkQuake1-3 now working

v3dv status update 2020-07-01

v3dv status update 2020-07-31

Multipart/form-data submit through javascript in Play Framework

An update on feature progress for V3DV

Improving v3dv pipeline caching

v3dv: wrap wait semaphores info in v3dv_submit_info_semaphores

v3dv: don't submit noop job if there is nothing to wait on or signal

About Joyk