3

v3dv: Switch to the common submit framework

 2 years ago
source link: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

v3dv: Switch to the common submit framework

I decided to go ahead and attempt this myself for my own education. I'm not going to claim I know what I'm doing but it's a good excuse to crawl around in v3dv. Still pretty WIP. This MR is mostly so I can run CI.

Edited 4 months ago by Jason Ekstrand
Detached merge request pipeline #558634 passed for 93fbaae7 3 months ago
Approval is optional

Test summary results are being parsed

View full report

Merged by 3 months ago (Apr 13, 2022 5:57pm UTC) 3 months ago

Merge details
Pipeline #558659 waiting for manual action for 93fbaae7 on main
  • I'm not going to claim I know what I'm doing

    Pretty sure that at least you know what you are doing with the common framework.

    Thanks for it. I will start to take a look. If I find it better that my own wip code, I will switch and use it as reference.

  • Resolved by Jason Ekstrand 4 months ago
    • Resolved by Jason Ekstrand 4 months ago

      "vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup"

      LGTM, but don't have too much experience there yet: Acked-by: Alejandro Piñeiro <[email protected]>

    • Last reply by Jason Ekstrand 4 months ago
  • "v3dv: Stop directly setting vk_device::alloc"

    Reviewed-by: Alejandro Piñeiro <[email protected]>

  • Resolved by Jason Ekstrand 4 months ago
  • Resolved by Jason Ekstrand 4 months ago
    • Resolved by Jason Ekstrand 4 months ago

      Looks promising. Just had time for a skim, addded some questions and skims.

      The bad thing is that I already found some regressions, just running the smoke-test subset through the simulator, tomorrow will try to confirm them on the device.

      Edited by Alejandro Piñeiro 4 months ago
    • Last reply by Jason Ekstrand 4 months ago
    • Resolved by Jason Ekstrand 4 months ago

      @jekstrand, @apinheiro: FYI, the Rpi4s we use in CI do not have a kernel with multisync support (they are running a kernel currently available for Raspberry Pi OS, which didn't pick up this feature yet).

      In order to ensure this works with multisync too you can add this change dac47c40 to the MR, it will make the RPI4s in the CI download a multisync-enabled kernel and then the driver will detect the feature is available and automatically switch to that.

    • Last reply by Iago Toral 4 months ago
  • For the runtime change : Reviewed-by: Lionel Landwerlin <[email protected]>

    • a8573f14 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
    • 6d9e4a00 - v3dv: Stop directly setting vk_device::alloc
    • c892bfc8 - v3dv: Switch to the common submit framework

    Compare with previous version

    • 397cdf4a - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
    • 63786dd7 - v3dv: Stop directly setting vk_device::alloc
    • 736fcb9c - v3dv: Switch to the common submit framework

    Compare with previous version

    • Resolved by Iago Toral 3 months ago

      I've got most stuff passing pretty well. Still need to debug some of the external cross-device fails. That said, there are a few tests that we cannot pass with the current design:

      • dEQP-VK.api.external.fence.sync_fd.signal_import_temporary
      • dEQP-VK.api.external.fence.sync_fd.dup_temporary
      • dEQP-VK.api.external.semaphore.sync_fd.signal_import_temporary
      • dEQP-VK.api.external.semaphore.sync_fd.dup2_temporary

      Why not? Because they do the following:

      1. Create a command buffer with a vkCmdWaitEvents()
      2. Submit the command buffer
      3. vkGetSemaphoreFdKHR() to try to get a sync_file
      4. vkSetEvent()

      The problem is that vkGetSemaphoreFdKHR() must wait for the fence to materialize in the syncobj before it can return a sync_file. This means waiting for the entire submit to flushe through which won't happen until vkSetEvent() has been called. It's a deadlock and no amount of additional threading will fix this. It would have been a problem with the old thread pool design as well; it just shows up now because sync_file support is suddenly advertised.

      Sadly, it seems that this is pretty explicitly allowed by the spec:

      VUID-vkCmdWaitEvents-pEvents-01163

      If pEvents includes one or more events that will be signaled by vkSetEvent after commandBuffer has been submitted to a queue, then vkCmdWaitEvents must not be called inside a render pass instance

      We could potentially get this disallowed for sync_file because it's a real cheap-shot there. But given that the "execute some stuff on the CPU" model isn't really how Vulkan is intended to work, we may get some push-back on that. The only other option is to get rid of all the CPU jobs. Or we can disable sync_file support. 🙃

      Edited by Jason Ekstrand 4 months ago
    • Last reply by Iago Toral 4 months ago
    • 8711f92b - v3dv: Put indirect compute CSD jobs in the job list
    • cbf2dd7f - v3dv: Switch to the common submit framework

    Compare with previous version

    • ac664ccb - v3dv: Put indirect compute CSD jobs in the job list
    • 54f8f368 - v3dv: Switch to the common submit framework

    Compare with previous version

  • I'm now down to just vkGetQueryPoolResults() with VK_QUERY_RESULT_WAIT_BIT. I think it's going to need a bit of reworking because it depends on v3dv_bo_wait() which doesn't really work when you have a submit thread which may delay things. We could vkDeviceWaitIdle() but that seems like a bit of a heavy hammer. Honestly, I'm not really sure how it's working now. 😕

    • Resolved by Iago Toral 3 months ago

      Honestly, I'm not really sure how it's working now. 😕

      For the same doubts you mentioned on IRC? New ones?

    • Last reply by Iago Toral 4 months ago
  • Reviewed-by: Iago Toral Quiroga <[email protected]>

  • Iago Toral @itoral started a thread on an outdated change in commit cd2687db 4 months ago
    Resolved by Jason Ekstrand 4 months ago
  • Iago Toral @itoral started a thread on an outdated change in commit cd2687db 4 months ago
    Resolved by Jason Ekstrand 4 months ago
    • cc470188...28ae397b - 48 commits from branch mesa:main
    • 568fddd2 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
    • 6f1ea603 - v3dv: Stop directly setting vk_device::alloc
    • 5f9de40c - v3dv: Put indirect compute CSD jobs in the job list
    • 72e96249 - v3dv: Switch to the common submit framework

    Compare with previous version

    Toggle commit list
    • 2f942136 - v3dv: Don't use pthread functions on c11 mutexes
    • bd09b35d - v3dv: Destroy the mutex on the teardown path
    • e7f6a7fb - v3dv: Switch to the common submit framework

    Compare with previous version

  • "v3dv: Don't use pthread functions on c11 mutexes"

    Yes, this messing up on having one kind of mutex, but using other library was something I noticed too last week. Didn't realize that we also had that issue on the pipeline cache. Thanks for the cleanup.

    Reviewed-by: Alejandro Piñeiro <[email protected]>

  • "v3dv: Destroy the mutex on the teardown path"

    That's ok, but as far as I see this change is gone with "v3dv: Switch to the common submit framework"

    Reviewed-by: Alejandro Piñeiro <[email protected]>

    • b1e8aec8 - v3dv: Destroy the device mutex on the teardown path
    • 574eee40 - v3dv: Switch to the common device lost tracking
    • d93abe70 - v3dv: Use util/os_time helpers
    • 4ea45541 - v3dv: Add a condition variable for queries

    Compare with previous version

    Toggle commit list
    • 0832c652 - v3dv: Don't use pthread functions on c11 mutexes
    • 9e12596c - v3dv: Destroy the device mutex on the teardown path
    • 7bfd5a02 - v3dv: Switch to the common device lost tracking
    • dfd3fa80 - v3dv: Use util/os_time helpers
    • 38610c49 - v3dv: Add a condition variable for queries
    • 37c88d52 - v3dv: Switch to the common submit framework

    Compare with previous version

    Toggle commit list
    • bd69f323 - v3dv: Add a condition variable for queries
    • 252e6b8e - v3dv: Switch to the common submit framework

    Compare with previous version

  • "v3dv: Switch to the common device lost tracking"

    Minor typo on the commit message: s/requres/requires

    Reviewed-by: Alejandro Piñeiro <[email protected]>

  • "v3dv: Use util/os_time helpers"

    Reviewed-by: Alejandro Piñeiro <[email protected]>

  • CI is green. I think we're ready for review for real.

  • "v3dv: Add a condition variable for queries"

    Reviewed-by: Alejandro Piñeiro <[email protected]>

    • Resolved by Alejandro Piñeiro 4 months ago

      CI is green.

      Ditto for the custom smoke-testing cts subset that we use internally, that was manually tailored (CI subset is somewhat random), and has slightly more tests that the CI subset.

      Just in case, I will start a full CTS run.

      I think we're ready for review for real.

      I started with the smaller patches. Tomorrow will continue with the bigger one (the real switch).

      Again thanks for all this work. Really appreciated.

    • Last reply by Alejandro Piñeiro 4 months ago
    • CI is green.

      Just realized something. We want CI to be green with both multisync and non-multisync environments. I guess that you tested including Iago's patch. Did you test too without it?

    • Most of my testing has been without multisync. I just did one non-multisync run.

    • As mentioned we found some fails with our smoke testing subset. Will try to confirm and add the list here.

    Please register or sign in to reply
  • b540a8c0...5b4e41e4 - 24 commits from branch mesa:main
  • cb1f3dfb - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 165756e9 - v3dv: Stop directly setting vk_device::alloc
  • e140dbac - v3dv: Put indirect compute CSD jobs in the job list
  • 96ddfe99 - v3dv: Don't use pthread functions on c11 mutexes
  • 6e34fa67 - v3dv: Destroy the device mutex on the teardown path
  • 1c96cc72 - v3dv: Switch to the common device lost tracking
  • 88cd8e86 - v3dv: Use util/os_time helpers
  • 9461736a - v3dv: Add a condition variable for queries
  • 8b5d7f82 - v3dv: Switch to the common submit framework
  • 0dcb6f94 - v3dv: Use the core version feature helpers
  • 326c6c1c - v3dv: Use the core version property helpers
  • 6de74b41 - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list
  • fa5a067f - v3dv: Use the core version feature helpers
  • 00265eff - v3dv: Use the core version property helpers
  • 6d4eddd8 - v3dv: Add emulated timeline semaphore support

Compare with previous version

  • ea540ddf - v3dv: Use the core version property helpers
  • 104dcf5d - v3dv: Add emulated timeline semaphore support

Compare with previous version

  • 8f4ef0d3 - v3dv: Use the core version feature helpers
  • 9850d182 - v3dv: Use the core version property helpers
  • 56667df4 - v3dv: Add emulated timeline semaphore support

Compare with previous version

  • Now, with timeline semaphores! I should stop typing patches and let you review. 😂

  • If you want me to pull the top few that do timeline semaphores into a separate MR, I can do that.

  • Well, I think that it is ok to keep them on this MR. But I think that now that we are on the situation of reviewing, testing, and finding some regressions, probably it is better to stop to add features and focus on stabilize the code.

    Really appreciated though that you added timeline semaphores though! Thanks again.

  • That's awesome! I didn't know there was support for emulated timeline semaphores.

    I hope I can finish reviewing the patches tomorrow, but modulo a few things I found and commented here I think it is looking pretty great.

  • I think I'll leave "real" timeline semaphores to someone else. As much fun as I've had hacking on v3d for a few days, the pain of kernel development decreases the motivation to go quite that far. It should be pretty easy to wire up in the v3d kernel driver, though. I'd be happy to give some pointers or maybe even throw you an untested patch or two.

  • As much fun as I've had hacking on v3d for a few days, the pain of kernel development decreases the motivation to go quite that far.

    Hehehe, I feel you 😄

    It should be pretty easy to wire up in the v3d kernel driver, though. I'd be happy to give some pointers or maybe even throw you an untested patch or two.

    Thanks again Jason, we should really buy you a beer or something next time we have a chance to meet face to face ;)

Please register or sign in to reply

Are Reviewed-by: Iago Toral Quiroga <[email protected]>

  • Resolved by Iago Toral 3 months ago

    @jekstrand: a question about the submit thread model, is my understanding correct that we are spawning a single thread for this? So for example in this case:

    We would block on the CmdWaitEvents until the client signals the event and would not start processing the 2nd vkQueueSubmit until that happens?

  • Last reply by Iago Toral 3 months ago
Iago Toral @itoral started a thread on an outdated change in commit 8b5d7f82 4 months ago
Resolved by Iago Toral 3 months ago
Iago Toral @itoral started a thread on an outdated change in commit 8b5d7f82 4 months ago
Resolved by Iago Toral 3 months ago
  • I think this may not be handling correctly the scenario where our first job in a command buffer batch is a CPU job. For example, if the first command is vkCmdResetQueryPool, the first CPU job will wait for the relevant queries to finish before resetting but will not wait for semaphores, if any.

    We have been handling this scenario by emitting a noop job to consume the wait semaphores first and then waiting for idle before processing the CPU job. See https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/broadcom/vulkan/v3dv_queue.c#L1137

    AFAICS, it seems that we are not handling that case here, no?

  • @jekstrand: I believe this has not been addressed yet?

  • @jekstrand: you didn't include this patch in the branch yet, do you have any concerns related to it?

  • In the multi-wait case, this is handled by the vk_sync_wait_many() in queue_wait_idle(). In the non-multi-wait case, we first use SYNC_FILE_MERGE to fold the wait semaphores into queue->last_job_syncs.syncs[V3DV_QUEUE_ANY] and then we wait on that sync in queue_wait_idle().

    In all the cases where queue_wait_idle() is called, the resource being touched is not a shared resource so there's no way that a different device will be touching it and v3dv_bo_wait() is sufficient. If you ever decide to allow multiple queues within a single v3dv_device (that should be possible now), we'll need to be a bit more clever. I typed the clever version at one point but decided it wasn't needed.

  • Yes, that makes sense. Maybe it is a good idea to add a comment in v3dv_queue_driver_submit explaining this for future reference.

Please register or sign in to reply
Iago Toral @itoral started a thread on an outdated change in commit 9850d182 4 months ago
Resolved by Jason Ekstrand 3 months ago
  • Resolved by Jason Ekstrand 3 months ago

    I did a full run of dEQP-VK.synchronization.* with multisync (CI only runs a small fraction) and got a few failures:

  • Last reply by Iago Toral 4 months ago
  • 1f9fc701...b591409b - 15 commits from branch mesa:main
  • 8e48b68c - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 42f514ff - v3dv: Stop directly setting vk_device::alloc
  • d042bf20 - v3dv: Put indirect compute CSD jobs in the job list
  • a452438c - v3dv: Don't use pthread functions on c11 mutexes
  • 88143b73 - v3dv: Destroy the device mutex on the teardown path
  • 998a21fb - v3dv: Switch to the common device lost tracking
  • a8c7189b - v3dv: Use util/os_time helpers
  • 5b4fde3e - v3dv: Add a condition variable for queries
  • ab5dc325 - v3dv: Switch to the common submit framework
  • e1b05d8c - v3dv: Use the core version feature helpers
  • 33e89262 - v3dv: Use the core version property helpers
  • 59fcbf21 - v3dv: Add emulated timeline semaphore support
  • c09bd06c - FIXUP: Use a dummy GPU job for signaling after a CPU job
  • 0241f383 - FIXUP: Drop a no longer relevant FIXME comment
  • dc41f7e5 - FIXUP: Use WAIT_ALL in queue_wait_idle

Compare with previous version

Toggle commit list

As mentioned by @jasuarez on the last full runs there were still some flake failures. But those runs didn't include last Jason's fixups pointed by Iago. Tomorrow I will ask him to do a new one, or I could try to do it manually myself on my rpi4.

Meanwhile tonight I will try to test the branch with real apps (just in case) and going on with the review.

Iago Toral @itoral started a thread on an outdated change in commit c09bd06c 3 months ago
Resolved by Jason Ekstrand 3 months ago

Reviewed-by: Iago Toral Quiroga <[email protected]>

Iago Toral @itoral started a thread on an outdated change in commit 59fcbf21 3 months ago
Resolved by Jason Ekstrand 3 months ago
879 880 device->drm_syncobj_type.import_sync_file = NULL;
880 881 device->drm_syncobj_type.export_sync_file = NULL;
881 882
883 /* Multiwait is required for emultead timeline semaphores and is supported
884 * by the v3d kernel interface.
  • I presume this is an intrinsic feature of a syncobject then, right?

  • Roughly. It just means that you don't modify the syncobj when you wait on it so you can wait over and over again if you want. It's true of basically all syncobj interfaces.

Please register or sign in to reply
  • It seems there is a problem with WSI, anything that tries to render to a window freezes and never actually seems to render anything. I tested this with different apps and it seems to happen in all of them. I think this happens because we end up waiting on the semaphore from a previous acquire image, which never gets signaled.

  • 0001-FIXUP-Fix-WSI.patch

    This change fixes the problem, it basically removes the whole sync_for_memory stuff and puts back our previous version of AcquireNextImage2KHR that signals fence/semaphore directly after calling the common code. I noticed that Turnip does the same thing by the way.

    @jekstrand: is there any implicit requirement for the sync-for-memory stuff to work?

  • It seems that the common framework signals with sync_for_memory only on QueuePresent by inserting a VK_STRUCTURE_TYPE_WSI_MEMORY_SIGNAL_SUBMIT_INFO_MESA struct into a QueueSubmit produced from there. Not sure how this is supposed to work though, since I think the cases I see are doing:

    So we end up waiting forever in step 2.

  • I have just confirmed that Iago's patch fix some tests that were failing yesterday like this:

    dEQP-VK.wsi.xlib.incremental_present.scale_none.immediate.identity.inherit.incremental_present

    and got the gfxreconstruct traces working again.

    I will keep doing tests, but meanwhile. @jasuarez could you do a new full cts run using this branch (that is just Jason's branch plus Iago's fix)?:

    https://gitlab.freedesktop.org/apinheiro/mesa/-/commits/wip/apinheiro/vk-submit

  • @jekstrand: this is the stacktrace for the case where we hang in step 2, in case it is helpful:

    Edited by Iago Toral 3 months ago
  • @jasuarez pointed to some building warnings when using Iago's fix (consideded errors when the CI builds). Have a fix here:

    apinheiro/mesa@c39b785f

  • Ugh... Yeah, I just blindly applied the thing from RADV because that's what I did for lavapipe too. I think what you've done in your patch is mostly ok. The only problem is that the temporary/permanent semantics are a bit off. We should just copy+paste from panvk.

  • Ok, I've pushed with updates from all the comments, including this one. I pulled in Iago's patch but modified it with the code from panvk.

  • Hrm... Actually, the panvk code should be equivalent to what's there today. I need to do some digging. Fortunately, lavapipe also explodes and that one's easy to debug.

  • Ok, I'm not not clear at all what we should be doing here. Question: How does implicit sync work on v3d? I don't see anything in the kernel interface that lets you wait/signal on BOs. Do you always do a full CPU stall before handing off to X11?

  • Reading the v3d kernel code, it seems to always sync everything all the time. If that's the case, how have we had any sync problems ever?!? How is vkcube tearing? Maybe I'm reading it wrong? Anyway, I'm going to put it down for now until one of you can help me figure out how this stuff works. Once I know how implicit sync works, fixing WSI should be a snap.

  • If everything syncs all the time like I suspect from my reading of the v3d kernel code then I think the WSI fixup patch as it exists in this MR right now should be correct. I've tested it a bit with vkcube and some sascha demos.

  • If everything syncs all the time like I suspect from my reading of the v3d kernel code

    AFAIR, that's the case, but I will let Iago to confirm it.

    I've tested it a bit with vkcube and some sascha demos.

    I have been testing with vkQuake2, Quake3e (that have a Vulkan renderer), and RBDoom3-BFG, and seems to work fine. Pending to try with some ue4 demos that we have around, that we use as reference as intensive vulkan app.

    I also tested with gfxreconstruct, and they work in general, but got some new asserts now. Having said so, those are old traces, and it is not the first time that I get weird behaviour when using a recent gfxreconstruct with old traces, so I think that it would be better to capture new ones and try again.

    Having said so, before going too much on those apps, when doing full CTS runs we found some new flake tests. @jasuarez runs (so using the CI environment for that) got crashes now and then on the following tests:

    and I got the following failures using one of my rpi4:

    But as mentioned they are flake tests. In fact, we were not able to reproduce them manually. Individually they work fine, and also if run a bigger subset with a similar pattern (like dEQP-VK.binding_model.descriptor_copy.*). Tomorrow we will try to reproduce those failures without needing to do a full CTS run.

  • Ok, I'm not not clear at all what we should be doing here. Question: How does implicit sync work on v3d? I don't see anything in the kernel interface that lets you wait/signal on BOs. Do you always do a full CPU stall before handing off to X11?

    Reading the v3d kernel code, it seems to always sync everything all the time. If that's the case, how have we had any sync problems ever?!? How is vkcube tearing? Maybe I'm reading it wrong? Anyway, I'm going to put it down for now until one of you can help me figure out how this stuff works. Once I know how implicit sync works, fixing WSI should be a snap.

    I think @mwen might be able to help here since she is the one familiar with the kernel bits. Melissa, some context about what we are looking at here: Jason has been rewriting how we do synchronization in v3dv driver to use the common sync code in Mesa and now we're specifically looking at how we send images to the display engine, particularly when it comes to the synchronization aspect so we can be certain that we don't start rendering or presenting too early.

Please register or sign in to reply
  • dc41f7e5...b91b90c2 - 10 commits from branch mesa:main
  • 5d2a6001 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 4ed9c0f2 - v3dv: Stop directly setting vk_device::alloc
  • 31b1dcc3 - v3dv: Put indirect compute CSD jobs in the job list
  • f11e9ea3 - v3dv: Don't use pthread functions on c11 mutexes
  • 31b5b1a6 - v3dv: Destroy the device mutex on the teardown path
  • 8366c38e - v3dv: Switch to the common device lost tracking
  • e926c994 - v3dv: Use util/os_time helpers
  • 78a620e3 - v3dv: Add a condition variable for queries
  • 75f3a1e4 - v3dv: Switch to the common submit framework
  • b1d82183 - v3dv: Use the core version feature helpers
  • cfab02da - v3dv: Use the core version property helpers
  • ced6413a - v3dv: Add emulated timeline semaphore support
  • a5577acf - FIXUP: Use a dummy GPU job for signaling after a CPU job
  • be89bf8c - FIXUP: Drop a no longer relevant FIXME comment
  • 789c7541 - FIXUP: Use WAIT_ALL in queue_wait_idle
  • 96d1741b - FIXUP: Fix WSI

Compare with previous version

Toggle commit list

I think we have another issue with signaling where the last job may complete before some other job we executed previously in the command buffer in a different queue. For example, if we have a command buffer like this:

TFU job -> CSD job -> CL job

We would signal with the CL job, but it may be possible that the TFU job is still running since that uses a totally different hw module. Currently, we have been handling this by only signaling on the last job if we have not submitted any jobs in the same command buffer batch to any of the other hw queues, and if that is not the case, we instead emit a serialized noop job at the end that handles signaling. See https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/broadcom/vulkan/v3dv_queue.c#L1306.

Yeah, this is a problem and it seems we actually have 2 bugs: I did a quick change where I we never signal on the last job and instead always signal with a noop job at the end, the idea being that we should be serializing that job against all queues and prevent the issue I describe above. And yes, this gets rid of the flaky crashes but then I found that some tests started to fail more permanently because the noop jobs are only really serialized if we don't have wait semaphores. This is because set_in_syncs assumes that if we are the first job in any queue we just need to wait on the semaphores and there is no need to serialize, however, this is not valid when we are trying to use a noop job for signaling at the end. In the particular case I was looking at, the test was doing a bunch of CSD jobs, so emitting a noop job (which is a CL) at the end to signal would see that it was the first CL job and decide that it only had to wait on the semaphores and not on the last job sycns, so we would be signaling immediately without waiting for the CSD jobs to finish. I fixed this by manually setting sync_info.wait_count = 0 when dispatching the noop job and we had a last job (meaning that have already waited on the input semaphores), and with that it seems that we were able to do a full CI run without issues.

So I think we have 2 ways to fix this properly:

  1. Restore the logic I linked to above where we only signal on the last job of the command buffer if we had only submitted jobs to that one queue (since we know they are serialized in that case) and otherwise, signal with a noop job at the end (but making sure that it waits for the last syncs so it is properly serialized).

  2. Always signal with the last job if it is a GPU job, but force proper serialization against last job syncs too, independently of whether we need to wait for semaphores or not too.

Personally, I think 2 makes more sense.

Tomorrow I have to run some errands mid-morning but I might be able to write that patch before I leave, unless @jekstrand beats me to it :)

  • I like option 2 as well. Throwing noop jobs at the kernel seems a bit wasteful but, given how many jobs per command buffer you end up with, it really doesn't seem that bad.

  • I've implemented 2. There is a fixup commit at the end and also one I snuck into the middle called v3dv: Always wait on last_job_syncs if job->serialize.

  • Thanks Jason, the fixes look good to me.

    @jasuarez: could you test the branch in CI?

  • So I've tested this MR using stock kernel (both 32bits and 64bits), as well as using a multisync-enabled kernel (only in 64bits, I don't have one for 32bits), and everything works fine.

  • Thanks a bunch! @itoral Is that a review? I've not seen one for the big mega-patch yet.

    Edited by Jason Ekstrand 3 months ago
  • The other thing we were waiting on was the question about implicit sync for WSI. It seems the display driver is supposed to wait on a fence for v3d to be done with rendering, but currently this is messed up and it is not working properly, it is a known issue and they are working on fixing this.

    I've just made a comment that we should be signaling immediately the semaphore/fence in vkAcquireNextImage2 by initializing their value to 1 which is what we have been doing before (instead of 0 like you currently do in this branch), however, since the kernel has this bug, I wonder if that is the right thing to do right now.

  • But in any case, it is exactly what we have been doing all this time anyway, so I don't think we need to figure this out right now.

Please register or sign in to reply
  • 7a050a93...837f781c - 57 commits from branch mesa:main
  • f4768f4a - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 0d7685bb - v3dv: Stop directly setting vk_device::alloc
  • 9f018315 - v3dv: Put indirect compute CSD jobs in the job list
  • 59f859eb - v3dv: Don't use pthread functions on c11 mutexes
  • 746af781 - v3dv: Destroy the device mutex on the teardown path
  • 9b605c21 - v3dv: Switch to the common device lost tracking
  • cee64456 - v3dv: Use util/os_time helpers
  • 0126a2bf - v3dv: Add a condition variable for queries
  • 378effb6 - v3dv: Always wait on last_job_syncs if job->serialize
  • 37c3a2e9 - v3dv: Switch to the common submit framework
  • 197cc92b - v3dv: Use the core version feature helpers
  • d27452d6 - v3dv: Use the core version property helpers
  • efa8db2e - v3dv: Add emulated timeline semaphore support
  • a475a2ce - FIXUP: Use a dummy GPU job for signaling after a CPU job
  • aa978ef6 - FIXUP: Drop a no longer relevant FIXME comment
  • a78b1853 - FIXUP: Use WAIT_ALL in queue_wait_idle
  • 1007be22 - FIXUP: Fix WSI

Compare with previous version

Toggle commit list

"v3dv: Add emulated timeline semaphore support" is:

Reviewed-by: Iago Toral Quiroga <[email protected]>

"v3dv: Always wait on last_job_syncs if job->serialize" is

Reviewed-by: Iago Toral Quiroga <[email protected]>

As @jasuarez has been focused on doing full CTS runs (and @itoral fixing the issues he found) I has been testing a little with real apps, specifically if there were any performance changes, even if it was just looking for FPS.

So I tested with the gfxreconstruct traces that I had an hand: ue4 shooter demo, ue4 vehicle demo, quake3e, vkQuake.

For the ue4 demos, there are not significant changes, although a slight improvement with the vehicle ones (slight as 11.5 fps vs 12.0 fps on average).

But on the case of Quake3e and vkQuake there is a significant FPS average drop. Around 10fps, and in some traces even more. I also briefly checked on Quake3e directly, just in case it was a issue with gfxreconstruct, and seems to confirm that performance drop.

But taking into account that it is not affecting all games, and the CTS is passing, I don't think that we should block this MR due that, and investigate and fix it later.

Iago Toral @itoral started a thread on an outdated change in commit 1007be22 3 months ago
Resolved by Jason Ekstrand 3 months ago
  • I've squashed in all the fixups. It should be basically equivalent to the last version only the big patch is now all one patch again.

  • It seems you didn't actually push the update, but in any case the big patch is:

    Reviewed-by: Iago Toral Quiroga <[email protected]>

    I think we are ready to land this. Although most of us will be off until Monday due to local holidays, since we have done full CI runs I don't expect there to be any unexpected issues, but in anyc ase @apinheiro will be around a few hours until Friday should anything unexpected happen.

    I'll try to figure out why Quake and Quake3 take a performance hit with this next week.

    Edited by Iago Toral 3 months ago
Please register or sign in to reply
  • e3ac6295...cf1390e1 - 349 commits from branch mesa:main
  • a563d3c2 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 6af49bfa - v3dv: Stop directly setting vk_device::alloc
  • 4bafdc48 - v3dv: Put indirect compute CSD jobs in the job list
  • e3a26369 - v3dv: Don't use pthread functions on c11 mutexes
  • 93c0d557 - v3dv: Destroy the device mutex on the teardown path
  • caf6b729 - v3dv: Switch to the common device lost tracking
  • 12b88928 - v3dv: Use util/os_time helpers
  • d3aa9f3c - v3dv: Add a condition variable for queries
  • 49a17589 - v3dv: Always wait on last_job_syncs if job->serialize
  • 960f6b3c - v3dv: Switch to the common submit framework
  • d5ab9ef9 - v3dv: Use the core version feature helpers
  • aa5119d3 - v3dv: Use the core version property helpers
  • 7c51d8be - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list
  • 7c51d8be...7478b00c - 53 commits from branch mesa:main
  • b284c512 - vulkan/drm_syncobj: Implement WAIT_PENDING with a sync_file lookup
  • 0208bb2d - v3dv: Stop directly setting vk_device::alloc
  • 25441b5e - v3dv: Put indirect compute CSD jobs in the job list
  • 30191fd9 - v3dv: Don't use pthread functions on c11 mutexes
  • 32527f3c - v3dv: Destroy the device mutex on the teardown path
  • 8bd7bd95 - v3dv: Switch to the common device lost tracking
  • e5a0e212 - v3dv: Use util/os_time helpers
  • 00b84fae - v3dv: Add a condition variable for queries
  • 321f0b85 - v3dv: Always wait on last_job_syncs if job->serialize
  • 316728a5 - v3dv: Switch to the common submit framework
  • 1973b2da - v3dv: Use the core version feature helpers
  • 1cc917bc - v3dv: Use the core version property helpers
  • 93fbaae7 - v3dv: Add emulated timeline semaphore support

Compare with previous version

Toggle commit list
  • Marge Bot @marge-bot merged 2 hours ago

    Just want to say again thanks @jekstrand for working on this MR.

  • You're more than welcome! One of the things I'm excited about with my new role at Collabora is being able to dip my toes into all the different drivers. This seemed like as good an excuse as any to get to know v3dv a bit.

Please register or sign in to reply
Please register or sign in to reply

Recommend

  • 8
    • blogs.igalia.com 3 years ago
    • Cache

    V3DV + Zink

    V3DV + Zink November 5, 2020 During my presentation at the X Developers Conference I...

  • 12
    • blogs.igalia.com 3 years ago
    • Cache

    v3dv status update 2020-09-07

    So here a new update of the evolution of the Vulkan driver for the rpi4 (broadcom GPU). Features Since my last update we finished the supp...

  • 15

    V3DV Vulkan driver update: VkQuake1-3 now working July 23, 2020 A few weeks ago we shared an

  • 3
    • blogs.igalia.com 3 years ago
    • Cache

    v3dv status update 2020-07-01

    About three weeks ago there was a big announcement about the update of the status of the Vulkan effort for the Raspberry Pi 4. Now the source code is pu...

  • 7
    • blogs.igalia.com 3 years ago
    • Cache

    v3dv status update 2020-07-31

    v3dv status update 2020-07-31 Iago talked recently about the work done testing and supporting well known applications, like the Vulkan ports of t...

  • 8

    Multipart/form-data submit through javascript in Play Framework Reading Time: 2 minutesNormally when we submit any multi-p...

  • 4
    • blogs.igalia.com 2 years ago
    • Cache

    An update on feature progress for V3DV

    An update on feature progress for V3DV I’ve been silent here for quite some time, so here is a quick summary of some of the new functionality we have been exposing in V3DV, the Vulkan driver for Raspberry PI 4, over the last few mont...

  • 1
    • blogs.igalia.com 2 years ago
    • Cache

    Improving v3dv pipeline caching

    Iago wrote recently a blog post about performance improvements on the v3d compiler, and introduced our plans to improve the pipeline caching (specifically the compiled shaders) (full blog

  • 2

    v3dv: wrap wait semaphores info in v3dv_submit_info_semaphores Instead of pass pSubmit to queue_submit_cmd_buffer, create a struct v3dv_submit_info_semaphores to wrap semaphores data from VkSubmitInfo. In the next commit, this struc...

  • 3

    v3dv: don't submit noop job if there is nothing to wait on or signal Detached merge request pipeline

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK