Blob Blame History Raw
From: Mario Kleiner <mario.kleiner.de@gmail.com>
Date: Mon, 24 Apr 2017 11:46:44 +0200
Subject: drm/amd/display: Fix race between vblank irq and pageflip irq. (v2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Git-commit: 753c66c91b5430816eedb67138017fd4e2143d52
Patch-mainline: v4.15-rc1
References: FATE#326289 FATE#326079 FATE#326049 FATE#322398 FATE#326166

Since DC now uses CRTC_VERTICAL_INTERRUPT0 as VBLANK irq trigger
and vblank interrupts actually happen earliest at start of vblank,
instead of a bit before vblank, we no longer need some of the
fudging logic to deal with too early vblank irq handling (grep for
lb_vblank_lead_lines). This itself fixes a pageflip scheduling
bug in DC, caused by uninitialized  use of lb_vblank_lead_lines,
with a wrong startup value of 0. Thanks to the new vblank irq
trigger this value of zero is now actually correct for DC :).

A new problem is that vblank irq's race against pflip irq's,
and as both can fire at first line of vblank, it is no longer
guaranteed that vblank irq handling (therefore -> drm_handle_vblank()
-> drm_update_vblank_count()) executes before pflip irq handling
for a given vblank interval when a pageflip completes. Therefore
the vblank count and timestamps emitted to user-space as part of
the pageflip completion event will be often stale and cause new
timestamping and swap scheduling errors in user-space.

This was observed with large frequency on R9 380 Tonga Pro.

Fix this by enforcing a vblank count+timestamp update right
before emitting the pageflip completion event from the pflip
irq handler. The logic in core drm_update_vblank_count() makes
sure that no redundant or conflicting updates happen, iow. the
call turns into a no-op if it wasn't needed for that vblank,
burning a few microseconds of cpu time though.

Successfully tested on AMD R9 380 "Tonga Pro" (VI/DCE 10)
with DC enabled on the current DC staging branch. Independent
measurement of pageflip completion timing with special hardware
measurement equipment now confirms correct pageflip timestamps
and counts in the pageflip completion events.

v2: Review comments by Michel, drop outdated paragraph
    about problem already fixed in 2nd patch of the series.
    Add acked/r-b by Harry and Michel.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com> (v1)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Petr Tesarik <ptesarik@suse.com>
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -202,6 +202,9 @@ static void dm_pflip_high_irq(void *inte
 	if (amdgpu_crtc->event
 			&& amdgpu_crtc->event->event.base.type
 			== DRM_EVENT_FLIP_COMPLETE) {
+		/* Update to correct count/ts if racing with vblank irq */
+		drm_crtc_accurate_vblank_count(&amdgpu_crtc->base);
+
 		drm_crtc_send_vblank_event(&amdgpu_crtc->base, amdgpu_crtc->event);
 		/* page flip completed. clean up */
 		amdgpu_crtc->event = NULL;