Blob Blame History Raw
From a4b2b01523a8f147112f44cd0867d246ed07b43c Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed, 17 May 2017 13:10:01 +0100
Subject: [PATCH] drm/i915: Don't mark an execlists context-switch when idle
Git-commit: a4b2b01523a8f147112f44cd0867d246ed07b43c
Patch-mainline: v4.13-rc1
References: FATE#322643 bsc#1055900

If we *know* that the engine is idle, i.e. we have not more contexts in
flight, we can skip any spurious CSB idle interrupts. These spurious
interrupts seem to arrive long after we assert that the engines are
completely idle, triggering later assertions:

[  178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2
[  178.896655] ------------[ cut here ]------------
[  178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226!
[  178.896661] invalid opcode: 0000 [#1] SMP
[  178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E)
[  178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G            E   4.11.0-rc5+ #14
[  178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000
[  178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915]
[  178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246
[  178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000
[  178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48
[  178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d
[  178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000
[  178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000
[  178.896742] FS:  00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
[  178.896746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0
[  178.896752] Call Trace:
[  178.896768]  reset_all_global_seqno.part.33+0x4e/0xd0 [i915]
[  178.896782]  i915_gem_request_alloc+0x304/0x330 [i915]
[  178.896795]  i915_gem_do_execbuffer+0x8a1/0x17d0 [i915]
[  178.896799]  ? remove_wait_queue+0x48/0x50
[  178.896812]  ? i915_wait_request+0x300/0x590 [i915]
[  178.896816]  ? wake_up_q+0x70/0x70
[  178.896819]  ? refcount_dec_and_test+0x11/0x20
[  178.896823]  ? reservation_object_add_excl_fence+0xa5/0x100
[  178.896835]  i915_gem_execbuffer2+0xab/0x1f0 [i915]
[  178.896844]  drm_ioctl+0x1e6/0x460 [drm]
[  178.896858]  ? i915_gem_execbuffer+0x260/0x260 [i915]
[  178.896862]  ? dput+0xcf/0x250
[  178.896866]  ? full_proxy_release+0x66/0x80
[  178.896869]  ? mntput+0x1f/0x30
[  178.896872]  do_vfs_ioctl+0x8f/0x5b0
[  178.896875]  ? ____fput+0x9/0x10
[  178.896878]  ? task_work_run+0x80/0xa0
[  178.896881]  SyS_ioctl+0x3c/0x70
[  178.896885]  entry_SYSCALL_64_fastpath+0x17/0x98
[  178.896888] RIP: 0033:0x7f2ccb455ca7
[  178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7
[  178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003
[  178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000
[  178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[  178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003
[  178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00

On the other hand, by ignoring the interrupt do we risk running out of
space in CSB ring? Testing for a few hours suggests not, i.e. that we
only seem to get the odd delayed CSB idle notification.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-6-chris@chris-wilson.co.uk
Acked-by: Takashi Iwai <tiwai@suse.de>

---
 drivers/gpu/drm/i915/i915_irq.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1323,8 +1323,10 @@ gen8_cs_irq_handler(struct intel_engine_
 	bool tasklet = false;
 
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
-		set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-		tasklet = true;
+		if (port_count(&engine->execlist_port[0])) {
+			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+			tasklet = true;
+		}
 	}
 
 	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) {