Blob Blame History Raw
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Date: Wed, 24 Sep 2014 23:41:29 -0700
Subject: [PATCH] module: warn if module init + probe takes long
References: bnc#889297 bnc#877622 bnc#889295 bnc#893454
Patch-mainline: Never, well maybe as RFC

Systemd has a general timeout for all workers currently set to 180
seconds. Systemd now has a warning which is issued once it reaches 1/3
of the timeout. The original motivation for the systemd timeout was
to help track device drivers which do not use asynch firmware loading
on init(), and the timeout was originally set to 30 seconds.

Since systemd + kernel are heavily tied in for the purposes of this
patch it is assumed you have merged on systemd the following commits:

671174136525ddf208cdbe75d6d6bd159afa961f        udev: timeout - warn after a third of the timeout before killing
b5338a19864ac3f5632aee48069a669479621dca        udev: timeout - increase timeout
2e92633dbae52f5ac9b7b2e068935990d475d2cd        udev: bump event timeout to 60 seconds
be2ea723b1d023b3d385d3b791ee4607cbfb20ca        udev: remove userspace firmware loading support
9f20a8a376f924c8eb5423cfc1f98644fc1e2d1a        udev: fixup commit
dd5eddd28a74a49607a8fffcaf960040dba98479        udev: unify event timeout handling
9719859c07aa13539ed2cd4b31972cd30f678543        udevd: add --event-timeout commandline option

One type of worker is the kernel module loader which uses the kmod
library for loading modules using the finit_module() system call.
Once the timeout is triggered systemd sends the worker process a
sigkill signal, this gets propagated to the kernel. Not all modules
get killed and that is because not all code paths on module
initialization check for signals. If it so happens that any path
in the module initializaiton happens to have some code path that checks
fatal_signal_pending() module loading can fail and at different points
on the initialization path. Drivers will only be correctly
de-initialized *iff* the error code handling of a device driver
is perfect. For drivers not well programmed this can leave device
driver in half initialized state. This is an issue.

Another issue is that the original implementation of the timeout assumed
that the timeout would only affect device drivers who are taking long
on init(), the driver core however batches both init() and probe()
synchronously. This means the timeout is also penalizing device drivers
that take long on probe, and although some subsystems, such as storage, do
have the goal to do probing asynchronously, not all subsystems have this
policy or set objective and it is therefore not practical to be
penalizing with a sigkill all subsystem device drivers.

Yet another issue is that the driver core calls probe for *all* devices
that a driver can claim serially, so if a device driver will need to
probe 3 devices and if probe on the device driver is synchronous
the amount of time that module loading will take will be:

driver load time = init() + probe for 3 devices serially

This ultimatley ends up limiting the number of devices that
*any* device driver can support based on the following formula:

number_devices =          systemd_timeout
                  -------------------------------------
                      max known probe time for driver

We don't want to work around the timeout, instead systemd must
be changed to take all the above into consideration when issuing
any kills on device drivers. We will eventually implement support
for letting userspace specify async probe is desired and the driver
core will handle that instead of requiring each device driver to
implement that. The systemd timeout should be increased or at least
the sigkill removed for kernel module loading for now.

In the meantime while systemd collects logs on systemd-udevd of workers
that take long (warning) or when it kills a worker, not many folks are
collecting logs for (journalctl -u systemd-udevd) and its not highly
expected that folks will submit bugs for these. Since the timeout really
should be subsystem / driver specific and since we're told systemd cannot
remove the sigkill / timeout due to its design right now this assumes
the timeout of systemd will be increased for at least kernel module loading
to 1 hour, and in order to help collect logs with this current upper limit
issue a warn to help collect logs of offending drivers. The limit restricts
the number of devices then to what we currently will expect for commulative
number of device probes + init on one device driver but this is a warning,
and does not kill device driers, we currently set this limit to 5 minutes.
This limit is still quite arbitrary but it can be increased / decreased
as systems grow and the systemd timeout architecture gets heavily
revisted in terms of design.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/module.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index 65fdb4e8293a..a442c1c3c9cf 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3697,6 +3697,8 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	struct module *mod;
 	long err;
 	char *after_dashes;
+	unsigned long time_start, time_end;
+	unsigned int delta;
 
 	err = module_sig_check(info, flags);
 	if (err)
@@ -3820,7 +3822,22 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	/* Done! */
 	trace_module_load(mod);
 
-	return do_init_module(mod);
+	time_start = jiffies;
+	err = do_init_module(mod);
+	time_end = jiffies;
+
+	delta = jiffies_to_msecs(time_end - time_start);
+
+	/*
+	 * Some subsystems do async probe, some do not, but lets be clear
+	 * about the possibility of both, some subsystems do want to see
+	 * reports about this and address these delays. 5 minutes for now.
+	 */
+	WARN(delta >= 300000,
+	     "Device driver %s initialization / probe took %d ms to complete, report this\n",
+	     mod->name, delta);
+
+	return err;
 
  sysfs_cleanup:
 	mod_sysfs_teardown(mod);