udelay: Lower the sleep vs delay threshold
By default, we busy-loop (a.k.a., "delay") for most delay values, and
only allow sleeping for large delays. But busy-looping is expensive, as
it wastes CPU cycles.
In a simple program that runs a bunch of samples of [1] over 1000
samples, I find that for 0.1 s (100000 us):
64x2 AMD CPU (CONFIG_HZ=250 / CONFIG_NO_HZ_FULL=y):
min diff: 60 us
max diff: 831 us
mean diff: 135 us
4+2 Mediatek MT8183 CPU (CONFIG_HZ=1000 / CONFIG_NO_HZ_IDLE=y /
sysctl kernel.timer_highres=1):
min diff: 70 us
max diff: 1556 us
mean diff: 146 us
4+2 Mediatek MT8183 CPU (CONFIG_HZ=1000 / CONFIG_NO_HZ_IDLE=y /
sysctl kernel.timer_highres=0):
min diff: 94 us
max diff: 7222 us
mean diff: 1201 us
i.e., maximum 1.5% error, typically ~0.1% error with high resolution
timers. Max 7% error, typical 1% error with low resolution timers.
This seems reasonable.
[1] Stripped / pseudocode:
clock_gettime(CLOCK_MONOTONIC, before);
nanosleep({ .tv_nsec = usecs * 1000 }, NULL);
clock_gettime(CLOCK_MONOTONIC, after);
diff = abs((after - before) / 1000 - usecs));
Change-Id: Ifd4821c66c5564f7c975c08769a6742f645e9be0
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-on: https://review.sourcearcade.org/c/flashprog/+/97
Reviewed-by: Arthur Heymans <arthur@aheymans.xyz>
Reviewed-by: Nico Huber <nico.h@gmx.de>
Tested-by: Nico Huber <nico.h@gmx.de>
diff --git a/udelay.c b/udelay.c
index 6c0efc4..c25d243 100644
--- a/udelay.c
+++ b/udelay.c
@@ -237,8 +237,8 @@
/* Precise delay. */
void internal_delay(unsigned int usecs)
{
- /* If the delay is >1 s, use internal_sleep because timing does not need to be so precise. */
- if (usecs > 1000000) {
+ /* If the delay is >0.1 s, use internal_sleep because timing does not need to be so precise. */
+ if (usecs > 100000) {
internal_sleep(usecs);
} else if (use_clock_gettime) {
clock_usec_delay(usecs);