|
Jan Kara |
7f7dff |
From: Jan Kara <jack@suse.cz>
|
|
Mel Gorman |
42f90c |
Date: Sun, 17 Jan 2016 17:30:38 +0000
|
|
Michal Kubecek |
b493de |
Subject: mm: readahead: Increase default readahead window
|
|
Jan Kara |
b53127 |
Patch-mainline: Never, mainline concerned by side-effects, we are confident this is a better default for SUSE install base
|
|
Jan Kara |
b53127 |
References: VM Performance, bsc#548529 bsc#1189955
|
|
Jan Kara |
7f7dff |
|
|
Mel Gorman |
42f90c |
Increase read_ahead_kb to values from SLES10 SP3 to get back sequential IO
|
|
Mel Gorman |
42f90c |
performance. This could be a sysctl but the sysctl would only be applied
|
|
Mel Gorman |
42f90c |
to files opened after the sysctl was updated. While it is unlikely that
|
|
Mel Gorman |
42f90c |
files opened by a systemd service earlier belong to processes that are
|
|
Mel Gorman |
42f90c |
read-intensive, it is still possible and this is more visible to kernel
|
|
Mel Gorman |
42f90c |
developers.
|
|
Jeff Mahoney |
85f007 |
|
|
Mel Gorman |
02b8b1 |
This is not in mainline due to concerns about side-effects. The most
|
|
Mel Gorman |
02b8b1 |
important is that a large readahead at the wrong times causes either stalls
|
|
Mel Gorman |
02b8b1 |
or IO starvations. For example, readahead on a slow device for data that
|
|
Mel Gorman |
02b8b1 |
is not required would starve other reads for potentially a long time. Larger
|
|
Mel Gorman |
02b8b1 |
readahead windows also increase memory footprint if the data was not required
|
|
Mel Gorman |
02b8b1 |
which potentially causes stalls due to reclaim later.
|
|
Mel Gorman |
02b8b1 |
|
|
Mel Gorman |
42f90c |
This was evaluated on two machines
|
|
Mel Gorman |
42f90c |
o a UMA machine, 8 cores and rotary storage
|
|
Mel Gorman |
42f90c |
o A NUMA machine, 4 socket, 48 cores and SSD storage
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Five basic tests were conducted;
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
1. paralleldd-single
|
|
Mel Gorman |
42f90c |
paralleldd uses different instances of dd to access a single file and
|
|
Mel Gorman |
42f90c |
write the contents to /dev/null. The performance of it depends on how
|
|
Mel Gorman |
42f90c |
well readahead works for a single file. It's mostly sequential IO.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
2. paralleldd-multi
|
|
Mel Gorman |
42f90c |
Similar to test 1 except each instance of dd accesses a different file
|
|
Mel Gorman |
42f90c |
so each instance of dd is accessing data sequentially but the timing
|
|
Mel Gorman |
42f90c |
makes it look like random read IO.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
3. pgbench-small
|
|
Mel Gorman |
42f90c |
A standard init of pgbench and execution with a small data set
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
4. pgbench-large
|
|
Mel Gorman |
42f90c |
A standard init of pgbench and execution with a large data set
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
5. bonnie++ with dataset sizes 2X RAM and in asyncronous mode
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA paralleldd-single on ext3
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanilla readahead-v1r1
|
|
Mel Gorman |
42f90c |
Amean Elapsd-1 5.42 ( 0.00%) 5.40 ( 0.50%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-3 7.51 ( 0.00%) 5.54 ( 26.25%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-5 7.15 ( 0.00%) 5.90 ( 17.46%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-7 5.81 ( 0.00%) 5.61 ( 3.42%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-8 6.05 ( 0.00%) 5.73 ( 5.36%)
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Results speak for themselves, readahead is a major boost when there
|
|
Mel Gorman |
42f90c |
are multiple readers of data. It's not displayed but system CPU
|
|
Mel Gorman |
42f90c |
usage is overall. The IO stats support the results
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanillareadahead-v1r1
|
|
Mel Gorman |
42f90c |
Mean sda-avgqusz 7.44 8.59
|
|
Mel Gorman |
42f90c |
Mean sda-avgrqsz 279.77 722.52
|
|
Mel Gorman |
42f90c |
Mean sda-await 31.95 48.82
|
|
Mel Gorman |
42f90c |
Mean sda-r_await 3.32 11.58
|
|
Mel Gorman |
42f90c |
Mean sda-w_await 127.51 119.60
|
|
Mel Gorman |
42f90c |
Mean sda-svctm 1.47 3.46
|
|
Mel Gorman |
42f90c |
Mean sda-rrqm 27.82 23.52
|
|
Mel Gorman |
42f90c |
Mean sda-wrqm 4.52 5.00
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
It shows that the average request size is 2.5 times larger even
|
|
Mel Gorman |
42f90c |
though the merging stats are similar. It's also interesting to
|
|
Mel Gorman |
42f90c |
note that average wait times are higher but more IO is being
|
|
Mel Gorman |
42f90c |
initiated per dd instance.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
It's interesting to note that this is specific to ext3 and that xfs showed
|
|
Mel Gorman |
42f90c |
a small regression with larger readahead.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA paralleldd-single on xfs
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanilla readahead-v1r1
|
|
Mel Gorman |
42f90c |
Min Elapsd-1 6.91 ( 0.00%) 7.10 ( -2.75%)
|
|
Mel Gorman |
42f90c |
Min Elapsd-3 6.77 ( 0.00%) 6.93 ( -2.36%)
|
|
Mel Gorman |
42f90c |
Min Elapsd-5 6.82 ( 0.00%) 7.00 ( -2.64%)
|
|
Mel Gorman |
42f90c |
Min Elapsd-7 6.84 ( 0.00%) 7.05 ( -3.07%)
|
|
Mel Gorman |
42f90c |
Min Elapsd-8 7.02 ( 0.00%) 7.04 ( -0.28%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-1 7.08 ( 0.00%) 7.20 ( -1.68%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-3 7.03 ( 0.00%) 7.12 ( -1.40%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-5 7.22 ( 0.00%) 7.38 ( -2.34%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-7 7.07 ( 0.00%) 7.19 ( -1.75%)
|
|
Mel Gorman |
42f90c |
Amean Elapsd-8 7.23 ( 0.00%) 7.23 ( -0.10%)
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
The IO stats are not displayed but show a similar ratio to ext3 and system
|
|
Mel Gorman |
42f90c |
CPU usage is also lower. Hence, this slowdown is unexplained but may be
|
|
Mel Gorman |
42f90c |
due to differences in XFS in the read path and how it locks even though
|
|
Mel Gorman |
42f90c |
direct IO is not a factor. Tracing was not enabled to see what flags are
|
|
Mel Gorman |
42f90c |
passed into xfs_ilock to see if the IO is all behind one lock but it's
|
|
Mel Gorman |
42f90c |
one potential explanation.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA paralleldd-single on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
This showed nothing interesting as the test was too short-lived to draw
|
|
Mel Gorman |
42f90c |
any conclusions. There was some difference in the kernels but it was
|
|
Mel Gorman |
42f90c |
within the noise. The same applies for XFS.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA pgbench-small on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
This showed very little that was interesting. The database load time
|
|
Mel Gorman |
42f90c |
was slower but by a very small margin. The actual transaction times
|
|
Mel Gorman |
42f90c |
were highly variable and inconclusive.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
NUMA pgbench-small on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Load times are not reported but they completed 1.5% faster.
|
|
Jan Kara |
7f7dff |
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanilla readahead-v1r1
|
|
Mel Gorman |
42f90c |
Hmean 1 3000.54 ( 0.00%) 2895.28 ( -3.51%)
|
|
Mel Gorman |
42f90c |
Hmean 8 20596.33 ( 0.00%) 19291.92 ( -6.33%)
|
|
Mel Gorman |
42f90c |
Hmean 12 30760.68 ( 0.00%) 30019.58 ( -2.41%)
|
|
Mel Gorman |
42f90c |
Hmean 24 74383.22 ( 0.00%) 73580.80 ( -1.08%)
|
|
Mel Gorman |
42f90c |
Hmean 32 88377.30 ( 0.00%) 88928.70 ( 0.62%)
|
|
Mel Gorman |
42f90c |
Hmean 48 88133.53 ( 0.00%) 96099.16 ( 9.04%)
|
|
Mel Gorman |
42f90c |
Hmean 80 55981.37 ( 0.00%) 76886.10 ( 37.34%)
|
|
Mel Gorman |
42f90c |
Hmean 112 74060.29 ( 0.00%) 87632.95 ( 18.33%)
|
|
Mel Gorman |
42f90c |
Hmean 144 51331.50 ( 0.00%) 66135.77 ( 28.84%)
|
|
Mel Gorman |
42f90c |
Hmean 172 44256.92 ( 0.00%) 63521.73 ( 43.53%)
|
|
Mel Gorman |
42f90c |
Hmean 192 35942.74 ( 0.00%) 71121.35 ( 97.87%)
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
The impact here is substantial particularly for higher thread-counts.
|
|
Mel Gorman |
42f90c |
It's interesting to note that there is an apparent regression for low
|
|
Mel Gorman |
42f90c |
thread counts. In general, there was a high degree of variability
|
|
Mel Gorman |
42f90c |
but the gains were all outside of the noise. In general, the io stats
|
|
Mel Gorman |
42f90c |
did not show any particular pattern about request size as the workload
|
|
Mel Gorman |
42f90c |
is mostly resident in memory. The real curiousity is that readahead
|
|
Mel Gorman |
42f90c |
should have had little or no impact here as the data is mostly resident
|
|
Mel Gorman |
42f90c |
in memory. Observing the transactions over time, there was a lot of
|
|
Mel Gorman |
42f90c |
variability and the performance is likely dominated by whether the
|
|
Mel Gorman |
42f90c |
data happened to be local or not. In itself, this test does not push
|
|
Mel Gorman |
42f90c |
for inclusion of the patch due to the lack of IO but is included for
|
|
Mel Gorman |
42f90c |
completeness.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA pgbench-small on xfs
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Similar observations to ext3 on the load times. The transaction times
|
|
Mel Gorman |
42f90c |
were stable but showed no significant performance difference.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA pgbench-large on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Database load times were slightly faster (3.36%). The transaction times
|
|
Mel Gorman |
42f90c |
were slower on average, more variable but still very close to the noise.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA pgbench-large on xfs
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
No significant difference on either database load times or transactions.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
UMA bonnie on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanilla readahead-v1r1
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Char 81079.98 ( 0.00%) 81172.05 ( 0.11%)
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Block 104416.12 ( 0.00%) 104116.24 ( -0.29%)
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Rewrite 44153.34 ( 0.00%) 44596.23 ( 1.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqIn Char 88144.56 ( 0.00%) 91702.67 ( 4.04%)
|
|
Mel Gorman |
42f90c |
Hmean SeqIn Block 134581.06 ( 0.00%) 137245.71 ( 1.98%)
|
|
Mel Gorman |
42f90c |
Hmean Random seeks 258.46 ( 0.00%) 280.82 ( 8.65%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate ops 2.25 ( 0.00%) 2.25 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate read 2.25 ( 0.00%) 2.25 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate del 911.29 ( 0.00%) 880.24 ( -3.41%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate ops 2.25 ( 0.00%) 2.25 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate read 2.00 ( 0.00%) 2.25 ( 12.50%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate del 911.89 ( 0.00%) 878.80 ( -3.63%)
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
The difference in headline performance figures is marginal and well within noise.
|
|
Mel Gorman |
42f90c |
The system CPU usage tells a slightly different story
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanillareadahead-v1r1
|
|
Mel Gorman |
42f90c |
User 1817.53 1798.89
|
|
Mel Gorman |
42f90c |
System 499.40 420.65
|
|
Mel Gorman |
42f90c |
Elapsed 10692.67 10588.08
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
As do the IO stats
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanillareadahead-v1r1
|
|
Mel Gorman |
42f90c |
Mean sda-avgqusz 1079.16 1083.35
|
|
Mel Gorman |
42f90c |
Mean sda-avgrqsz 807.95 1225.08
|
|
Mel Gorman |
42f90c |
Mean sda-await 7308.06 9647.13
|
|
Mel Gorman |
42f90c |
Mean sda-r_await 119.04 133.27
|
|
Mel Gorman |
42f90c |
Mean sda-w_await 19106.20 20255.41
|
|
Mel Gorman |
42f90c |
Mean sda-svctm 4.67 7.02
|
|
Mel Gorman |
42f90c |
Mean sda-rrqm 1.80 0.99
|
|
Mel Gorman |
42f90c |
Mean sda-wrqm 5597.12 5723.32
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
NUMA bonnie on ext3
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
bonnie
|
|
Mel Gorman |
42f90c |
4.4.0 4.4.0
|
|
Mel Gorman |
42f90c |
vanilla readahead-v1r1
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Char 58660.72 ( 0.00%) 58930.39 ( 0.46%)
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Block 253950.92 ( 0.00%) 261466.37 ( 2.96%)
|
|
Mel Gorman |
42f90c |
Hmean SeqOut Rewrite 151960.60 ( 0.00%) 161300.48 ( 6.15%)
|
|
Mel Gorman |
42f90c |
Hmean SeqIn Char 57015.41 ( 0.00%) 55699.16 ( -2.31%)
|
|
Mel Gorman |
42f90c |
Hmean SeqIn Block 600448.14 ( 0.00%) 627565.09 ( 4.52%)
|
|
Mel Gorman |
42f90c |
Hmean Random seeks 0.00 ( 0.00%) 0.00 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate ops 1.00 ( 0.00%) 1.00 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate read 3.00 ( 0.00%) 3.00 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean SeqCreate del 90.91 ( 0.00%) 79.88 (-12.14%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate ops 1.00 ( 0.00%) 1.50 ( 50.00%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate read 3.00 ( 0.00%) 3.00 ( 0.00%)
|
|
Mel Gorman |
42f90c |
Hmean RandCreate del 92.95 ( 0.00%) 93.97 ( 1.10%)
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
The impact is small but in line with the UMA machine in a number of details.
|
|
Mel Gorman |
42f90c |
As before, the CPU usage is lower even if the iostats show very little
|
|
Mel Gorman |
42f90c |
differences overall.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Overall, the headline performance figures are mostly improved or show
|
|
Mel Gorman |
42f90c |
little difference. There is a small anomaly with XFS that indicates it may
|
|
Mel Gorman |
42f90c |
not always win there due to other factors. There is also the possibility
|
|
Mel Gorman |
42f90c |
that a mostly random read workload that was larger than memory with each
|
|
Mel Gorman |
42f90c |
read spanning multiple pages but less than the max readahead window would
|
|
Mel Gorman |
42f90c |
suffer but the probability is low as the readahead window should scale
|
|
Mel Gorman |
42f90c |
properly. On balance, this is a win -- particularly on the large read
|
|
Mel Gorman |
42f90c |
workloads that a customer is likely to examine.
|
|
Mel Gorman |
42f90c |
|
|
Mel Gorman |
42f90c |
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Mel Gorman |
42f90c |
Signed-off-by: Mel Gorman <mgorman@suse.de>
|
|
Greg Kroah-Hartman |
dfa8b1 |
---
|
|
Michal Kubecek |
887490 |
include/linux/pagemap.h | 2 +-
|
|
Mel Gorman |
42f90c |
1 file changed, 1 insertion(+), 1 deletion(-)
|
|
Greg Kroah-Hartman |
dfa8b1 |
|
|
Michal Kubecek |
887490 |
--- a/include/linux/pagemap.h
|
|
Michal Kubecek |
887490 |
+++ b/include/linux/pagemap.h
|
|
Michal Kubecek |
3c5082 |
@@ -808,7 +808,7 @@ struct readahead_control {
|
|
Michal Kubecek |
3c5082 |
._index = i, \
|
|
Michal Kubecek |
3c5082 |
}
|
|
Jan Kara |
7f7dff |
|
|
Michal Kubecek |
b493de |
-#define VM_READAHEAD_PAGES (SZ_128K / PAGE_SIZE)
|
|
Michal Kubecek |
b493de |
+#define VM_READAHEAD_PAGES (SZ_512K / PAGE_SIZE)
|
|
Jan Kara |
7f7dff |
|
|
Michal Kubecek |
3c5082 |
void page_cache_ra_unbounded(struct readahead_control *,
|
|
Michal Kubecek |
3c5082 |
unsigned long nr_to_read, unsigned long lookahead_count);
|