Jan Kara 7f7dff
From: Jan Kara <jack@suse.cz>
Mel Gorman 42f90c
Date: Sun, 17 Jan 2016 17:30:38 +0000
Michal Kubecek b493de
Subject: mm: readahead: Increase default readahead window
Jan Kara b53127
Patch-mainline: Never, mainline concerned by side-effects, we are confident this is a better default for SUSE install base
Jan Kara b53127
References: VM Performance, bsc#548529 bsc#1189955
Jan Kara 7f7dff
Mel Gorman 42f90c
Increase read_ahead_kb to values from SLES10 SP3 to get back sequential IO
Mel Gorman 42f90c
performance. This could be a sysctl but the sysctl would only be applied
Mel Gorman 42f90c
to files opened after the sysctl was updated.  While it is unlikely that
Mel Gorman 42f90c
files opened by a systemd service earlier belong to processes that are
Mel Gorman 42f90c
read-intensive, it is still possible and this is more visible to kernel
Mel Gorman 42f90c
developers.
Jeff Mahoney 85f007
Mel Gorman 02b8b1
This is not in mainline due to concerns about side-effects. The most
Mel Gorman 02b8b1
important is that a large readahead at the wrong times causes either stalls
Mel Gorman 02b8b1
or IO starvations. For example, readahead on a slow device for data that
Mel Gorman 02b8b1
is not required would starve other reads for potentially a long time. Larger
Mel Gorman 02b8b1
readahead windows also increase memory footprint if the data was not required
Mel Gorman 02b8b1
which potentially causes stalls due to reclaim later.
Mel Gorman 02b8b1
Mel Gorman 42f90c
This was evaluated on two machines
Mel Gorman 42f90c
o a UMA machine, 8 cores and rotary storage
Mel Gorman 42f90c
o A NUMA machine, 4 socket, 48 cores and SSD storage
Mel Gorman 42f90c
Mel Gorman 42f90c
Five basic tests were conducted;
Mel Gorman 42f90c
Mel Gorman 42f90c
1. paralleldd-single
Mel Gorman 42f90c
   paralleldd uses different instances of dd to access a single file and
Mel Gorman 42f90c
   write the contents to /dev/null. The performance of it depends on how
Mel Gorman 42f90c
   well readahead works for a single file. It's mostly sequential IO.
Mel Gorman 42f90c
Mel Gorman 42f90c
2. paralleldd-multi
Mel Gorman 42f90c
   Similar to test 1 except each instance of dd accesses a different file
Mel Gorman 42f90c
   so each instance of dd is accessing data sequentially but the timing
Mel Gorman 42f90c
   makes it look like random read IO.
Mel Gorman 42f90c
Mel Gorman 42f90c
3. pgbench-small
Mel Gorman 42f90c
   A standard init of pgbench and execution with a small data set
Mel Gorman 42f90c
Mel Gorman 42f90c
4. pgbench-large
Mel Gorman 42f90c
   A standard init of pgbench and execution with a large data set
Mel Gorman 42f90c
Mel Gorman 42f90c
5. bonnie++ with dataset sizes 2X RAM and in asyncronous mode
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA paralleldd-single on ext3
Mel Gorman 42f90c
                                  4.4.0                 4.4.0
Mel Gorman 42f90c
                                vanilla        readahead-v1r1
Mel Gorman 42f90c
Amean    Elapsd-1        5.42 (  0.00%)        5.40 (  0.50%)
Mel Gorman 42f90c
Amean    Elapsd-3        7.51 (  0.00%)        5.54 ( 26.25%)
Mel Gorman 42f90c
Amean    Elapsd-5        7.15 (  0.00%)        5.90 ( 17.46%)
Mel Gorman 42f90c
Amean    Elapsd-7        5.81 (  0.00%)        5.61 (  3.42%)
Mel Gorman 42f90c
Amean    Elapsd-8        6.05 (  0.00%)        5.73 (  5.36%)
Mel Gorman 42f90c
Mel Gorman 42f90c
Results speak for themselves, readahead is a major boost when there
Mel Gorman 42f90c
are multiple readers of data. It's not displayed but system CPU
Mel Gorman 42f90c
usage is overall. The IO stats support the results
Mel Gorman 42f90c
Mel Gorman 42f90c
                       4.4.0       4.4.0
Mel Gorman 42f90c
                     vanillareadahead-v1r1
Mel Gorman 42f90c
Mean sda-avgqusz        7.44        8.59
Mel Gorman 42f90c
Mean sda-avgrqsz      279.77      722.52
Mel Gorman 42f90c
Mean sda-await         31.95       48.82
Mel Gorman 42f90c
Mean sda-r_await        3.32       11.58
Mel Gorman 42f90c
Mean sda-w_await      127.51      119.60
Mel Gorman 42f90c
Mean sda-svctm          1.47        3.46
Mel Gorman 42f90c
Mean sda-rrqm          27.82       23.52
Mel Gorman 42f90c
Mean sda-wrqm           4.52        5.00
Mel Gorman 42f90c
Mel Gorman 42f90c
It shows that the average request size is 2.5 times larger even
Mel Gorman 42f90c
though the merging stats are similar. It's also interesting to
Mel Gorman 42f90c
note that average wait times are higher but more IO is being
Mel Gorman 42f90c
initiated per dd instance.
Mel Gorman 42f90c
Mel Gorman 42f90c
It's interesting to note that this is specific to ext3 and that xfs showed
Mel Gorman 42f90c
a small regression with larger readahead.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA paralleldd-single on xfs
Mel Gorman 42f90c
                                  4.4.0                 4.4.0
Mel Gorman 42f90c
                                vanilla        readahead-v1r1
Mel Gorman 42f90c
Min      Elapsd-1        6.91 (  0.00%)        7.10 ( -2.75%)
Mel Gorman 42f90c
Min      Elapsd-3        6.77 (  0.00%)        6.93 ( -2.36%)
Mel Gorman 42f90c
Min      Elapsd-5        6.82 (  0.00%)        7.00 ( -2.64%)
Mel Gorman 42f90c
Min      Elapsd-7        6.84 (  0.00%)        7.05 ( -3.07%)
Mel Gorman 42f90c
Min      Elapsd-8        7.02 (  0.00%)        7.04 ( -0.28%)
Mel Gorman 42f90c
Amean    Elapsd-1        7.08 (  0.00%)        7.20 ( -1.68%)
Mel Gorman 42f90c
Amean    Elapsd-3        7.03 (  0.00%)        7.12 ( -1.40%)
Mel Gorman 42f90c
Amean    Elapsd-5        7.22 (  0.00%)        7.38 ( -2.34%)
Mel Gorman 42f90c
Amean    Elapsd-7        7.07 (  0.00%)        7.19 ( -1.75%)
Mel Gorman 42f90c
Amean    Elapsd-8        7.23 (  0.00%)        7.23 ( -0.10%)
Mel Gorman 42f90c
Mel Gorman 42f90c
The IO stats are not displayed but show a similar ratio to ext3 and system
Mel Gorman 42f90c
CPU usage is also lower. Hence, this slowdown is unexplained but may be
Mel Gorman 42f90c
due to differences in XFS in the read path and how it locks even though
Mel Gorman 42f90c
direct IO is not a factor. Tracing was not enabled to see what flags are
Mel Gorman 42f90c
passed into xfs_ilock to see if the IO is all behind one lock but it's
Mel Gorman 42f90c
one potential explanation.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA paralleldd-single on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
This showed nothing interesting as the test was too short-lived to draw
Mel Gorman 42f90c
any conclusions. There was some difference in the kernels but it was
Mel Gorman 42f90c
within the noise. The same applies for XFS.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA pgbench-small on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
This showed very little that was interesting. The database load time
Mel Gorman 42f90c
was slower but by a very small margin. The actual transaction times
Mel Gorman 42f90c
were highly variable and inconclusive.
Mel Gorman 42f90c
Mel Gorman 42f90c
NUMA pgbench-small on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
Load times are not reported but they completed 1.5% faster.
Jan Kara 7f7dff
Mel Gorman 42f90c
                             4.4.0                 4.4.0
Mel Gorman 42f90c
                           vanilla        readahead-v1r1
Mel Gorman 42f90c
Hmean    1       3000.54 (  0.00%)     2895.28 ( -3.51%)
Mel Gorman 42f90c
Hmean    8      20596.33 (  0.00%)    19291.92 ( -6.33%)
Mel Gorman 42f90c
Hmean    12     30760.68 (  0.00%)    30019.58 ( -2.41%)
Mel Gorman 42f90c
Hmean    24     74383.22 (  0.00%)    73580.80 ( -1.08%)
Mel Gorman 42f90c
Hmean    32     88377.30 (  0.00%)    88928.70 (  0.62%)
Mel Gorman 42f90c
Hmean    48     88133.53 (  0.00%)    96099.16 (  9.04%)
Mel Gorman 42f90c
Hmean    80     55981.37 (  0.00%)    76886.10 ( 37.34%)
Mel Gorman 42f90c
Hmean    112    74060.29 (  0.00%)    87632.95 ( 18.33%)
Mel Gorman 42f90c
Hmean    144    51331.50 (  0.00%)    66135.77 ( 28.84%)
Mel Gorman 42f90c
Hmean    172    44256.92 (  0.00%)    63521.73 ( 43.53%)
Mel Gorman 42f90c
Hmean    192    35942.74 (  0.00%)    71121.35 ( 97.87%)
Mel Gorman 42f90c
Mel Gorman 42f90c
The impact here is substantial particularly for higher thread-counts.
Mel Gorman 42f90c
It's interesting to note that there is an apparent regression for low
Mel Gorman 42f90c
thread counts. In general, there was a high degree of variability
Mel Gorman 42f90c
but the gains were all outside of the noise. In general, the io stats
Mel Gorman 42f90c
did not show any particular pattern about request size as the workload
Mel Gorman 42f90c
is mostly resident in memory. The real curiousity is that readahead
Mel Gorman 42f90c
should have had little or no impact here as the data is mostly resident
Mel Gorman 42f90c
in memory. Observing the transactions over time, there was a lot of
Mel Gorman 42f90c
variability and the performance is likely dominated by whether the
Mel Gorman 42f90c
data happened to be local or not. In itself, this test does not push
Mel Gorman 42f90c
for inclusion of the patch due to the lack of IO but is included for
Mel Gorman 42f90c
completeness.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA pgbench-small on xfs
Mel Gorman 42f90c
Mel Gorman 42f90c
Similar observations to ext3 on the load times. The transaction times
Mel Gorman 42f90c
were stable but showed no significant performance difference.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA pgbench-large on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
Database load times were slightly faster (3.36%). The transaction times
Mel Gorman 42f90c
were slower on average, more variable but still very close to the noise.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA pgbench-large on xfs
Mel Gorman 42f90c
Mel Gorman 42f90c
No significant difference on either database load times or transactions.
Mel Gorman 42f90c
Mel Gorman 42f90c
UMA bonnie on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
                                               4.4.0                       4.4.0
Mel Gorman 42f90c
                                             vanilla              readahead-v1r1
Mel Gorman 42f90c
Hmean    SeqOut Char            81079.98 (  0.00%)        81172.05 (  0.11%)
Mel Gorman 42f90c
Hmean    SeqOut Block          104416.12 (  0.00%)       104116.24 ( -0.29%)
Mel Gorman 42f90c
Hmean    SeqOut Rewrite         44153.34 (  0.00%)        44596.23 (  1.00%)
Mel Gorman 42f90c
Hmean    SeqIn  Char            88144.56 (  0.00%)        91702.67 (  4.04%)
Mel Gorman 42f90c
Hmean    SeqIn  Block          134581.06 (  0.00%)       137245.71 (  1.98%)
Mel Gorman 42f90c
Hmean    Random seeks             258.46 (  0.00%)          280.82 (  8.65%)
Mel Gorman 42f90c
Hmean    SeqCreate ops              2.25 (  0.00%)            2.25 (  0.00%)
Mel Gorman 42f90c
Hmean    SeqCreate read             2.25 (  0.00%)            2.25 (  0.00%)
Mel Gorman 42f90c
Hmean    SeqCreate del            911.29 (  0.00%)          880.24 ( -3.41%)
Mel Gorman 42f90c
Hmean    RandCreate ops             2.25 (  0.00%)            2.25 (  0.00%)
Mel Gorman 42f90c
Hmean    RandCreate read            2.00 (  0.00%)            2.25 ( 12.50%)
Mel Gorman 42f90c
Hmean    RandCreate del           911.89 (  0.00%)          878.80 ( -3.63%)
Mel Gorman 42f90c
Mel Gorman 42f90c
The difference in headline performance figures is marginal and well within noise.
Mel Gorman 42f90c
The system CPU usage tells a slightly different story
Mel Gorman 42f90c
Mel Gorman 42f90c
               4.4.0       4.4.0
Mel Gorman 42f90c
             vanillareadahead-v1r1
Mel Gorman 42f90c
User         1817.53     1798.89
Mel Gorman 42f90c
System        499.40      420.65
Mel Gorman 42f90c
Elapsed     10692.67    10588.08
Mel Gorman 42f90c
Mel Gorman 42f90c
As do the IO stats
Mel Gorman 42f90c
Mel Gorman 42f90c
                      4.4.0       4.4.0
Mel Gorman 42f90c
                     vanillareadahead-v1r1
Mel Gorman 42f90c
Mean sda-avgqusz     1079.16     1083.35
Mel Gorman 42f90c
Mean sda-avgrqsz      807.95     1225.08
Mel Gorman 42f90c
Mean sda-await       7308.06     9647.13
Mel Gorman 42f90c
Mean sda-r_await      119.04      133.27
Mel Gorman 42f90c
Mean sda-w_await    19106.20    20255.41
Mel Gorman 42f90c
Mean sda-svctm          4.67        7.02
Mel Gorman 42f90c
Mean sda-rrqm           1.80        0.99
Mel Gorman 42f90c
Mean sda-wrqm        5597.12     5723.32
Mel Gorman 42f90c
Mel Gorman 42f90c
NUMA bonnie on ext3
Mel Gorman 42f90c
Mel Gorman 42f90c
bonnie
Mel Gorman 42f90c
                                               4.4.0                       4.4.0
Mel Gorman 42f90c
                                             vanilla              readahead-v1r1
Mel Gorman 42f90c
Hmean    SeqOut Char            58660.72 (  0.00%)        58930.39 (  0.46%)
Mel Gorman 42f90c
Hmean    SeqOut Block          253950.92 (  0.00%)       261466.37 (  2.96%)
Mel Gorman 42f90c
Hmean    SeqOut Rewrite        151960.60 (  0.00%)       161300.48 (  6.15%)
Mel Gorman 42f90c
Hmean    SeqIn  Char            57015.41 (  0.00%)        55699.16 ( -2.31%)
Mel Gorman 42f90c
Hmean    SeqIn  Block          600448.14 (  0.00%)       627565.09 (  4.52%)
Mel Gorman 42f90c
Hmean    Random seeks               0.00 (  0.00%)            0.00 (  0.00%)
Mel Gorman 42f90c
Hmean    SeqCreate ops              1.00 (  0.00%)            1.00 (  0.00%)
Mel Gorman 42f90c
Hmean    SeqCreate read             3.00 (  0.00%)            3.00 (  0.00%)
Mel Gorman 42f90c
Hmean    SeqCreate del             90.91 (  0.00%)           79.88 (-12.14%)
Mel Gorman 42f90c
Hmean    RandCreate ops             1.00 (  0.00%)            1.50 ( 50.00%)
Mel Gorman 42f90c
Hmean    RandCreate read            3.00 (  0.00%)            3.00 (  0.00%)
Mel Gorman 42f90c
Hmean    RandCreate del            92.95 (  0.00%)           93.97 (  1.10%)
Mel Gorman 42f90c
Mel Gorman 42f90c
The impact is small but in line with the UMA machine in a number of details.
Mel Gorman 42f90c
As before, the CPU usage is lower even if the iostats show very little
Mel Gorman 42f90c
differences overall.
Mel Gorman 42f90c
Mel Gorman 42f90c
Overall, the headline performance figures are mostly improved or show
Mel Gorman 42f90c
little difference. There is a small anomaly with XFS that indicates it may
Mel Gorman 42f90c
not always win there due to other factors. There is also the possibility
Mel Gorman 42f90c
that a mostly random read workload that was larger than memory with each
Mel Gorman 42f90c
read spanning multiple pages but less than the max readahead window would
Mel Gorman 42f90c
suffer but the probability is low as the readahead window should scale
Mel Gorman 42f90c
properly. On balance, this is a win -- particularly on the large read
Mel Gorman 42f90c
workloads that a customer is likely to examine.
Mel Gorman 42f90c
Mel Gorman 42f90c
Signed-off-by: Jan Kara <jack@suse.cz>
Mel Gorman 42f90c
Signed-off-by: Mel Gorman <mgorman@suse.de>
Greg Kroah-Hartman dfa8b1
---
Michal Kubecek 887490
 include/linux/pagemap.h | 2 +-
Mel Gorman 42f90c
 1 file changed, 1 insertion(+), 1 deletion(-)
Greg Kroah-Hartman dfa8b1
Michal Kubecek 887490
--- a/include/linux/pagemap.h
Michal Kubecek 887490
+++ b/include/linux/pagemap.h
Michal Kubecek 3c5082
@@ -808,7 +808,7 @@ struct readahead_control {
Michal Kubecek 3c5082
 		._index = i,						\
Michal Kubecek 3c5082
 	}
Jan Kara 7f7dff
 
Michal Kubecek b493de
-#define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
Michal Kubecek b493de
+#define VM_READAHEAD_PAGES	(SZ_512K / PAGE_SIZE)
Jan Kara 7f7dff
 
Michal Kubecek 3c5082
 void page_cache_ra_unbounded(struct readahead_control *,
Michal Kubecek 3c5082
 		unsigned long nr_to_read, unsigned long lookahead_count);