Blob Blame History Raw
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2018 09:06:34 -0700
Subject: [PATCH 2/8] x86/speculation/l1tf: Change order of offset/type in swap
 entry
Patch-mainline: v4.19-rc1
Git-commit: bcd11afa7adad8d720e7ba5ef58bdcd9775cf45f
References: bnc#1087081, CVE-2018-3620

Here's a patch that switches the order of "type" and
"offset" in the x86-64 encoding in preparation of the next
patch which inverts the swap entry to protect against L1TF.

That means that now the offset is bits 9-58 in the page table, and that
the type is in the bits that hardware generally doesn't care about.

That, in turn, means that if you have a desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, you still have
to have 30 bits of offset actually *in use* until bit 39 ends up being
clear.

So that's 4 terabyte of swap space (because the offset is counted in
pages, so 30 bits of offset is 42 bits of actual coverage). With bigger
physical addressing, that obviously grows further, until you hit the limit
of the offset (at 50 bits of offset - 62 bits of actual swap file
coverage).

[updated description and minor tweaks by AK]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

---
 arch/x86/include/asm/pgtable_64.h |   31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -274,7 +274,7 @@ static inline int pgd_large(pgd_t pgd) {
  *
  * |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0| <- bit number
  * |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
- * | OFFSET (14->63) | TYPE (9-13)  |0|X|X|X| X| X|X|X|0| <- swp entry
+ * | TYPE (59-63) |  OFFSET (9-58)  |0|X|X|X| X| X|X|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -282,19 +282,28 @@ static inline int pgd_large(pgd_t pgd) {
  * erratum where they can be incorrectly set by hardware on
  * non-present PTEs.
  */
-#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
-#define SWP_TYPE_BITS 5
-/* Place the offset above the type: */
-#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)
+#define SWP_TYPE_BITS		5
+
+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
-#define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
-					 & ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
-#define __swp_entry(type, offset)	((swp_entry_t) { \
-					 ((type) << (SWP_TYPE_FIRST_BIT)) \
-					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
+/* Extract the high bits for type */
+#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
+
+/* Shift up (to get rid of type), then down to get value */
+#define __swp_offset(x) ((x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+
+/*
+ * Shift the offset up "too far" by TYPE bits, then down again
+ */
+#define __swp_entry(type, offset) ((swp_entry_t) { \
+	((unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })
+
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })