Registry virtualization is an internal Windows feature present in the OS since Windows Vista. It is responsible for redirecting accesses that reference global, admin-writable keys (such as most of HKEY_LOCAL_MACHINE) and transparently pointing them to user-accessible locations, so that legacy applications that expect to always run with administrative privileges can continue to work under more restricted user accounts. For example, with registry virtualization enabled, writes to HKLM\Software are silently translated to HKCU\Software\Classes\VirtualStore\Machine\Software by the kernel.

An important part of registry virtualization is key replication - it is the exact process of replicating the subkey structure from a virtualizable hive (e.g. HKLM\Software) to the virtual store (HKCU\Software\Classes\VirtualStore), whenever an operation is performed that requires the virtual key to exist. The two top-level kernel routines related to this functionality are CmKeyBodyReplicateToVirtual and CmpReplicateKeyToVirtual (the latter called by the former), and they are used in the following cases:

On attempt to create a key in a virtualizable admin-only hive,
On attempt to set a value in such a key via NtSetValueKey,
On attempt to set information on such a key (flags etc.) via NtSetInformationKey,
On attempt to rename such a key via NtRenameKey.

Key replication is overall a complex procedure consisting of several logical steps:

Creating one or more nested subkeys in the virtual store to replicate the full desired path,
Setting the SACL (system access control list) part of the leaf key's security descriptor to the SACL of the real key being virtualized, achieved in two steps:
a) Freeing the SD of the leaf key,
b) Allocating a new SD with the old settings of Owner/Group/DACL, and the new SACL part.

It is important to note that the virtual store is fully under the user's control and can be set to an adversarial state. However, the key replication code doesn't seem to take that into account, and mostly assumes that the user classes hive is in a pristine state. This may lead to several general types of issues:

When replicating the key tree, the kernel uses internal functions to allocate new keys and to attach them to existing ones - such as CmpCreateEmptyKey and CmpAddSubKeyEx. This is a very direct way to create new keys, as it bypasses the entire logic in CmpParseKey/CmpDoParseKey/CmpCreateChild etc. that is normally involved when creating keys through standard means, including bypassing various sanity checks that are implemented there.
A single path replication operation is made up of several smaller steps, and each of these steps can potentially fail (due to memory/hive space exhaustion or other conditions). It is imperative that the logic is implemented in an error-proof manner, such that any failure along the way causes the previously completed stages to be reverted, or not get applied at all until the entire process succeeds.
Programs running in the system may have open handles to keys in the virtual store being operated on by registry virtualization. This means that any changes made to such keys in the hive must also be reflected in the Key Control Block (KCB) structures corresponding to them (even if replication ends with just partial success).

Based on the above ideas, we have identified four specific bugs/vulnerabilities, which make it possible to:

Abuse registry virtualization to create stable keys under volatile ones.
Obtain registry keys with an invalid security cell of _CM_KEY_NODE.Security == -1.
Obtain KCBs with stale information due to a missing refresh in case of a partial replication success.
Obtain KCBs with stale information due to the relevant refresh functions rejecting symbolic links and predefined-handle keys.

All of these problems are mostly distinct, but they share a significant amount of context, so they are reported collectively here. Each of them is discussed in a separate section below.

========== Creation of stable subkeys under volatile keys ==========

In Windows Registry, 'stable' keys are the default, persistent keys that are written to disk and accessible across reboots, while the 'volatile' ones are well... volatile, i.e. only existent in memory, and only for as long as the hive is loaded in the OS. Stable keys may have volatile subkeys, but for obvious reasons, the opposite wouldn't make sense so it is disallowed by the internal CmpCreateChild function, which returns STATUS_CHILD_MUST_BE_VOLATILE (0xC0000181) if such an attempt is made.

This limitation can be bypassed because registry virtualization tries to preserve the volatility of keys being virtualized, but it doesn't check the volatility of existing keys in the virtual store. Let's assume we first create the following registry path:

HKCU\Software\Classes\VirtualStore\Machine\Software
_/__________/
| |
stable volatile

And then perform an operation that requires the following path to be replicated:

HKLM\Software\Microsoft
_____________________/
|
stable

This will prompt the kernel to create a stable "Microsoft" subkey under "Software" in the virtual store, resulting in the following structure:

HKCU\Software\Classes\VirtualStore\Machine\Software\Microsoft
_/_____/_/
| | |
stable volatile stable

This behavior breaks the canonical rule of registry key storage types, but what are the security implications here? This is less clear, as at the time of writing, we haven't found a direct way to convert this mismatch to a memory corruption primitive. The stable keys with volatile parents don't survive reboots and unloading of the hive in general, since as soon as the volatile keys are lost, there is no direct connection from the root of the hive to the dangling stable keys (even though they remain present as allocated cells in the hive file).

Furthermore, this behavior also has the potential to corrupt the linked list of security descriptors. In normal circumstances, all stable SDs (associated with stable keys) are connected in a single linked list via the _CM_KEY_SECURITY.Flink/Blink fields. On the other hand, volatile SDs are each in their own single-entry lists (Flink/Blink pointing at itself), since they are not persistent and don't have to be tracked. When a stable subkey with a unique security descriptor is created under a volatile key, the volatile SD of the parent is erroneously used to add the subkey's SD to the linked list in CmpInsertSecurityCellList. This creates a linked list of mixed stable/volatile descriptors that is disconnected from the main descriptor list (pointed to by the root cell), and which will be discarded on the next reloading of the hive - thus resetting the security of any other stable keys in the hive that might have started to share the descriptor at some point.

Overall, we believe there is some potential for this issue to lead to memory corruption either in code related to handling subkey lists or security descriptors, but we haven't investigated further. In addition to addressing the stable/volatile problem, we also recommend analyzing what other checks executed during normal key creation are skipped during key replication, and either adding them to the virtualization code, or redesigning the feature entirely to achieve key replication through the more standardized interface (NT Object Manager).

Attached is a proof-of-concept program that triggers the behavior described above. We have tested it on Windows 11 (October 2022 update). It is easiest to examine the resulting state of the registry with WinDbg attached as a kernel debugger, by using the !reg extension to confirm the stable/volatile types of the keys and the corrupted security descriptor list.

========== Creation of keys with invalid security descriptors ==========

As mentioned earlier in the report, a part of the key replication process is to replicate the SACL portion of the replicated key's security descriptor, to the descriptor of the virtual key. This entire task is performed by the CmpCopySaclToVirtualKey function, which is called as the last step in CmpReplicateKeyToVirtual (after creating the desired key structure). It does so by:

Building the new security descriptor in memory through a series of Rtl[Get|Set]...SecurityDescriptor calls.
Freeing the current security descriptor of the leaf key via CmpFreeSecurityDescriptor.
Allocating a cell for the new SD in the hive and assigning it to the leaf key via CmpGetSecurityDescriptorNode.

Now the problem here is that if step 3 fails, then step 2 is not reverted and the key remains with an invalid index of the security cell. And the step can indeed fail, because the new descriptor may be larger than the previous one (because of the new SACL part), so in a case where the hive is full, the allocation may fail.

What we need to try to trigger the issue is a key in a virtualizable hive that grants read access to regular users, has a SACL component in its security descriptor, and doesn't have the REG_KEY_DONT_VIRTUALIZE bit set. There don't seem to be too many keys like that in a default Windows installation, but one suitable one that meets all these requirements is:

HKLM\Software\Microsoft\Windows Advanced Threat Protection

The second part of the attack is to be able to fill up the hive as much possible, so that even after freeing one SD, allocating a slightly bigger one will fail because there is no space left. Since each of stable/volatile storage types are limited to 2 GiB, this can be achieved mostly reliably and within a limited CPU time/memory overhead. In our proof-of-concept, we achieve this by creating a series of values with descending lengths starting from 1 MiB, in order to pack the hive structure as tightly as possible and allocate every last free chunk. We can choose whether we want to perform the attack in the stable or volatile space. In our demonstration, we have chosen volatile because it's more consistent (we start with an empty storage and are independent of any previous state of the hive) and is in-memory only, so it doesn't generate excessive writes to disk and doesn't persistently bloat the size of the UsrClass.dat hive file.

Once we trigger the bug and obtain a key with _CM_KEY_NODE.Security set to -1, the last piece of the puzzle is how to exploit it for some kind of memory corruption. One way we have found is through the very same CmpCopySaclToVirtualKey function, which references the virtual key's security cell to get its Owner/Group/DACL. So once the VirtualStore key structure and the OOM condition are set up, our exploit boils down to two consecutive RegRenameKey API calls, first to trigger the bug, and then to trigger a kernel panic.

An example crash log, generated on Windows 11 (October 2022 update), is shown below:

--- cut ---
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff8076207e89c, Address of the instruction which caused the bugcheck
Arg3: ffff800ce12109a0, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

[...]

CONTEXT: ffff800ce12109a0 -- (.cxr 0xffff800ce12109a0)
rax=0000000000000000 rbx=00000000ffffffff rcx=0000000000000fff
rdx=ffffe580108cbfe8 rsi=ffff800ce1211420 rdi=ffffe58fe8018000
rip=fffff8076207e89c rsp=ffff800ce12113c0 rbp=ffff800ce1211461
r8=ffff800ce1211424 r9=00000000000001ff r10=ffffe58fe8018000
r11=ffff800ce1211398 r12=00000291d2145914 r13=00000000ffffffff
r14=ffffe58fe5092000 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00050206
nt!HvpGetCellPaged+0x7c:
fffff8076207e89c 8b01 mov eax,dword ptr [rcx] ds:002b:0000000000000fff=????????
Resetting default scope

PROCESS_NAME: Registry

STACK_TEXT:
ffff800ce12113c0 fffff80762315320 : ffffe58fe8018000 0000000000000000 ffffe58fe8018000 fffff8076207e8ca : nt!HvpGetCellPaged+0x7c
ffff800ce12113f0 fffff807623145bd : 0000000000000000 0000000000000001 0000000000000000 ffffe58fea4fe464 : nt!CmpCopySaclToVirtualKey+0xd4
ffff800ce12114c0 fffff80762313282 : ffff800ce12116c0 ffff800ce1211a10 0000000000000000 ffffe58ff0dc0390 : nt!CmpReplicateKeyToVirtual+0x281
ffff800ce12115c0 fffff8076230be52 : 0000000000000000 0000000000000000 00000000000000bc fffff807621c5533 : nt!CmKeyBodyReplicateToVirtual+0x1ba
ffff800ce1211970 fffff80761e2d275 : ffffc484bc89e080 0000000000004001 ffffc484bc89e080 0000000000004001 : nt!NtRenameKey+0x362
ffff800ce1211ae0 00007ffcd1e46ad4 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x25
000000e21eeffaf8 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x00007ffc`d1e46ad4
--- cut ---

========== Stale KCBs after partial key replication success ==========

In Windows Registry, an important part of making any changes to the hive data is to make sure that these changes are reflected in the in-memory cache, i.e. the KCB structures allocated for all currently opened keys. This also applies to the key replication process, and especially so given that it may modify a number of keys in the virtual store within the scope of a single replication operation. Currently, this synchronization is achieved through the following function call in either CmKeyBodyReplicateToVirtual or CmpDoParseKey:

CmpSearchKeyControlBlockTreeEx(CmpSyncKcbCacheForHive, VirtualStoreHive, ...);

Since replication touches multiple keys - all identified by their names/nodes and not KCBs - then instead of trying to find and update the relevant KCBs, all of them (associated with the relevant hive) are iterated and synchronized with their corresponding key nodes. In the above line of code, CmpSearchKeyControlBlockTreeEx is the function that iterates over the whole hive, and CmpSyncKcbCacheForHive is a callback routine that performs the synchronization of a particular KCB.

The vulnerability here is that the above call is made only if all of the following functions fully succeed (i.e. return STATUS_SUCCESS):

CmpReplicateKeyToVirtual
CmpExamineSaclForAuditEvent
CmpReportAuditVirtualizationEvent
CmpReparseToVirtualPath (only in CmKeyBodyReplicateToVirtual)

That means that if the CmpReplicateKeyToVirtual call succeeds only partially (having already made some changes to the hive), or if it succeeds fully but one of the subsequent three calls fails, CmpSearchKeyControlBlockTreeEx is not invoked and the KCBs associated with any keys modified in CmpReplicateKeyToVirtual will become inconsistent with their corresponding key nodes. This includes information about subkeys and key security, both of which are modified in CmpReplicateKeyToVirtual.

One (but not only) example of how such a situation can arise is the bug discussed in the previous section ("Creation of keys with invalid security descriptors"). If we trigger a hive OOM condition that results in the freeing of the leaf key's security descriptor without assigning a new one, not only _CM_KEY_NODE.Security becomes -1, but also CmpReplicateKeyToVirtual returns STATUS_INSUFFICIENT_RESOURCES, so CmpSearchKeyControlBlockTreeEx is never called to update the KCBs. If during the time of the replication we had an open handle to the leaf key, and its previous security descriptor was unique (and thus freed), then the _CM_KEY_CONTROL_BLOCK.CachedSecurity pointer starts to point at a freed pool allocation. The use of the dangling pointer can be triggered in a variety of ways, e.g. by querying the key's descriptor via RegGetKeySecurity.

The proof-of-concept for this issue is very similar to the previous one, with the only two differences being:

The initial security descriptor of the "Windows Advanced Threat Protection" key in the virtual store is set to be unique, to ensure that once it's no longer used by the leaf key, it's freed in memory.
Instead of the second RegRenameKey call, we use RegGetKeySecurity to trigger access to the stale _CM_KEY_CONTROL_BLOCK.CachedSecurity pointer.

The bug is easiest to reproduce with Special Pools enabled for ntoskrnl.exe. An example crash log, generated on Windows 11 (October 2022 update), is shown below:

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except.
Typically the address is just plain bad or it is pointing at freed memory.
Arguments:
Arg1: ffffd509a1326f90, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff8042b8d1efd, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000000, (reserved)

[...]

TRAP_FRAME: ffffa603a31bf5f0 -- (.trap 0xffffa603a31bf5f0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=ffffd509a1326f90 rbx=0000000000000000 rcx=0000000000000000
rdx=000002b473348ea0 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8042b8d1efd rsp=ffffa603a31bf780 rbp=ffffa603a31bf880
r8=000002b473348ea0 r9=ffffa603a31bf8d0 r10=ffffa603a31bfa88
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
nt!SeQuerySecurityDescriptorInfo+0x4d:
fffff8042b8d1efd 0f1000 movups xmm0,xmmword ptr [rax] ds:ffffd509a1326f90=????????????????????????????????
Resetting default scope

STACK_TEXT:
ffffa603a31beb38 fffff8042b768ee2 : ffffa603a31beca0 fffff8042b5665c0 ffffc2814c4c6180 0000000000000000 : nt!DbgBreakPointWithStatus
ffffa603a31beb40 fffff8042b768721 : ffffc28100000003 ffffa603a31beca0 fffff8042b636900 0000000000000050 : nt!KiBugCheckDebugBreak+0x12
ffffa603a31beba0 fffff8042b620d47 : 0000000000000000 0000000000000000 0000bb5800000010 ffffd509a1326f90 : nt!KeBugCheck2+0xa71
ffffa603a31bf310 fffff8042b6a351d : 0000000000000050 ffffd509a1326f90 0000000000000000 ffffa603a31bf5f0 : nt!KeBugCheckEx+0x107
ffffa603a31bf350 fffff8042b46af96 : ffffa60300000000 0000000000000000 ffffa603a31bf550 0000000000000000 : nt!MiSystemFault+0x1c117d
ffffa603a31bf450 fffff8042b62f8f5 : 000000021f7cc025 0000000000000000 ffffa603a31bfa78 fffff8042b43fe5a : nt!MmAccessFault+0x2a6
ffffa603a31bf5f0 fffff8042b8d1efd : 0000000000000000 0000000000000000 0000000000000000 fffff8042b8a6bbb : nt!KiPageFault+0x335
ffffa603a31bf780 fffff8042b8d1e26 : ffffa603a31bfa78 000002b473348ea0 ffffa603a31bfa88 fffff8042b47047f : nt!SeQuerySecurityDescriptorInfo+0x4d
ffffa603a31bf840 fffff8042b8d1c66 : ffffd509992ccf58 ffffa603a31bfa78 000002b473348e01 0000000000000001 : nt!CmpQueryKeySecurity+0xc2
ffffa603a31bf8b0 fffff8042b987894 : 00000000ffffffff 00000000000000d8 00000000000a0008 00007ff7f978f810 : nt!CmpSecurityMethod+0x146
ffffa603a31bf9e0 fffff8042b633275 : ffffe50ceb2b4600 0000000000000004 000000ea33aff9e8 0000000000000000 : nt!NtQuerySecurityObject+0x144
ffffa603a31bfa70 00007ffbfbe267b4 : 00007ffbf957c55e 0000000000000000 00000000000000b4 0000000000020019 : nt!KiSystemServiceCopyEnd+0x25
000000ea33aff9c8 00007ffbf957c55e : 0000000000000000 00000000000000b4 0000000000020019 000000ea33af0000 : ntdll!NtQuerySecurityObject+0x14
--- cut ---

========== Stale KCBs of symbolic links and predefined keys ==========

The final vulnerability described in this report is also related to KCB synchronization. If CmpSearchKeyControlBlockTreeEx(CmpSyncKcbCacheForHive) is called as it should be, the control flow goes through the following functions:

CmpSearchKeyControlBlockTreeEx (for the whole hive)
- CmpSyncKcbCacheForHive (for each KCB)
  - CmpRebuildKcbCache (for each KCB)
    - CmpRebuildKcbCacheFromNode (for each KCB)
      - CmpCleanUpSubKeyInfo, CmpCleanUpKcbValueCache, CmpAssignSecurityToKcb, etc...

The actual KCB synchronization takes place in CmpRebuildKcbCacheFromNode and the lower-level functions, while CmpSyncKcbCacheForHive and CmpRebuildKcbCache are thin wrappers that only check for a few special conditions and bail out early if needed. Two of the conditions being checked are whether the key is:

a symbolic link (flag 0x10 in _CM_KEY_CONTROL_BLOCK.Flags), in CmpSyncKcbCacheForHive
a predefined-handle key (flag 0x40 in _CM_KEY_CONTROL_BLOCK.Flags), in CmpRebuildKcbCache

If either condition is true, the KCB is not refreshed for that key. This is a problem, because both symlinks/predefined keys can have subkeys and do have security descriptors that can be operated on, so that information must be kept in sync with the hive for those special types of keys as well.

This bug is even easier to trigger than the previous one: instead of spraying the hive to achieve an OOM condition, we create the virtual store leaf key as a symbolic link (flag REG_OPTION_CREATE_LINK for RegCreateKeyExW). Then once we try to rename the HKLM key under registry virtualization and the replication process is triggered, the security descriptor of the leaf key is replaced with a new one, CmpReplicateKeyToVirtual succeeds and CmpSearchKeyControlBlockTreeEx is called. However due to the logic in CmpSyncKcbCacheForHive, the KCB refresh is omitted for the leaf key, again leading to a stale pointer in _CM_KEY_CONTROL_BLOCK.CachedSecurity and a system crash when it is subsequently accessed in SeQuerySecurityDescriptorInfo via RegGetKeySecurity.

Attached is a proof-of-concept exploit for this issue that has been successfully tested on Windows 11 (October 2022 update). The observable kernel bugcheck is identical to the one in the previous section, so we won't re-paste it here.

This bug is subject to a 90-day disclosure deadline. If a fix for this issue is made available to users before the end of the 90-day deadline, this bug report will become public 30 days after the fix was made available. Otherwise, this bug report will become public at the deadline. The scheduled deadline is 2023-01-23.