| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
dma-mapping: benchmark: fix node id validation
While validating node ids in map_benchmark_ioctl(), node_possible() may
be provided with invalid argument outside of [0,MAX_NUMNODES-1] range
leading to:
BUG: KASAN: wild-memory-access in map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
Read of size 8 at addr 1fffffff8ccb6398 by task dma_map_benchma/971
CPU: 7 PID: 971 Comm: dma_map_benchma Not tainted 6.9.0-rc6 #37
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117)
kasan_report (mm/kasan/report.c:603)
kasan_check_range (mm/kasan/generic.c:189)
variable_test_bit (arch/x86/include/asm/bitops.h:227) [inline]
arch_test_bit (arch/x86/include/asm/bitops.h:239) [inline]
_test_bit at (include/asm-generic/bitops/instrumented-non-atomic.h:142) [inline]
node_state (include/linux/nodemask.h:423) [inline]
map_benchmark_ioctl (kernel/dma/map_benchmark.c:214)
full_proxy_unlocked_ioctl (fs/debugfs/file.c:333)
__x64_sys_ioctl (fs/ioctl.c:890)
do_syscall_64 (arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Compare node ids with sane bounds first. NUMA_NO_NODE is considered a
special valid case meaning that benchmarking kthreads won't be bound to a
cpuset of a given node.
Found by Linux Verification Center (linuxtesting.org). |
| In the Linux kernel, the following vulnerability has been resolved:
greybus: lights: check return of get_channel_from_mode
If channel for the given node is not found we return null from
get_channel_from_mode. Make sure we validate the return pointer
before using it in two of the missing places.
This was originally reported in [0]:
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[0] https://lore.kernel.org/all/20240301190425.120605-1-m.lobanov@rosalinux.ru |
| In the Linux kernel, the following vulnerability has been resolved:
soundwire: cadence: fix invalid PDI offset
For some reason, we add an offset to the PDI, presumably to skip the
PDI0 and PDI1 which are reserved for BPT.
This code is however completely wrong and leads to an out-of-bounds
access. We were just lucky so far since we used only a couple of PDIs
and remained within the PDI array bounds.
A Fixes: tag is not provided since there are no known platforms where
the out-of-bounds would be accessed, and the initial code had problems
as well.
A follow-up patch completely removes this useless offset. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/msm/dpu: Add callback function pointer check before its call
In dpu_core_irq_callback_handler() callback function pointer is compared to NULL,
but then callback function is unconditionally called by this pointer.
Fix this bug by adding conditional return.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Patchwork: https://patchwork.freedesktop.org/patch/588237/ |
| In the Linux kernel, the following vulnerability has been resolved:
media: stk1160: fix bounds checking in stk1160_copy_video()
The subtract in this condition is reversed. The ->length is the length
of the buffer. The ->bytesused is how many bytes we have copied thus
far. When the condition is reversed that means the result of the
subtraction is always negative but since it's unsigned then the result
is a very high positive value. That means the overflow check is never
true.
Additionally, the ->bytesused doesn't actually work for this purpose
because we're not writing to "buf->mem + buf->bytesused". Instead, the
math to calculate the destination where we are writing is a bit
involved. You calculate the number of full lines already written,
multiply by two, skip a line if necessary so that we start on an odd
numbered line, and add the offset into the line.
To fix this buffer overflow, just take the actual destination where we
are writing, if the offset is already out of bounds print an error and
return. Otherwise, write up to buf->length bytes. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: Forcibly leave nested virt when SMM state is toggled
Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.
Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.
Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.
WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
Modules linked in:
CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
Call Trace:
<TASK>
kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
__fput+0x286/0x9f0 fs/file_table.c:311
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xb29/0x2a30 kernel/exit.c:806
do_group_exit+0xd2/0x2f0 kernel/exit.c:935
get_signal+0x4b0/0x28c0 kernel/signal.c:2862
arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
handle_signal_work kernel/entry/common.c:148 [inline]
exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
sock_map: avoid race between sock_map_close and sk_psock_put
sk_psock_get will return NULL if the refcount of psock has gone to 0, which
will happen when the last call of sk_psock_put is done. However,
sk_psock_drop may not have finished yet, so the close callback will still
point to sock_map_close despite psock being NULL.
This can be reproduced with a thread deleting an element from the sock map,
while the second one creates a socket, adds it to the map and closes it.
That will trigger the WARN_ON_ONCE:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Modules linked in:
CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701
Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02
RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293
RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000
RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0
RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3
R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840
R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870
FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0
Call Trace:
<TASK>
unix_release+0x87/0xc0 net/unix/af_unix.c:1048
__sock_release net/socket.c:659 [inline]
sock_close+0xbe/0x240 net/socket.c:1421
__fput+0x42b/0x8a0 fs/file_table.c:422
__do_sys_close fs/open.c:1556 [inline]
__se_sys_close fs/open.c:1541 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1541
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb37d618070
Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070
RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Use sk_psock, which will only check that the pointer is not been set to
NULL yet, which should only happen after the callbacks are restored. If,
then, a reference can still be gotten, we may call sk_psock_stop and cancel
psock->work.
As suggested by Paolo Abeni, reorder the condition so the control flow is
less convoluted.
After that change, the reproducer does not trigger the WARN_ON_ONCE
anymore. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/shmem-helper: Fix BUG_ON() on mmap(PROT_WRITE, MAP_PRIVATE)
Lack of check for copy-on-write (COW) mapping in drm_gem_shmem_mmap
allows users to call mmap with PROT_WRITE and MAP_PRIVATE flag
causing a kernel panic due to BUG_ON in vmf_insert_pfn_prot:
BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
Return -EINVAL early if COW mapping is detected.
This bug affects all drm drivers using default shmem helpers.
It can be reproduced by this simple example:
void *ptr = mmap(0, size, PROT_WRITE, MAP_PRIVATE, fd, mmap_offset);
ptr[0] = 0; |
| In the Linux kernel, the following vulnerability has been resolved:
ALSA: hda: cs35l56: Fix lifetime of cs_dsp instance
The cs_dsp instance is initialized in the driver probe() so it
should be freed in the driver remove(). Also fix a missing call
to cs_dsp_remove() in the error path of cs35l56_hda_common_probe().
The call to cs_dsp_remove() was being done in the component unbind
callback cs35l56_hda_unbind(). This meant that if the driver was
unbound and then re-bound it would be using an uninitialized cs_dsp
instance.
It is best to initialize the cs_dsp instance in probe() so that it
can return an error if it fails. The component binding API doesn't
have any error handling so there's no way to handle a failure if
cs_dsp was initialized in the bind. |
| In the Linux kernel, the following vulnerability has been resolved:
arm64: asm-bug: Add .align 2 to the end of __BUG_ENTRY
When CONFIG_DEBUG_BUGVERBOSE=n, we fail to add necessary padding bytes
to bug_table entries, and as a result the last entry in a bug table will
be ignored, potentially leading to an unexpected panic(). All prior
entries in the table will be handled correctly.
The arm64 ABI requires that struct fields of up to 8 bytes are
naturally-aligned, with padding added within a struct such that struct
are suitably aligned within arrays.
When CONFIG_DEBUG_BUGVERPOSE=y, the layout of a bug_entry is:
struct bug_entry {
signed int bug_addr_disp; // 4 bytes
signed int file_disp; // 4 bytes
unsigned short line; // 2 bytes
unsigned short flags; // 2 bytes
}
... with 12 bytes total, requiring 4-byte alignment.
When CONFIG_DEBUG_BUGVERBOSE=n, the layout of a bug_entry is:
struct bug_entry {
signed int bug_addr_disp; // 4 bytes
unsigned short flags; // 2 bytes
< implicit padding > // 2 bytes
}
... with 8 bytes total, with 6 bytes of data and 2 bytes of trailing
padding, requiring 4-byte alginment.
When we create a bug_entry in assembly, we align the start of the entry
to 4 bytes, which implicitly handles padding for any prior entries.
However, we do not align the end of the entry, and so when
CONFIG_DEBUG_BUGVERBOSE=n, the final entry lacks the trailing padding
bytes.
For the main kernel image this is not a problem as find_bug() doesn't
depend on the trailing padding bytes when searching for entries:
for (bug = __start___bug_table; bug < __stop___bug_table; ++bug)
if (bugaddr == bug_addr(bug))
return bug;
However for modules, module_bug_finalize() depends on the trailing
bytes when calculating the number of entries:
mod->num_bugs = sechdrs[i].sh_size / sizeof(struct bug_entry);
... and as the last bug_entry lacks the necessary padding bytes, this entry
will not be counted, e.g. in the case of a single entry:
sechdrs[i].sh_size == 6
sizeof(struct bug_entry) == 8;
sechdrs[i].sh_size / sizeof(struct bug_entry) == 0;
Consequently module_find_bug() will miss the last bug_entry when it does:
for (i = 0; i < mod->num_bugs; ++i, ++bug)
if (bugaddr == bug_addr(bug))
goto out;
... which can lead to a kenrel panic due to an unhandled bug.
This can be demonstrated with the following module:
static int __init buginit(void)
{
WARN(1, "hello\n");
return 0;
}
static void __exit bugexit(void)
{
}
module_init(buginit);
module_exit(bugexit);
MODULE_LICENSE("GPL");
... which will trigger a kernel panic when loaded:
------------[ cut here ]------------
hello
Unexpected kernel BRK exception at EL1
Internal error: BRK handler: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in: hello(O+)
CPU: 0 PID: 50 Comm: insmod Tainted: G O 6.9.1 #8
Hardware name: linux,dummy-virt (DT)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : buginit+0x18/0x1000 [hello]
lr : buginit+0x18/0x1000 [hello]
sp : ffff800080533ae0
x29: ffff800080533ae0 x28: 0000000000000000 x27: 0000000000000000
x26: ffffaba8c4e70510 x25: ffff800080533c30 x24: ffffaba8c4a28a58
x23: 0000000000000000 x22: 0000000000000000 x21: ffff3947c0eab3c0
x20: ffffaba8c4e3f000 x19: ffffaba846464000 x18: 0000000000000006
x17: 0000000000000000 x16: ffffaba8c2492834 x15: 0720072007200720
x14: 0720072007200720 x13: ffffaba8c49b27c8 x12: 0000000000000312
x11: 0000000000000106 x10: ffffaba8c4a0a7c8 x9 : ffffaba8c49b27c8
x8 : 00000000ffffefff x7 : ffffaba8c4a0a7c8 x6 : 80000000fffff000
x5 : 0000000000000107 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff3947c0eab3c0
Call trace:
buginit+0x18/0x1000 [hello]
do_one_initcall+0x80/0x1c8
do_init_module+0x60/0x218
load_module+0x1ba4/0x1d70
__do_sys_init_module+0x198/0x1d0
__arm64_sys_init_module+0x1c/0x28
invoke_syscall+0x48/0x114
el0_svc
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
syzbot reports a kernel bug as below:
F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4
==================================================================
BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline]
BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline]
BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600
Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076
CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline]
current_nat_addr fs/f2fs/node.h:213 [inline]
f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600
f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline]
f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925
ioctl_fiemap fs/ioctl.c:220 [inline]
do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838
__do_sys_ioctl fs/ioctl.c:902 [inline]
__se_sys_ioctl+0x81/0x170 fs/ioctl.c:890
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The root cause is we missed to do sanity check on i_xattr_nid during
f2fs_iget(), so that in fiemap() path, current_nat_addr() will access
nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering
kasan bug report, fix it. |
| In the Linux kernel, the following vulnerability has been resolved:
bonding: fix oops during rmmod
"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
remove redundant NULL check in debugfs function"). Here are the relevant
functions being called:
bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);
bonding_debug_root = NULL; <--------- SET TO NULL HERE
bond_netlink_fini()
rtnl_link_unregister()
__rtnl_link_unregister()
unregister_netdevice_many_notify()
bond_uninit()
bond_debug_unregister()
(commit removed check for bonding_debug_root == NULL)
debugfs_remove()
simple_recursive_removal()
down_write() -> OOPS
However, reverting the bad commit does not solve the problem completely
because the original code contains a race that could cause the same
oops, although it was much less likely to be triggered unintentionally:
CPU1
rmmod bonding
bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);
CPU2
echo -bond0 > /sys/class/net/bonding_masters
bond_uninit()
bond_debug_unregister()
if (!bonding_debug_root)
CPU1
bonding_debug_root = NULL;
So do NOT revert the bad commit (since the removed checks were racy
anyway), and instead change the order of actions taken during module
removal. The same oops can also happen if there is an error during
module init, so apply the same fix there. |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "xsk: Support redirect to any socket bound to the same umem"
This reverts commit 2863d665ea41282379f108e4da6c8a2366ba66db.
This patch introduced a potential kernel crash when multiple napi instances
redirect to the same AF_XDP socket. By removing the queue_index check, it is
possible for multiple napi instances to access the Rx ring at the same time,
which will result in a corrupted ring state which can lead to a crash when
flushing the rings in __xsk_flush(). This can happen when the linked list of
sockets to flush gets corrupted by concurrent accesses. A quick and small fix
is not possible, so let us revert this for now. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: protect folio::private when attaching extent buffer folios
[BUG]
Since v6.8 there are rare kernel crashes reported by various people,
the common factor is bad page status error messages like this:
BUG: Bad page state in process kswapd0 pfn:d6e840
page: refcount:0 mapcount:0 mapping:000000007512f4f2 index:0x2796c2c7c
pfn:0xd6e840
aops:btree_aops ino:1
flags: 0x17ffffe0000008(uptodate|node=0|zone=2|lastcpupid=0x3fffff)
page_type: 0xffffffff()
raw: 0017ffffe0000008 dead000000000100 dead000000000122 ffff88826d0be4c0
raw: 00000002796c2c7c 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: non-NULL mapping
[CAUSE]
Commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") changes the sequence when allocating a new
extent buffer.
Previously we always called grab_extent_buffer() under
mapping->i_private_lock, to ensure the safety on modification on
folio::private (which is a pointer to extent buffer for regular
sectorsize).
This can lead to the following race:
Thread A is trying to allocate an extent buffer at bytenr X, with 4
4K pages, meanwhile thread B is trying to release the page at X + 4K
(the second page of the extent buffer at X).
Thread A | Thread B
-----------------------------------+-------------------------------------
| btree_release_folio()
| | This is for the page at X + 4K,
| | Not page X.
| |
alloc_extent_buffer() | |- release_extent_buffer()
|- filemap_add_folio() for the | | |- atomic_dec_and_test(eb->refs)
| page at bytenr X (the first | | |
| page). | | |
| Which returned -EEXIST. | | |
| | | |
|- filemap_lock_folio() | | |
| Returned the first page locked. | | |
| | | |
|- grab_extent_buffer() | | |
| |- atomic_inc_not_zero() | | |
| | Returned false | | |
| |- folio_detach_private() | | |- folio_detach_private() for X
| |- folio_test_private() | | |- folio_test_private()
| Returned true | | | Returned true
|- folio_put() | |- folio_put()
Now there are two puts on the same folio at folio X, leading to refcount
underflow of the folio X, and eventually causing the BUG_ON() on the
page->mapping.
The condition is not that easy to hit:
- The release must be triggered for the middle page of an eb
If the release is on the same first page of an eb, page lock would kick
in and prevent the race.
- folio_detach_private() has a very small race window
It's only between folio_test_private() and folio_clear_private().
That's exactly when mapping->i_private_lock is used to prevent such race,
and commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer() to
allocate-then-attach method") screwed that up.
At that time, I thought the page lock would kick in as
filemap_release_folio() also requires the page to be locked, but forgot
the filemap_release_folio() only locks one page, not all pages of an
extent buffer.
[FIX]
Move all the code requiring i_private_lock into
attach_eb_folio_to_filemap(), so that everything is done with proper
lock protection.
Furthermore to prevent future problems, add an extra
lockdep_assert_locked() to ensure we're holding the proper lock.
To reproducer that is able to hit the race (takes a few minutes with
instrumented code inserting delays to alloc_extent_buffer()):
#!/bin/sh
drop_caches () {
while(true); do
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
done
}
run_tar () {
while(true); do
for x in `seq 1 80` ; do
tar cf /dev/zero /mnt > /dev/null &
done
wait
done
}
mkfs.btrfs -f -d single -m single
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
Pass the already obtained vlan group pointer to br_mst_vlan_set_state()
instead of dereferencing it again. Each caller has already correctly
dereferenced it for their context. This change is required for the
following suspicious RCU dereference fix. No functional changes
intended. |
| In the Linux kernel, the following vulnerability has been resolved:
net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
I converted br_mst_set_state to RCU to avoid a vlan use-after-free
but forgot to change the vlan group dereference helper. Switch to vlan
group RCU deref helper to fix the suspicious rcu usage warning. |
| In the Linux kernel, the following vulnerability has been resolved:
parisc: Try to fix random segmentation faults in package builds
PA-RISC systems with PA8800 and PA8900 processors have had problems
with random segmentation faults for many years. Systems with earlier
processors are much more stable.
Systems with PA8800 and PA8900 processors have a large L2 cache which
needs per page flushing for decent performance when a large range is
flushed. The combined cache in these systems is also more sensitive to
non-equivalent aliases than the caches in earlier systems.
The majority of random segmentation faults that I have looked at
appear to be memory corruption in memory allocated using mmap and
malloc.
My first attempt at fixing the random faults didn't work. On
reviewing the cache code, I realized that there were two issues
which the existing code didn't handle correctly. Both relate
to cache move-in. Another issue is that the present bit in PTEs
is racy.
1) PA-RISC caches have a mind of their own and they can speculatively
load data and instructions for a page as long as there is a entry in
the TLB for the page which allows move-in. TLBs are local to each
CPU. Thus, the TLB entry for a page must be purged before flushing
the page. This is particularly important on SMP systems.
In some of the flush routines, the flush routine would be called
and then the TLB entry would be purged. This was because the flush
routine needed the TLB entry to do the flush.
2) My initial approach to trying the fix the random faults was to
try and use flush_cache_page_if_present for all flush operations.
This actually made things worse and led to a couple of hardware
lockups. It finally dawned on me that some lines weren't being
flushed because the pte check code was racy. This resulted in
random inequivalent mappings to physical pages.
The __flush_cache_page tmpalias flush sets up its own TLB entry
and it doesn't need the existing TLB entry. As long as we can find
the pte pointer for the vm page, we can get the pfn and physical
address of the page. We can also purge the TLB entry for the page
before doing the flush. Further, __flush_cache_page uses a special
TLB entry that inhibits cache move-in.
When switching page mappings, we need to ensure that lines are
removed from the cache. It is not sufficient to just flush the
lines to memory as they may come back.
This made it clear that we needed to implement all the required
flush operations using tmpalias routines. This includes flushes
for user and kernel pages.
After modifying the code to use tmpalias flushes, it became clear
that the random segmentation faults were not fully resolved. The
frequency of faults was worse on systems with a 64 MB L2 (PA8900)
and systems with more CPUs (rp4440).
The warning that I added to flush_cache_page_if_present to detect
pages that couldn't be flushed triggered frequently on some systems.
Helge and I looked at the pages that couldn't be flushed and found
that the PTE was either cleared or for a swap page. Ignoring pages
that were swapped out seemed okay but pages with cleared PTEs seemed
problematic.
I looked at routines related to pte_clear and noticed ptep_clear_flush.
The default implementation just flushes the TLB entry. However, it was
obvious that on parisc we need to flush the cache page as well. If
we don't flush the cache page, stale lines will be left in the cache
and cause random corruption. Once a PTE is cleared, there is no way
to find the physical address associated with the PTE and flush the
associated page at a later time.
I implemented an updated change with a parisc specific version of
ptep_clear_flush. It fixed the random data corruption on Helge's rp4440
and rp3440, as well as on my c8000.
At this point, I realized that I could restore the code where we only
flush in flush_cache_page_if_present if the page has been accessed.
However, for this, we also need to flush the cache when the accessed
bit is cleared in
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
After installing the anonymous fd, we can now see it in userland and close
it. However, at this point we may not have gotten the reference count of
the cache, but we will put it during colse fd, so this may cause a cache
UAF.
So grab the cache reference count before fd_install(). In addition, by
kernel convention, fd is taken over by the user land after fd_install(),
and the kernel should not call close_fd() after that, i.e., it should call
fd_install() after everything is ready, thus fd_install() is called after
copy_to_user() succeeds. |
| In the Linux kernel, the following vulnerability has been resolved:
cachefiles: remove requests from xarray during flushing requests
Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------
cachefiles_ondemand_init_object
cachefiles_ondemand_send_req
REQ_A = kzalloc(sizeof(*req) + data_len)
wait_for_completion(&REQ_A->done)
cachefiles_daemon_read
cachefiles_ondemand_daemon_read
// close dev fd
cachefiles_flush_reqs
complete(&REQ_A->done)
kfree(REQ_A)
xa_lock(&cache->reqs);
cachefiles_ondemand_select_req
req->msg.opcode != CACHEFILES_OP_READ
// req use-after-free !!!
xa_unlock(&cache->reqs);
xa_destroy(&cache->reqs)
Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/komeda: check for error-valued pointer
komeda_pipeline_get_state() may return an error-valued pointer, thus
check the pointer for negative or null value before dereferencing. |