| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
vmxnet3: disable rx data ring on dma allocation failure
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
rq->data_ring.desc_size for the data ring that failed, which presumably
causes the hypervisor to reference it on packet reception.
To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
the hypervisor to disable this feature.
[ 95.436876] kernel BUG at net/core/skbuff.c:207!
[ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
[ 95.441558] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f
[ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
[ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
[ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
[ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
[ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
[ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
[ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
[ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
[ 95.459791] Call Trace:
[ 95.460515] <IRQ>
[ 95.461180] ? __die_body.cold+0x19/0x27
[ 95.462150] ? die+0x2e/0x50
[ 95.462976] ? do_trap+0xca/0x110
[ 95.463973] ? do_error_trap+0x6a/0x90
[ 95.464966] ? skb_panic+0x4d/0x4f
[ 95.465901] ? exc_invalid_op+0x50/0x70
[ 95.466849] ? skb_panic+0x4d/0x4f
[ 95.467718] ? asm_exc_invalid_op+0x1a/0x20
[ 95.468758] ? skb_panic+0x4d/0x4f
[ 95.469655] skb_put.cold+0x10/0x10
[ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
[ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
[ 95.473185] __napi_poll+0x2b/0x160
[ 95.474145] net_rx_action+0x2c6/0x3b0
[ 95.475115] handle_softirqs+0xe7/0x2a0
[ 95.476122] __irq_exit_rcu+0x97/0xb0
[ 95.477109] common_interrupt+0x85/0xa0
[ 95.478102] </IRQ>
[ 95.478846] <TASK>
[ 95.479603] asm_common_interrupt+0x26/0x40
[ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
[ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
[ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
[ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
[ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
[ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
[ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
[ 95.495035] acpi_safe_halt+0x14/0x20
[ 95.496127] acpi_idle_do_entry+0x2f/0x50
[ 95.497221] acpi_idle_enter+0x7f/0xd0
[ 95.498272] cpuidle_enter_state+0x81/0x420
[ 95.499375] cpuidle_enter+0x2d/0x40
[ 95.500400] do_idle+0x1e5/0x240
[ 95.501385] cpu_startup_entry+0x29/0x30
[ 95.502422] start_secondary+0x11c/0x140
[ 95.503454] common_startup_64+0x13e/0x141
[ 95.504466] </TASK>
[ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ip
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
io_uring/rsrc: don't lock while !TASK_RUNNING
There is a report of io_rsrc_ref_quiesce() locking a mutex while not
TASK_RUNNING, which is due to forgetting restoring the state back after
io_run_task_work_sig() and attempts to break out of the waiting loop.
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff815d2494>] prepare_to_wait+0xa4/0x380
kernel/sched/wait.c:237
WARNING: CPU: 2 PID: 397056 at kernel/sched/core.c:10099
__might_sleep+0x114/0x160 kernel/sched/core.c:10099
RIP: 0010:__might_sleep+0x114/0x160 kernel/sched/core.c:10099
Call Trace:
<TASK>
__mutex_lock_common kernel/locking/mutex.c:585 [inline]
__mutex_lock+0xb4/0x940 kernel/locking/mutex.c:752
io_rsrc_ref_quiesce+0x590/0x940 io_uring/rsrc.c:253
io_sqe_buffers_unregister+0xa2/0x340 io_uring/rsrc.c:799
__io_uring_register io_uring/register.c:424 [inline]
__do_sys_io_uring_register+0x5b9/0x2400 io_uring/register.c:613
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd8/0x270 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6f/0x77 |
| In the Linux kernel, the following vulnerability has been resolved:
memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
On an (old) x86 system with SRAT just covering space above 4Gb:
ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xfffffffff] hotplug
the commit referenced below leads to this NUMA configuration no longer
being refused by a CONFIG_NUMA=y kernel (previously
NUMA: nodes only cover 6144MB of your 8185MB e820 RAM. Not used.
No NUMA configuration found
Faking a node at [mem 0x0000000000000000-0x000000027fffffff]
was seen in the log directly after the message quoted above), because of
memblock_validate_numa_coverage() checking for NUMA_NO_NODE (only). This
in turn led to memblock_alloc_range_nid()'s warning about MAX_NUMNODES
triggering, followed by a NULL deref in memmap_init() when trying to
access node 64's (NODE_SHIFT=6) node data.
To compensate said change, make memblock_set_node() warn on and adjust
a passed in value of MAX_NUMNODES, just like various other functions
already do. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/huge_memory: don't unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Set run context for rawtp test_run callback
syzbot reported crash when rawtp program executed through the
test_run interface calls bpf_get_attach_cookie helper or any
other helper that touches task->bpf_ctx pointer.
Setting the run context (task->bpf_ctx pointer) for test_run
callback. |
| In the Linux kernel, the following vulnerability has been resolved:
net: dsa: seville: register the mdiobus under devres
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The Seville VSC9959 switch is a platform device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the seville switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The seville driver has a code structure that could accommodate both the
mdiobus_unregister and mdiobus_free calls, but it has an external
dependency upon mscc_miim_setup() from mdio-mscc-miim.c, which calls
devm_mdiobus_alloc_size() on its behalf. So rather than restructuring
that, and exporting yet one more symbol mscc_miim_teardown(), let's work
with devres and replace of_mdiobus_register with the devres variant.
When we use all-devres, we can ensure that devres doesn't free a
still-registered bus (it either runs both callbacks, or none). |
| In the Linux kernel, the following vulnerability has been resolved:
net: dsa: felix: don't use devres for mdiobus
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The Felix VSC9959 switch is a PCI device, so the initial set of
constraints that I thought would cause this (I2C or SPI buses which call
->remove on ->shutdown) do not apply. But there is one more which
applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the felix switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The felix driver has the code structure in place for orderly mdiobus
removal, so just replace devm_mdiobus_alloc_size() with the non-devres
variant, and add manual free where necessary, to ensure that we don't
let devres free a still-registered bus. |
| In the Linux kernel, the following vulnerability has been resolved:
net: dsa: lantiq_gswip: don't use devres for mdiobus
As explained in commits:
74b6d7d13307 ("net: dsa: realtek: register the MDIO bus under devres")
5135e96a3dd2 ("net: dsa: don't allocate the slave_mii_bus using devres")
mdiobus_free() will panic when called from devm_mdiobus_free() <-
devres_release_all() <- __device_release_driver(), and that mdiobus was
not previously unregistered.
The GSWIP switch is a platform device, so the initial set of constraints
that I thought would cause this (I2C or SPI buses which call ->remove on
->shutdown) do not apply. But there is one more which applies here.
If the DSA master itself is on a bus that calls ->remove from ->shutdown
(like dpaa2-eth, which is on the fsl-mc bus), there is a device link
between the switch and the DSA master, and device_links_unbind_consumers()
will unbind the GSWIP switch driver on shutdown.
So the same treatment must be applied to all DSA switch drivers, which
is: either use devres for both the mdiobus allocation and registration,
or don't use devres at all.
The gswip driver has the code structure in place for orderly mdiobus
removal, so just replace devm_mdiobus_alloc() with the non-devres
variant, and add manual free where necessary, to ensure that we don't
let devres free a still-registered bus. |
| In the Linux kernel, the following vulnerability has been resolved:
ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
ip[6]mr_free_table() can only be called under RTNL lock.
RTNL: assertion failed at net/core/dev.c (10367)
WARNING: CPU: 1 PID: 5890 at net/core/dev.c:10367 unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Modules linked in:
CPU: 1 PID: 5890 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller-11627-g422ee58dc0ef #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unregister_netdevice_many+0x1246/0x1850 net/core/dev.c:10367
Code: 0f 85 9b ee ff ff e8 69 07 4b fa ba 7f 28 00 00 48 c7 c6 00 90 ae 8a 48 c7 c7 40 90 ae 8a c6 05 6d b1 51 06 01 e8 8c 90 d8 01 <0f> 0b e9 70 ee ff ff e8 3e 07 4b fa 4c 89 e7 e8 86 2a 59 fa e9 ee
RSP: 0018:ffffc900046ff6e0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888050f51d00 RSI: ffffffff815fa008 RDI: fffff520008dfece
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815f3d6e R11: 0000000000000000 R12: 00000000fffffff4
R13: dffffc0000000000 R14: ffffc900046ff750 R15: ffff88807b7dc000
FS: 00007f4ab736e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee0b4f8990 CR3: 000000001e7d2000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
mroute_clean_tables+0x244/0xb40 net/ipv6/ip6mr.c:1509
ip6mr_free_table net/ipv6/ip6mr.c:389 [inline]
ip6mr_rules_init net/ipv6/ip6mr.c:246 [inline]
ip6mr_net_init net/ipv6/ip6mr.c:1306 [inline]
ip6mr_net_init+0x3f0/0x4e0 net/ipv6/ip6mr.c:1298
ops_init+0xaf/0x470 net/core/net_namespace.c:140
setup_net+0x54f/0xbb0 net/core/net_namespace.c:331
copy_net_ns+0x318/0x760 net/core/net_namespace.c:475
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
copy_namespaces+0x391/0x450 kernel/nsproxy.c:178
copy_process+0x2e0c/0x7300 kernel/fork.c:2167
kernel_clone+0xe7/0xab0 kernel/fork.c:2555
__do_sys_clone+0xc8/0x110 kernel/fork.c:2672
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f4ab89f9059
Code: Unable to access opcode bytes at RIP 0x7f4ab89f902f.
RSP: 002b:00007f4ab736e118 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f4ab8b0bf60 RCX: 00007f4ab89f9059
RDX: 0000000020000280 RSI: 0000000020000270 RDI: 0000000040200000
RBP: 00007f4ab8a5308d R08: 0000000020000300 R09: 0000000020000300
R10: 00000000200002c0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc3977cc1f R14: 00007f4ab736e300 R15: 0000000000022000
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
Commit effa453168a7 ("i2c: i801: Don't silently correct invalid transfer
size") revealed that ee1004_eeprom_read() did not properly limit how
many bytes to read at once.
In particular, i2c_smbus_read_i2c_block_data_or_emulated() takes the
length to read as an u8. If count == 256 after taking into account the
offset and page boundary, the cast to u8 overflows. And this is common
when user space tries to read the entire EEPROM at once.
To fix it, limit each read to I2C_SMBUS_BLOCK_MAX (32) bytes, already
the maximum length i2c_smbus_read_i2c_block_data_or_emulated() allows. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/proc: task_mmu.c: don't read mapcount for migration entry
The syzbot reported the below BUG:
kernel BUG at include/linux/page-flags.h:785!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
Call Trace:
page_mapcount include/linux/mm.h:837 [inline]
smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
walk_pmd_range mm/pagewalk.c:128 [inline]
walk_pud_range mm/pagewalk.c:205 [inline]
walk_p4d_range mm/pagewalk.c:240 [inline]
walk_pgd_range mm/pagewalk.c:277 [inline]
__walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
walk_page_vma+0x277/0x350 mm/pagewalk.c:530
smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
smap_gather_stats fs/proc/task_mmu.c:741 [inline]
show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
seq_read+0x3e0/0x5b0 fs/seq_file.c:162
vfs_read+0x1b5/0x600 fs/read_write.c:479
ksys_read+0x12d/0x250 fs/read_write.c:619
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4]
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()] |
| In the Linux kernel, the following vulnerability has been resolved:
perf: Fix list corruption in perf_cgroup_switch()
There's list corruption on cgrp_cpuctx_list. This happens on the
following path:
perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
cpu_ctx_sched_in
ctx_sched_in
ctx_pinned_sched_in
merge_sched_in
perf_cgroup_event_disable: remove the event from the list
Use list_for_each_entry_safe() to allow removing an entry during
iteration. |
| In the Linux kernel, the following vulnerability has been resolved:
s390/cio: verify the driver availability for path_event call
If no driver is attached to a device or the driver does not provide the
path_event function, an FCES path-event on this device could end up in a
kernel-panic. Verify the driver availability before the path_event
function call. |
| In the Linux kernel, the following vulnerability has been resolved:
mm: don't try to NUMA-migrate COW pages that have other uses
Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:
"All the details are in the bug, but the bottom line is that somehow,
this patch causes corruption when the numa balancing feature is
enabled AND we don't use process affinity AND we use GUP to pin pages
so our accelerator can DMA to/from system memory.
Either disabling numa balancing, using process affinity to bind to
specific numa-node or reverting this patch causes the bug to
disappear"
and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page()
simplification").
Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW. But it appears it
does. Suspicious.
However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical. It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.
The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.
Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.
Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information). Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.
The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless. |
| In the Linux kernel, the following vulnerability has been resolved:
parisc: Fix data TLB miss in sba_unmap_sg
Rolf Eike Beer reported the following bug:
[1274934.746891] Bad Address (null pointer deref?): Code=15 (Data TLB miss fault) at addr 0000004140000018
[1274934.746891] CPU: 3 PID: 5549 Comm: cmake Not tainted 5.15.4-gentoo-parisc64 #4
[1274934.746891] Hardware name: 9000/785/C8000
[1274934.746891]
[1274934.746891] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[1274934.746891] PSW: 00001000000001001111111000001110 Not tainted
[1274934.746891] r00-03 000000ff0804fe0e 0000000040bc9bc0 00000000406760e4 0000004140000000
[1274934.746891] r04-07 0000000040b693c0 0000004140000000 000000004a2b08b0 0000000000000001
[1274934.746891] r08-11 0000000041f98810 0000000000000000 000000004a0a7000 0000000000000001
[1274934.746891] r12-15 0000000040bddbc0 0000000040c0cbc0 0000000040bddbc0 0000000040bddbc0
[1274934.746891] r16-19 0000000040bde3c0 0000000040bddbc0 0000000040bde3c0 0000000000000007
[1274934.746891] r20-23 0000000000000006 000000004a368950 0000000000000000 0000000000000001
[1274934.746891] r24-27 0000000000001fff 000000000800000e 000000004a1710f0 0000000040b693c0
[1274934.746891] r28-31 0000000000000001 0000000041f988b0 0000000041f98840 000000004a171118
[1274934.746891] sr00-03 00000000066e5800 0000000000000000 0000000000000000 00000000066e5800
[1274934.746891] sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[1274934.746891]
[1274934.746891] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000406760e8 00000000406760ec
[1274934.746891] IIR: 48780030 ISR: 0000000000000000 IOR: 0000004140000018
[1274934.746891] CPU: 3 CR30: 00000040e3a9c000 CR31: ffffffffffffffff
[1274934.746891] ORIG_R28: 0000000040acdd58
[1274934.746891] IAOQ[0]: sba_unmap_sg+0xb0/0x118
[1274934.746891] IAOQ[1]: sba_unmap_sg+0xb4/0x118
[1274934.746891] RP(r2): sba_unmap_sg+0xac/0x118
[1274934.746891] Backtrace:
[1274934.746891] [<00000000402740cc>] dma_unmap_sg_attrs+0x6c/0x70
[1274934.746891] [<000000004074d6bc>] scsi_dma_unmap+0x54/0x60
[1274934.746891] [<00000000407a3488>] mptscsih_io_done+0x150/0xd70
[1274934.746891] [<0000000040798600>] mpt_interrupt+0x168/0xa68
[1274934.746891] [<0000000040255a48>] __handle_irq_event_percpu+0xc8/0x278
[1274934.746891] [<0000000040255c34>] handle_irq_event_percpu+0x3c/0xd8
[1274934.746891] [<000000004025ecb4>] handle_percpu_irq+0xb4/0xf0
[1274934.746891] [<00000000402548e0>] generic_handle_irq+0x50/0x70
[1274934.746891] [<000000004019a254>] call_on_stack+0x18/0x24
[1274934.746891]
[1274934.746891] Kernel panic - not syncing: Bad Address (null pointer deref?)
The bug is caused by overrunning the sglist and incorrectly testing
sg_dma_len(sglist) before nents. Normally this doesn't cause a crash,
but in this case sglist crossed a page boundary. This occurs in the
following code:
while (sg_dma_len(sglist) && nents--) {
The fix is simply to test nents first and move the decrement of nents
into the loop. |
| In the Linux kernel, the following vulnerability has been resolved:
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage()
with detailed analysis and a nice repro.
unix_stream_sendpage() tries to add data to the last skb in the peer's
recv queue without locking the queue.
If the peer's FD is passed to another socket and the socket's FD is
passed to the peer, there is a loop between them. If we close both
sockets without receiving FD, the sockets will be cleaned up by garbage
collection.
The garbage collection iterates such sockets and unlinks skb with
FD from the socket's receive queue under the queue's lock.
So, there is a race where unix_stream_sendpage() could access an skb
locklessly that is being released by garbage collection, resulting in
use-after-free.
To avoid the issue, unix_stream_sendpage() must lock the peer's recv
queue.
Note the issue does not exist in 6.5+ thanks to the recent sendpage()
refactoring.
This patch is originally written by Linus Torvalds.
BUG: unable to handle page fault for address: ffff988004dd6870
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
PREEMPT SMP PTI
CPU: 4 PID: 297 Comm: garbage_uaf Not tainted 6.1.46 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:kmem_cache_alloc_node+0xa2/0x1e0
Code: c0 0f 84 32 01 00 00 41 83 fd ff 74 10 48 8b 00 48 c1 e8 3a 41 39 c5 0f 85 1c 01 00 00 41 8b 44 24 28 49 8b 3c 24 48 8d 4a 40 <49> 8b 1c 06 4c 89 f0 65 48 0f c7 0f 0f 94 c0 84 c0 74 a1 41 8b 44
RSP: 0018:ffffc9000079fac0 EFLAGS: 00000246
RAX: 0000000000000070 RBX: 0000000000000005 RCX: 000000000001a284
RDX: 000000000001a244 RSI: 0000000000400cc0 RDI: 000000000002eee0
RBP: 0000000000400cc0 R08: 0000000000400cc0 R09: 0000000000000003
R10: 0000000000000001 R11: 0000000000000000 R12: ffff888003970f00
R13: 00000000ffffffff R14: ffff988004dd6800 R15: 00000000000000e8
FS: 00007f174d6f3600(0000) GS:ffff88807db00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff988004dd6870 CR3: 00000000092be000 CR4: 00000000007506e0
PKRU: 55555554
Call Trace:
<TASK>
? __die_body.cold+0x1a/0x1f
? page_fault_oops+0xa9/0x1e0
? fixup_exception+0x1d/0x310
? exc_page_fault+0xa8/0x150
? asm_exc_page_fault+0x22/0x30
? kmem_cache_alloc_node+0xa2/0x1e0
? __alloc_skb+0x16c/0x1e0
__alloc_skb+0x16c/0x1e0
alloc_skb_with_frags+0x48/0x1e0
sock_alloc_send_pskb+0x234/0x270
unix_stream_sendmsg+0x1f5/0x690
sock_sendmsg+0x5d/0x60
____sys_sendmsg+0x210/0x260
___sys_sendmsg+0x83/0xd0
? kmem_cache_alloc+0xc6/0x1c0
? avc_disable+0x20/0x20
? percpu_counter_add_batch+0x53/0xc0
? alloc_empty_file+0x5d/0xb0
? alloc_file+0x91/0x170
? alloc_file_pseudo+0x94/0x100
? __fget_light+0x9f/0x120
__sys_sendmsg+0x54/0xa0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x69/0xd3
RIP: 0033:0x7f174d639a7d
Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 8a c1 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 de c1 f4 ff 48
RSP: 002b:00007ffcb563ea50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f174d639a7d
RDX: 0000000000000000 RSI: 00007ffcb563eab0 RDI: 0000000000000007
RBP: 00007ffcb563eb10 R08: 0000000000000000 R09: 00000000ffffffff
R10: 00000000004040a0 R11: 0000000000000293 R12: 00007ffcb563ec28
R13: 0000000000401398 R14: 0000000000403e00 R15: 00007f174d72c000
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
nfs: fix possible null-ptr-deref when parsing param
According to commit "vfs: parse: deal with zero length string value",
kernel will set the param->string to null pointer in vfs_parse_fs_string()
if fs string has zero length.
Yet the problem is that, nfs_fs_context_parse_param() will dereferences the
param->string, without checking whether it is a null pointer, which may
trigger a null-ptr-deref bug.
This patch solves it by adding sanity check on param->string
in nfs_fs_context_parse_param(). |
| In the Linux kernel, the following vulnerability has been resolved:
e1000e: fix heap overflow in e1000_set_eeprom
Fix a possible heap overflow in e1000_set_eeprom function by adding
input validation for the requested length of the change in the EEPROM.
In addition, change the variable type from int to size_t for better
code practices and rearrange declarations to RCT. |
| In the Linux kernel, the following vulnerability has been resolved:
libbpf: Use elf_getshdrnum() instead of e_shnum
This commit replace e_shnum with the elf_getshdrnum() helper to fix two
oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both
reports are incorrectly marked as fixed and while still being
reproducible in the latest libbpf.
# clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704
libbpf: loading object 'fuzz-object' from buffer
libbpf: sec_cnt is 0
libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2
libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1
=================================================================
==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8
WRITE of size 4 at 0x6020000000c0 thread T0
SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds)
#0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24
#1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16
#2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20
...
The issue lie in libbpf's direct use of e_shnum field in ELF header as
the section header count. Where as libelf implemented an extra logic
that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the
initial section header as the real section header count (part of ELF
spec to accommodate situation where section header counter is larger
than SHN_LORESERVE).
The above inconsistency lead to libbpf writing into a zero-entry calloc
area. So intead of using e_shnum directly, use the elf_getshdrnum()
helper provided by libelf to retrieve the section header counter into
sec_cnt. |
| In the Linux kernel, the following vulnerability has been resolved:
irqchip/imx-irqsteer: Handle runtime power management correctly
The power domain is automatically activated from clk_prepare(). However, on
certain platforms like i.MX8QM and i.MX8QXP, the power-on handling invokes
sleeping functions, which triggers the 'scheduling while atomic' bug in the
context switch path during device probing:
BUG: scheduling while atomic: kworker/u13:1/48/0x00000002
Call trace:
__schedule_bug+0x54/0x6c
__schedule+0x7f0/0xa94
schedule+0x5c/0xc4
schedule_preempt_disabled+0x24/0x40
__mutex_lock.constprop.0+0x2c0/0x540
__mutex_lock_slowpath+0x14/0x20
mutex_lock+0x48/0x54
clk_prepare_lock+0x44/0xa0
clk_prepare+0x20/0x44
imx_irqsteer_resume+0x28/0xe0
pm_generic_runtime_resume+0x2c/0x44
__genpd_runtime_resume+0x30/0x80
genpd_runtime_resume+0xc8/0x2c0
__rpm_callback+0x48/0x1d8
rpm_callback+0x6c/0x78
rpm_resume+0x490/0x6b4
__pm_runtime_resume+0x50/0x94
irq_chip_pm_get+0x2c/0xa0
__irq_do_set_handler+0x178/0x24c
irq_set_chained_handler_and_data+0x60/0xa4
mxc_gpio_probe+0x160/0x4b0
Cure this by implementing the irq_bus_lock/sync_unlock() interrupt chip
callbacks and handle power management in them as they are invoked from
non-atomic context.
[ tglx: Rewrote change log, added Fixes tag ] |