kernel-sophgo-sg2044-revyos-6.17-src

jchzhou/kernel-sophgo-sg2044-revyos-6.17-src

Author	SHA1	Message	Date
Matthieu Baerts (NGI0)	9de74c227c	mptcp: pm: in-kernel: C-flag: handle late ADD_ADDR commit e84cb860ac3ce67ec6ecc364433fd5b412c448bc upstream. The special C-flag case expects the ADD_ADDR to be received when switching to 'fully-established'. But for various reasons, the ADD_ADDR could be sent after the "4th ACK", and the special case doesn't work. On NIPA, the new test validating this special case for the C-flag failed a few times, e.g. 102 default limits, server deny join id 0 syn rx [FAIL] got 0 JOIN[s] syn rx expected 2 Server ns stats (...) MPTcpExtAddAddrTx 1 MPTcpExtEchoAdd 1 Client ns stats (...) MPTcpExtAddAddr 1 MPTcpExtEchoAddTx 1 synack rx [FAIL] got 0 JOIN[s] synack rx expected 2 ack rx [FAIL] got 0 JOIN[s] ack rx expected 2 join Rx [FAIL] see above syn tx [FAIL] got 0 JOIN[s] syn tx expected 2 join Tx [FAIL] see above I had a suspicion about what the issue could be: the ADD_ADDR might have been received after the switch to the 'fully-established' state. The issue was not easy to reproduce. The packet capture shown that the ADD_ADDR can indeed be sent with a delay, and the client would not try to establish subflows to it as expected. A simple fix is not to mark the endpoints as 'used' in the C-flag case, when looking at creating subflows to the remote initial IP address and port. In this case, there is no need to try. Note: newly added fullmesh endpoints will still continue to be used as expected, thanks to the conditions behind mptcp_pm_add_addr_c_flag_case. Fixes: 4b1ff850e0c1 ("mptcp: pm: in-kernel: usable client side with C-flag") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251020-net-mptcp-c-flag-late-add-addr-v1-1-8207030cb0e8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-29 14:10:24 +01:00
Stefano Garzarella	a2a4346eea	vsock: fix lock inversion in vsock_assign_transport() commit f7c877e7535260cc7a21484c994e8ce7e8cb6780 upstream. Syzbot reported a potential lock inversion deadlock between vsock_register_mutex and sk_lock-AF_VSOCK when vsock_linger() is called. The issue was introduced by commit `687aa0c558` ("vsock: Fix transport_* TOCTOU") which added vsock_register_mutex locking in vsock_assign_transport() around the transport->release() call, that can call vsock_linger(). vsock_assign_transport() can be called with sk_lock held. vsock_linger() calls sk_wait_event() that temporarily releases and re-acquires sk_lock. During this window, if another thread hold vsock_register_mutex while trying to acquire sk_lock, a circular dependency is created. Fix this by releasing vsock_register_mutex before calling transport->release() and vsock_deassign_transport(). This is safe because we don't need to hold vsock_register_mutex while releasing the old transport, and we ensure the new transport won't disappear by obtaining a module reference first via try_module_get(). Reported-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Tested-by: syzbot+10e35716f8e4929681fa@syzkaller.appspotmail.com Fixes: `687aa0c558` ("vsock: Fix transport_* TOCTOU") Cc: mhal@rbox.co Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251021121718.137668-1-sgarzare@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-29 14:10:23 +01:00
Ralf Lici	5353d39d34	net: datagram: introduce datagram_poll_queue for custom receive queues [ Upstream commit f6ceec6434b5efff62cecbaa2ff74fc29b96c0c6 ] Some protocols using TCP encapsulation (e.g., espintcp, openvpn) deliver userspace-bound packets through a custom skb queue rather than the standard sk_receive_queue. Introduce datagram_poll_queue that accepts an explicit receive queue, and convert datagram_poll into a wrapper around datagram_poll_queue. This allows protocols with custom skb queues to reuse the core polling logic without relying on sk_receive_queue. Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Antonio Quartulli <antonio@openvpn.net> Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20251021100942.195010-2-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: efd729408bc7 ("ovpn: use datagram_poll_queue for socket readiness in TCP") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:20 +01:00
Ralf Lici	d4fa3e039f	espintcp: use datagram_poll_queue for socket readiness [ Upstream commit 0fc3e32c2c069f541f2724d91f5e98480b640326 ] espintcp uses a custom queue (ike_queue) to deliver packets to userspace. The polling logic relies on datagram_poll, which checks sk_receive_queue, which can lead to false readiness signals when that queue contains non-userspace packets. Switch espintcp_poll to use datagram_poll_queue with ike_queue, ensuring poll only signals readiness when userspace data is actually available. Fixes: `e27cca96cd` ("xfrm: add espintcp (RFC 8229)") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20251021100942.195010-3-ralf@mandelbit.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:20 +01:00
Fernando Fernandez Mancera	89941a0d0f	net: hsr: prevent creation of HSR device with slaves from another netns [ Upstream commit c0178eec8884231a5ae0592b9fce827bccb77e86 ] HSR/PRP driver does not handle correctly having slaves/interlink devices in a different net namespace. Currently, it is possible to create a HSR link in a different net namespace than the slaves/interlink with the following command: ip link add hsr0 netns hsr-ns type hsr slave1 eth1 slave2 eth2 As there is no use-case on supporting this scenario, enforce that HSR device link matches netns defined by IFLA_LINK_NETNSID. The iproute2 command mentioned above will throw the following error: Error: hsr: HSR slaves/interlink must be on the same net namespace than HSR link. Fixes: `f421436a59` ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20251020135533.9373-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:20 +01:00
Alexey Simakov	89b465b542	sctp: avoid NULL dereference when chunk data buffer is missing [ Upstream commit 441f0647f7673e0e64d4910ef61a5fb8f16bfb82 ] chunk->skb pointer is dereferenced in the if-block where it's supposed to be NULL only. chunk->skb can only be NULL if chunk->head_skb is not. Check for frag_list instead and do it just before replacing chunk->skb. We're sure that otherwise chunk->skb is non-NULL because of outer if() condition. Fixes: `90017accff` ("sctp: Add GSO support") Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://patch.msgid.link/20251021130034.6333-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:20 +01:00
Wang Liang	99b5b3faf3	net/smc: fix general protection fault in __smc_diag_dump [ Upstream commit f584239a9ed25057496bf397c370cc5163dde419 ] The syzbot report a crash: Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 Call Trace: <TASK> smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 netlink_dump_start include/linux/netlink.h:341 [inline] smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 __sock_diag_cmd net/core/sock_diag.c:249 [inline] sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg net/socket.c:729 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 __sys_sendmsg+0x16d/0x220 net/socket.c:2700 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> The process like this: (CPU1) \| (CPU2) ---------------------------------\|------------------------------- inet_create() \| // init clcsock to NULL \| sk = sk_alloc() \| \| // unexpectedly change clcsock \| inet_init_csk_locks() \| \| // add sk to hash table \| smc_inet_init_sock() \| smc_sk_init() \| smc_hash_sk() \| \| // traverse the hash table \| smc_diag_dump_proto \| __smc_diag_dump() \| // visit wrong clcsock \| smc_diag_msg_common_fill() // alloc clcsock \| smc_create_clcsk \| sock_create_kern \| With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed in inet_init_csk_locks(). The INET_PROTOSW_ICSK flag is no need by smc, just remove it. After removing the INET_PROTOSW_ICSK flag, this patch alse revert commit `6fd27ea183` ("net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC") to avoid casting smc_sock to inet_connection_sock. Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Fixes: `d25a92ccae` ("net/smc: Introduce IPPROTO_SMC") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20251017024827.3137512-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:19 +01:00
Johannes Wiesböck	5d1234c1cc	rtnetlink: Allow deleting FDB entries in user namespace [ Upstream commit bf29555f5bdc017bac22ca66fcb6c9f46ec8788f ] Creating FDB entries is possible from a non-initial user namespace when having CAP_NET_ADMIN, yet, when deleting FDB entries, processes receive an EPERM because the capability is always checked against the initial user namespace. This restricts the FDB management from unprivileged containers. Drop the netlink_capable check in rtnl_fdb_del as it was originally dropped in `c5c351088a` and reintroduced in `1690be63a2` without intention. This patch was tested using a container on GyroidOS, where it was possible to delete FDB entries from an unprivileged user namespace and private network namespace. Fixes: `1690be63a2` ("bridge: Add vlan support to static neighbors") Reviewed-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Tested-by: Harshal Gohel <hg@simonwunderlich.de> Signed-off-by: Johannes Wiesböck <johannes.wiesboeck@aisec.fraunhofer.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20251015201548.319871-1-johannes.wiesboeck@aisec.fraunhofer.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-29 14:10:18 +01:00
Sabrina Dubroca	8e49da5e8f	tls: don't rely on tx_work during send() [ Upstream commit 7f846c65ca11e63d2409868ff039081f80e42ae4 ] With async crypto, we rely on tx_work to actually transmit records once encryption completes. But while send() is running, both the tx_lock and socket lock are held, so tx_work_handler cannot process the queue of encrypted records, and simply reschedules itself. During a large send(), this could last a long time, and use a lot of memory. Transmit any pending encrypted records before restarting the main loop of tls_sw_sendmsg_locked. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/8396631478f70454b44afb98352237d33f48d34d.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:31 +02:00
Sabrina Dubroca	4fc109d0ab	tls: wait for pending async decryptions if tls_strp_msg_hold fails [ Upstream commit b8a6ff84abbcbbc445463de58704686011edc8e1 ] Async decryption calls tls_strp_msg_hold to create a clone of the input skb to hold references to the memory it uses. If we fail to allocate that clone, proceeding with async decryption can lead to various issues (UAF on the skb, writing into userspace memory after the recv() call has returned). In this case, wait for all pending decryption requests. Fixes: `84c61fe1a7` ("tls: rx: do not use the standard strparser") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/b9fe61dcc07dab15da9b35cf4c7d86382a98caf2.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:31 +02:00
Sabrina Dubroca	d9234cae02	tls: always set record_type in tls_process_cmsg [ Upstream commit b6fe4c29bb51cf239ecf48eacf72b924565cb619 ] When userspace wants to send a non-DATA record (via the TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a previous MSG_MORE send() as a separate DATA record. If that DATA record is encrypted asynchronously, tls_handle_open_record will return -EINPROGRESS. This is currently treated as an error by tls_process_cmsg, and it will skip setting record_type to the correct value, but the caller (tls_sw_sendmsg_locked) handles that return value correctly and proceeds with sending the new message with an incorrect record_type (DATA instead of whatever was requested in the cmsg). Always set record_type before handling the open record. If tls_handle_open_record returns an error, record_type will be ignored. If it succeeds, whether with synchronous crypto (returning 0) or asynchronous (returning -EINPROGRESS), the caller will proceed correctly. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/0457252e578a10a94e40c72ba6288b3a64f31662.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:31 +02:00
Sabrina Dubroca	9997b7ece5	tls: wait for async encrypt in case of error during latter iterations of sendmsg [ Upstream commit b014a4e066c555185b7c367efacdc33f16695495 ] If we hit an error during the main loop of tls_sw_sendmsg_locked (eg failed allocation), we jump to send_end and immediately return. Previous iterations may have queued async encryption requests that are still pending. We should wait for those before returning, as we could otherwise be reading from memory that userspace believes we're not using anymore, which would be a sort of use-after-free. This is similar to what tls_sw_recvmsg already does: failures during the main loop jump to the "wait for async" code, not straight to the unlock/return. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/c793efe9673b87f808d84fdefc0f732217030c52.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:31 +02:00
Sabrina Dubroca	8789451d2d	tls: trim encrypted message to match the plaintext on short splice [ Upstream commit ce5af41e3234425a40974696682163edfd21128c ] During tls_sw_sendmsg_locked, we pre-allocate the encrypted message for the size we're expecting to send during the current iteration, but we may end up sending less, for example when splicing: if we're getting the data from small fragments of memory, we may fill up all the slots in the skmsg with less data than expected. In this case, we need to trim the encrypted message to only the length we actually need, to avoid pushing uninitialized bytes down the underlying TCP socket. Fixes: `fe1e81d4f7` ("tls/sw: Support MSG_SPLICE_PAGES") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/66a0ae99c9efc15f88e9e56c1f58f902f442ce86.1760432043.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:31 +02:00
Florian Westphal	00d8c45ac0	net: core: fix lockdep splat on device unregister [ Upstream commit 7f0fddd817ba6daebea1445ae9fab4b6d2294fa8 ] Since blamed commit, unregister_netdevice_many_notify() takes the netdev mutex if the device needs it. If the device list is too long, this will lock more device mutexes than lockdep can handle: unshare -n \ bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done' BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by kworker/u16:1/69: #0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work #1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work #2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net #3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch #4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify [..] Add a helper to close and then unlock a list of net_devices. Devices that are not up have to be skipped - netif_close_many always removes them from the list without any other actions taken, so they'd remain in locked state. Close devices whenever we've used up half of the tracking slots or we processed entire list without hitting the limit. Fixes: `7e4d784f58` ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20251013185052.14021-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:30 +02:00
Eric Dumazet	6e3a266098	tcp: fix tcp_tso_should_defer() vs large RTT [ Upstream commit 295ce1eb36ae47dc862d6c8a1012618a25516208 ] Neal reported that using neper tcp_stream with TCP_TX_DELAY set to 50ms would often lead to flows stuck in a small cwnd mode, regardless of the congestion control. While tcp_stream sets TCP_TX_DELAY too late after the connect(), it highlighted two kernel bugs. The following heuristic in tcp_tso_should_defer() seems wrong for large RTT: delta = tp->tcp_clock_cache - head->tstamp; /* If next ACK is likely to come too late (half srtt), do not defer / if ((s64)(delta - (u64)NSEC_PER_USEC (tp->srtt_us >> 4)) < 0) goto send_now; If next ACK is expected to come in more than 1 ms, we should not defer because we prefer a smooth ACK clocking. While blamed commit was a step in the good direction, it was not generic enough. Another patch fixing TCP_TX_DELAY for established flows will be proposed when net-next reopens. Fixes: `50c8339e92` ("tcp: tso: restore IW10 after TSO autosizing") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251011115742.1245771-1-edumazet@google.com [pabeni@redhat.com: fixed whitespace issue] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:30 +02:00
Dmitry Safonov	b6eb25d870	net/ip6_tunnel: Prevent perpetual tunnel growth [ Upstream commit 21f4d45eba0b2dcae5dbc9e5e0ad08735c993f16 ] Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too. While ipv4 tunnel headroom adjustment growth was limited in commit `5ae1e9922b` ("net: ip_tunnel: prevent perpetual headroom growth"), ipv6 tunnel yet increases the headroom without any ceiling. Reflect ipv4 tunnel headroom adjustment limit on ipv6 version. Credits to Francesco Ruggeri, who was originally debugging this issue and wrote local Arista-specific patch and a reproducer. Fixes: `8eb30be035` ("ipv6: Create ip6_tnl_xmit") Cc: Florian Westphal <fw@strlen.de> Cc: Francesco Ruggeri <fruggeri05@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Link: https://patch.msgid.link/20251009-ip6_tunnel-headroom-v2-1-8e4dbd8f7e35@arista.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:29 +02:00
Tetsuo Handa	a20a6efd64	can: j1939: add missing calls in NETDEV_UNREGISTER notification handler [ Upstream commit 93a27b5891b8194a8c083c9a80d2141d4bf47ba8 ] Currently NETDEV_UNREGISTER event handler is not calling j1939_cancel_active_session() and j1939_sk_queue_drop_all(). This will result in these calls being skipped when j1939_sk_release() is called. And I guess that the reason syzbot is still reporting unregister_netdevice: waiting for vcan0 to become free. Usage count = 2 is caused by lack of these calls. Calling j1939_cancel_active_session(priv, sk) from j1939_sk_release() can be covered by calling j1939_cancel_active_session(priv, NULL) from j1939_netdev_notify(). Calling j1939_sk_queue_drop_all() from j1939_sk_release() can be covered by calling j1939_sk_netdev_event_netdown() from j1939_netdev_notify(). Therefore, we can reuse j1939_cancel_active_session(priv, NULL) and j1939_sk_netdev_event_netdown(priv) for NETDEV_UNREGISTER event handler. Fixes: `7fcbe5b2c6` ("can: j1939: implement NETDEV_UNREGISTER notification handler") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/3ad3c7f8-5a74-4b07-a193-cb0725823558@I-love.SAKURA.ne.jp Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-23 16:24:28 +02:00
Matthieu Baerts (NGI0)	461d135c70	mptcp: reset blackhole on success with non-loopback ifaces commit 833d4313bc1e9e194814917d23e8874d6b651649 upstream. When a first MPTCP connection gets successfully established after a blackhole period, 'active_disable_times' was supposed to be reset when this connection was done via any non-loopback interfaces. Unfortunately, the opposite condition was checked: only reset when the connection was established via a loopback interface. Fixing this by simply looking at the opposite. This is similar to what is done with TCP FastOpen, see tcp_fastopen_active_disable_ofo_check(). This patch is a follow-up of a previous discussion linked to commit 893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable()."), see [1]. Fixes: `27069e7cb3` ("mptcp: disable active MPTCP in case of blackhole") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org [1] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v1-1-bf5818326639@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:36 +02:00
Matthieu Baerts (NGI0)	8b603e8068	mptcp: pm: in-kernel: usable client side with C-flag commit 4b1ff850e0c1aacc23e923ed22989b827b9808f9 upstream. When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port. If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP. The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues. A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode. Fixes: `df377be387` ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250925-net-next-mptcp-c-flag-laminar-v1-1-ad126cc47c6b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:36 +02:00
Alexander Lobakin	5b5fffa7c8	xsk: Harden userspace-supplied xdp_desc validation commit 07ca98f906a403637fc5e513a872a50ef1247f3b upstream. Turned out certain clearly invalid values passed in xdp_desc from userspace can pass xp_{,un}aligned_validate_desc() and then lead to UBs or just invalid frames to be queued for xmit. desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len can cause positive integer overflow and wraparound, the same way low enough desc->addr with a non-zero pool->tx_metadata_len can cause negative integer overflow. Both scenarios can then pass the validation successfully. This doesn't happen with valid XSk applications, but can be used to perform attacks. Always promote desc->len to ``u64`` first to exclude positive overflows of it. Use explicit check_{add,sub}_overflow() when validating desc->addr (which is ``u64`` already). bloat-o-meter reports a little growth of the code size: add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44) Function old new delta xskq_cons_peek_desc 299 330 +31 xsk_tx_peek_release_desc_batch 973 1002 +29 xsk_generic_xmit 3148 3132 -16 but hopefully this doesn't hurt the performance much. Fixes: `341ac980ea` ("xsk: Support tx_metadata_len") Cc: stable@vger.kernel.org # 6.8+ Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20251008165659.4141318-1-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:31 +02:00
Eric Biggers	0b32ff285f	sctp: Fix MAC comparison to be constant-time commit dd91c79e4f58fbe2898dac84858033700e0e99fb upstream. To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: `bbd0d59809` ("[SCTP]: Implement the receive and verification of AUTH chunk") Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20250818205426.30222-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:31 +02:00
Eric Woudstra	d6089b0b75	bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu() [ Upstream commit bbf0c98b3ad9edaea1f982de6c199cc11d3b7705 ] net/bridge/br_private.h:1627 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 7 locks held by socat/410: #0: ffff88800d7a9c90 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_stream_connect+0x43/0xa0 #1: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x62/0x1830 [..] #6: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: nf_hook.constprop.0+0x8a/0x440 Call Trace: lockdep_rcu_suspicious.cold+0x4f/0xb1 br_vlan_fill_forward_path_pvid+0x32c/0x410 [bridge] br_fill_forward_path+0x7a/0x4d0 [bridge] Use to correct helper, non _rcu variant requires RTNL mutex. Fixes: `bcf2766b13` ("net: bridge: resolve forwarding path for VLAN tag actions in bridge devices") Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:13 +02:00
Fernando Fernandez Mancera	4c1cf72ec1	netfilter: nft_objref: validate objref and objrefmap expressions [ Upstream commit f359b809d54c6e3dd1d039b97e0b68390b0e53e4 ] Referencing a synproxy stateful object from OUTPUT hook causes kernel crash due to infinite recursive calls: BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12) [...] Call Trace: __find_rr_leaf+0x99/0x230 fib6_table_lookup+0x13b/0x2d0 ip6_pol_route+0xa4/0x400 fib6_rule_lookup+0x156/0x240 ip6_route_output_flags+0xc6/0x150 __nf_ip6_route+0x23/0x50 synproxy_send_tcp_ipv6+0x106/0x200 synproxy_send_client_synack_ipv6+0x1aa/0x1f0 nft_synproxy_do_eval+0x263/0x310 nft_do_chain+0x5a8/0x5f0 [nf_tables nft_do_chain_inet+0x98/0x110 nf_hook_slow+0x43/0xc0 __ip6_local_out+0xf0/0x170 ip6_local_out+0x17/0x70 synproxy_send_tcp_ipv6+0x1a2/0x200 synproxy_send_client_synack_ipv6+0x1aa/0x1f0 [...] Implement objref and objrefmap expression validate functions. Currently, only NFT_OBJECT_SYNPROXY object type requires validation. This will also handle a jump to a chain using a synproxy object from the OUTPUT hook. Now when trying to reference a synproxy object in the OUTPUT hook, nft will produce the following error: synproxy_crash.nft: Error: Could not process rule: Operation not supported synproxy name mysynproxy ^^^^^^^^^^^^^^^^^^^^^^^^ Fixes: `ee394f96ad` ("netfilter: nft_synproxy: add synproxy stateful object support") Reported-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com> Closes: https://bugzilla.suse.com/1250237 Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:13 +02:00
Daniel Borkmann	7404ce888a	bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6} [ Upstream commit 23f3770e1a53e6c7a553135011f547209e141e72 ] Cilium has a BPF egress gateway feature which forces outgoing K8s Pod traffic to pass through dedicated egress gateways which then SNAT the traffic in order to interact with stable IPs outside the cluster. The traffic is directed to the gateway via vxlan tunnel in collect md mode. A recent BPF change utilized the bpf_redirect_neigh() helper to forward packets after the arrival and decap on vxlan, which turned out over time that the kmalloc-256 slab usage in kernel was ever-increasing. The issue was that vxlan allocates the metadata_dst object and attaches it through a fake dst entry to the skb. The latter was never released though given bpf_redirect_neigh() was merely setting the new dst entry via skb_dst_set() without dropping an existing one first. Fixes: `b4ab314149` ("bpf: Add redirect_neigh helper as redirect drop-in") Reported-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com> Reported-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jordan Rife <jrife@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jordan Rife <jrife@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251003073418.291171-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:11 +02:00
Eric Dumazet	fbe6af6d82	tcp: take care of zero tp->window_clamp in tcp_set_rcvlowat() [ Upstream commit 21b29e74ffe5a6c851c235bb80bf5ee26292c67b ] Some applications (like selftests/net/tcp_mmap.c) call SO_RCVLOWAT on their listener, before accept(). This has an unfortunate effect on wscale selection in tcp_select_initial_window() during 3WHS. For instance, tcp_mmap was negotiating wscale 4, regardless of tcp_rmem[2] and sysctl_rmem_max. Do not change tp->window_clamp if it is zero or bigger than our computed value. Zero value is special, it allows tcp_select_initial_window() to enable autotuning. Note that SO_RCVLOWAT use on listener is probably not wise, because tp->scaling_ratio has a default value, possibly wrong. Fixes: `d1361840f8` ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251003184119.2526655-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:11 +02:00
Kuniyuki Iwashima	64dc47a13a	tcp: Don't call reqsk_fastopen_remove() in tcp_conn_request(). [ Upstream commit 2e7cbbbe3d61c63606994b7ff73c72537afe2e1c ] syzbot reported the splat below in tcp_conn_request(). [0] If a listener is close()d while a TFO socket is being processed in tcp_conn_request(), inet_csk_reqsk_queue_add() does not set reqsk->sk and calls inet_child_forget(), which calls tcp_disconnect() for the TFO socket. After the cited commit, tcp_disconnect() calls reqsk_fastopen_remove(), where reqsk_put() is called due to !reqsk->sk. Then, reqsk_fastopen_remove() in tcp_conn_request() decrements the last req->rsk_refcnt and frees reqsk, and __reqsk_free() at the drop_and_free label causes the refcount underflow for the listener and double-free of the reqsk. Let's remove reqsk_fastopen_remove() in tcp_conn_request(). Note that other callers make sure tp->fastopen_rsk is not NULL. [0]: refcount_t: underflow; use-after-free. WARNING: CPU: 12 PID: 5563 at lib/refcount.c:28 refcount_warn_saturate (lib/refcount.c:28) Modules linked in: CPU: 12 UID: 0 PID: 5563 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:refcount_warn_saturate (lib/refcount.c:28) Code: ab e8 8e b4 98 ff 0f 0b c3 cc cc cc cc cc 80 3d a4 e4 d6 01 00 75 9c c6 05 9b e4 d6 01 01 48 c7 c7 e8 df fb ab e8 6a b4 98 ff <0f> 0b e9 03 5b 76 00 cc 80 3d 7d e4 d6 01 00 0f 85 74 ff ff ff c6 RSP: 0018:ffffa79fc0304a98 EFLAGS: 00010246 RAX: d83af4db1c6b3900 RBX: ffff9f65c7a69020 RCX: d83af4db1c6b3900 RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: ffffffffac78a280 RBP: 000000009d781b60 R08: 0000000000007fff R09: ffffffffac6ca280 R10: 0000000000017ffd R11: 0000000000000004 R12: ffff9f65c7b4f100 R13: ffff9f65c7d23c00 R14: ffff9f65c7d26000 R15: ffff9f65c7a64ef8 FS: 00007f9f962176c0(0000) GS:ffff9f65fcf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000180 CR3: 000000000dbbe006 CR4: 0000000000372ef0 Call Trace: <IRQ> tcp_conn_request (./include/linux/refcount.h:400 ./include/linux/refcount.h:432 ./include/linux/refcount.h:450 ./include/net/sock.h:1965 ./include/net/request_sock.h:131 net/ipv4/tcp_input.c:7301) tcp_rcv_state_process (net/ipv4/tcp_input.c:6708) tcp_v6_do_rcv (net/ipv6/tcp_ipv6.c:1670) tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1906) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438) ip6_input (net/ipv6/ip6_input.c:500) ipv6_rcv (net/ipv6/ip6_input.c:311) __netif_receive_skb (net/core/dev.c:6104) process_backlog (net/core/dev.c:6456) __napi_poll (net/core/dev.c:7506) net_rx_action (net/core/dev.c:7569 net/core/dev.c:7696) handle_softirqs (kernel/softirq.c:579) do_softirq (kernel/softirq.c:480) </IRQ> Fixes: `45c8a6cc2b` ("tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251001233755.1340927-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:10 +02:00
Alexandr Sapozhnikov	badbd79313	net/sctp: fix a null dereference in sctp_disposition sctp_sf_do_5_1D_ce() [ Upstream commit 2f3119686ef50319490ccaec81a575973da98815 ] If new_asoc->peer.adaptation_ind=0 and sctp_ulpevent_make_authkey=0 and sctp_ulpevent_make_authkey() returns 0, then the variable ai_ev remains zero and the zero will be dereferenced in the sctp_ulpevent_free() function. Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Fixes: `30f6ebf65b` ("sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT") Link: https://patch.msgid.link/20251002091448.11-1-alsp705@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-19 16:37:10 +02:00
Olga Kornievskaia	d4c322913a	nfsd: unregister with rpcbind when deleting a transport commit 898374fdd7f06fa4c4a66e8be3135efeae6128d5 upstream. When a listener is added, a part of creation of transport also registers program/port with rpcbind. However, when the listener is removed, while transport goes away, rpcbind still has the entry for that port/type. When deleting the transport, unregister with rpcbind when appropriate. ---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP transport and at xprt destroy send rpcbind unregister if flag set. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: `d093c90892` ("nfsd: fix management of listener transports") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:00 +02:00
Toke Høiland-Jørgensen	f62934cea3	page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches commit 95920c2ed02bde551ab654e9749c2ca7bc3100e0 upstream. Helge reported that the introduction of PP_MAGIC_MASK let to crashes on boot on his 32-bit parisc machine. The cause of this is the mask is set too wide, so the page_pool_page_is_pp() incurs false positives which crashes the machine. Just disabling the check in page_pool_is_pp() will lead to the page_pool code itself malfunctioning; so instead of doing this, this patch changes the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel pointers for page_pool-tagged pages. The fix relies on the kernel pointers that alias with the pp_magic field always being above PAGE_OFFSET. With this assumption, we can use the lowest bit of the value of PAGE_OFFSET as the upper bound of the PP_DMA_INDEX_MASK, which should avoid the false positives. Because we cannot rely on PAGE_OFFSET always being a compile-time constant, nor on it always being >0, we fall back to disabling the dma_index storage when there are not enough bits available. This leaves us in the situation we were in before the patch in the Fixes tag, but only on a subset of architecture configurations. This seems to be the best we can do until the transition to page types in complete for page_pool pages. v2: - Make sure there's at least 8 bits available and that the PAGE_OFFSET bit calculation doesn't wrap Link: https://lore.kernel.org/all/aMNJMFa5fDalFmtn@p100/ Fixes: `ee62ce7a1d` ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") Cc: stable@vger.kernel.org # 6.15+ Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Tested-by: Helge Deller <deller@gmx.de> Link: https://patch.msgid.link/20250930114331.675412-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-19 16:37:00 +02:00
Dominique Martinet	df8462f0fc	net/9p: Fix buffer overflow in USB transport layer commit c04db81cd0288dfc68b7a0f7d09bd49b40bba451 upstream. A buffer overflow vulnerability exists in the USB 9pfs transport layer where inconsistent size validation between packet header parsing and actual data copying allows a malicious USB host to overflow heap buffers. The issue occurs because: - usb9pfs_rx_header() validates only the declared size in packet header - usb9pfs_rx_complete() uses req->actual (actual received bytes) for memcpy This allows an attacker to craft packets with small declared size (bypassing validation) but large actual payload (triggering overflow in memcpy). Add validation in usb9pfs_rx_complete() to ensure req->actual does not exceed the buffer capacity before copying data. Reported-by: Yuhao Jiang <danisjiang@gmail.com> Closes: https://lkml.kernel.org/r/20250616132539.63434-1-danisjiang@gmail.com Fixes: `a3be076dc1` ("net/9p/usbg: Add new usb gadget function transport") Cc: stable@vger.kernel.org Message-ID: <20250622-9p-usb_overflow-v3-1-ab172691b946@codewreck.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-15 12:04:22 +02:00
Lei Lu	ab9a70cd23	sunrpc: fix null pointer dereference on zero-length checksum commit 6df164e29bd4e6505c5a2e0e5f1e1f6957a16a42 upstream. In xdr_stream_decode_opaque_auth(), zero-length checksum.len causes checksum.data to be set to NULL. This triggers a NPD when accessing checksum.data in gss_krb5_verify_mic_v2(). This patch ensures that the value of checksum.len is not less than XDR_UNIT. Fixes: `0653028e8f` ("SUNRPC: Convert gss_verify_header() to use xdr_stream") Cc: stable@kernel.org Signed-off-by: Lei Lu <llfamsec@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-15 12:04:22 +02:00
Deepak Sharma	c395d1e548	net: nfc: nci: Add parameter validation for packet data commit 9c328f54741bd5465ca1dc717c84c04242fac2e1 upstream. Syzbot reported an uninitialized value bug in nci_init_req, which was introduced by commit `5aca7966d2` ("Merge tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools"). This bug arises due to very limited and poor input validation that was done at nic_valid_size(). This validation only validates the skb->len (directly reflects size provided at the userspace interface) with the length provided in the buffer itself (interpreted as NCI_HEADER). This leads to the processing of memory content at the address assuming the correct layout per what opcode requires there. This leads to the accesses to buffer of `skb_buff->data` which is not assigned anything yet. Following the same silent drop of packets of invalid sizes at `nic_valid_size()`, add validation of the data in the respective handlers and return error values in case of failure. Release the skb if error values are returned from handlers in `nci_nft_packet` and effectively do a silent drop Possible TODO: because we silently drop the packets, the call to `nci_request` will be waiting for completion of request and will face timeouts. These timeouts can get excessively logged in the dmesg. A proper handling of them may require to export `nci_request_cancel` (or propagate error handling from the nft packets handlers). Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=740e04c2a93467a0f8c8 Fixes: `6a2968aaf5` ("NFC: basic NCI protocol implementation") Tested-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Deepak Sharma <deepak.sharma.472935@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250925132846.213425-1-deepak.sharma.472935@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-15 12:04:16 +02:00
Eric Dumazet	f575d6d793	tcp: use skb->len instead of skb->truesize in tcp_can_ingest() [ Upstream commit f017c1f768b670bced4464476655b27dfb937e67 ] Some applications are stuck to the 20th century and still use small SO_RCVBUF values. After the blamed commit, we can drop packets especially when using LRO/hw-gro enabled NIC and small MSS (1500) values. LRO/hw-gro NIC pack multiple segments into pages, allowing tp->scaling_ratio to be set to a high value. Whenever the receive queue gets full, we can receive a small packet filling RWIN, but with a high skb->truesize, because most NIC use 4K page plus sk_buff metadata even when receiving less than 1500 bytes of payload. Even if we refine how tp->scaling_ratio is estimated, we could have an issue at the start of the flow, because the first round of packets (IW10) will be sent based on the initial tp->scaling_ratio (1/2) Relax tcp_can_ingest() to use skb->len instead of skb->truesize, allowing the peer to use final RWIN, assuming a 'perfect' scaling_ratio of 1. Fixes: `1d2fbaad7c` ("tcp: stronger sk_rcvbuf checks") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:12 +02:00
Luiz Augusto von Dentz	7a73febfa1	Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements [ Upstream commit 03ddb4ac251463ec5b7b069395d9ab89163dd56c ] When creating an advertisement for BIG the address shall not be non-resolvable since in case of acting as BASS/Broadcast Assistant the address must be the same as the connection in order to use the PAST method and even when PAST/BASS are not in the picture a Periodic Advertisement can still be synchronized thus the same argument as to connectable advertisements still stand. Fixes: `eca0ae4aea` ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:10 +02:00
Pauli Virtanen	38e095509f	Bluetooth: ISO: don't leak skb in ISO_CONT RX [ Upstream commit 5bf863f4c5da055c1eb08887ae4f26d99dbc4aac ] For ISO_CONT RX, the data from skb is copied to conn->rx_skb, but the skb is leaked. Free skb after copying its data. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:10 +02:00
Pauli Virtanen	ec2387951f	Bluetooth: ISO: free rx_skb if not consumed [ Upstream commit 6ba85da5804efffe15c89b03742ea868f20b4172 ] If iso_conn is freed when RX is incomplete, free any leftover skb piece. Fixes: `dc26097bdb` ("Bluetooth: ISO: Use kref to track lifetime of iso_conn") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:10 +02:00
Luiz Augusto von Dentz	c92ad1a155	Bluetooth: ISO: Fix possible UAF on iso_conn_free [ Upstream commit 9950f095d6c875dbe0c9ebfcf972ec88fdf26fc8 ] This attempt to fix similar issue to sco_conn_free where if the conn->sk is not set to NULL may lead to UAF on iso_conn_free. Fixes: `ccf74f2390` ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:10 +02:00
Luiz Augusto von Dentz	33f94b750d	Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO [ Upstream commit 79e562a52adea4afa0601a15964498fae66c823c ] The debug UUID was only getting set if MGMT_OP_READ_EXP_FEATURES_INFO was not called with a specific index which breaks the likes of bluetoothd since it only invokes MGMT_OP_READ_EXP_FEATURES_INFO when an adapter is plugged, so instead of depending hdev not to be set just enable the UUID on any index like it was done with iso_sock_uuid. Fixes: `e625e50cee` ("Bluetooth: Introduce debug feature when dynamic debug is disabled") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:10 +02:00
Eric Dumazet	18986319ed	netfilter: nf_conntrack: do not skip entries in /proc/net/nf_conntrack [ Upstream commit c5ba345b2d358b07cc4f07253ba1ada73e77d586 ] ct_seq_show() has an opportunistic garbage collector : if (nf_ct_should_gc(ct)) { nf_ct_kill(ct); goto release; } So if one nf_conn is killed there, next time ct_get_next() runs, we skip the following item in the bucket, even if it should have been displayed if gc did not take place. We can decrement st->skip_elems to tell ct_get_next() one of the items was removed from the chain. Fixes: `58e207e498` ("netfilter: evict stale entries when user reads /proc/net/nf_conntrack") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:08 +02:00
Fernando Fernandez Mancera	8736a4b5af	netfilter: nfnetlink: reset nlh pointer during batch replay [ Upstream commit 09efbac953f6f076a07735f9ba885148d4796235 ] During a batch replay, the nlh pointer is not reset until the parsing of the commands. Since commit `bf2ac490d2` ("netfilter: nfnetlink: Handle ACK flags for batch messages") that is problematic as the condition to add an ACK for batch begin will evaluate to true even if NLM_F_ACK wasn't used for batch begin message. If there is an error during the command processing, netlink is sending an ACK despite that. This misleads userspace tools which think that the return code was 0. Reset the nlh pointer to the original one when a replay is triggered. Fixes: `bf2ac490d2` ("netfilter: nfnetlink: Handle ACK flags for batch messages") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:07 +02:00
Slavin Liu	a343811ef1	ipvs: Defer ip_vs_ftp unregister during netns cleanup [ Upstream commit 134121bfd99a06d44ef5ba15a9beb075297c0821 ] On the netns cleanup path, __ip_vs_ftp_exit() may unregister ip_vs_ftp before connections with valid cp->app pointers are flushed, leading to a use-after-free. Fix this by introducing a global `exiting_module` flag, set to true in ip_vs_ftp_exit() before unregistering the pernet subsystem. In __ip_vs_ftp_exit(), skip ip_vs_ftp unregister if called during netns cleanup (when exiting_module is false) and defer it to __ip_vs_cleanup_batch(), which unregisters all apps after all connections are flushed. If called during module exit, unregister ip_vs_ftp immediately. Fixes: `61b1ab4583` ("IPVS: netns, add basic init per netns.") Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Slavin Liu <slavin452@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:07 +02:00
Vadim Fedorenko	aec4d4fa05	net: ethtool: tsconfig: set command must provide a reply [ Upstream commit e8ab231782e92bc26e5eb605263525636a2f7ae7 ] Timestamping configuration through ethtool has inconsistent behavior of skipping the reply for set command if configuration was not changed. Fix it be providing reply in any case. Fixes: `6e9e2eed4f` ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250922231924.2769571-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:07 +02:00
Ryder Lee	2feef19cd9	wifi: cfg80211: fix width unit in cfg80211_radio_chandef_valid() [ Upstream commit 17f34ab55a8518ecbd5dcacec48e6ee903f7c1d0 ] The original code used nl80211_chan_width_to_mhz(), which returns the width in MHz. However, the expected unit is KHz. Fixes: `510dba80ed` ("wifi: cfg80211: add helper for checking if a chandef is valid on a radio") Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://patch.msgid.link/df54294e6c4ed0f3ceff6e818b710478ddfc62c0.1758579480.git.Ryder%20Lee%20ryder.lee@mediatek.com/ Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:05 +02:00
Aditya Kumar Singh	7009aca194	wifi: mac80211: fix Rx packet handling when pubsta information is not available [ Upstream commit 32d340ae675800672e1219444a17940a8efe5cca ] In ieee80211_rx_handle_packet(), if the caller does not provide pubsta information, an attempt is made to find the station using the address 2 (source address) field in the header. Since pubsta is missing, link information such as link_valid and link_id is also unavailable. Now if such a situation comes, and if a matching ML station entry is found based on the source address, currently the packet is dropped due to missing link ID in the status field which is not correct. Hence, to fix this issue, if link_valid is not set and the station is an ML station, make an attempt to find a link station entry using the source address. If a valid link station is found, derive the link ID and proceed with packet processing. Otherwise, drop the packet as per the existing flow. Fixes: `ea9d807b56` ("wifi: mac80211: add link information in ieee80211_rx_status") Suggested-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Signed-off-by: Aditya Kumar Singh <aditya.kumar.singh@oss.qualcomm.com> Link: https://patch.msgid.link/20250917-fix_data_packet_rx_with_mlo_and_no_pubsta-v1-1-8cf971a958ac@oss.qualcomm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:03 +02:00
Kuniyuki Iwashima	cc976ec9e3	mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable(). [ Upstream commit 893c49a78d9f85e4b8081b908fb7c407d018106a ] mptcp_active_enable() is called from subflow_finish_connect(), which is icsk->icsk_af_ops->sk_rx_dst_set() and it's not always under RCU. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dst_dev_rcu(). Fixes: `27069e7cb3` ("mptcp: disable active MPTCP in case of blackhole") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916214758.650211-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:01 +02:00
Kuniyuki Iwashima	0a14dbd8a2	mptcp: Call dst_release() in mptcp_active_enable(). [ Upstream commit 108a86c71c93ff28087994e6107bc99ebe336629 ] mptcp_active_enable() calls sk_dst_get(), which returns dst with its refcount bumped, but forgot dst_release(). Let's add missing dst_release(). Cc: stable@vger.kernel.org Fixes: `27069e7cb3` ("mptcp: disable active MPTCP in case of blackhole") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916214758.650211-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable().") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:01 +02:00
Kuniyuki Iwashima	feb474ddbf	tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock(). [ Upstream commit c65f27b9c3be2269918e1cbad6d8884741f835c5 ] get_netdev_for_sock() is called during setsockopt(), so not under RCU. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dst_dev_rcu(). Note that the only ->ndo_sk_get_lower_dev() user is bond_sk_get_lower_dev(), which uses RCU. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20250916214758.650211-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:01 +02:00
Kuniyuki Iwashima	f6adf7a180	smc: Use __sk_dst_get() and dst_dev_rcu() in smc_vlan_by_tcpsk(). [ Upstream commit 0b0e4d51c6554e5ecc3f8cc73c2eaf12da21249a ] smc_vlan_by_tcpsk() fetches sk_dst_get(sk)->dev before RTNL and passes it to netdev_walk_all_lower_dev(), which is illegal. Also, smc_vlan_by_tcpsk_walk() does not require RTNL at all. Let's use __sk_dst_get(), dst_dev_rcu(), and netdev_walk_all_lower_dev_rcu(). Note that the returned value of smc_vlan_by_tcpsk() is not used in the caller. Fixes: `0cfdd8f92c` ("smc: connection and link group creation") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916214758.650211-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:00 +02:00
Kuniyuki Iwashima	d26e80f7fb	smc: Use __sk_dst_get() and dst_dev_rcu() in smc_clc_prfx_match(). [ Upstream commit 235f81045c008169cc4e1955b4a64e118eebe61b ] smc_clc_prfx_match() is called from smc_listen_work() and not under RCU nor RTNL. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dst_dev_rcu(). Note that the returned value of smc_clc_prfx_match() is not used in the caller. Fixes: `a046d57da1` ("smc: CLC handshake (incl. preparation steps)") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916214758.650211-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:00 +02:00
Kuniyuki Iwashima	0736993bfe	smc: Use __sk_dst_get() and dst_dev_rcu() in in smc_clc_prfx_set(). [ Upstream commit 935d783e5de9b64587f3adb25641dd8385e64ddb ] smc_clc_prfx_set() is called during connect() and not under RCU nor RTNL. Using sk_dst_get(sk)->dev could trigger UAF. Let's use __sk_dst_get() and dev_dst_rcu() under rcu_read_lock() after kernel_getsockname(). Note that the returned value of smc_clc_prfx_set() is not used in the caller. While at it, we change the 1st arg of smc_clc_prfx_set[46]_rcu() not to touch dst there. Fixes: `a046d57da1` ("smc: CLC handshake (incl. preparation steps)") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250916214758.650211-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-15 12:04:00 +02:00

1 2 3 4 5 ...

81727 Commits