77d2e05abd45886dcad2b632c738cf46b9f7c19e
26925 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
77d2e05abd |
bpf: Add bpf_verifier_vlog() and bpf_verifier_log_needed()
The BTF (BPF Type Format) verifier needs to reuse the current
BPF verifier log. Hence, it requires the following changes:
(1) Expose log_write() in verifier.c for other users.
Its name is renamed to bpf_verifier_vlog().
(2) The BTF verifier also needs to check
'log->level && log->ubuf && !bpf_verifier_log_full(log);'
independently outside of the current log_write(). It is
because the BTF verifier will do one-check before
making multiple calls to btf_verifier_vlog to log
the details of a type.
Hence, this check is also re-factored to a new function
bpf_verifier_log_needed(). Since it is re-factored,
we can check it before va_start() in the current
bpf_verifier_log_write() and verbose().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
b9193c1b61 |
bpf: Rename bpf_verifer_log
bpf_verifer_log => bpf_verifier_log Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
abe0884011 |
bpf: Remove struct bpf_verifier_env argument from print_bpf_insn
We use print_bpf_insn in user space (bpftool and soon perf), so it'd be nice to keep it generic and strip it off the kernel struct bpf_verifier_env argument. This argument can be safely removed, because its users can use the struct bpf_insn_cbs::private_data to pass it. By changing the argument type we can no longer have clean 'verbose' alias to 'bpf_verifier_log_write' in verifier.c. Instead we're adding the 'verbose' cb_print callback and removing the alias. This way we have new cb_print callback in place, and all the 'verbose(env, ...) calls in verifier.c will cleanly cast to 'verbose(void *, ...)' so no other change is needed. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
03fe2debbb |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fun set of conflict resolutions here...
For the mac80211 stuff, these were fortunately just parallel
adds. Trivially resolved.
In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.
In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.
The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.
The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:
====================
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit
|
||
|
|
8401c72c59 |
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Two regression fixes, two bug fixes for older issues, two fixes for
new functionality added this cycle that have userspace ABI concerns,
and a small cleanup. These have appeared in a linux-next release and
have a build success report from the 0day robot.
* The 4.16 rework of altmap handling led to some configurations
leaking page table allocations due to freeing from the altmap
reservation rather than the page allocator.
The impact without the fix is leaked memory and a WARN() message
when tearing down libnvdimm namespaces. The rework also missed a
place where error handling code needed to be removed that can lead
to a crash if devm_memremap_pages() fails.
* acpi_map_pxm_to_node() had a latent bug whereby it could
misidentify the closest online node to a given proximity domain.
* Block integrity handling was reworked several kernels back to allow
calling add_disk() after setting up the integrity profile.
The nd_btt and nd_blk drivers are just now catching up to fix
automatic partition detection at driver load time.
* The new peristence_domain attribute, a platform indicator of
whether cpu caches are powerfail protected for example, is meant to
be a single value enum and not a set of flags.
This oversight was caught while reviewing new userspace code in
libndctl to communicate the attribute.
Fix this new enabling up so that we are not stuck with an unwanted
userspace ABI"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, nfit: fix persistence domain reporting
libnvdimm, region: hide persistence_domain when unknown
acpi, numa: fix pxm to online numa node associations
x86, memremap: fix altmap accounting at free
libnvdimm: remove redundant assignment to pointer 'dev'
libnvdimm, {btt, blk}: do integrity setup before add_disk()
kernel/memremap: Remove stale devres_free() call
|
||
|
|
394c73d396 |
Merge tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fix from Jessica Yu: "Propagate error in modules_open() to avoid possible later NULL dereference if seq_open() had failed" * tag 'modules-for-v4.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: propagate error in modules_open() |
||
|
|
c4f4d2f917 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Always validate XFRM esn replay attribute, from Florian Westphal.
2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long.
3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul
Triebitz.
4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel
Axtens.
5) Fix some interrupt handling issues in e1000e driver, from Benjamin
Poitier.
6) Use strlcpy() in several ethtool get_strings methods, from Florian
Fainelli.
7) Fix rhlist dup insertion, from Paul Blakey.
8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev.
9) Fix driver unload crash when link is up in smsc911x, from Jeremy
Linton.
10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric
Dumazet.
11) Need to purge the write queue when TCP connections are aborted,
otherwise userspace using MSG_ZEROCOPY can't close the fd. From
Soheil Hassas Yeganeh.
12) Fix double free in error path of team driver, from Arkadi
Sharshevsky.
13) Filter fixes for hv_netvsc driver, from Stephen Hemminger.
14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo
Bianconi.
15) Properly filter out unsupported feature flags in macvlan driver,
from Shannon Nelson.
16) Don't request loading the diag module for a protocol if the protocol
itself is not even registered. From Xin Long.
17) If datagram connect fails in ipv6, make sure the socket state is
consistent afterwards. From Paolo Abeni.
18) Use after free in qed driver, from Dan Carpenter.
19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the
entry. From Sabrina Dubroca.
20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins.
21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita.
22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel.
23) Fix NULL derefs in error paths of tcf_*_init(), from Davide Caratti.
24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli.
25) Memory leak in gemini driver, from Igor Pylypiv.
26) IDR leaks in error paths of tcf_*_init() functions, from Davide
Caratti.
27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun.
28) Missing dev_put() in error path of macsec_newlink(), from Dan
Carpenter.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits)
macsec: missing dev_put() on error in macsec_newlink()
net: dsa: Fix functional dsa-loop dependency on FIXED_PHY
hv_netvsc: common detach logic
hv_netvsc: change GPAD teardown order on older versions
hv_netvsc: use RCU to fix concurrent rx and queue changes
hv_netvsc: disable NAPI before channel close
net/ipv6: Handle onlink flag with multipath routes
ppp: avoid loop in xmit recursion detection code
ipv6: sr: fix NULL pointer dereference when setting encap source address
ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
net: aquantia: driver version bump
net: aquantia: Implement pci shutdown callback
net: aquantia: Allow live mac address changes
net: aquantia: Add tx clean budget and valid budget handling logic
net: aquantia: Change inefficient wait loop on fw data reads
net: aquantia: Fix a regression with reset on old firmware
net: aquantia: Fix hardware reset when SPI may rarely hangup
s390/qeth: on channel error, reject further cmd requests
s390/qeth: lock read device while queueing next buffer
s390/qeth: when thread completes, wake up all waiters
...
|
||
|
|
0fa4fe85f4 |
bpf: skip unnecessary capability check
The current check statement in BPF syscall will do a capability check for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This code path will trigger unnecessary security hooks on capability checking and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN access. This can be resolved by simply switch the order of the statement and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is allowed. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
f005afede9 |
trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs
Commit |
||
|
|
1b5f3ba415 |
Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo: "Two commits to fix the following subtle cgroup2 behavior bugs: - cpu.max was rejecting config when it shouldn't - thread mode enable was allowed when it shouldn't" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix rule checking for threaded mode switching sched, cgroup: Don't reject lower cpu.max on ancestors |
||
|
|
c6256ca9c0 |
Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo: "Two low-impact workqueue commits. One fixes workqueue creation error path and the other removes the unused cancel_work()" * 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove unused cancel_work() workqueue: use put_device() instead of kfree() |
||
|
|
4f738adba3 |
bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
This implements a BPF ULP layer to allow policy enforcement and monitoring at the socket layer. In order to support this a new program type BPF_PROG_TYPE_SK_MSG is used to run the policy at the sendmsg/sendpage hook. To attach the policy to sockets a sockmap is used with a new program attach type BPF_SK_MSG_VERDICT. Similar to previous sockmap usages when a sock is added to a sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT program type attached then the BPF ULP layer is created on the socket and the attached BPF_PROG_TYPE_SK_MSG program is run for every msg in sendmsg case and page/offset in sendpage case. BPF_PROG_TYPE_SK_MSG Semantics/API: BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and SK_DROP. Returning SK_DROP free's the copied data in the sendmsg case and in the sendpage case leaves the data untouched. Both cases return -EACESS to the user. Returning SK_PASS will allow the msg to be sent. In the sendmsg case data is copied into kernel space buffers before running the BPF program. The kernel space buffers are stored in a scatterlist object where each element is a kernel memory buffer. Some effort is made to coalesce data from the sendmsg call here. For example a sendmsg call with many one byte iov entries will likely be pushed into a single entry. The BPF program is run with data pointers (start/end) pointing to the first sg element. In the sendpage case data is not copied. We opt not to copy the data by default here, because the BPF infrastructure does not know what bytes will be needed nor when they will be needed. So copying all bytes may be wasteful. Because of this the initial start/end data pointers are (0,0). Meaning no data can be read or written. This avoids reading data that may be modified by the user. A new helper is added later in this series if reading and writing the data is needed. The helper call will do a copy by default so that the page is exclusively owned by the BPF call. The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg in the sendmsg() case and the entire page/offset in the sendpage case. This avoids ambiguity on how to handle mixed return codes in the sendmsg case. Again a helper is added later in the series if a verdict needs to apply to multiple system calls and/or only a subpart of the currently being processed message. The helper msg_redirect_map() can be used to select the socket to send the data on. This is used similar to existing redirect use cases. This allows policy to redirect msgs. Pseudo code simple example: The basic logic to attach a program to a socket is as follows, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); Adding sockets to the map is done in the normal way, // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above any socket attached to "my_sock_map", in this case 'fd', will run the BPF msg verdict program (msg_prog) on every sendmsg and sendpage system call. For a complete example see BPF selftests or sockmap samples. Implementation notes: It seemed the simplest, to me at least, to use a refcnt to ensure psock is not lost across the sendmsg copy into the sg, the bpf program running on the data in sg_data, and the final pass to the TCP stack. Some performance testing may show a better method to do this and avoid the refcnt cost, but for now use the simpler method. Another item that will come after basic support is in place is supporting MSG_MORE flag. At the moment we call sendpages even if the MSG_MORE flag is set. An enhancement would be to collect the pages into a larger scatterlist and pass down the stack. Notice that bpf_tcp_sendmsg() could support this with some additional state saved across sendmsg calls. I built the code to support this without having to do refactoring work. Other features TBD include ZEROCOPY and the TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series shortly. Future work could improve size limits on the scatterlist rings used here. Currently, we use MAX_SKB_FRAGS simply because this was being used already in the TLS case. Future work could extend the kernel sk APIs to tune this depending on workload. This is a trade-off between memory usage and throughput performance. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
ffa3566001 |
sockmap: convert refcnt to an atomic refcnt
The sockmap refcnt up until now has been wrapped in the sk_callback_lock(). So its not actually needed any locking of its own. The counter itself tracks the lifetime of the psock object. Sockets in a sockmap have a lifetime that is independent of the map they are part of. This is possible because a single socket may be in multiple maps. When this happens we can only release the psock data associated with the socket when the refcnt reaches zero. There are three possible delete sock reference decrement paths first through the normal sockmap process, the user deletes the socket from the map. Second the map is removed and all sockets in the map are removed, delete path is similar to case 1. The third case is an asyncronous socket event such as a closing the socket. The last case handles removing sockets that are no longer available. For completeness, although inc does not pose any problems in this patch series, the inc case only happens when a psock is added to a map. Next we plan to add another socket prog type to handle policy and monitoring on the TX path. When we do this however we will need to keep a reference count open across the sendmsg/sendpage call and holding the sk_callback_lock() here (on every send) seems less than ideal, also it may sleep in cases where we hit memory pressure. Instead of dealing with these issues in some clever way simply make the reference counting a refcnt_t type and do proper atomic ops. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
9e1909b9da |
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
"Another set of melted spectrum updates:
- Iron out the last late microcode loading issues by actually
checking whether new microcode is present and preventing the CPU
synchronization to run into a timeout induced hang.
- Remove Skylake C2 from the microcode blacklist according to the
latest Intel documentation
- Fix the VM86 POPF emulation which traps if VIP is set, but VIF is
not. Enhance the selftests to catch that kind of issue
- Annotate indirect calls/jumps for objtool on 32bit. This is not a
functional issue, but for consistency sake its the right thing to
do.
- Fix a jump label build warning observed on SPARC64 which uses 32bit
storage for the code location which is casted to 64 bit pointer w/o
extending it to 64bit first.
- Add two new cpufeature bits. Not really an urgent issue, but
provides them for both x86 and x86/kvm work. No impact on the
current kernel"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Fix CPU synchronization routine
x86/microcode: Attempt late loading only when new microcode is present
x86/speculation: Remove Skylake C2 from Speculation Control microcode blacklist
jump_label: Fix sparc64 warning
x86/speculation, objtool: Annotate indirect calls/jumps for objtool on 32-bit kernels
x86/vm86/32: Fix POPF emulation
selftests/x86/entry_from_vm86: Add test cases for POPF
selftests/x86/entry_from_vm86: Exit with 1 if we fail
x86/cpufeatures: Add Intel PCONFIG cpufeature
x86/cpufeatures: Add Intel Total Memory Encryption cpufeature
|
||
|
|
615755a77b |
bpf: extend stackmap to save binary_build_id+offset instead of address
Currently, bpf stackmap store address for each entry in the call trace.
To map these addresses to user space files, it is necessary to maintain
the mapping from these virtual address to symbols in the binary. Usually,
the user space profiler (such as perf) has to scan /proc/pid/maps at the
beginning of profiling, and monitor mmap2() calls afterwards. Given the
cost of maintaining the address map, this solution is not practical for
system wide profiling that is always on.
This patch tries to solve this problem with a variation of stackmap. This
variation is enabled by flag BPF_F_STACK_BUILD_ID. Instead of storing
addresses, the variation stores ELF file build_id + offset.
Build ID is a 20-byte unique identifier for ELF files. The following
command shows the Build ID of /bin/bash:
[user@]$ readelf -n /bin/bash
...
Build ID: XXXXXXXXXX
...
With BPF_F_STACK_BUILD_ID, bpf_get_stackid() tries to parse Build ID
for each entry in the call trace, and translate it into the following
struct:
struct bpf_stack_build_id_offset {
__s32 status;
unsigned char build_id[BPF_BUILD_ID_SIZE];
union {
__u64 offset;
__u64 ip;
};
};
The search of build_id is limited to the first page of the file, and this
page should be in page cache. Otherwise, we fallback to store ip for this
entry (ip field in struct bpf_stack_build_id_offset). This requires the
build_id to be stored in the first page. A quick survey of binary and
dynamic library files in a few different systems shows that almost all
binary and dynamic library files have build_id in the first page.
Build_id is only meaningful for user stack. If a kernel stack is added to
a stackmap with BPF_F_STACK_BUILD_ID, it will automatically fallback to
only store ip (status == BPF_STACK_BUILD_ID_IP). Similarly, if build_id
lookup failed for some reason, it will also fallback to store ip.
User space can access struct bpf_stack_build_id_offset with bpf
syscall BPF_MAP_LOOKUP_ELEM. It is necessary for user space to
maintain mapping from build id to binary files. This mostly static
mapping is much easier to maintain than per process address maps.
Note: Stackmap with build_id only works in non-nmi context at this time.
This is because we need to take mm->mmap_sem for find_vma(). If this
changes, we would like to allow build_id lookup in nmi context.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
af1d830bf3 |
jump_label: Fix sparc64 warning
The kbuild test robot reported the following warning on sparc64:
kernel/jump_label.c: In function '__jump_label_update':
kernel/jump_label.c:376:51: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
WARN_ONCE(1, "can't patch jump_label at %pS", (void *)entry->code);
On sparc64, the jump_label entry->code field is of type u32, but
pointers are 64-bit. Silence the warning by casting entry->code to an
unsigned long before casting it to a pointer. This is also what the
sparc jump label code does.
Fixes:
|
||
|
|
6417250d3f |
workqueue: remove unused cancel_work()
Found this by accident. There are no usages of bare cancel_work() in current kernel source. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
537f4146c5 |
workqueue: use put_device() instead of kfree()
Never directly free @dev after calling device_register(), even if it returned an error! Always use put_device() to give up the reference initialized in this function instead. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
|
|
b6b76dd62c |
error-injection: Fix to prohibit jump optimization
Since the kprobe which was optimized by jump can not change
the execution path, the kprobe for error-injection must not
be optimized. To prohibit it, set a dummy post-handler as
officially stated in Documentation/kprobes.txt.
Fixes:
|
||
|
|
8ad4424350 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Thomas Gleixner:
"Another set of perf updates:
- Fix a Skylake Uncore event format declaration
- Prevent perf pipe mode from crahsing which was caused by a missing
buffer allocation
- Make the perf top popup message which tells the user that it uses
fallback mode on older kernels a debug message.
- Make perf context rescheduling work correcctly
- Robustify the jump error drawing in perf browser mode so it does
not try to create references to NULL initialized offset entries
- Make trigger_on() robust so it does not enable the trigger before
everything is set up correctly to handle it
- Make perf auxtrace respect the --no-itrace option so it does not
try to queue AUX data for decoding.
- Prevent having different number of field separators in CVS output
lines when a counter is not supported.
- Make the perf kallsyms man page usage behave like it does for all
other perf commands.
- Synchronize the kernel headers"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix ctx_event_type in ctx_resched()
perf tools: Fix trigger class trigger_on()
perf auxtrace: Prevent decoding when --no-itrace
perf stat: Fix CVS output format for non-supported counters
tools headers: Sync x86's cpufeatures.h
tools headers: Sync copy of kvm UAPI headers
perf record: Fix crash in pipe mode
perf annotate browser: Be more robust when drawing jump arrows
perf top: Fix annoying fallback message on older kernels
perf kallsyms: Fix the usage on the man page
perf/x86/intel/uncore: Fix Skylake UPI event format
|
||
|
|
02bf0ef028 |
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner: "rt_mutex_futex_unlock() grew a new irq-off call site, but the function assumes that its always called from irq enabled context. Use (un)lock_irqsafe() to handle the new call site correctly" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites |
||
|
|
0862ca422b |
bug: use %pB in BUG and stack protector failure
The BUG and stack protector reports were still using a raw %p. This
changes it to %pB for more meaningful output.
Link: http://lkml.kernel.org/r/20180301225704.GA34198@beast
Fixes:
|
||
|
|
6b0ef92fee |
rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites
When running rcutorture with TREE03 config, CONFIG_PROVE_LOCKING=y, and kernel cmdline argument "rcutorture.gp_exp=1", lockdep reports a HARDIRQ-safe->HARDIRQ-unsafe deadlock: ================================ WARNING: inconsistent lock state 4.16.0-rc4+ #1 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. takes: __schedule+0xbe/0xaf0 {IN-HARDIRQ-W} state was registered at: _raw_spin_lock+0x2a/0x40 scheduler_tick+0x47/0xf0 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rq->lock); <Interrupt> lock(&rq->lock); *** DEADLOCK *** 1 lock held by rcu_torture_rea/724: rcu_torture_read_lock+0x0/0x70 stack backtrace: CPU: 2 PID: 724 Comm: rcu_torture_rea Not tainted 4.16.0-rc4+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: lock_acquire+0x90/0x200 ? __schedule+0xbe/0xaf0 _raw_spin_lock+0x2a/0x40 ? __schedule+0xbe/0xaf0 __schedule+0xbe/0xaf0 preempt_schedule_irq+0x2f/0x60 retint_kernel+0x1b/0x2d RIP: 0010:rcu_read_unlock_special+0x0/0x680 ? rcu_torture_read_unlock+0x60/0x60 __rcu_read_unlock+0x64/0x70 rcu_torture_read_unlock+0x17/0x60 rcu_torture_reader+0x275/0x450 ? rcutorture_booster_init+0x110/0x110 ? rcu_torture_stall+0x230/0x230 ? kthread+0x10e/0x130 kthread+0x10e/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ? call_usermodehelper_exec_async+0x11a/0x150 ret_from_fork+0x3a/0x50 This happens with the following even sequence: preempt_schedule_irq(); local_irq_enable(); __schedule(): local_irq_disable(); // irq off ... rcu_note_context_switch(): rcu_note_preempt_context_switch(): rcu_read_unlock_special(): local_irq_save(flags); ... raw_spin_unlock_irqrestore(...,flags); // irq remains off rt_mutex_futex_unlock(): raw_spin_lock_irq(); ... raw_spin_unlock_irq(); // accidentally set irq on <return to __schedule()> rq_lock(): raw_spin_lock(); // acquiring rq->lock with irq on which means rq->lock becomes a HARDIRQ-unsafe lock, which can cause deadlocks in scheduler code. This problem was introduced by commit |
||
|
|
6d8cb045cd |
bpf: comment why dots in filenames under BPF virtual FS are not allowed
When pinning a file under the BPF virtual file system (traditionally /sys/fs/bpf), using a dot in the name of the location to pin at is not allowed. For example, trying to pin at "/sys/fs/bpf/foo.bar" will be rejected with -EPERM. This check was introduced at the same time as the BPF file system itself, with commit |
||
|
|
bd903afeb5 |
perf/core: Fix ctx_event_type in ctx_resched()
In ctx_resched(), EVENT_FLEXIBLE should be sched_out when EVENT_PINNED is
added. However, ctx_resched() calculates ctx_event_type before checking
this condition. As a result, pinned events will NOT get higher priority
than flexible events.
The following shows this issue on an Intel CPU (where ref-cycles can
only use one hardware counter).
1. First start:
perf stat -C 0 -e ref-cycles -I 1000
2. Then, in the second console, run:
perf stat -C 0 -e ref-cycles:D -I 1000
The second perf uses pinned events, which is expected to have higher
priority. However, because it failed in ctx_resched(). It is never
run.
This patch fixes this by calculating ctx_event_type after re-evaluating
event_type.
Reported-by: Ephraim Park <ephiepark@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <jolsa@redhat.com>
Cc: <kernel-team@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes:
|
||
|
|
3f553b308b |
module: propagate error in modules_open()
otherwise kernel can oops later in seq_release() due to dereferencing null
file->private_data which is only set if seq_open() succeeds.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: seq_release+0xc/0x30
Call Trace:
close_pdeo+0x37/0xd0
proc_reg_release+0x5d/0x60
__fput+0x9d/0x1d0
____fput+0x9/0x10
task_work_run+0x75/0x90
do_exit+0x252/0xa00
do_group_exit+0x36/0xb0
SyS_exit_group+0xf/0x10
Fixes:
|
||
|
|
e67548254b |
Merge tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
"A miscellaneous pile of MIPS fixes for 4.16:
- move put_compat_sigset() to evade hardened usercopy warnings (4.16)
- select ARCH_HAVE_PC_{SERIO,PARPORT} for Loongson64 platforms (4.16)
- fix kzalloc() failure handling in ath25 (3.19) and Octeon (4.0)
- fix disabling of IPIs during BMIPS suspend (3.19)"
* tag 'mips_fixes_4.16_4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
MIPS: BMIPS: Do not mask IPIs during suspend
MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_SERIO
MIPS: Loongson64: Select ARCH_MIGHT_HAVE_PC_PARPORT
signals: Move put_compat_sigset to compat.h to silence hardened usercopy
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
MIPS: ath25: Check for kzalloc allocation failure
|
||
|
|
95da0cdb72 |
bpf: add support to read sample address in bpf program
This commit adds new field "addr" to bpf_perf_event_data which could be read and used by bpf programs attached to perf events. The value of the field is copied from bpf_perf_event_data_kern.addr and contains the address value recorded by specifying sample_type with PERF_SAMPLE_ADDR when calling perf_event_open. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
167f5594b5 |
kernel/memremap: Remove stale devres_free() call
devm_memremap_pages() was re-worked in |
||
|
|
0f3e9c97eb |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All of the conflicts were cases of overlapping changes. In net/core/devlink.c, we have to make care that the resouce size_params have become a struct member rather than a pointer to such an object. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
547046141f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Use an appropriate TSQ pacing shift in mac80211, from Toke
Høiland-Jørgensen.
2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
in ip6_route_me_harder, from Eric Dumazet.
3) Fix several shutdown races and similar other problems in l2tp, from
James Chapman.
4) Handle missing XDP flush properly in tuntap, for real this time.
From Jason Wang.
5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
Borkmann.
6) Fix phy_resume() locking, from Andrew Lunn.
7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
from Xin Long.
8) Revert F-RTO middle box workarounds, they only handle one dimension
of the problem. From Yuchung Cheng.
9) Fix socket refcounting in RDS, from Ka-Cheong Poon.
10) Don't allow ppp unit registration to an unregistered channel, from
Guillaume Nault.
11) Various hv_netvsc fixes from Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
hv_netvsc: propagate rx filters to VF
hv_netvsc: filter multicast/broadcast
hv_netvsc: defer queue selection to VF
hv_netvsc: use napi_schedule_irqoff
hv_netvsc: fix race in napi poll when rescheduling
hv_netvsc: cancel subchannel setup before halting device
hv_netvsc: fix error unwind handling if vmbus_open fails
hv_netvsc: only wake transmit queue if link is up
hv_netvsc: avoid retry on send during shutdown
virtio-net: re enable XDP_REDIRECT for mergeable buffer
ppp: prevent unregistered channels from connecting to PPP units
tc-testing: skbmod: fix match value of ethertype
mlxsw: spectrum_switchdev: Check success of FDB add operation
net: make skb_gso_*_seglen functions private
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
rds: Incorrect reference counting in TCP socket creation
net: ethtool: don't ignore return from driver get_fecparam method
vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
...
|
||
|
|
4c4ce3022d |
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes from the timer departement:
- Add a missing timer wheel clock forward when migrating timers off a
unplugged CPU to prevent operating on a stale clock base and
missing timer deadlines.
- Use the proper shift count to extract data from a register value to
prevent evaluating unrelated bits
- Make the error return check in the FSL timer driver work correctly.
Checking an unsigned variable for less than zero does not really
work well.
- Clarify the confusing comments in the ARC timer code"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Forward timer base before migrating timers
clocksource/drivers/arc_timer: Update some comments
clocksource/drivers/mips-gic-timer: Use correct shift count to extract data
clocksource/drivers/fsl_ftm_timer: Fix error return checking
|
||
|
|
20f14172cb |
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A 4.16 regression fix, three fixes for -stable, and a cleanup fix:
- During the merge window support for the new ACPI NVDIMM Platform
Capabilities structure disabled support for "deep flush", a
force-unit- access like mechanism for persistent memory. Restore
that mechanism.
- VFIO like RDMA is yet one more memory registration / pinning
interface that is incompatible with Filesystem-DAX. Disable long
term pins of Filesystem-DAX mappings via VFIO.
- The Filesystem-DAX detection to prevent long terms pins mistakenly
also disabled Device-DAX pins which are not subject to the same
block- map collision concerns.
- Similar to the setup path, softlockup warnings can trigger in the
shutdown path for large persistent memory namespaces. Teach
for_each_device_pfn() to perform cond_resched() in all cases.
- Boaz noticed that the might_sleep() in dax_direct_access() is stale
as of the v4.15 kernel.
These have received a build success notification from the 0day robot,
and the longterm pin fixes have appeared in -next. However, I recently
rebased the tree to remove some other fixes that need to be reworked
after review feedback.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
memremap: fix softlockup reports at teardown
libnvdimm: re-enable deep flush for pmem devices via fsync()
vfio: disable filesystem-dax page pinning
dax: fix vma_is_fsdax() helper
dax: ->direct_access does not sleep anymore
|
||
|
|
949b93250a |
memremap: fix softlockup reports at teardown
The cond_resched() currently in the setup path needs to be duplicated in
the teardown path. Rather than require each instance of
for_each_device_pfn() to open code the same sequence, embed it in the
helper.
Link: https://github.com/intel/ixpdimm_sw/issues/11
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Fixes:
|
||
|
|
fde9fc766e |
signals: Move put_compat_sigset to compat.h to silence hardened usercopy
Since commit |
||
|
|
a5f7b0eeb2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2018-02-28 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Add schedule points and reduce the number of loop iterations the test_bpf kernel module is performing in order to not hog the CPU for too long, from Eric. 2) Fix an out of bounds access in tail calls in the ppc64 BPF JIT compiler, from Daniel. 3) Fix a crash on arm64 on unaligned BPF xadd operations that could be triggered via interpreter and JIT, from Daniel. Please not that once you merge net into net-next at some point, there is a minor merge conflict in test_verifier.c since test cases had been added at the end in both trees. Resolution is trivial: keep all the test cases from both trees. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
7bec4a9646 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek: "Make sure that we wake up userspace loggers. This fixes a race introduced by the console waiter logic during this merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: Wake klogd when passing console_lock owner |
||
|
|
c52232a49e |
timers: Forward timer base before migrating timers
On CPU hotunplug the enqueued timers of the unplugged CPU are migrated to a
live CPU. This happens from the control thread which initiated the unplug.
If the CPU on which the control thread runs came out from a longer idle
period then the base clock of that CPU might be stale because the control
thread runs prior to any event which forwards the clock.
In such a case the timers from the unplugged CPU are queued on the live CPU
based on the stale clock which can cause large delays due to increased
granularity of the outer timer wheels which are far away from base:;clock.
But there is a worse problem than that. The following sequence of events
illustrates it:
- CPU0 timer1 is queued expires = 59969 and base->clk = 59131.
The timer is queued at wheel level 2, with resulting expiry time = 60032
(due to level granularity).
- CPU1 enters idle @60007, with next timer expiry @60020.
- CPU0 is hotplugged at @60009
- CPU1 exits idle and runs the control thread which migrates the
timers from CPU0
timer1 is now queued in level 0 for immediate handling in the next
softirq because the requested expiry time 59969 is before CPU1 base->clk
60007
- CPU1 runs code which forwards the base clock which succeeds because the
next expiring timer. which was collected at idle entry time is still set
to 60020.
So it forwards beyond 60007 and therefore misses to expire the migrated
timer1. That timer gets expired when the wheel wraps around again, which
takes between 63 and 630ms depending on the HZ setting.
Address both problems by invoking forward_timer_base() for the control CPUs
timer base. All other places, which might run into a similar problem
(mod_timer()/add_timer_on()) already invoke forward_timer_base() to avoid
that.
[ tglx: Massaged comment and changelog ]
Fixes:
|
||
|
|
c14376de3a |
printk: Wake klogd when passing console_lock owner
wake_klogd is a local variable in console_unlock(). The information is lost when the console_lock owner using the busy wait added by the commit |
||
|
|
85a2d939c0 |
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Yet another pile of melted spectrum related changes:
- sanitize the array_index_nospec protection mechanism: Remove the
overengineered array_index_nospec_mask_check() magic and allow
const-qualified types as index to avoid temporary storage in a
non-const local variable.
- make the microcode loader more robust by properly propagating error
codes. Provide information about new feature bits after micro code
was updated so administrators can act upon.
- optimizations of the entry ASM code which reduce code footprint and
make the code simpler and faster.
- fix the {pmd,pud}_{set,clear}_flags() implementations to work
properly on paravirt kernels by removing the address translation
operations.
- revert the harmful vmexit_fill_RSB() optimization
- use IBRS around firmware calls
- teach objtool about retpolines and add annotations for indirect
jumps and calls.
- explicitly disable jumplabel patching in __init code and handle
patching failures properly instead of silently ignoring them.
- remove indirect paravirt calls for writing the speculation control
MSR as these calls are obviously proving the same attack vector
which is tried to be mitigated.
- a few small fixes which address build issues with recent compiler
and assembler versions"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
objtool, retpolines: Integrate objtool with retpoline support more closely
x86/entry/64: Simplify ENCODE_FRAME_POINTER
extable: Make init_kernel_text() global
jump_label: Warn on failed jump_label patching attempt
jump_label: Explicitly disable jump labels in __init code
x86/entry/64: Open-code switch_to_thread_stack()
x86/entry/64: Move ASM_CLAC to interrupt_entry()
x86/entry/64: Remove 'interrupt' macro
x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
objtool: Add module specific retpoline rules
objtool: Add retpoline validation
objtool: Use existing global variables for options
x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/paravirt, objtool: Annotate indirect calls
...
|
||
|
|
ba6056a41c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-02-26 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Various improvements for BPF kselftests: i) skip unprivileged tests when kernel.unprivileged_bpf_disabled sysctl knob is set, ii) count the number of skipped tests from unprivileged, iii) when a test case had an unexpected error then print the actual but also the unexpected one for better comparison, from Joe. 2) Add a sample program for collecting CPU state statistics with regards to how long the CPU resides in cstate and pstate levels. Based on cpu_idle and cpu_frequency trace points, from Leo. 3) Various x64 BPF JIT optimizations to further shrink the generated image size in order to make it more icache friendly. When tested on the Cilium generated programs, image size reduced by approx 4-5% in best case mainly due to how LLVM emits unsigned 32 bit constants, from Daniel. 4) Improvements and fixes on the BPF sockmap sample programs: i) fix the sockmap's Makefile to include nlattr.o for libbpf, ii) detach the sock ops programs from the cgroup before exit, from Prashant. 5) Avoid including xdp.h in filter.h by just forward declaring the struct xdp_rxq_info in filter.h, from Jesper. 6) Fix the BPF kselftests Makefile for cgroup_helpers.c by only declaring it a dependency for test_dev_cgroup.c but not every other test case where it is not needed, from Jesper. 7) Adjust rlimit RLIMIT_MEMLOCK for test_tcpbpf_user selftest since the default is insufficient for creating the 'global_map' used in the corresponding BPF program, from Yonghong. 8) Likewise, for the xdp_redirect sample, Tushar ran into the same when invoking xdp_redirect and xdp_monitor at the same time, therefore in order to have the sample generically work bump the limit here, too. Fix from Tushar. 9) Avoid an unnecessary NULL check in BPF_CGROUP_RUN_PROG_INET_SOCK() since sk is always guaranteed to be non-NULL, from Yafang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
c23a757591 |
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes:
- UAPI data type correction for hyperv
- correct the cpu cores field in /proc/cpuinfo on CPU hotplug
- return proper error code in the resctrl file system failure path to
avoid silent subsequent failures
- correct a subtle accounting issue in the new vector allocation code
which went unnoticed for a while and caused suspend/resume
failures"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
x86/topology: Fix function name in documentation
x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
x86/apic/vector: Handle vector release on CPU unplug correctly
genirq/matrix: Handle CPU offlining proper
x86/headers/UAPI: Use __u64 instead of u64 in <uapi/asm/hyperv.h>
|
||
|
|
f74290fdb3 | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net | ||
|
|
9cb9c07d6b |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.
2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.
3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.
4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.
5) Fix potential deadlocks in netfilter getsockopt() code paths, from
Paolo Abeni.
6) Netfilter stackpointer size checks really are needed to validate
user input, from Florian Westphal.
7) Missing timer init in x_tables, from Paolo Abeni.
8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.
9) When an ibmvnic device is brought down then back up again, it can be
sent queue entries from a previous session, handle this properly
instead of crashing. From Thomas Falcon.
10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.
11) When we are dumping filters in cls_api, the output SKB is empty, and
the filter we are dumping is too large for the space in the SKB, we
should return -EMSGSIZE like other netlink dump operations do.
Otherwise userland has no signal that is needs to increase the size
of its read buffer. From Roman Kapl.
12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.
13) Module refcount leak in netlink when a dump start fails, from Jason
Donenfeld.
14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
from Eric Dumazet.
15) Releasing bpf per-cpu arraymaps can take a long time, add a
condtional scheduling point. From Eric Dumazet.
16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
Daniel Borkmann.
17) Fix page leak in gianfar driver, from Andy Spencer.
18) Missed clearing of estimator scratch buffer, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net_sched: gen_estimator: fix broken estimators based on percpu stats
gianfar: simplify FCS handling and fix memory leak
ipv6 sit: work around bogus gcc-8 -Wrestrict warning
macvlan: fix use-after-free in macvlan_common_newlink()
bpf, arm64: fix out of bounds access in tail call
bpf, x64: implement retpoline for tail call
rxrpc: Fix send in rxrpc_send_data_packet()
net: aquantia: Fix error handling in aq_pci_probe()
bpf: fix rcu lockdep warning for lpm_trie map_free callback
bpf: add schedule points in percpu arrays management
regulatory: add NUL to request alpha2
ibmvnic: Fix early release of login buffer
net/smc9194: Remove bogus CONFIG_MAC reference
net: ipv4: Set addr_type in hash_keys for forwarded case
tcp_bbr: better deal with suboptimal GSO
smsc75xx: fix smsc75xx_set_features()
netlink: put module reference if dump start fails
selftests/bpf/test_maps: exit child process without error in ENOMEM case
selftests/bpf: update gitignore with test_libbpf_open
selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
..
|
||
|
|
2eb02aa94f |
Merge branch 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:
- keys fixes via David Howells:
"A collection of fixes for Linux keyrings, mostly thanks to Eric
Biggers:
- Fix some PKCS#7 verification issues.
- Fix handling of unsupported crypto in X.509.
- Fix too-large allocation in big_key"
- Seccomp updates via Kees Cook:
"These are fixes for the get_metadata interface that landed during
-rc1. While the new selftest is strictly not a bug fix, I think
it's in the same spirit of avoiding bugs"
- an IMA build fix from Randy Dunlap
* 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
integrity/security: fix digsig.c build error with header file
KEYS: Use individual pages in big_key for crypto buffers
X.509: fix NULL dereference when restricting key with unsupported_sig
X.509: fix BUG_ON() when hash algorithm is unsupported
PKCS#7: fix direct verification of SignerInfo signature
PKCS#7: fix certificate blacklisting
PKCS#7: fix certificate chain verification
seccomp: add a selftest for get_metadata
ptrace, seccomp: tweak get_metadata behavior slightly
seccomp, ptrace: switch get_metadata types to arch independent
|
||
|
|
ca36960211 |
bpf: allow xadd only on aligned memory
The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:
[ 830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
[...]
[ 830.916161] Internal error: Oops: 96000021 [#1] SMP
[ 830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
[ 830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
[ 830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 831.003793] pc : __ll_sc_atomic_add+0x4/0x18
[ 831.008055] lr : ___bpf_prog_run+0x1198/0x1588
[ 831.012485] sp : ffff00001ccabc20
[ 831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
[ 831.021087] x27: 0000000000000001 x26: 0000000000000000
[ 831.026387] x25: 000000c168d9db98 x24: 0000000000000000
[ 831.031686] x23: ffff000008203878 x22: ffff000009488000
[ 831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
[ 831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
[ 831.047585] x17: 0000000000000000 x16: 0000000000000000
[ 831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
[ 831.058184] x13: 0000000000000000 x12: 0000000000000000
[ 831.063484] x11: 0000000000000001 x10: 0000000000000000
[ 831.068783] x9 : 0000000000000000 x8 : 0000000000000000
[ 831.074083] x7 : 0000000000000000 x6 : 000580d428000000
[ 831.079383] x5 : 0000000000000018 x4 : 0000000000000000
[ 831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
[ 831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
[ 831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
[ 831.102577] Call trace:
[ 831.105012] __ll_sc_atomic_add+0x4/0x18
[ 831.108923] __bpf_prog_run32+0x4c/0x70
[ 831.112748] bpf_test_run+0x78/0xf8
[ 831.116224] bpf_prog_test_run_xdp+0xb4/0x120
[ 831.120567] SyS_bpf+0x77c/0x1110
[ 831.123873] el0_svc_naked+0x30/0x34
[ 831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)
Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.
xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.
Fixes:
|
||
|
|
8961ca441b |
Merge tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie: "A bunch of fixes for rc3: Exynos: - fixes for using monotonic timestamps - register definitions - removal of unused file ipu-v3L - minor changes - make some register arrays const+static - fix some leaks meson: - fix for vsync atomic: - fix for memory leak EDID parser: - add quirks for some more non-desktop devices - 6-bit panel fix. drm_mm: - fix a bug in the core drm mm hole handling cirrus: - fix lut loading regression Lastly there is a deadlock fix around runtime suspend for secondary GPUs. There was a deadlock between one thread trying to wait for a workqueue job to finish in the runtime suspend path, and the workqueue job it was waiting for in turn waiting for a runtime_get_sync to return. The fixes avoids it by not doing the runtime sync in the workqueue as then we always wait for all those tasks to complete before we runtime suspend" * tag 'drm-fixes-for-v4.16-rc3' of git://people.freedesktop.org/~airlied/linux: (25 commits) drm/tve200: fix kernel-doc documentation comment include drm/edid: quirk Sony PlayStation VR headset as non-desktop drm/edid: quirk Windows Mixed Reality headsets as non-desktop drm/edid: quirk Oculus Rift headsets as non-desktop drm/meson: fix vsync buffer update drm: Handle unexpected holes in color-eviction drm: exynos: Use proper macro definition for HDMI_I2S_PIN_SEL_1 drm/exynos: remove exynos_drm_rotator.h drm/exynos: g2d: Delete an error message for a failed memory allocation in two functions drm/exynos: fix comparison to bitshift when dealing with a mask drm/exynos: g2d: use monotonic timestamps drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA gpu: ipu-csi: add 10/12-bit grayscale support to mbus_code_to_bus_cfg gpu: ipu-cpmem: add 16-bit grayscale support to ipu_cpmem_set_image gpu: ipu-v3: prg: fix device node leak in ipu_prg_lookup_by_phandle gpu: ipu-v3: pre: fix device node leak in ipu_pre_lookup_by_phandle drm/amdgpu: Fix deadlock on runtime suspend drm/radeon: Fix deadlock on runtime suspend drm/nouveau: Fix deadlock on runtime suspend drm: Allow determining if current task is output poll worker ... |
||
|
|
651ca2c004 |
genirq/matrix: Handle CPU offlining proper
At CPU hotunplug the corresponding per cpu matrix allocator is shut down and
the allocated interrupt bits are discarded under the assumption that all
allocated bits have been either migrated away or shut down through the
managed interrupts mechanism.
This is not true because interrupts which are not started up might have a
vector allocated on the outgoing CPU. When the interrupt is started up
later or completely shutdown and freed then the allocated vector is handed
back, triggering warnings or causing accounting issues which result in
suspend failures and other issues.
Change the CPU hotplug mechanism of the matrix allocator so that the
remaining allocations at unplug time are preserved and global accounting at
hotplug is correctly readjusted to take the dormant vectors into account.
Fixes:
|
||
|
|
6c5f61023c |
bpf: fix rcu lockdep warning for lpm_trie map_free callback
Commit |
||
|
|
32fff239de |
bpf: add schedule points in percpu arrays management
syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()
It takes time to allocate a huge percpu map, but even more time to free
it.
Since we run in process context, use cond_resched() to yield cpu if
needed.
Fixes:
|