The vcpu module is a core component which should be removed last, but the
destructor was mistakenly marked as something that should be done first.
This would cause the vcpu data to be freed up before anything else had the
chance to exit, and assuming that that data was still valid - causing use
after frees.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When VCPU #0 exits (e.g. due to KVM_EXIT_SYSTEM_EVENT), it sends
SIGKVMEXIT to all other VCPUs, waits for them to exit, then tears down
any remaining context. The signalling of SIGKVMEXIT is critical to
forcing VCPUs to shut down in response to a system event (e.g. PSCI
SYSTEM_OFF).
VCPUs other that VCPU #0 simply exit in kvm_cpu_thread without forcing
other CPUs to shut down. Thus if a system event is taken on a VCPU other
than VCPU #0, the remaining CPUs are left online. This results in KVM
tool not exiting as expected when a system event is taken on a VCPU
other than VCPU #0 (as may happen if the guest panics).
Fix this by tearing down all CPUs upon a system event, regardless of the
CPU on which the event occurred. While this means the VCPU thread will
signal itself, and VCPU #0 will signal all other VCPU threads a second
time, these are harmless.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The KVM_EXIT_SYSTEM_EVENT exit reason was added to define
architecture independent system-wide events for a Guest.
Currently, it is used by in-kernel PSCI-0.2 emulation of
KVM ARM/ARM64 to inform user space about PSCI SYSTEM_OFF
or PSCI SYSTEM_RESET request.
For now, we simply treat all system-wide guest events as
shutdown request in KVMTOOL.
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
[will: removed useless prints]
Signed-off-by: Will Deacon <will.deacon@arm.com>
The recent introduction of bi-endianness on arm/arm64 had the
odd effect of breaking virtio-pci support on these platforms, as the
device endian field defaults to being VIRTIO_ENDIAN_HOST, which
is the wrong thing to have on a bi-endian capable architecture.
The fix is to check for the endianness on the ioport path the
same way we do it for mmio, which implies passing the vcpu all
the way down. Patch is a bit ugly, but aligns MMIO and ioport nicely.
Tested on arm64 and x86.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Save the CPU endianness when the device is reset. It is widely
assumed that the guest won't change its endianness after, or at
least not without reseting the device first.
A default implementation of the endianness sampling just returns
the default "host endianness" value so that unsuspecting architectures
are not affected.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In order to be able to find out about the endianness of a virtual
CPU, it is necessary to pass a pointer to the kvm_cpu structure
down to the MMIO accessors.
This patch just pushes such pointer as far as required for the
MMIO accessors to have a play with the vcpu.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Switch to using init/exit calls instead of the repeating call blocks in builtin-run.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
There's no reason the array of guest specific vcpus is global. Move it into
struct kvm.
Also split up arch specific vcpu init from the generic code and call it from
the kvm_cpu initializer.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Remove some redundant members between struct kvm_config and struct kvm
since options are now contained within struct kvm.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
If the guest is stopped, there is no need to run it.
This patch fixes this when running 'lkvm stop'.
KVM_RUN failed: Bad address
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Currently, 'lkvm stop' can not stop a pasued guest becasue
guest is blocked on the pause_lock.
This patch fixes it by un-pausing the guest before stops it.
The pthread_kill() call is not needed.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The KVM_NR_CPUS define is only really used to statically size the global
kvm_cpus array, which can just as easily be allocated on startup. There is
some checking of the -c <nr cpus> value given against NR_CPUs but this is
later again checked against a dynamically-determined limit from
KVM_CAP_MAX_VCPUS anyway. The hardwired limit is arbitrary and not strictly
necessary.
This patch removes the #define, replacing the statically-sized array with
a malloc; the array is kvm->nrcpus+1 in size so that any iterator can halt
at the end (this is done in kvm_cpu__reboot, which doesn't have access to
a struct kvm* and therefore kvm->nrcpus).
An unused #define in x86/mptable.c is also removed.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
If we took a MMIO exit, the data in the coalesced ring should be processes
before the data in the exit itself is processed.
Doing it wrong (like we did so far) will cause ordering issues.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Different architectures will deal with MMIO exits differently. For example,
KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesised by steering
into windows in PCI bridges on other architectures.
This patch calls arch-specific kvm_cpu__emulate_io() and kvm_cpu__emulate_mmio()
from the main runloop's IO and MMIO exit handlers. For x86, these directly
call kvm__emulate_io() and kvm__emulate_mmio() but other architectures will
perform some address munging before passing on the call.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This allows triggering NMI on guests using 'kvm debug -m [cpu]'.
Please note that the default behaviour of 'kvm debug' dumping guest's cpu
state has been modified to require a '-d'/--dump.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm_cpu__run() currently die()s if KVM_RUN returns non-zero. Some architectures
may return positive values in non-error cases, whereas real errors are always
negative return values. Check for those instead.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in
which arch-specific exit reasons can be handled outside of the common runloop.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Create a new arch-specific subdirectory to contain architecture-specific code
and includes.
The Makefile now adds various arch-specific objects based on detected
architecture. That aside, this patch should only contain code moves. These
include:
- x86-specific kvm_cpu setup, kernel loading, memory setup etc. now in
x86/kvm{-cpu}.c
- BIOS now lives in x86/bios/
- ioport setup
- KVM extensions are asserted in arch-specific kvm.c now, so each architecture
can manage its own dependencies.
- Various architecture-specific #defines are moved into $(ARCH)/include/kvm{-cpu}.h
such as struct kvm_cpu, KVM_NR_CPUS, KVM_32BIT_GAP_SIZE.
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch makes debug output go to a 'debug_fd' instead of stdout.
Doing so allows us to send the output to a different console when
required.
This patch also changes the behaviour of 'kvm debug' to show the debug
output in the console that executed the debug command instead of in the
console of the guest.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Recent kernels check for MSR_IA32_MISC_ENABLE_FAST_STRING in the
MSR_IA32_MISC_ENABLE MSR before enabling reps/movs memcpy.
So far we didn't set it, and got a slower memcpy and a warning:
[ 0.000000] Disabled fast string operations
This patch enables fast string operations.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Instead of exiting directly when a user enters 'ctrl x + a', go through
the regular termination path by stopping all VCPUs and letting the
main thread handle it.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
"K. Watts" writes:
When the singlestep is enabled the ioctl to sent out when the kvm_cpu
is initialized (kvm-cpu.c in the for loop that gets each vcpu built).
When the ioct goes out the CPU is sitting at the initialization vector
of 0xf000:0xfff0 on CPU #0 and 0x000000 on the other SMP CPUs. The
new host kernel code that handles setting the TF was changed in 2.6.32
and again at 2.6.38. 2.6.32 seems just flat broken, but 2.6.38 checks
that the linear address of the RIP matches what it was when the
KVM_GUESTDBG_SINGLESTEP flag was set. Because the kvm-tool doesn't
start the CPU at the initialization vector (0xfff0) (0x7c00 for the
MBR and where ever you guys map the linux kernel to) they don't match
and the host kernel won't set the trap flag.
Basically the debug and singlestep ioclts need to happen after the CPU
has been initialized. I moved kvm_cpu__enable_singlestep to happen in
kvm_cpu__reset_vcpu after the registers are set (EIP points to boot
address) and the TRAP flags get set and all is good with the world.
Singlestepping is disabled when the guest issues a CLI because the
Linux host doesn't support features in new Intel and AMD CPUs. We can
sort of "shadow" the interrupt mask and still get the CPU to trap out
at ever instruction even when the guest has disabled interrupts on the
CPU. I have to get the bios disk handler working completely first,
but that may be my next task so that we can trace all the CPU
instructions. My current hack is to just re-enable the trap flag
every time a VMEXIT occurs. I get enough instructions to get me by.
This patch fixes the problem by moving the kvm_cpu__enable_singlestep() into
kvm_cpu__start().
Suggested-by: K. Watts <traetox@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Implement the keyboard reset method which allows guest kernel
to reboot the guest using the keyboard controller.
This will allow guest kernel to reboot the guest when it needs to,
for example - kernel panic (when passing "panic=1" as kernel parameter).
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Coalescing MMIO allows us to avoid an exit every time we have a
MMIO write, instead - MMIO writes are coalesced in a ring which
can be flushed once an exit for a different reason is needed.
A MMIO exit is also trigged once the ring is full.
Coalesce all MMIO regions registered in the MMIO mapper.
Add a coalescing handler under kvm_cpu.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Allow pausing and unpausing guests running on the host.
Pausing a guest means that none of the VCPU threads are running
KVM_RUN until they are unpaused.
The following API functions are added:
void kvm__pause(void);
void kvm__continue(void);
void kvm__notify_paused(void);
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Currently the VCPU loop would exit when the thread received any signal.
Change behaviour to exit only when SIGKVMEXIT is received. This change
prevents from the guest to terminate when unrelated signals are processed
by the thread (for example, when attaching a debugger).
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
When shutting down SMP guests only VCPU #0 will receive
a KVM_EXIT_SHUTDOWN. Waiting for all VCPU threads to exit
causes them to hang.
Instead, notify all VCPU threads once VCPU #0 thread is terminated
so they could also stop properly.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
To make debugging easier, look up symbol from guest kernel image based on RIP
when user does 'kill -3' to the hypervisor.
Example output looks as follows:
Code:
-----
rip: [<ffffffff812cb3a0>] delay_loop+30 (/home/penberg/linux/arch/x86/lib/delay.c:32)
Cc: Asias He <asias.hejun@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch enables SMP support:
[ 0.155072] Brought up 3 CPUs
[ 0.155074] Total of 3 processors activated (15158.58 BogoMIPS).
virtio-console was being loaded no matter the cmdline options
and it was causing some hangs (have to look into that).
I'll send this patch to a larger audience once someone
else can confirm it actually works/doesn't work.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch makes the core hypervisor to use per-VCPU threads for execution.
NOTE: We only start one thread right now because we're unable to let the guest
kernel know about more cores at the moment.
Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Tested-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Pekka Enberg <penberg@kernel.org>