vhost kernel driver does not support mutiple queue yet,
Tweak queue number will fail with "--net mode=tap,vhost=1,mq=2"
as below when lkvm trying to set ring kick fd for queue 2:
VHOST_SET_VRING_KICK failed: No buffer space available
Error on this scenario, and overide with the default one queue
configuration.
Signed-off-by: Fan Du <fan.du@intel.com>
To detach tap device automatically from bridge when exiting,
just like what the reverse of "script" does.
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Due to our kernel heritage we have code in kvmtool that relies on
the (still) implicit -std=gnu89 compiler switch.
It turns out that this just affects some structure initialization,
where we currently provide a cast to the type, which upsets GCC for
anything beyond -std=gnu89 (for instance gnu99 or gnu11).
We do need the casts when initializing structures that are not
assigned to the same type, so we put it there explicitly.
This allows us to compile with all the three GNU standards GCC
currently supports: gnu89/90, gnu99 and gnu11.
GCC threatens people with moving to gnu11 as the new default standard,
so lets fix this better sooner than later.
(Compiling without GNU extensions still breaks and I don't bother to
fix that without very good reasons.)
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
On registering the ioeventfds for the virtio-pci device we cover both
the I/O ports and the MMIO BAR.
But as the current code advertises both as PIO, the host kernel gets
the wrong bus number for the MMIO region.
Fix the issue by marking only the actual PIO area as PIO.
This fixes vhost-net on x86.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In PCI config space there is an interrupt line field (offset 0x3f),
which is used to initially communicate the IRQ line number from
firmware to the OS. _Hardware_ should never use this information,
as the OS is free to write any information in there.
But kvmtool uses this number when it triggers IRQs in the guest,
which fails starting with Linux 3.19-rc1, where the PCI layer starts
writing the virtual IRQ number in there.
Fix that by storing the IRQ number in a separate field in
struct virtio_pci, which is independent from the PCI config space
and cannot be influenced by the guest.
This fixes ARM/ARM64 guests using PCI with newer kernels.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
VIRTIO_PCI_QUEUE_NOTIFY is 16-bit and iowrite16 is used in
drivers/virtio/virtio_pci.c to notify the other side.
If the size doesn't match notification via mmio write will fail.
Signed-off-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Otherwise vhost does not work if a virtio descriptor is used that was
allocated from a guest memory bank not registered as
vhost_memory_region.
Signed-off-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit d2a7ddff4 (Add minimal support for macvtap) opening
of tap device might fail. lkvm shows
Warning: Config tap device error. Are you root?
virtio_net_request_tap passed wrong pointer for struct ifreq to
TUNSETIFF ioctl.
Signed-off-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
dpkg in the guest fails when it tries to use fsync() on a directory:
openat(AT_FDCWD, "/var/lib/dpkg", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 4
fsync(4) = -1 EINVAL (Invalid argument)
stracing lkvm shows that this is converted to:
openat(AT_FDCWD, "/root/rootfs-32//var/lib/dpkg", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 368
fsync(0) = -1 EINVAL (Invalid argument)
In other words, we sync against the wrong file descriptor. This case
is not handled in the kvmtool code, let's add support for it.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The terminal handling thread and the virtio-net-ctrl don't
set their name, which ends up as follows up:
terminal => lkvm
virtio-net-ctrl => kvm-cpu-X !!
Set the thread name explicitly to term-poll and virtio-net-ctrl
respectively
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
include/kvm/types.h seemed to be in use once, but it does no longer
contain any useful definition. Remove it.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The 9p kernel code uses separate types for uid_t and gid_t. To avoid
changing too much code needlessly, copy over the kernel definitions
from uidgid.h into 9p.h.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In order to be useable by kvmtool, a macvtap interface requires
some minimal configuration (basically setting up the offload bits).
This requires skipping some of the low level TUN/TAP setup.
To avoid adding yet another option, we extend the 'tapif' option
to detect the use of a file (such as /dev/tap23).
Assuming you've run the following as root:
# ip link add link eth0 name kvmtap0 type macvtap mode bridge
# chgrp kvm /dev/tap$(< /sys/class/net/kvmtap0/ifindex)
# chmod g+rw /dev/tap$(< /sys/class/net/kvmtap0/ifindex)
it is fairly easy to have a script that does the following:
#!/bin/sh
addr=$(< /sys/class/net/kvmtap0/address)
tap=/dev/tap$(< /sys/class/net/kvmtap0/ifindex)
kvmtool/lkvm run --console virtio \
-k /boot/zImage \
-p "console=hvc0 earlyprintk" \
-n trans=mmio,mode=tap,tapif=$tap,guest_mac=$addr
and you now have your VM running, directly attached to the network.
This patch also removes the TUNSETNOCSUM ioctl that has declared
obsolete for quite some time now...
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If an open at the 9p server(host) fails with EMFILE (Too many open files for
the process), we should return ENFILE(too many open files in the system) to
the guest to indicate the actual status within the guest.
This was uncovered during LTP, where getdtablesize01 fails to open the maximum
number-open-files.
getdtablesize01 0 TINFO : Maximum number of files a process can have opened is 1024
getdtablesize01 0 TINFO : Checking with the value returned by getrlimit...RLIMIT_NOFILE
getdtablesize01 1 TPASS : got correct dtablesize, value is 1024
getdtablesize01 0 TINFO : Checking Max num of files that can be opened by a process.Should be: RLIMIT_NOFILE - 1
getdtablesize01 2 TFAIL : getdtablesize01.c:102: 974 != 1023
For a more practial impact:
# ./getdtablesize01 &
[1] 1834
getdtablesize01 0 TINFO : Maximum number of files a process can have opened is 1024
getdtablesize01 0 TINFO : Checking with the value returned by getrlimit...RLIMIT_NOFILE
getdtablesize01 1 TPASS : got correct dtablesize, value is 1024
getdtablesize01 0 TINFO : Checking Max num of files that can be opened by a process.Should be: RLIMIT_NOFILE - 1
getdtablesize01 2 TFAIL : getdtablesize01.c:102: 974 != 1023
[--- Modified to sleep indefinitely, without closing the files --- ]
# ls
bash: /bin/ls: Too many open files
That gives a wrong error message for the bash, when getdtablesize01 has exhausted the system
wide limits, giving false indicators.
With the fix, we get :
# ls
bash: /bin/ls: Too many open files in system
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently we describe every interrupt for each device in the FDT
as being edge triggered.
Add a parameter to the irq property generation to allow devices to
specify their interrupts as level triggered if needed.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
lkvm by default sets up a virtio-pci transport for network, if none is
specified. This can be a problem on archs (e.g ARM64), where virtio-pci is
not supported yet and cause the following warning at exit.
# KVM compatibility warning.
virtio-net device was not detected.
This patch changes it to make use of the default transport method for the
architecture when none is specified. This will ensure that on every arch
we get the network up by default in the VM.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The recent introduction of bi-endianness on arm/arm64 had the
odd effect of breaking virtio-pci support on these platforms, as the
device endian field defaults to being VIRTIO_ENDIAN_HOST, which
is the wrong thing to have on a bi-endian capable architecture.
The fix is to check for the endianness on the ioport path the
same way we do it for mmio, which implies passing the vcpu all
the way down. Patch is a bit ugly, but aligns MMIO and ioport nicely.
Tested on arm64 and x86.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.
Extra care is taken for the handling of the virtio_net_hdr structures
on both the TX and RX ends.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Configure the queues to follow the guest endianness, and make sure
the configuration space is doing the same.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Save the CPU endianness when the device is reset. It is widely
assumed that the guest won't change its endianness after, or at
least not without reseting the device first.
A default implementation of the endianness sampling just returns
the default "host endianness" value so that unsuspecting architectures
are not affected.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Define a simple infrastructure to configure a virt_queue
depending on the guest endianness, as reported by the feature
flags. At this stage, the endianness is always the host's.
Wrap all accesses to virt_queue data structures shared between
host and guest with byte swapping helpers.
Should the architecture only support one endianness, these helpers
are reduced to the identity function.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In order to be able to find out about the endianness of a virtual
CPU, it is necessary to pass a pointer to the kvm_cpu structure
down to the MMIO accessors.
This patch just pushes such pointer as far as required for the
MMIO accessors to have a play with the vcpu.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In order to overcome the fact that a TAP interface can only be created
by root, allow the use of an interface that has already been created,
configured, made persistent and owned by a specific user/group (such
as done with tunctl).
In this case, any kind of configuration can be skipped (IP, up and
running mode), and the TAP is assumed to be ready for use.
This is done by introducing the "tapif" option, as used here:
--network trans=mmio,mode=tap,tapif=blah
where "blah" is a TAP interface.
This allow the creation/configuration of the interface to be controlled
by root, and lkvm to be run as a normal user.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Upstream commit "9p: Modify the stat structures to use kuid_t and kgid_t"
has modified the type of uid and gid in the stat structure, which breaks
build for us.
This is a rather trivial conversion from u32 to kuid_t and kgid_t.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
A recent -next patch named "PCI: Ignore BAR contents when
firmware left decoding disabled" has pointed out that PCI
cards are supposed to declare that they have either PIO or
MMIO BARs by disabling them if it didn't.
Fix it by correctly marking our emulated PCI card as PIO/MMIO
enabled.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch changes VIRTIO_DEFAULT_TRANS to take a struct kvm parameter,
allowing architectures to choose the default transport dynamically.
For ARM, this is driven by an arch-specific cmdline option.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
For the MMIO and PCI buses, drivers typically allocate an IRQ line for
their device before registering the device with the device tree for the
relevant bus.
This patch moves the IRQ allocation into the bus code, which is then
called directly by the device tree when a new device is registered.
IOPORT devices, however, tend to use hardcoded IRQs for legacy reasons,
so they are still required to deal with their interrupts (which also
require remapping for non-x86 architectures).
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Since irq__register_device no longer registers a device with anything,
rename it to irq__alloc_line, which better describes what is actually
going on.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
With the removal of the x86 irq rbtree, the only parameter used by
irq__register_device is actually used to return the new line.
This patch removes all of the parameters from irq__register_device and
returns the allocated line directly.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In preparation for moving the irq allocation into generic code, remove
the pin parameter from irq__register_device and temporarily place the
onus on the emulation driver to allocate the pin (which is always 1 and
only used on PCI anyway).
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds an MMIO interface for each virtio-pci device, so that
they can be accessed without having to use an ioport. For each device, a
new memory BAR is added which corresponds to an area of MMIO space with
a shim trap handler. This handler simply translates the access into an
ioport access via kvm__emulate_io. Since guests can generate accesses
via either the ioport or MMIO regions, an ioeventfd is registered for
both.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
When attempting to initialise a mixture of pci and mmio virtio devices,
we cannot share an ops structure, otherwise the transport-specific
fields (init/exit and signal handling) will be globally set to the
transport of the last registered device.
This patch dynamically allocates a new ops structure for each instance
of a virtio net device.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Currently, if a ->tx or ->rx callback into the net_dev_operations
encounters an error, it returns -1 to the virtio-net code, which in turn
treats this as an unsigned (size_t) size describing the data available.
The resulting memcpy operation then quickly explodes with a SEGV.
This patch detects the error code from the low-level callbacks and
exits the thread dealing with the erroneous queue.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Despite not being used anymore, there are still traces of BAR 3 in both
the code and comments for the virtio pci msix implementation.
This patch removes the redundant code and fixes up the comments to match
what we're actually doing.
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Despite allocating and providing code to handle accesses to the pba
structure, virtio msix blocks in fact only register the msix table with
kvm.
This patch fixes the MMIO region so that it includes the pba structure
for msix io blocks of virtio pci devices. The corresponding BAR is also
updated to advertise the full size of the io block.
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
After features negotiation, kvmtool should tell vhost-net that he's
using mergeable rx buffers.
Signed-off-by: Ying-Shiuan Pan <yspan@itri.org.tw>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Enabling vhost-net encounted an error:
Fatal: VHOST_NET_SET_BACKEND failed 88
The reason is that vhost-net requires tap_fd for VHOST_NET_SET_BACKEND,
however tap_fd is opened after VIRTIO_CONFIG_S_DRIVER_OK. Because the
initialization needs to know the guest features, I suppose the initialization
could be moved to set_guest_features(). Therefore, initialization can be
finished before status VIRTIO_CONFIG_S_DRIVER_OK, and tap_fd can be set
before vhost-net sets backend.
Signed-off-by: Ying-Shiuan Pan <yspan@itri.org.tw>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Milan Kocian writes:
I found the crash in virtio-net-rx thread (I can reproduce it every
time by 'aptitude update' in VM):
traps: virtio-net-rx[28933] general protection ip:7f00dda3d107 sp:7f00c58f4de8 error:0 in libc-2.17.so[7f00dd90f000+1a2000]
gdb backtrace:
(gdb) bt
#0 0x00007fb6a548e107 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000000000041259c in memcpy_toiovecend (iov=0x7fb68d346ea0, iov@entry=0x7fb68d345e90,
kdata=<optimized out>, kdata@entry=0x7fb68d346e90 "", offset=<optimized out>, len=<optimized out>)
at util/iovec.c:70
#2 0x000000000040c66d in virtio_net_rx_thread (p=0x23688a0) at virtio/net.c:117
#3 0x00007fb6a5b2ee0e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007fb6a54489ed in clone () from /lib/x86_64-linux-gnu/libc.so.6
I tried to add some printf to diagnose it but it isn't clear to me:
virtio_net_rx_thread: before memcpy_toiovecend; copied: 0, len: 18890, iovsize: 4096, realiovsize: 4096
memcpy_toiovecend: offset: 0, len: 4096
memcpy_toiovecend: iov_len: 4096, len: 4096
virtio_net_rx_thread: before memcpy_toiovecend; copied: 4096, len: 18890, iovsize: 4096, realiovsize: 4096
memcpy_toiovecend: offset: 4096, len: 4096
memcpy_toiovecend: iov_len: 4096, len: 4096
memcpy_toiovecend: iov_len: 0, len: 4096
memcpy_toiovecend: iov_len: 0, len: 4096
.
N x memcpy_toiovecend: iov_len: 0, len: 4096
.
memcpy_toiovecend: iov_len: 0, len: 4096
memcpy_toiovecend: iov_len: 0, len: 4096
memcpy_toiovecend: iov_len: 1519143547641528320, len: 4096
memcpy_toiovecend: iov_len: 193827583623176, len: 4096
./runlkvm.sh: line 2: 16090 Segmentation fault
IMHO problem come when received len size is bigger than maximum of the
dst iovec (realiovsize). Only iovec size is copied and in the next run
isn't place to copy the rest of len size.
Asias He writes:
We should skip copied bytes from the buffer not from the iov itself
which memcpy_toiovecend does.
Reported-and-tested-by: Milan Kocian <milon@wq.cz>
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@iki.fi>
The asynchronous nature of the virtio input handling (using a job queue)
can result in unnecessary jobs being created if there is some delay in
handing input (the original function to handle the input returns immediately
without the file having been read, and hence poll returns immediately
informing us of data to read).
This patch adds synchronisation to the threads so that we don't start
polling input files again until we've read from the console.
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
A recent fix to virtio MMIO (72a7541ce305 ["kvm tools: virtio-mmio:
init_ioeventfd should use MMIO for ioeventfd__add_event()"]) highlighted
the confusing parameters expected by ioeventfd__add_event.
As per Pekka's suggestion, replace the bool parameters to this function
with a single `flags' argument instead.
Cc: Ying-Shiuan Pan <yingshiuan.pan@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch fixes a bug that vtirtio_mmio_init_ioeventfd() passed a wrong
value when it invoked ioeventfd__add_event(). True value of 2nd parameter
indicates the eventfd uses PIO bus which is used by virito-pci, however,
for virtio-mmio, the value should be false.
Signed-off-by: Ying-Shiuan Pan <yspan@itri.org.tw>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
When fa7226f (kvm tools: init network devices only when the virtio
driver is ready to go) was introduced, a tiny detail was overlooked:
- Initialization of the uip layer is now coming in very late (only
when the guest driver says it is ready).
- In parallel, the rx thread is created quite early (as soon as the
queues are allocated).
This cause the rx thread to call uip_rx, which calls uip_buf_get_used,
which starts to use buf_lock mutex/the buf_used_cond, which haven't
been initialized yet. Tears and devastation follow, not to mention a
certain lack of network connectivity for the unsuspecting guest.
The (not so pretty) fix is to split uip_init:
- uip_static_init: initialize the lists, mutexes and conditions,
called from virtio_net__init_one.
- uip_init: perform the dynamic memory allocations, called from
notify_status.
This allows the network to be safely initialized.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>