The manpage of poll(2) states that the prototype of poll is defined
in <poll.h>. Use that header file instead of <sys/poll.h> to allow
compilation against musl-libc.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some functions in qcow.c return u64, but are checked against < 0
because they want to check for the -1 error return value.
Do an explicit comparison against the casted -1 to express this
properly.
This was silently compiled out by gcc, but clang complained about it.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Due to our kernel heritage we have code in kvmtool that relies on
the (still) implicit -std=gnu89 compiler switch.
It turns out that this just affects some structure initialization,
where we currently provide a cast to the type, which upsets GCC for
anything beyond -std=gnu89 (for instance gnu99 or gnu11).
We do need the casts when initializing structures that are not
assigned to the same type, so we put it there explicitly.
This allows us to compile with all the three GNU standards GCC
currently supports: gnu89/90, gnu99 and gnu11.
GCC threatens people with moving to gnu11 as the new default standard,
so lets fix this better sooner than later.
(Compiling without GNU extensions still breaks and I don't bother to
fix that without very good reasons.)
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This should fix following warnings
builtin-stat.c:93:3: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type '__u64' [-Wformat]
builtin-run.c:188:4: warning: format '%Lu' expects argument of type 'long long unsigned int', but argument 3 has type '__u64' [-Wformat]
builtin-run.c:554:3: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'u64' [-Wformat]
builtin-run.c:554:3: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 3 has type 'u64' [-Wformat]
builtin-run.c:645:3: warning: format '%Lu' expects argument of type 'long long unsigned int', but argument 4 has type 'u64' [-Wformat]
disk/core.c:330:4: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 4 has type '__dev_t' [-Wformat]
disk/core.c:330:4: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 5 has type '__dev_t' [-Wformat]
disk/core.c:330:4: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 6 has type '__ino64_t' [-Wformat]
mmio.c:134:5: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'u64' [-Wformat]
util/util.c:101:7: warning: format '%lld' expects argument of type 'long long int', but argument 3 has type 'u64' [-Wformat]
util/util.c:113:7: warning: format '%lld' expects argument of type 'long long int', but argument 2 has type 'u64' [-Wformat]
hw/pci-shmem.c:339:3: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'u64' [-Wformat]
hw/pci-shmem.c:340:3: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'u64' [-Wformat]
as observed when compiling on mips64.
Signed-off-by: Andreas Herrmann <andreas.herrmann@caviumnetworks.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Give threads a meaningful name. This makes debugging much easier, and
everything else much prettier.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ penberg@kernel.org: specify vcpu names ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Switch to using init/exit calls instead of the repeating call blocks in builtin-run.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Move io debug delay into kvm_config, the parser out of builtin-run into the disk code
and make the init/exit functions match the rest of the code in style.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch brings virito-scsi support to kvm tool.
With the introduce of tcm_vhost (vhost-scsi)
tcm_vhost: Initial merge for vhost level target fabric driver
we can implement virito-scsi by simply having vhost-scsi to handle the
SCSI command.
Howto use:
1) Setup the tcm_vhost target through /sys/kernel/config
[Stefan Hajnoczi, Thanks for the script to setup tcm_vhost]
** Setup wwpn and tpgt
$ wwpn="naa.0"
$ tpgt=/sys/kernel/config/target/vhost/$wwpn/tpgt_0
$ nexus=$tpgt/nexus
$ mkdir -p $tpgt
$ echo -n $wwpn > $nexus
** Setup lun using /dev/ram
$ n=0
$ lun=$tpgt/lun/lun_${n}
$ data=/sys/kernel/config/target/core/iblock_0/data_${n}
$ ram=/dev/ram${n}
$ mkdir -p $lun
$ mkdir -p $data
$ echo -n udev_path=${ram} > $data/control
$ echo -n 1 > $data/enable
$ ln -s $data $lun
2) Run kvm tool with the new disk option '-d scsi:$wwpn:$tpgt', e.g
$ lkvm run -k /boot/bzImage -d ~/img/sid.img -d scsi:naa.0:0
Signed-off-by: Asias He <asias.hejun@gmail.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
With Direct I/O, file reads and writes go directly from the applications
to the storage device, bypassing the operating system read and write
caches. This is useful for applications that manage their own caches.
Open a disk image with O_DIRECT:
$ lkvm run -d ~/img/test.img,direct
The original readonly flag is still supported.
Open a disk image with O_DIRECT and readonly:
$ lkvm run -d ~/img/test.img,direct,ro
Signed-off-by: Asias He <asias.hejun@gmail.com>
Acked-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Introduce struct disk_image_params to contain all the disk image parameters.
This is useful for adding more disk image parameters, e.g. disk image
cache mode.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The queue size for virtio_blk is 256 and AIO_MAX is 32, we might be
short of available aio events if guest issues > 32 requests
simultaneously. Following error is observed when guest running stressed
I/O workload.
Info: disk_image__read error: total=-11
To fix this, let's increase the aio events limit.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
We read and write in sectors by default. It makes little sense to add
the extra _sector string for read and write ops/function name.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Previously, we use mmaped host root partition as guest's root
filesystem. As virtio-9p based root filesystem is supported,
mmaped host root partition approach is not used anymore.
It is useful to use raw block device as guest's disk backend for some
user. e.g. bypass host's fs layer.
This patch makes raw block device work as disk image, user can do
read/write on raw block device, by using DISK_IMAGE_REGULAR instead of
DISK_IMAGE_MMAP for block device
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The kernel already has pr_err helper lets do the same.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch enables allocating new refcount blocks and so then kvm tools
could expand qcow2 image much larger.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
It is supposed to have no write ops in ro_ops_nowrite disk operation.
However, there is one. Let's remove it.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
ro_ops is never used after the assignment, so no need to do the
assignment.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch reduces the number of calls to io_getevents() by getting
multiple io events at a time instead of one in disk image thread.
Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
When meeting request to write the cluster without copied flag,
allocate a new cluster and write original data with modification
to the new cluster. This also adds support for the writing operation
of the qcow2 compressed image. After testing, image file can pass
through "qemu-img check". The performance is needed to be improved.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
For some reason some of the defines were set to HAS_VIRTIO instead of HAS_AIO.
This broke raw blk device.
Reported-and-tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
QCOW disk image async flag was erroneously enabled, while QCOW doesn't support
async ops yet.
This has caused a hang when booting QCOW images.
Reported-and-tested-by: Richard -rw- Weinberger <richard.weinberger@gmail.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch hooks AIO support into virtio-blk, allowing for faster IO.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ penberg@kernel.org: wrap libaio include with CONFIG_HAS_AIO ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds an optional callback to be called when a disk op completes.
Currently theres not much use for it, but it is the infrastructure for adding
aio support.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In case a read or write op ptr is missing simply ignore it instead of
critically failing. This provides an easier way to prevent read or write
in specific scenarios.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch modifies the definition and usage of ops for read only, mmap and
regular IO.
There is no longer a mix between iov and mmap, and read only no longer implies
mmap (although it will try to use it first).
This allows for more flexibility defining different ops for different
scenarios.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds the decompression operation when confirming the qcow or
qcow2 image is compressed. This patch also divides the read cluster
fucntion into two which are respective for qcow and qcow2 in order to be
convenient to support these two kind images. Add some macros for qcow.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[ penberg@kernel.org: make zlib optional ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds support for writing to zero refcount clusters. Refcount blocks
are cached in like L2 tables and flushed upon VIRTIO_BLK_T_FLUSH and when
evicted from the LRU cache.
With this patch applied, 'qemu-img check' no longer complains about referenced
clusters with zero reference count after
dd if=/dev/zero of=/mnt/tmp
where '/mnt' is freshly generated QCOW2 image.
Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The QCOW write support isn't stable enough for wide-spread use so force
read-only mode for QCOW images.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In preparation for refcount block caching, rename L2 table lookup functions to
use less generic names.
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
In preparation for refcount block cache, move L2 cache data structures to
'struct qcow_l1_table'.
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch unifies qcow_read_cluster() and qcow_write_cluster() L1 and L2 table
variable names to make the code more readable.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch renames the ambiguous 'struct qcow_table' to 'struct qcow_l1_table'
in preparation for introducing 'struct qcow_refcount_table'.
Signed-off-by: Pekka Enberg <penberg@kernel.org>
We don't handle refcount table properly so make sure we only write to clusters
that have the "copied" flag set.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Return device id when requested by virtio-blk.
Device id is currently based on the device information and the inode
number of the underlying disk image.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch delays writeout for new L2 tables like we do for L1 tables. If a L2
table has non-allocated clusters, we mark that in the in-memory L2 table but
don't actually write it to disk until the L2 table is thrown out of LRU cache
or when qcow_disk_flush() is called. That makes writes to new clusters volatile
before VIRTIO_BLK_T_FLUSH is issued without corrupting the QCOW image on I/O
error.
Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>