1. Feature Bits
Each virtio device offers all the features it understands. During device initialization, the driver reads this and tells the device the subset that it accepts. The only way to renegotiate is to reset the device. This allows for forwards and backwards compatibility: if the device is enhanced with a new feature bit, older drivers will not write that feature bit back to the device. Similarly, if a driver is enhanced with a feature that the device doesn’t support, it see the new feature is not offered.
struct virtio_net {
/* Frontend (QEMU) memory and memory region information */
struct rte_vhost_memory *mem;
uint64_t features;
uint64_t protocol_features;
int vid;
uint32_t flags;
uint16_t vhost_hlen;
/* to tell if we need broadcast rarp packet */
rte_atomic16_t broadcast_rarp;
uint32_t nr_vring;
int dequeue_zero_copy;
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
char ifname[IF_NAME_SZ];
uint64_t log_size;
uint64_t log_base;
uint64_t log_addr;
struct ether_addr mac;
uint16_t mtu;
struct vhost_device_ops const *notify_ops;
uint32_t nr_guest_pages;
uint32_t max_guest_pages;
struct guest_page *guest_pages;
int slave_req_fd;
} __rte_cache_aligned;
/* The feature bitmap for virtio net */
#define VIRTIO_NET_F_CSUM 0 /* Host handles pkts w/ partial csum */
#define VIRTIO_NET_F_GUEST_CSUM 1 /* Guest handles pkts w/ partial csum */
#define VIRTIO_NET_F_MTU 3 /* Initial MTU advice. */
#define VIRTIO_NET_F_MAC 5 /* Host has given MAC address. */
#define VIRTIO_NET_F_GUEST_TSO4 7 /* Guest can handle TSOv4 in. */
#define VIRTIO_NET_F_GUEST_TSO6 8 /* Guest can handle TSOv6 in. */
#define VIRTIO_NET_F_GUEST_ECN 9 /* Guest can handle TSO[6] w/ ECN in. */
#define VIRTIO_NET_F_GUEST_UFO 10 /* Guest can handle UFO in. */
#define VIRTIO_NET_F_HOST_TSO4 11 /* Host can handle TSOv4 in. */
#define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */
#define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */
#define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */
#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */
#define VIRTIO_NET_F_STATUS 16 /* virtio_net_config.status available */
#define VIRTIO_NET_F_CTRL_VQ 17 /* Control channel available */
#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the
* network */
#define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
* Steering */
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
/* Do we get callbacks when the ring is completely used, even if we've
* suppressed them? */
#define VIRTIO_F_NOTIFY_ON_EMPTY 24
/* Can the device handle any descriptor layout? */
#define VIRTIO_F_ANY_LAYOUT 27
/* We support indirect buffer descriptors */
#define VIRTIO_RING_F_INDIRECT_DESC 28
#define VIRTIO_F_VERSION_1 32
#define VIRTIO_F_IOMMU_PLATFORM 33
2. Virtqueues
The memory aligment and size requirements, in bytes, of each part of the virtqueue are summarized in the following table:
Virtqueue Part | Alignment | Size |
---|---|---|
Descriptor Table | 16 | 16*(Queue Size) |
Available Ring | 2 | 6 + 2*(Queue Size) |
Used Ring | 4 | 6 + 8*(Queue Size) |
When the driver wants to send a buffer to the device, it fills in a slot in the descriptor table (or chains several together), and writes the descriptor index into the available ring. It then notifies the device. When the device has finished a buffer, it writes the descriptor index into the used ring, and sends an interrupt.
struct vring {
unsigned int num;
struct vring_desc *desc;
struct vring_avail *avail;
struct vring_used *used;
};
/* The standard layout for the ring is a continuous chunk of memory which
* looks like this. We assume num is a power of 2.
*
* struct vring {
* // The actual descriptors (16 bytes each)
* struct vring_desc desc[num];
*
* // A ring of available descriptor heads with free-running index.
* __u16 avail_flags;
* __u16 avail_idx;
* __u16 available[num];
* __u16 used_event_idx;
*
* // Padding to the next align boundary.
* char pad[];
*
* // A ring of used descriptor heads with free-running index.
* __u16 used_flags;
* __u16 used_idx;
* struct vring_used_elem used[num];
* __u16 avail_event_idx;
* };
*
* NOTE: for VirtIO PCI, align is 4096.
*/
2.1. The Virtqueue Descriptor Table
The descriptor table refers to the buffers the driver is using for the device. addr is a physical address, and the buffers can be chained via next. Each descriptor describes a buffer which is read-only for the device (“device-readable”) or write-only for the device (“device-writable”), but a chain of descriptors can contain both device-readable and device-writable buffers.
/* VirtIO ring descriptors: 16 bytes.
* These can chain together via "next". */
struct vring_desc {
uint64_t addr; /* Address (guest-physical). */
uint32_t len; /* Length. */
uint16_t flags; /* The flags as indicated above. */
uint16_t next; /* We chain unused descriptors via this. */
};
2.2. The Virtqueue Available Ring
The driver uses the available ring to offer buffers to the device: each ring entry refers to the head of a descriptor chain. It is only written by the driver and read by the device. idx field indicates where the driver would put the next descriptor entry in the ring (modulo the queue size). This starts at 0, and increases.
struct vring_avail {
uint16_t flags;
uint16_t idx;
uint16_t ring[0];
};
2.3. The Virtqueue Used Ring
The used ring is where the device returns buffers once it is done with them: it is only written to by the device, and read by the driver. Each entry in the ring is a pair: id indicates the head entry of the descriptor chain describing the buffer (this matches an entry placed in the available ring by the guest earlier), and len the total of bytes written into the buffer.
/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
struct vring_used_elem {
/* Index of start of used descriptor chain. */
uint32_t id;
/* Total length of the descriptor chain which was written to. */
uint32_t len;
};
struct vring_used {
uint16_t flags;
volatile uint16_t idx;
struct vring_used_elem ring[0];
};
3. PCI Device Discovery
Devices MUST have the PCI Vendor ID 0x1AF4. Devices MUST either have the PCI Device ID calculated by adding 0x1040 to the Virtio Device ID, or have the Transitional PCI Device ID depending on the device type, as follows:
|Transitional PCI Device ID|Virtio Device| | —— | —— | |0x1000|network card| |0x1001|block device| |0x1002|memory ballooning (traditional)| |0x1003|console| |0x1004|SCSI host| |0x1005|entropy source| |0x1009|9P transport| For example, the network card device with the Virtio Device ID 1 has the PCI Device ID 0x1041 or the Transitional PCI Device ID 0x1000.
/* VirtIO PCI vendor/device ID. */
#define VIRTIO_PCI_VENDORID 0x1AF4
#define VIRTIO_PCI_LEGACY_DEVICEID_NET 0x1000
#define VIRTIO_PCI_MODERN_DEVICEID_NET 0x1041