gogodick's blog

Find a bug in DPDK pci driver

2018-10-09

阅读:

 

I find that DPDK application crashes after attaching serveral PCI devices. Then I debug this issue and find some clues:

After attaching the second PCI device, the virtual address of mapped PCI resource for the first PCI device is corrupted, dev->mem_resource[x].addr is overridden with 0, then driver does not work correctly.

The calling sequence is: rte_eth_dev_attach()->rte_eal_hotplug_add()

Let’s take a closer look at rte_eal_hotplug_add():

	ret = bus->scan();
	if (ret)
		goto err_devarg;

	dev = bus->find_device(NULL, cmp_detached_dev_name, devname);
	if (dev == NULL) {
		RTE_LOG(ERR, EAL, "Cannot find unplugged device (%s)\n",
			devname);
		ret = -ENODEV;
		goto err_devarg;
	}

	ret = bus->plug(dev);
	if (ret) {
		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
			dev->name);
		goto err_devarg;
	}

rte_eal_hotplug_add() will invoke bus callback scan(), find_device() and plug(), and this issue is introduces by scan().

The calling sequence for scan() is: rte_pci_scan()->pci_scan_one()

And the virtual address of mapped PCI resource is overwritten at pci_scan_one():

	/* device is valid, add in list (sorted) */
	if (TAILQ_EMPTY(&rte_pci_bus.device_list)) {
		rte_pci_add_device(dev);
	} else {
		struct rte_pci_device *dev2;
		int ret;

		TAILQ_FOREACH(dev2, &rte_pci_bus.device_list, next) {
			ret = rte_pci_addr_cmp(&dev->addr, &dev2->addr);
			if (ret > 0)
				continue;

			if (ret < 0) {
				rte_pci_insert_device(dev2, dev);
			} else { /* already registered */
				dev2->kdrv = dev->kdrv;
				dev2->max_vfs = dev->max_vfs;
				pci_name_set(dev2);
				memmove(dev2->mem_resource, dev->mem_resource,
					sizeof(dev->mem_resource));
				free(dev);
			}
			return 0;
		}

		rte_pci_add_device(dev);
	}

The virtual address of mapped PCI resource is initialized at plug(), and it will not update the virtual address for an already plugged device.

And the calling sequence for plug() is: pci_plug()->pci_probe_all_drivers()->rte_pci_probe_one_driver()->rte_pci_map_device()

I have reported this bug, and below patch can fix it:

bus/pci: fix unexpected resource mapping override

The fix is not to update any rte_pci_device’s field if the being scanned device’s driver is already probed.


Similar Posts