problem: proxmox - guests experience networking outage after vm bridge interface maintenance on the host
This post covers a situation when the host is using the
ifupdown package - not
ifupdown2 package. In the older package, the
ifup utilities are more likely to be disruptive during network reconfiguration and ignore enslaved devices/interfaces on a bridge. One could argue this fact is the root cause of the guest network outage which inspired this post.
The more modern
ifupdown2 utilities should be non-disruptive and are aware of bridges and enslaved devices/interfaces. If you are not already using
ifupdown2, I strongly recommend you test
ifreload and consider migrating.
You are running a Proxmox Virtual Environment / PVE hypervisor. You performed some maintenance on the physical interfaces of the bridge interface, and needed to
ifup the bridge interface to restore host networking, and now the guests have lost networking! 😲
For example, maybe you switched the physical interface behind the bridge from one physical interface to another, perhaps upgrading from 1 GbE to 10 GbE networking.
To quote Proxmox staff member Stoiko Ivanov on this post:
"If you restart a bridge-interface - all it's child interfaces are not associated with it any more - i.e. all of your VMs tap-devices are not in the bridge any more - your VMs lose connectivity."
man ip-link refers to devices (interfaces) that have had their master set as being enslaved. This is what proxmox does with kvm's and lcx containers - their bridged network devices/interfaces are enslaved to a bridge. Using the legacy
ifup utilities does not take care about enslaved devices - effectively the slaves become masterless and experience a networking outage.
impact: networking downtime for guestsNaturally the guests experiencing an outage because they lost their virtual network is not good - especially in sensitive / mission critical environments.
solution: re-enslave the guest devices/interfaces to the bridge
This approach restores guest networking without requiring a host reboot or guest migrations.
Under normal operations pve guests using bridged networking have tap and/or veth interfaces enslaved to the a given vm bridge interface e.g. vmbr0.
Here is how the bridge looks like under normal conditions:
root@viper:~# pveversion pve-manager/7.3-3/c3928077 (running kernel: 5.15.74-1-pve) root@viper:~# brctl show vmbr0 bridge name bridge id STP enabled interfaces vmbr0 8000.82f29b223ad7 no eth0 tap102i0 tap103i0 veth100i0
the bridge has undergone maintenance and needs an
ifup to get
networking working again on the host, the tap/veth devices/interfaces become masterless and no longer associated to the bridge. Here is a comparison
ip a s output:
This command determines the guest tap devices/interfaces for all kvm guests:
To re-enslave the tap devices/interfaces, for each line of the output run:
grep -FH bridge= /etc/pve/nodes/*/qemu-server/*.conf | sort | uniq | perl -nle 'print "tap$1i$2 master $3" if /\/(\d+).conf:net(\d+):.*?bridge=(vmbr\d+)/'
# ip link set <paste an output line>
# for kvm 102
ip link set tap102i0 master vmbr0
# for kvm 103
ip link set tap103i0 master vmbr0
This command determines the guest veth devices/interfaces for all lxc guests:
To re-enslave the veth devices/interfaces, for each line of the output run:
grep -FH bridge= /etc/pve/nodes/$(hostname)/lxc/*.conf | sort | uniq | perl -nle 'print "veth$1i$2 master $3" if /\/(\d+).conf:net(\d+):.*?bridge=(vmbr\d+)/'
# ip link set <paste an output line>
# for lxc 100
ip link set veth100i0 master vmbr0
# for lxc 101
ip link set veth101i0 master vmbr0
caveat: the grep in these command does not consider if the guest is running and was or wasn't enslaved. This needs to be checked by a human or programmatically.
caveat: the approach worked reliably for tap devices/interfaces for kvm guests. I had issues restoring networking on lxc containers (veth) - something else was de-configured. I took the decision to migrate to the
ifupdown2 package and I was able to run
ifreload after re-enslaving the veth devices/interfaces which did restore lxc guest networking.
It would be possible to automate the previous step once you're comfortable with the solution, e.g. with
xargs. However I strongly recommend to migrate to using
ifupdown2 which handles all this automagically.
It would be possible to make a simple script for this approach. It could be further developed to run as an interface post-up hook. That script would want to have some fault checking and ensure the tap interfaces are detached before trying to reattach etc. However, before trying to automate this I'd suggest to research the
ifreload command from the
ifupdown2 package, mentioned below.
Stoiko Ivanov @ on this post for insights and knowledge.
smartynov @ on this post for the script.
Further considerations - probably more eloquent
If you're using the
ifupdown2 package,this package includes a command
ifreload (Debian stable man page). This command is able to reload interfaces non-disruptively and automagically keeps any enslaved devices/interfaces attached. Its mentioned by Stoiko Ivanov @ on this post. It works very well from my short expereince with it.
Edit: I've migrated to using the
I don't use the
ifupdown2 package yet but this is the second time its been suggested as a solution for networking related configuration management improvements. I got the impression that the pve team were going to make it a standard at some point in the pve distro.
WARNING: I've not tested the
pvesh approach - I'm not sure how graceful it will be. If you have network configuration outside of pve? Then its possible this command will clobber it. Use with caution - test this in a way or env where it cannot harm your host or guests.
For the node I was working on when I authored this post - AFAIK the node had never had its network configured via proxmox or its GUI - so its unclear what would happen. Fresh snapshot(s) and scheduled maintenance window required before trying it - in my case.
I also saw multiple mentions of a
pvesh approach to this, which may depend on the
ifupdown2 package. I haven't found official docs on this yet, but it looks like the following command reloads the pve network configuration:
pvesh set /nodes/<node>/network
<node> is one of the nodes in your cluster. The
pvesh cmd supports autocomplete. I'm making the assumption that the pve web interface calls this api to apply changes, and this particular invocation applies the network config to a node, and includes a.
ifreload (or similar?).
The mentions of this approach I've seen are here, here, here and here.
Post a Comment