Saturday, January 14, 2023

Proxmox: solutions to restore guest networking after vm bridge interface maintenance on the host - without a reboot

problem: proxmox - guests experience networking outage after vm bridge interface maintenance on the host

<edit>
This post covers a situation when the host is using the ifupdown package - not ifupdown2 package. In the older package, the ifdown and ifup utilities are more likely to be disruptive during network reconfiguration and ignore enslaved devices/interfaces on a bridge. One could argue this fact is the root cause of the guest network outage which inspired this post.
The more modern ifupdown2 utilities should be non-disruptive and are aware of bridges and enslaved devices/interfaces. If you are not already using ifupdown2, I strongly recommend you test ifupdown2 and ifreload and consider migrating.
</edit>

You are running a Proxmox Virtual Environment / PVE hypervisor. You performed some maintenance on the physical interfaces of the bridge interface, and needed to ifdown and ifup the bridge interface to restore host networking, and now the guests have lost networking! 😲

For example, maybe you switched the physical interface behind the bridge from one physical interface to another, perhaps upgrading from 1 GbE to 10 GbE networking.

To quote Proxmox staff member Stoiko Ivanov on this post:

"If you restart a bridge-interface - all it's child interfaces are not associated with it any more - i.e. all of your VMs tap-devices are not in the bridge any more - your VMs lose connectivity."

Edit: man ip-link refers to devices (interfaces) that have had their master set as being enslaved. This is what proxmox does with kvm's and lcx containers - their bridged network devices/interfaces are enslaved to a bridge. Using the legacy ifupdown package ifdown and ifup utilities does not take care about enslaved devices - effectively the slaves become masterless and experience a networking outage.

impact: networking downtime for guests

Naturally the guests experiencing an outage because they lost their virtual network is not good - especially in sensitive / mission critical environments.

solution: re-enslave the guest devices/interfaces to the bridge

This approach restores guest networking without requiring a host reboot or guest migrations.

Under normal operations pve guests using bridged networking have tap and/or veth interfaces enslaved to the a given vm bridge interface e.g. vmbr0.

Here is how the bridge looks like under normal conditions:

root@viper:~# pveversion
pve-manager/7.3-3/c3928077 (running kernel: 5.15.74-1-pve)

root@viper:~# brctl show vmbr0
bridge name     bridge id               STP enabled     interfaces
vmbr0           8000.82f29b223ad7       no              eth0
                                                        tap102i0
                                                        tap103i0
                                                        veth100i0

If the bridge has undergone maintenance and needs an ifdown and ifup to get networking working again on the host, the tap/veth devices/interfaces become masterless and no longer associated to the bridge. Here is a comparison of the ip a s output:

This command determines the guest tap devices/interfaces for all kvm guests:

grep -FH bridge= /etc/pve/nodes/*/qemu-server/*.conf | sort | uniq | perl -nle 'print "tap$1i$2 master $3" if /\/(\d+).conf:net(\d+):.*?bridge=(vmbr\d+)/'
To re-enslave the tap devices/interfaces, for each line of the output run:
# ip link set <paste an output line>

# for kvm 102
ip link set tap102i0 master vmbr0
# for kvm 103
ip link set tap103i0 master vmbr0

This command determines the guest veth devices/interfaces for all lxc guests:

grep -FH bridge= /etc/pve/nodes/$(hostname)/lxc/*.conf | sort | uniq | perl -nle 'print "veth$1i$2 master $3" if /\/(\d+).conf:net(\d+):.*?bridge=(vmbr\d+)/'
To re-enslave the veth devices/interfaces, for each line of the output run:
# ip link set <paste an output line>

# for lxc 100
ip link set veth100i0 master vmbr0
# for lxc 101
ip link set veth101i0 master vmbr0

caveat: the grep in these command does not consider if the guest is running and was or wasn't enslaved. This needs to be checked by a human or programmatically.

caveat: the approach worked reliably for tap devices/interfaces for kvm guests. I had issues restoring networking on lxc containers (veth) - something else was de-configured. I took the decision to migrate  to the ifupdown2 package and I was able to run ifreload after re-enslaving the veth devices/interfaces which did restore lxc guest networking.

It would be possible to automate the previous step once you're comfortable with the solution, e.g. with xargs. However I strongly recommend to migrate to using ifupdown2 which handles all this automagically.

It would be possible to make a simple script for this approach. It could be further developed to run as an interface post-up hook. That script would want to have some fault checking and ensure the tap interfaces are detached before trying to reattach etc. However, before trying to automate this I'd suggest to research the ifreload command from the ifupdown2 package, mentioned below.

citation(s):

Props to: 

Stoiko Ivanov @ on this post for insights and knowledge.
smartynov @ on this post for the script.

Further considerations - probably more eloquent

ifupdown2 package

If you're using the ifupdown2 package,this package includes a command ifreload (Debian stable man page). This command is able to reload interfaces non-disruptively and automagically keeps any enslaved devices/interfaces attached. Its mentioned by Stoiko Ivanov @ on this post. It works very well from my short expereince with it.

Edit: I've migrated to using the ifupdown2 package. I don't use the ifupdown2 package yet but this is the second time its been suggested as a solution for networking related configuration management improvements. I got the impression that the pve team were going to make it a standard at some point in the pve distro.

pvesh approach

WARNING: I've not tested the pvesh approach - I'm not sure how graceful it will be. If you have network configuration outside of pve? Then its possible this command will clobber it. Use with caution - test this in a way or env where it cannot harm your host or guests.
For the node I was working on when I authored this post - AFAIK the node had never had its network configured via proxmox or its GUI - so its unclear what would happen. Fresh snapshot(s) and scheduled maintenance window required before trying it - in my case.

I also saw multiple mentions of a pvesh approach to this, which may depend on the ifupdown2 package. I haven't found official docs on this yet, but it looks like the following command reloads the pve network configuration:

pvesh set /nodes/<node>/network

Where <node> is one of the nodes in your cluster. The pvesh cmd supports autocomplete. I'm making the assumption that the pve web interface calls this api to apply changes, and this particular invocation applies the network config to a node, and includes a. ifreload (or similar?).

The mentions of this approach I've seen are here, here, here and here.

No comments: