We have a customer that uses the SLES Xen hypervisor (SLES 10 SP2 and now SLES 11) on their test and development server. The idea is simple, virtualize all their test and development servers on a single 2-way QUAD core, 32 GB RAM and 1TB RAID-5 SATA drive. This way, they save money on the number of physical machines in their test/development environment, improve utilization (not all projects runs concurrently). Their production systems are still running on bare-metal without virtualization... for now.
They've been having some networking issues (ie packet drop in PING test) and suspected its either the NIC (Broadcom 4 port Gb NIC and using NIC bonding) hardware or the netorking setup needs tweaking. Here's a journal of what I discovered and resolved (to a certain extent - hardware not in my scope):
1. Boot up SLES 11 (x86_64) with the default kernel (non-Xen) and configure the basic networking. Created a bond0 device that bonds 2, out of the 4, physical ports for NIC bonding (static IP, DNS and Routing info provided by customer). Tested the configuration and it works, we can ping other servers and development desktops can ping the server. Opened port in SuSEFirewall and SSH session works. Used yast2 lan for configuring and verified (always a good thing) via the following files /etc/hosts, /etc/resolv.conf, /etc/sysconfig/network/ifcfg-bond0
The following steps are done with reference to this Novell support document "Hassle-free Xen Networking" at this link.
2. Restart SLES 11 with the Xen kernel. Verify the following entry is commented out in /etc/xen/xend-config.sxp
## (network-script network-bridge)
and instead have the following:
Restart Xen via rcxend restart.
3. Created a bridge interface called br0 and bridged it to bond0. Moved all the static IP settings from bond0 to br0. In the place of the now empty bond0 (zero IP settings), I placed static IP 0.0.0.0 and Subnet Mask /32 (or 255.255.255.255). Verify that br0 is the interface with the static IP while bond0 does not. Also verify all the network pings between the servers and development desktop.
You can see this in more detail as referenced earlier at this link.
4. Almost there! Next, I just needed to verify and update virtual NIC settings in all domUs (VMs). By inspecting each VM configuration file in /etc/xen/vm/, we need to amend the vif variable/s (see example below):
from: vif=[ 'mac=00:16:3e:24:96:38', ]
to: vif=[ 'mac=00:16:3e:24:96:38,bridge=br0', ]
Don't forget to refresh the configuration via xm delete [VM] and xm new [VM] where [VM] is the name of the domU (VM).
Done! Customer can ping the host (SLES 11) and their domUs (SLES, RHEL and Windows) successfully without any packet drops.