Using VRF (Virtual Routing and Forwarding) on Linux

Posted on 2020-02-29

It isn’t uncommon on linux systems to have muliple (layer 3, i.e. with IP addresses, not counting the loopback device) network interfaces. For example your main interface (with the default route) and a docker bridge. Or you run a firewall between a public network and one or many internal networks (perhaps as tagged VLAN interfaces).

But in rather rare scenarios you need multiple default routes: for example a separate management access in case someone attacks your main interface or it breaks due to other reasons. Or you want to provide a service to different networks (which are not connected directly) from the same host.

In those cases you need to be careful: if you simply add multiple default routes, linux will just pick one and go with it. Back in the old days you had to use policy-based routing for those cases: add another routing table and select it based on the source address for example; the second routing table can have a different default route than your main table.

I have seen enterprise appliances using multiple default routes (and multiple interfaces of course), and they kept wondering why it didn’t work. Well, we use ACLs to make sure systems uses the correct source address to prevent IP address spoofing; even without ACLs we usually enable uRPF. If you really need multiple interfaces, which can be connected to from overlapping sources, you MUST setup policy-based routing (unless the outgoing interface doesn’t matter).

VRFs on linux are just a fancy way to setup policy-based routing for such special cases.

Basic idea

  • Make sure your loopback device is up; without it various things just fail: ip link set dev lo up.
  • For each VRF apart from your default context create a VRF interface with a separate routing table like: ip link add name blue type vrf table 10. You can assign names to those ids in /etc/iproute2/rt_tables or /etc/iproute2/rt_tables.d/*.conf - also handy to see which IDs are already in use by default. Each VRF needs its own table id!
  • Start VRFs: ip link set dev blue up
  • Assign interfaces to the VRF and start them: ip link set dev eth0 vrf blue up
  • Now you can configure addresses with ip address as always; to configure routes you need to specify the VRF like: ip [-6] route ... vrf blue (or the table id with ... table 10).

Sadly there are some cornercases that won’t work properly.

TLDR: run fix-vrf-rules.sh to fix it.

How it works

Routing on linux uses rules to find tables, and then looks up the target in the tables.

More precisely: it loops through all rules by preference, and if a rule matches, it will execute its action. If a action doesn’t lead to a final result it will continue with the next rule.

Usually the action is a table lookup; if the lookup finds a matching route, the process ends. In short: first table with a matching entry wins!

You can take a look at the rules with: ip rule (IPv6: ip -6 rule. For everything below keep in mind IPv6 has a separate configuration).

For IPv4 they look like this by default:

0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

(In IPv6 the lookup default line is not present by default.)

When you create a VRF (with linux kernel >= 4.8) you get this instead:

0:      from all lookup local
1000:   from all lookup [l3mdev-table]
32766:  from all lookup main
32767:  from all lookup default

It inserted a new rule with a rather magic [l3mdev-table] lookup. This will lookup the table you specified on creation of the VRF interface when the packet to be routed uses an interface that belongs to a VRF. Only one rule is needed for all VRFs!

But… sadly the lookup local comes before the VRF lookup; if you try to ping one of your main addresses from the VRF it will find them in the local table, and never reach the VRF routing table. This is almost never what you wanted; with RFC1918 addresses you might want to reach someone else entirely (or just want to take it the proper path due to stateful firewalls: the response won’t take the shortcut!).

The solution is to move the lookup local rule down; even the “selftests” script in the kernel does this:

ip -4 rule add pref 32765 table local
ip -4 rule del pref 0
ip -6 rule add pref 32765 table local
ip -6 rule del pref 0

The kernel Documentation/networking/vrf.txt also recommends adding unreachable routes to the VRF routing tables so that routes afterwards are taken (I’d recommend using vrf blue instead of table 10 though):

ip -4 route add table 10 unreachable default metric 4278198272
ip -6 route add table 10 unreachable default metric 4278198272

An even easier solution in my opinion is to add another rule with an unreachable target which only triggers in VRF context:

ip -4 rule add pref 2000 l3mdev unreachable

Now the rules should look like this:

1000:   from all lookup [l3mdev-table]
2000:   from all lookup [l3mdev-table] unreachable
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

The unreachable action overrides the lookup, but due to l3mdev only triggers if a VRF lookup would have triggered, and solves the problem for all VRFs.

My script fix-vrf-rules.sh should apply those changes in a safe way.

Demo

I created another script vrf-demo.sh (requires fix-vrf-rules.sh) to show how a VRF setup can look like.

First it runs in a separate network namespace, so your normal network doesn’t break.

Then it will create a VRF and a veth-link (“virtual ethernet”, consisting of two interfaces) between the VRF and the default environment, setup some addresses and routing and showing connectivity.

Using VRFs

Sadly most programs can’t use VRFs directly; they’d need support to set the SO_BINDTODEVICE (or SO_BINDTOIFINDEX) socket option. Without that they will end up using the default VRF (binding sources addresses from other VRFs doesn NOT work).

For incoming TCP connections you should be able to use sysctl -w net.ipv4.tcp_l3mdev_accept=1 to have a TCP server running in the “global” VRF accept connections in other VRFs as well (and the connection then being in the VRF it came from). The kernel also has sysctl options for UDP and RAW sockets, but I can’t see how that’d be useful, as they don’t result in a (VRF specific) connection socket, and response packets wouldn’t be sent in the VRF.

You can change the default VRF for a program with ip vrf exec blue ... - this will install a hook in the kernel to basically set SO_BINDTODEVICE everytime a new socket is created.

Comparing with policy-based routing

With the traditional policy-based routing you’d add rules like this:

ip rule add from 192.0.2.2 lookup 40

And then fill the routing table 40.

As binding to source addresses is a platform-independent concept, many programs support it (compared to SO_BINDTODEVICE).

On the other hand you can use the same IP addresses in different VRFs; this is not possible without VRFs (for example you could run a DNS resolver for different networks on the same IP address for easier configuration).


Complete story with comments

debian stretch: upgrade 32-bit to 64-bit

Posted on 2017-06-28

There are various reasons why one would prefer 64-bit over 32-bit (or not); basically it is about improved ABI (passing arguments in registers), bigger register (can be faster) versus higher memory usage (because pointers are twice as big). In some corner cases you want 64-bit to be able to use more memory in your programs (32-bit kernels can often handle more than 4G memory, but 32-bit userspace programs can’t).

For me 64-bit offers higher security as well: ASLR (address space layout randomization) should be more effective on 64-bit because it can use more entropy.

So that is why I decided to upgrade some 32-bit (called “i386” in debian) machines to 64-bit (“amd64”); after some trial and error I came to the following set of operations to do this in a “reliable” (for me) and fast way.

Note 2020-10-24: crossgrader just made it to debian unstable; it appears to support crossgrading even across CPU-incompatible architectures.

Instructions

  • This one should be obvious: make sure you have backups
  • If you can’t figure out what one of the commands below does, this is not for you. Expert mode only :)
  • Don’t blame me if it fails :)
  • Some programs store data in a platform specific way. So far I had this problem only with postgresql, but other databases are very likely to be affected too. Export the data in a platform independent way before and restore it afterwards.
  • If you run custom installed stuff like rvm, virtualenv or npm they probably won’t work after the migration.
  • Remember which packages were installed, so you can later check nothing is missing (and you didn’t install packages you didn’t want). As I use aptitude with “markauto” I also store these states:

    aptitude search ~M | cut -c5- | cut -d' ' -f1 > aptitude.auto
    dpkg -l | grep '^ii' | awk '{print $2}' > packages
    
  • Check the current architectures. Should show “i386” in the first and either nothing or “amd64” (when multiarch is already enabled) in the second command.

    dpkg --print-architecture
    dpkg --print-foreign-architectures
    
  • Now upgrade to a 64-bit kernel. Since stretch you already need a multiarch setup for this:

    dpkg --add-architecture amd64
    apt-get update
    apt-get --no-install-recommends install linux-image-amd64:amd64
    

    If you need dkms modules you also need linux-headers-amd64:amd64. Make sure the dkms modules actually got compiled before rebooting if they are essential.

  • Reboot and make sure you boot the amd64 kernel - either manipulate your bootloader config before rebooting, select it manually, or remove the 32-bit kernel before.

    uname -r
    # should show 4.9.0-3-amd64
    
  • Now is a good time to remove special 32-bit only packages, like libc6-xen or the 32-bit kernel.

  • You’ll see a repeating pattern below, so I’ll explain it first. apt-get (and aptitude) are not good at resolving conflicts, and don’t understand they don’t have to remove the :i386 packages to install the corresponding :amd64 package. So they would start with removing the :i386 packages, which essentially breaks your system, instead of just installing the :amd64 package.

    So we only download the packages with apt, and install them manually with dpkg. For this we need a clean /var/cache/apt/archives/ directory.

    The pattern is like this:

    apt-get clean
    apt-get -y --no-install-recommends --download-only install SOME_PACKAGES
    dpkg --install /var/cache/apt/archives/*.deb
    

    If the download command shows errors, it usually mentions missing dependencies: just repeat the command and add those at the end of the command.

    Sometimes dpkg fails:

    • If a Pre-Depends package is missing, it won’t install it at all. At the end dpkg will show you the *.deb filename - just install those again with dpkg --install ....
    • If a Depends package is missing it won’t configure the package. Just run dpkg --configure -a afterwards to fix those.

    The examples below sometimes anticipate the dpkg failures and show a second command to run.

  • For whatever fucked up reason the dash preinst script uses bash; and bash of course “Pre-Depends” on dash. There is a good chance this breaks later, and if one of them gets removed you’re in big trouble (xxd -r might help you restoring binaries from another system). So we update these first:

    apt-get clean
    apt-get -y --no-install-recommends --download-only install dash:amd64 bash:amd64
    dpkg --install /var/cache/apt/archives/*.deb
    dpkg --install /var/cache/apt/archives/dash_*.deb /var/cache/apt/archives/bash_*.deb
    

    Update: Good news, #604873 (and #866362 with it) got fixed in debian buster, which should make this step unnecessary.

  • Now we change the host architecture by installing the new packaging tools. This should work without aptitude too if you don’t want it.

    apt-get clean
    apt-get -y --no-install-recommends --download-only install dpkg:amd64 apt:amd64 aptitude:amd64 apt-utils:amd64
    dpkg --install /var/cache/apt/archives/*.deb
    dpkg --install /var/cache/apt/archives/dpkg_*.deb
    
  • Check the architecture indeed changed:

    dpkg --print-architecture
    dpkg --print-foreign-architectures
    
  • Now install all :i386 with :amd64 as well. You can start with the libraries and perl (this sometimes keeps the failing list shorter, but not always).

    apt-get clean
    apt-get --download-only -y --no-install-recommends install `dpkg -l | grep '^.i' | awk '{print $2}' | grep :i386 | sed -e 's/\(.*\):i386/\1:i386- \1:amd64/'`
    dpkg --install /var/cache/apt/archives/lib*.deb /var/cache/apt/archives/perl*.deb
    dpkg --configure -a
    

    The download command might fail due to missing dependencies (see note above); also packages which are not available on amd64 can’t be installed of course. You could try removing them if they are not strictly needed, or put a | grep -v PKGNAME before sed.

  • Now install everything else as :amd64.

    apt-get clean
    apt-get --download-only -y --no-install-recommends install `dpkg -l | grep '^.i' | awk '{print $2}' | grep :i386 | sed -e 's/\(.*\):i386/\1:i386- \1:amd64/'`
    dpkg --install /var/cache/apt/archives/*.deb
    

    Again you might need to fix the download command like before. Instead of redownloading you can just remove the just installed packages too, or just install them again.

    This was basically the big step.

  • Make sure your stuff is working again. The following steps cleanup the 32-bit packages, and for example installing postgresql-9.6:i386 to recover your database becomes more difficult if all dependecies are already gone.

  • Now remove the :i386 packages - they shouldn’t be needed anymore. I used aptitude for this (limit package view with l, then enter ~ri386). Something like this might work too:

    apt-get autoremove
    apt-get remove `dpkg -l | grep '^.i' | awk '{print $2}' | grep :i386`
    
  • Restore the aptitude markauto state:

    aptitude markauto `cat aptitude.auto | sed -e s/:i386/:amd64/`
    
  • Check installed package differences:

    diff -u  <(sed s/:.*// packages | uniq) <(dpkg -l | grep '^ii' | awk '{print $2}' | sed s/:.*// | uniq) | less
    

    Lines with + are new packages, lines with - are removed packages.

  • Remove i386 architecture:

    dpkg --remove-architecture i386
    

    Only works if there are no i386 packages left.

  • You’re done. Reboot if you want to make sure everything is still working :)

Additional notes

Debian multiarch dependency handling

Sometimes the dependency handling is fucked up. E.g. postgresql (architecture independent) depends on postgresql-9.6 - it isn’t multiarch enabled as far as I can tell, and this means it wants postgresql-9.6 with the host architecture (which is completely unreasonable as it is just a meta package pulling components).

A similar problem exists the other way too: postgresql-9.6 (architecture dependent) depends on locales (architecture independent). postgresql-9.6 wants locales to be from the same architecture as itself, and apt and dpkg pretend locales is from the host architecture, which doesn’t match if you try to install postgresql-9.6:i386 and the host architecture is already amd64.

I have no solution for the first problem; in the second case just download the package with wget and install it with dpkg --install. It won’t configure it (because a dependency is missing), but it should be usable.

Useful tools

  • needrestart is a great tool to find processes which should be restarted after upgrades - or 32-bit to 64-bit migrations.
  • etckeeper is a great tool to keep track of your /etc/ changes.

Feedback

I received some mails with feedback, thanks Leszek Dubiel and Klemens:

  • It’s probably a good idea to make sure there are no pending updates before you start.
  • I pass --no-install-recommends to apt; if you enabled APT::Install-Suggests in your apt config you should add --install-suggests as well in those places, or simply put this into /etc/apt/99-no-auto-install.conf during the procedure:

    APT::Install-Recommends "false";
    APT::Install-Suggests "false";
    
  • You could make sure before starting your packages are in good state, and you don’t have any configured-but-not-installed packages. List the unclean packages with:

    dpkg -l | grep -v '^ii'
    

    Purge previously installed and still configured packages with (this should remove all configuration and data of those packages - so don’t do this if you still need those):

    apt-get purge $(dpkg -l | grep "^rc" | awk '{print $2}')
    

    Or with aptitude:

    aptitude purge '~c'
    
  • You can try to make sure before that all the packages you want are available on 64-bit:

    apt-get --dry-run --download-only -y --no-install-recommends install \
      `dpkg -l | grep '^.i' | awk '{print $2}' | grep :i386 | sed -e 's/\(.*\):i386/\1:i386- 1\1:amd64/'`
    

    If you have packages that only work in a 32-bit environment you might be able to run them in a multiarch setup afterwards; just be careful when removing 32-bit packages above (either don’t remove them, or reinstall them later).

  • If dpkg --install fails with “too many errors” you might want to try:

    dpkg --abort-after=999999 --install ...
    
  • Due to my repeated usage of apt-get clean a fast network connection is useful, or a local caching proxy (e.g. squid-deb-proxy; better run it on a second machine though).

Complete story with comments

systemd: allow normal process to bind to privileged port

Posted on 2017-06-23

Sometimes services are started by systemd with already dropped privileges, for example inspircd.service starts as irc user.

Such services cannot bind to priliged ports (< 1024) usually – in this case I needed it to listen to port 443 though (additionally to some high port) to allow users behind “strange” firewall configurations to connect to the server.

The solution is to add the following to the service (for example by running systemctl edit inspircd.service):

[Service]
AmbientCapabilities=CAP_NET_BIND_SERVICE

Complete story with comments
Generated using nanoc and bootstrap - Last content change: 2013-08-16 14:47