It isn’t uncommon on linux systems to have muliple (layer 3, i.e. with IP addresses, not counting the loopback device) network interfaces. For example your main interface (with the default route) and a docker bridge. Or you run a firewall between a public network and one or many internal networks (perhaps as tagged VLAN interfaces).
But in rather rare scenarios you need multiple default routes: for example a separate management access in case someone attacks your main interface or it breaks due to other reasons. Or you want to provide a service to different networks (which are not connected directly) from the same host.
In those cases you need to be careful: if you simply add multiple default routes, linux will just pick one and go with it. Back in the old days you had to use policy-based routing for those cases: add another routing table and select it based on the source address for example; the second routing table can have a different default route than your main table.
I have seen enterprise appliances using multiple default routes (and multiple interfaces of course), and they kept wondering why it didn’t work. Well, we use ACLs to make sure systems uses the correct source address to prevent IP address spoofing; even without ACLs we usually enable uRPF. If you really need multiple interfaces, which can be connected to from overlapping sources, you MUST setup policy-based routing (unless the outgoing interface doesn’t matter).
VRFs on linux are just a fancy way to setup policy-based routing for such special cases.
Basic idea
- Make sure your loopback device is up; without it various things just fail:
ip link set dev lo up
. - For each VRF apart from your default context create a VRF interface with a separate routing table like:
ip link add name blue type vrf table 10
. You can assign names to those ids in/etc/iproute2/rt_tables
or/etc/iproute2/rt_tables.d/*.conf
- also handy to see which IDs are already in use by default. Each VRF needs its own table id! - Start VRFs:
ip link set dev blue up
- Assign interfaces to the VRF and start them:
ip link set dev eth0 vrf blue up
- Now you can configure addresses with
ip address
as always; to configure routes you need to specify the VRF like:ip [-6] route ... vrf blue
(or the table id with... table 10
).
Sadly there are some cornercases that won’t work properly.
TLDR: run fix-vrf-rules.sh to fix it.
How it works
Routing on linux uses rules to find tables, and then looks up the target in the tables.
More precisely: it loops through all rules by preference, and if a rule matches, it will execute its action. If a action doesn’t lead to a final result it will continue with the next rule.
Usually the action is a table lookup; if the lookup finds a matching route, the process ends. In short: first table with a matching entry wins!
You can take a look at the rules with: ip rule
(IPv6: ip -6 rule
. For everything below keep in mind IPv6 has a separate configuration).
For IPv4 they look like this by default:
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
(In IPv6 the lookup default
line is not present by default.)
When you create a VRF (with linux kernel >= 4.8) you get this instead:
0: from all lookup local
1000: from all lookup [l3mdev-table]
32766: from all lookup main
32767: from all lookup default
It inserted a new rule with a rather magic [l3mdev-table]
lookup. This will lookup the table you specified on creation of the VRF interface when the packet to be routed uses an interface that belongs to a VRF. Only one rule is needed for all VRFs!
But… sadly the lookup local
comes before the VRF lookup; if you try to ping one of your main addresses from the VRF it will find them in the local
table, and never reach the VRF routing table. This is almost never what you wanted; with RFC1918 addresses you might want to reach someone else entirely (or just want to take it the proper path due to stateful firewalls: the response won’t take the shortcut!).
The solution is to move the lookup local
rule down; even the “selftests” script in the kernel does this:
ip -4 rule add pref 32765 table local
ip -4 rule del pref 0
ip -6 rule add pref 32765 table local
ip -6 rule del pref 0
The kernel Documentation/networking/vrf.txt also recommends adding unreachable routes to the VRF routing tables so that routes afterwards are taken (I’d recommend using vrf blue
instead of table 10
though):
ip -4 route add table 10 unreachable default metric 4278198272
ip -6 route add table 10 unreachable default metric 4278198272
An even easier solution in my opinion is to add another rule with an unreachable
target which only triggers in VRF context:
ip -4 rule add pref 2000 l3mdev unreachable
Now the rules should look like this:
1000: from all lookup [l3mdev-table]
2000: from all lookup [l3mdev-table] unreachable
32765: from all lookup local
32766: from all lookup main
32767: from all lookup default
The unreachable
action overrides the lookup, but due to l3mdev
only triggers if a VRF lookup would have triggered, and solves the problem for all VRFs.
My script fix-vrf-rules.sh should apply those changes in a safe way.
Demo
I created another script vrf-demo.sh (requires fix-vrf-rules.sh) to show how a VRF setup can look like.
First it runs in a separate network namespace, so your normal network doesn’t break.
Then it will create a VRF and a veth-link (“virtual ethernet”, consisting of two interfaces) between the VRF and the default environment, setup some addresses and routing and showing connectivity.
Using VRFs
Sadly most programs can’t use VRFs directly; they’d need support to set the SO_BINDTODEVICE
(or SO_BINDTOIFINDEX
) socket option. Without that they will end up using the default VRF (binding sources addresses from other VRFs doesn NOT work).
For incoming TCP connections you should be able to use sysctl -w net.ipv4.tcp_l3mdev_accept=1
to have a TCP server running in the “global” VRF accept connections in other VRFs as well (and the connection then being in the VRF it came from). The kernel also has sysctl options for UDP and RAW sockets, but I can’t see how that’d be useful, as they don’t result in a (VRF specific) connection socket, and response packets wouldn’t be sent in the VRF.
You can change the default VRF for a program with ip vrf exec blue ...
- this will install a hook in the kernel to basically set SO_BINDTODEVICE
everytime a new socket is created.
Comparing with policy-based routing
With the traditional policy-based routing you’d add rules like this:
ip rule add from 192.0.2.2 lookup 40
And then fill the routing table 40.
As binding to source addresses is a platform-independent concept, many programs support it (compared to SO_BINDTODEVICE
).
On the other hand you can use the same IP addresses in different VRFs; this is not possible without VRFs (for example you could run a DNS resolver for different networks on the same IP address for easier configuration).
Complete story with comments