Unequal load balancing with BGP

It recently came up that a customer would like dual connections and load balancing. Just about everyone knows about equal cost multi-path routing and load balancing traffic over links, but I thought it would be fun to talk about how we can also balance traffic...unequally.

Before we build up to unequal load balancing, we lets take a look at typical load balancing.

Below is an example of a customer that wants two connections to us.

In this first example, we will not be load balancing. CE 1 will presumably receive the same routes from both PE 1 and PE 2 but will only select a single next-hop in the RIB and FIB.

Quick refresher on the output here:

Destinations = What we know how to reach
Routes = Different paths we can take to reach destinations
Active = Best routes in the routing table
"*" = Active route
">" = Next hop used for forwarding table

The output tells us we have learned the prefix 100.65.0.0/24 from two BGP neighbors in the same AS but the best path is through 100.64.0.6. In this case the best path is the one that was active first (rule 11).

💡

The BGP Best Path algorithm tie-breakers at the end seem to slightly differ based on the Juniper source you are using. Even the official Juniper BGP guide differs from 'help topic bgp path-selection' on vMX routers which also differs from Day One BGP. In the above case, rule 11 is referencing the "oldest version" or "currently active" route depending on which source you use.

The best path has been selected and the corresponding forwarding table entry has been created.

Now we turn on the multipath knob and see the expected results of per-prefix load balancing.

set protocols bgp [neighbor] multipath
set protocols bgp multipath

Choose the command based on if you only want it on certain peers or all peers

Notice that the other BGP route's next-hop has been copied to the active route's list of available next-hops and its been selected? This output is tricky because even though there are two next-hops listed for prefix 100.65.0.0/24, only one will be used. So other than copying the next-hop of other routes into the active route, what did multipath do? It enabled per-prefix load balancing. On a sufficiently large number of routes, the Juniper will hash the route and attempt to load balance traffic per-prefix. This means that even though 100.65.0.0/24 has two next-hops listed, only one will be used. We can prove this again by taking a look at the forwarding table.

Hypothetically, if you received 900k routes, 450k of them would select one path and the other 450k would select the other path.

If we want the router to use both next-hops for the route, we need to tell it to program the forwarding table with more than one next-hop for a given prefix. Create your policy to select which routes you want to load balance.

policy-statement lb_100_65_0_0 {
    from {
        route-filter 100.65.0.0/24 exact;
    }
    then {
        load-balance per-packet;
    }
}

Then apply that policy to your forwarding table.

routing-options {
    forwarding-table {
        export lb_100_65_0_0;
    }
}

Certain configuration options have been excluded for brevity

If we run the same commands, we see the same routing table output because the changes we made changed the way the forwarding table gets programmed not the routing table. But if we take a look at the forwarding table, we will see there are multiple next hops listed for 100.65.0.0/24.

We now have equal cost multipath load balancing configured, working, and a refresher of what the RIB and FIB will look like when traffic is equally balanced so we can start talking about unequal load balancing.

BGP Unequal multipath load balancing is achieved using an extended BGP community to specify the bandwidth the link has.

First we need to create a community

set policy-options community bw-10 members bandwidth:65000:10000
set policy-options community bw-90 members bandwidth:65000:90000

Then we need to update our routing policy to tag routes with our new communities.

policy-statement eBGP_policy {
    term "90%" {
        from {
            protocol bgp;
            neighbor 100.64.0.2;
        }
        then {
            community add bw-90;
        }
    }
    term "10%" {
        from {
            protocol bgp;
            neighbor 100.64.0.6;
        }
        then {
            community add bw-10;
        }
    }
}

💡

Note the use of 'add' instead of 'set' for the community. You should be tagging your routes with other communities already so we want to add to the communities, not replace them.

And of course apply the routing policy if you haven't already.

set protocols bgp group [GROUP] import eBGP_policy

Now lets take a look at our RIB and FIB.

So.. How do we know we are load balancing per our configured 90/10 routing policy? We need to look at the extensive RIB output.

💡

You may notice that even when load balancing the ">" tag still exists on next-hops and the RIB extensive output still uses "selected" keywords. This is a little deceiving, but basically that is the next-hop that would be used if we were not load balancing. The only way to see what is actually being used is to inspect the forwarding table.

💡

When defining the bandwidth extended community, you are specifying the link bandwidth in Bytes. In this blog post, I simply used a ratio to make it 90/10, but if you plan on advertising aggregate bandwidth across links, it would make sense to actually do the math.

Unequal load balancing with BGP

Schylar Utley

From guesswork to framework; firewall filters

Delivering E-LINE for cell backhaul

How to make BGP AS loops