5

Anycast in Segment Routing

 2 years ago
source link: https://routingcraft.net/anycast-in-segment-routing/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

MPLS or Anycast Routing – for a long time, you had to choose one. Segment Routing allows you to have both.

Introduction

It’s hard to overstate how important anycast routing is. DNS root servers and CDN rely on it to make the Internet fast and reliable. VXLAN designs in data center networks use things like anycast VTEP or anycast gateway which allows to scale the network while letting VMs migrate to different hosts without having to change IP. It’s even possible to replace expensive load balancers with anycast routing and ECMP, in some cases. Multicast designs use techniques like anycast RP, anycast source and root node redundancy in MVPN.

Yet MPLS, until recently, was deprived of anycast routing. This is because MPLS is not a pure packet switching technology, but has a control plane based on virtual circuit switching. There is always a Label Switch Path (LSP) with one destination.1 Even if MPLS control plane just follows IGP metrics without any traffic engineering, anycast routing is still not possible.

Segment Routing

SR reinvents the MPLS control plane, making it entirely based on packet switching. Since there are no LSP, and global labels are mapped to IP prefixes, nothing prevents us from assigning the same prefix, and same global label (segment) to multiple routers. Great success.

Fig. 1

Consider figure 1 – R1 and R2 are the ASBR, both advertising routes from the Internet to AS100 which has a BGP-free core. All routers in AS100 run IS-IS with SR extensions. In traditional MPLS designs, R1 and R2 would advertise routes with their respective loopback IP and other routers in AS100 would pick the best exit point based on IGP metrics. This causes multiple problems:

  1. Slow convergence if R1 or R2 fails – RR has to withdraw each BGP prefix and advertise it with the new nexthop.
  2. Suboptimal routing, because the best IGP path for RR is not always the same as for other routers in the AS. One way to solve this is BGP ORR [RFC9107].
  3. Attempting to use BGP add-path to solve these problems will increase memory usage on the RR.

With Segment Routing, it is possible to assign an anycast SID to R1 and R2 and use it as a nexthop when advertising routes throughout the AS. This way, routing always follows IGP metrics, without having to use BGP add-path or ORR. Convergence speed is prefix-independent and depends on IGP timers rather than BGP withdrawal and advertisement. If necessary, convergence speed can be further improved by enabling TI-LFA.

Anycast SID is just a prefix SID without the N flag (in IS-IS or OSPF). The same SID can be assigned to multiple routers.

Note a separate BGP session between R1 and R2 on figure 1. If the eBGP session between R1 and its external peer fails, R1 will keep receiving traffic sent towards the anycast SID, so it must route that traffic towards R2. Anycast SID must not be used as the nexthop for this BGP session. Suboptimal traffic in this case is a drawback of the anycast routing design.

Anycast SID doesn’t have to be the tail-end of the LSP, it can as well be in the middle, allowing SR-TE policies to steer traffic via it, like in the example below.

Large Scale Interconnect

Large Scale Interconnect [RFC8604] is a very scalable SDN design with Segment Routing, an alternative to Seamless MPLS for very large networks. In theory, it can scale almost infinitely because if a million SID is not enough to address all routers, same SID can be reused in different leaf domains. It requires a controller which will see the entire topology (e.g. using BGP-LS or streaming telemetry) and program SR-TE policies on routers.

There are multiple flavours of this design, but usually they involve multiple IGP domains, e.g. multi-level IS-IS with blocked L1->L2 route propagation or multi-instance IS-IS [RFC8202]. The IGP domains can be completely isolated, or a few selected core (L2) prefixes can be redistributed into leaf (L1) domains. Anycast routing also comes in handy here.

Fig. 2

In the example on figure 2, the SDN controller programmed an SR-TE policy on PE1, so that it pushes the following label stack:

  1. Anycast SID of the local ABR
  2. Anycast SID of the remote ABR
  3. Prefix or node SID of the remote PE

If we redistribute core SID into the Leaf IGP domains, label stack PE has to push can be reduced from 3 to 2 labels, but the routing table usage and the amount of IGP flooding in Leaf domains will increase. So there is always a tradeoff in such designs.

IS-IS area proxy

[draft-ietf-lsr-isis-area-proxy] is an alternative solution to the IGP scale problem in large networks. Instead of splitting the network into multiple IGP domains and interconnecting them with BGP or SR-TE policies, area proxy allows to create L1 islands inside L2, and represent them to the rest of the L2 network as just one node. This way, it is possible to reduce the number of nodes in SPF calculations and the amount of required LSP flooding (which can be further reduced by [draft-ietf-lsr-dynamic-flooding]).

Fig. 3

Consider the topology on figure 3. Routers R1-R5 support area proxy and are all configured as L1L2 routers. Sample config (Arista EOS):

router isis 1
   net 49.0001.0001.0001.0001.00
   !
   address-family ipv4 unicast
   !
   area proxy
      net 49.0101.0101.0101.0101.00
      is-hostname AR101
      area segment 101.101.101.101/32 index 101
      router-id ipv4 101.101.101.101
      no shutdown
   !
   segment-routing mpls
      router-id 1.1.1.1
      no shutdown

Links on R1-R4 that are connected to routers outside the proxy area, are configured as L2-only circuits and proxy boundary:

interface Ethernet1
   no switchport
   ip address 10.0.0.1/30
   isis enable 1
   isis circuit-type level-2
   isis area proxy boundary
   isis network point-to-point

PE1-PE3 are oblivious of area proxy and don’t have to support it. R1-R5 elect the area leader which generates a proxy LSP. Routers outside the proxy area are not aware of area 49.0001 and see the entire proxy area as one big router in area 49.0101.

The main benefit of area proxy is reducing the amount of LSP in the area, and the number of routers that are included in SPF runs. The tradeoff is lack of visibility of the entire network topology that link-state IGPs promise – hence difficulty in traffic engineering. Therefore, it makes sense to put routers from one POP or city in the same proxy area, and configure expensive long-haul links as L2 circuits, where traffic engineering is more likely to be required.

Anycast is not strictly required in area proxy designs; in example on figure 3 traffic can travel from PE1 to PE2 with just the respective node SID as the only transport label. But for traffic engineering purposes it is possible to allocate the anycast area SID, owned by all routers in the proxy area.

TI-LFA and anycast SID

Just like with protecting any SID, TI-LFA attempts to steer traffic on the post-convergence path. Depending on topology, it can be the other router sharing the anycast SID.

Fig. 4

On figure 4, R2 and R5 share anycast SID 25. If the R1-R2 link fails, R1 will reroute traffic to that anycast SID to R5, because this is the new post-convergence path.

A bit less obvious example:

Figure-5-TI-LFA-node-protection.png?resize=412%2C396&ssl=1

Fig. 5

On figure 6, node protection is enabled on R1, and a new link between R2 and R6 with lower metric is added. In this case, using anycast SID 25 on the backup path can result in R6 sending traffic to R2 before R6 detects the failure and converges. Therefore, R1 must use node SID of R5 instead of the anycast SID for the backup path.

Check also Topology Dependent LFA where I have written more about not-so-obvious TI-LFA scenarios.

So far it’s all been great, but there is an elephant in the room, and that is the Segment Routing Global Block (SRGB). Best practice is that all routers in the SR domain must use exactly the same SRGB. But there is no standard that would mandate what should be the SRGB range, so in practice every vendor uses a different SRGB by default, letting the operator to decide what to use in a multivendor network.

Segment Routing architecture allows for different SRGB to be used on different routers. When propagating SR extensions in IS-IS, OSPF or BGP,  a router advertises its SRGB base + segment identifiers, so that other routers can calculate label values. But in practice, using different SRGB causes a lot of issues, such as:

  1. Increased operational and troubleshooting complexity without any added benefit
  2. No anycast routing
  3. No TCAM optimizations

Yes, no anycast routing at all. [draft-ietf-spring-mpls-anycast-segments] proposes some solutions for anycast with different SRGB (one of these solutions is a designated SRGB for anycast SID, which should be the same on all routers). But it doesn’t even mention TI-LFA for anycast SID. I don’t know whether any vendor implemented this draft.

MPLS services

Sadly, but so far there are no extensions to MPLS services to support anycast. This means, even if we use Segment Routing as the MPLS transport, L3 VPNs and VPLS are still not going to be anycast-aware and will work in the old way of the end-to-end LSP.

Conclusion

Anycast routing proves yet another time that Segment Routing is superior to legacy MPLS control planes. But there are caveats to anycast designs, like SRGB issues and sometimes suboptimal routing after failures.

References

Notes

  1. ^multipoint LSP can have multiple destinations, but not ANY destination, like in anycast

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK