0

NSX-T东西向路由

 1 year ago
source link: http://just4coding.com/2022/11/22/nsxt-vdrport/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

NSX-T东西向路由

2022-11-222022-11-30 Virtualization

之前的文章<<NSX-T路由逻辑介绍>>主介绍了NSX-T的路由逻辑, 举例介绍的是南北向网络路径, 介绍从逻辑交换机/分段Tire1逻辑路由器, 再到Tire0逻辑路由器的过程.

本文来简要介绍一下两个逻辑交换机之间通过Tire1逻辑路由器通信的东西向路径.

实验拓扑如图:

1.png

虚拟机t1IP为:6.6.100.11, t2IP为:6.6.200.12.

N-VDS或者DVS上的端口以GeneveVNI互相隔离, 因而一个Geneve VNI就决定了一个逻辑交换机/分段. 我的环境的两个逻辑交换机的Geneve VNI如图:

2.png

可以看到ls-geneve-100VNI65537, ls-geneve-200VNI65536.

使用命令net-vdl2 -l查看VNI:

[root@esxi-01:~] net-vdl2 -l
Global States:
Control Plane Out-Of-Sync: No
VXLAN UDP Port: 4789
Geneve UDP Port: 6081
NSX VDS: DSwitch
VDS ID: 50 02 70 16 c2 cd 74 37-fb a6 ff 0b 1b cd 0e ee
MTU: 1600
Segment ID: 10.10.10.0
Transport VLAN ID: 300
VTEP Count: 1
CDO status: enabled (deactivated)
VTEP Interface: vmk10
DVPort ID: b58c174b-a07f-43a6-b0ca-7830de39f50f
Switch Port ID: 67108877
Endpoint ID: 0
VLAN ID: 300
Label: 10292
Uplink Port ID: 2214592537
Is Uplink Port LAG: No
IP: 10.10.10.101
Netmask: 255.255.255.0
Segment ID: 10.10.10.0
GW IP: 10.10.10.1
GW MAC: ff:ff:ff:ff:ff:ff
IP Acquire Timeout: 0
Multicast Group Count: 0
Is DRVTEP: Yes
Network Count: 3
Logical Network: 65538
Routing Domain: 00000000-0000-0000-0000-000000000000
Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
Replication Mode: Source Unicast
Control Plane: Enabled (Multicast Proxy,ARP proxy)
Controller: 10.44.205.85 (up)
MAC Entry Count: 0
ARP Entry Count: 0
Port Count: 1
Logical Network: 65537
Routing Domain: 98334210-1ec6-4176-a718-581908b718c5
Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
Replication Mode: MTEP Unicast
Control Plane: Enabled (Multicast Proxy,ARP proxy)
Controller: 10.44.205.85 (up)
MAC Entry Count: 0
ARP Entry Count: 1
Port Count: 2
Logical Network: 65536
Routing Domain: 98334210-1ec6-4176-a718-581908b718c5
Multicast Routing Domain: 00000000-0000-0000-0000-000000000000
Replication Mode: MTEP Unicast
Control Plane: Enabled (Multicast Proxy,ARP proxy)
Controller: 10.44.205.85 (up)
MAC Entry Count: 0
ARP Entry Count: 0
Port Count: 2
Routing Domain Count: 2
Routing DomainID: 00000000-0000-0000-0000-000000000000
Routing DomainID: 98334210-1ec6-4176-a718-581908b718c5

可以看到所有的逻辑交换机也都位于该虚拟交换机.

ESXi01上查看DVS的端口信息:

[root@esxi-01:~] nsxdp-cli vswitch instance list
DvsPortset-1 (DSwitch) 50 02 70 16 c2 cd 74 37-fb a6 ff 0b 1b cd 0e ee
Total Ports:2560 Available:2540
Client PortID DVPortID MAC Uplink
Management 67108868 00:00:00:00:00:00 n/a
vmnic0 2214592520 10 00:00:00:00:00:00
Shadow of vmnic0 67108873 00:50:56:5c:37:04 n/a
vmk0 67108876 1 00:50:56:b1:59:3e vmnic0
vmk10 67108877 b58c174b-a07f-43a6-b0ca-7830de39f50f 00:50:56:69:15:41 vmnic1
vmk50 67108878 8b2a4724-274f-46d0-a99b-580352399aa9 00:50:56:61:3f:85 void
vdr-vdrPort 67108883 vdrPort 02:50:56:56:44:52 vmnic1
spf-spfPort 67108886 spfPort50027016c2cd7437 02:50:56:56:45:52 vmnic1
vmnic1 2214592537 11 00:00:00:00:00:00
Shadow of vmnic1 67108890 00:50:56:5f:1e:d7 n/a
t1.eth0 67108910 e25a8fa7-0c21-4dae-b252-6d22ef33c1c5 00:50:56:82:70:f0 vmnic1
t3.eth0 67108917 c932ef38-c49f-4e28-8672-6ca34db2b38c 00:50:56:82:a0:05 vmnic1

可以看到, 所有的逻辑交换机端口都接在同一个虚拟交换机上. 逻辑路由器(Logical Router)SR: Service RouterDR: Distributed Router构成。DR分布在相应传输区域传输节点上,SR则部署在Edge节点中。上边交换机端口vdrPortESXi主机上DR实例接到虚拟交换机的端口, 它可以理解为是trunk端口. 所有逻辑交换机的广播域流量都可以从它通过.

需要注意的vdrPortMAC地址在所有传输节点上都是相同的, 默认为02:50:56:56:44:52.

ESXi-01主机上查看DR:

[root@esxi-01:~] nsxcli -c get logical-routers
Tue Nov 22 2022 UTC 03:53:42.083
Logical Routers Summary
------------------------------------------------------------------------------------------
VDR UUID LIF num Route num Max Neighbors Current Neighbors
98334210-1ec6-4176-a718-581908b718c5 2 2 50000 3

接着查看DR的接口信息:

[root@esxi-01:~] nsxcli -c get logical-router 98334210-1ec6-4176-a718-581908b718c5 interfaces
Tue Nov 22 2022 UTC 03:57:19.784
Logical Router Interfaces
---------------------------------------------------------------------------
IPv6 DAD Status Legend: [A: DAD_Sucess], [F: DAD_Duplicate], [T: DAD_Tentative], [U: DAD_Unavailable]

LIF UUID : 39c68523-a185-49a7-9f86-4792e6696a8f
Mode : [b'Routing']
Overlay VNI : 65536
IP/Mask : 6.6.200.1/24
Mac : 02:50:56:56:44:52
Connected DVS : DSwitch
Control plane enable : True
Replication Mode : 0.0.0.1
Multicast Routing : [b'Enabled', b'Oper Down']
State : [b'Enabled']
Flags : 0x80388
DHCP relay : Not enable
DAD-mode : ['LOOSE']
RA-mode : ['UNKNOWN']

LIF UUID : 4adea6ee-5dbf-4ff8-8fa4-6670bb70982f
Mode : [b'Routing']
Overlay VNI : 65537
IP/Mask : 6.6.100.1/24
Mac : 02:50:56:56:44:52
Connected DVS : DSwitch
Control plane enable : True
Replication Mode : 0.0.0.1
Multicast Routing : [b'Enabled', b'Oper Down']
State : [b'Enabled']
Flags : 0x80388
DHCP relay : Not enable
DAD-mode : ['LOOSE']
RA-mode : ['UNKNOWN']

可以看到6.6.100.16.6.200.1两个接口的MAC地址都为:02:50:56:56:44:52.

现在我们来看6.6.100.116.6.200.12的网络路径.

t1上清空ARP信息, 然后ping虚拟机t2. 因为目标IP6.6.200.12不在相同子网内, 会先发送ARP请求来确认网关6.6.100.1MAC地址.

我们在t1.eth0, vdrPortuplink上进行抓包.

只有在t1.eth0端口上抓到ARP请求:

[root@esxi-01:~] pktcap-uw --switchport 67108910 --dir 2 -o - | tcpdump-uw -ner -
The switch port id is 0x0400002e.
pktcap: The output file is -.
pktcap: No server port specifed, select 7799 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 7799.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Accept...
pktcap: Vsock connection from port 1096 cid 2.
11:45:24.879494 00:50:56:82:70:f0 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 6.6.100.1 tell 6.6.100.11, length 46
11:45:24.879541 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype ARP (0x0806), length 60: Reply 6.6.100.1 is-at 02:50:56:56:44:52, length 46

猜测虚拟交换机层面对虚拟子网网关实现了ARP代答, 这样发送向网关的流量导向本机的vdrPort, 尽管各个ESXi主机上的vdrPortMAC地址都相同也不会冲突, 因为这样的ARP请求不会送到其他ESXi主机上.

接下来, 在虚拟机t1上长ping t2, 我们分别在ESXi-01ESXi-02vdrPort上抓包.

在发送方t1所在ESXi-01上的vdrPort, 可以看到两个request包, 但没有reply包:

12:17:25.381567 00:50:56:82:70:f0 > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 11, length 64
12:17:25.381593 02:50:56:56:44:52 > 00:50:56:82:a6:ae, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 11, length 64
12:17:26.382613 00:50:56:82:70:f0 > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 12, length 64
12:17:26.382645 02:50:56:56:44:52 > 00:50:56:82:a6:ae, ethertype IPv4 (0x0800), length 98: 6.6.100.11 > 6.6.200.12: ICMP echo request, id 9652, seq 12, length 64

而在虚拟机t2所在的ESXi-02上的vdrPort, 只有reply包:

12:17:25.588603 00:50:56:82:a6:ae > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 11, length 64
12:17:25.588627 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 11, length 64
12:17:26.590845 00:50:56:82:a6:ae > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 12, length 64
12:17:26.590873 02:50:56:56:44:52 > 00:50:56:82:70:f0, ethertype IPv4 (0x0800), length 98: 6.6.200.12 > 6.6.100.11: ICMP echo reply, id 9652, seq 12, length 64

因而数据包的路由是在数据包发送方主机上的DR实例来实现, 数据包到达目标主机后, 直接解封装送到目标虚拟机.

整体路径如图:

3.png

所有ESXi主机上的vdrPortMAC地址都一致, 且vdrport上可以接收到uplink所连接物理网络的数据包. 一般情况下该MAC地址并不会暴露到物理网络中, 但当虚拟交换机上的某uplink接口down掉, 启用standby uplink时, ESXi会广播发送Reverse ARP向物理交换机宣告这些MAC在该端口下, 这种情况下会导致vdrPortMAC地址暴露到物理网络, 如:

14:53:52.919368 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919379 02:50:56:56:44:52 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 02:50:56:56:44:52 tell 02:50:56:56:44:52, length 46
14:53:52.919397 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919397 00:50:56:6c:e2:6a > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:6c:e2:6a tell 00:50:56:6c:e2:6a, length 46
14:53:52.919406 00:50:56:53:71:23 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 00:50:56:53:71:23 tell 00:50:56:53:71:23, length 46
14:53:52.919409 2c:f0:5d:1d:b0:41 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is 2c:f0:5d:1d:b0:41 tell 2c:f0:5d:1d:b0:41, length 46

当不同的uplink异常, 多台ESXi启用不同的uplink后, 该MAC会暴露在不同的物理交换机端口, 因而交换机可能会告警存在mac-address flapping.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK