5

Slow network after adding a second WIFI Access Point

 1 year ago
source link: https://aloiskraus.wordpress.com/2022/11/06/slow-network-after-adding-a-second-wifi-access-point/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Alois Kraus

Performance is everything. But correctness comes first.

Networking is hard and many things can go wrong. I have bought an Oculus Quest VR headset for my daughter. Soon I have found it frustrating to connect it via an USB cable to my PC. To get rid of the cable I have bought a cheap TP-Link Archer C5 AC1200 router which I connect via my LAN cable to the PC. After becoming wireless I can send realtime VR images over my dedicated WIFI to the VR headset with stable 150 MBit/s.

image-3.png?w=1024

The “real” WIFI (yellow) which is connected to the internet is located in another room which is not capable to transfer 150 MBit/s without any interrupts which would kill the VR experience. That is the reason why nearly everyone recommends to use either an USB cable (inconvenient) or a LAN cable (blue) connected to a dedicated WIFI router for the VR headset.

That setup works great, but I have noticed that whenever I have the second router running that browsing the internet would feel very slow. Every page refresh was taking a lot longer compared to when I did turn the second WIFI router off.

Since all browsers have great development tools I just needed to press F12 to see what was blocking my browser requests. In the example below I have tried to access a far far away web site which shows in the network tab for the first request a delay 10,55s for host name resolution (Dns).

image-7.png?w=1024

On my box these timings were always around 10s so I knew that I did suffer from slow Dns queries to resolve the host name e.g. http://www.dilbert.com to the actual IP address. Knowing that page load times are blocked by 10+ seconds by Dns is great, but now what? I was suspecting that, since I have two network tabs in Task Manager, that the DNS query was first sent via the LAN cable to a dead end because my Oculus is not connected to the internet and then via WIFI to the actual internet connected router.

Initially I tried the route print command to see where why packets were routed, but that looked too complicated to configure the order for every local route. After a bit more searching I have found the powershell command Get-NetIPInterface which shows you over which network interface Windows sends network packets to the outside world. The relevant column is InterfaceMetric. If multiple network cards are are considered the network interface with the lowest InterfaceMetric number is tried first.

image-4.png?w=1024

As expected I have found my local Ethernet connection with InterfaceMetric 5 with the preferred connection to send network packets to the outside world. But what should be the other network? The easy way out is to cause some traffic in the browser and check in Task Manager in the Performance Tab which network gets all the traffic:

image-5.png?w=374

I need therefore to change the order vEthernet (New Virtual Switch) with metric 15 and interface id 49 and vEthernet (New Virtual Switch) with metric 25 and interface id 10.

Set-NetIPInterface -InterfaceIndex 10 -InterfaceMetric 5
Set-NetIPInterface -InterfaceIndex 49 -InterfaceMetric 25
image-6.png?w=1024

These two commands did switch the order of the two relevant networks (ifIndex 10 and 49). How fast are things now? Internet surfing feels normal and the change of switched network ordering did save the day. There are still Dns delays but nowhwere in the 10s region. Case solved.

The average reader can stop reading now.

What follows is a deep dive into how Dns works on Windows and how you can diagnose pretty much any Dns issue with the builtin Dns tracing of Windows with the help of ETWAnalyzer.

So far we have just looked at Dns as a black box, but you can trace all Dns requests with the help of ETW (Event Tracing for Windows) and the Microsoft-Windows-DNS-Client ETW provider. The easiest way to record the data is to download a recording profile like Multiprofile.wprp, download the file e.g. via curl from the command line

curl https://raw.githubusercontent.com/Alois-xx/FileWriter/master/MultiProfile.wprp > c:\temp\MultiProfile.wprp

Then start from an Admin command prompt tracing with

wpr -start c:\temp\MultiProfile.wprp!Network

Now surf to some web sites or do your slow use case. After you have found some slowdown you can stop recording with

wpr -start c:\temp\SlowBrowsing_%ComputerName%.etl

Good naming is hard, but if you later want to revisit your recorded files you should try to come up with a descriptive name.

After recording the data you can extract a Json file with ETWAnalyzer by extracting everything (-extract all) or just Dns data with -extract DNS

C:>EtwAnalyzer -extract all -fd c:\temp\SlowBrowsing_Skyrmion.etl -symserver ms 
1 - files found to extract.
Success Extraction of c:\temp\Extract\SlowBrowsing_Skyrmion.json
Extracted 1/1 - Failed 0 files.
Extracted: 1 files in 00 00:02:40, Failed Files 0

ETWAnalyzer supports many different queries for e.g. Disk, File, CPU and other things. Since 2.4.0.0 support for dumping Dns Queries was added. To show the aggregated top 5 slowest Dns queries you just need -fd (File or Directory) and the file name of the extracted data and -dump Dns to dump all Dns queries. To select the top 5 add -topn 5 and you get this output per process printed:

C:\temp\Extract>EtwAnalyzer -dump DNS -topn 5 -fd SlowBrowsing_Skyrmion.json   
02.10.2022 17:47:32    SlowBrowsing_Skyrmion 
NonOverlapping       Min        Max  Count TimeOut DNS Query                                                             
       Total s         s          s      #
OVRServer_x64.exe(6624) 124 128
      12,038 s   0,000 s   12,038 s      4       1 graph.oculus.com                                                      
backgroundTaskHost.exe(6692)  -ServerName:BackgroundTaskHost.WebAccountProvider
      12,043 s   0,000 s   12,043 s      2       1 login.microsoftonline.com                                             
firefox.exe(13612) 
      12,047 s   0,021 s   12,047 s      2       1 d21lj84g4rjzla.cloudfront.net                                         
      12,052 s   0,000 s   12,052 s      2       1 loadus.exelator.com                                                   
      12,052 s   0,000 s   12,052 s      2       1 cdn.adrtx.net                                                         
Totals: 60,232 s Dns query time for 12 Dns queries

We find that we have indeed some slow Dns queries which all did time out after ca. 12s. The magenta named processes are issuing all of the following DNS queries. To see more we can add -details to see the timing of every slow DNS Query

C:\temp\Extract>EtwAnalyzer -dump DNS -topn 5 -details -fd SlowBrowsing_Skyrmion.json  
02.10.2022 17:47:32    SlowBrowsing_Skyrmion 
NonOverlapping       Min        Max  Count TimeOut DNS Query                                                             
       Total s         s          s      #
      12,038 s   0,000 s   12,038 s      4       1 graph.oculus.com                                                      
        2022-10-02 17:47:38.416 Duration:  0,000 s OVRServer_x64.exe(6624)                       DnsAnswer: 
        2022-10-02 17:47:38.417 Duration:  0,000 s OVRServer_x64.exe(6624)                       DnsAnswer: 
        2022-10-02 17:47:38.417 Duration: 12,038 s OVRServer_x64.exe(6624)                       TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: 2a03:2880:f21c:80c1:face:b00c:0:32c2;::ffff:31.13.92.49
        2022-10-02 17:47:38.417 Duration: 12,038 s OVRServer_x64.exe(6624)                       TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: 2a03:2880:f21c:80c1:face:b00c:0:32c2;::ffff:31.13.92.49
      12,043 s   0,000 s   12,043 s      2       1 login.microsoftonline.com                                             
        2022-10-02 17:47:42.020 Duration:  0,000 s backgroundTaskHost.exe(6692)                  DnsAnswer: 
        2022-10-02 17:47:42.020 Duration: 12,043 s backgroundTaskHost.exe(6692)                  TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:40.126.32.74;::ffff:20.190.160.14;::ffff:40.126.32.134;::ffff:40.126.32.72;::ffff:20.190.160.22;::ffff:40.126.32.136;::ffff:40.126.32.138;::ffff:40.126.32.68
      12,047 s   0,021 s   12,047 s      2       1 d21lj84g4rjzla.cloudfront.net                                         
        2022-10-02 17:47:54.203 Duration:  0,021 s firefox.exe(13612)                            DnsAnswer: 13.227.153.112;13.227.153.94;13.227.153.81;13.227.153.66
        2022-10-02 17:47:54.224 Duration: 12,047 s firefox.exe(13612)                            TimedOut: 192.169.0.1:53 Servers: 0.0.0.0;192.169.0.1;fe80::1 DnsAnswer: 2600:9000:225b:fc00:1e:b6b1:7b80:93a1;2600:9000:225b:3600:1e:b6b1:7b80:93a1;2600:9000:225b:2e00:1e:b6b1:7b80:93a1;2600:9000:225b:f000:1e:b6b1:7b80:93a1;2600:9000:225b:7200:1e:b6b1:7b80:93a1;2600:9000:225b:a400:1e:b6b1:7b80:93a1;2600:9000:225b:9200:1e:b6b1:7b80:93a1;2600:9000:225b:dc00:1e:b6b1:7b80:93a1
      12,052 s   0,000 s   12,052 s      2       1 loadus.exelator.com                                                   
        2022-10-02 17:47:42.091 Duration:  0,000 s firefox.exe(13612)                            DnsAnswer: 
        2022-10-02 17:47:42.091 Duration: 12,052 s firefox.exe(13612)                            TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:54.78.254.47
      12,052 s   0,000 s   12,052 s      2       1 cdn.adrtx.net                                                         
        2022-10-02 17:47:42.091 Duration:  0,000 s firefox.exe(13612)                            DnsAnswer: 
        2022-10-02 17:47:42.091 Duration: 12,052 s firefox.exe(13612)                            TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:34.248.125.130;::ffff:3.248.26.101
Totals: 60,232 s Dns query time for 12 Dns queries

You might ask yourself. What are NonOverlapping Totals? The ETW provider Microsoft-Windows-DNS-Client instruments dnsapi.dll!DnsQueryEx which supports synchronous and async queries. By default an IPV4 and IPV6 query is started whenever when you try to resolve a host name. If both queries need e.g. 10s, and are started at the same time the observed delay by the user is 10s (Non Overlapping time) and not 20s (that would be the sum of both query times). To estimate the observed user delay for each query ETWAnalyzer calculates the non overlapping query time which should much better reflect the actual user delays. The summary time Totals is also taking overlapping DNS query times into account which should be a better metric than a simple sum.

For more details please refer to https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/DumpDNSCommand.md.

The details view shows that every slow timed out query has a red TimedOut Dns server name (192.169.0.1:53). This is the Dns server IP for which a Dns query did time out. By looking at the interfaces with ipconfig we can indeed verify that the new TPLink Router was tried first, did time out and after that the next network (WIFI) was tried which did finally succeed.

If you are in an IPV6 network the Dns server IP might be a generic default Link Local IP which can be the same for several network interfaces. In that case you can add -ShowAdapter to show which network interfaces were tried. That should give you all information to diagnose common Dns issues. Of course you can also use WireShark to directly get your hands on the Dns queries, but with WireShark you will not know which process was waiting for how long. The ETW instrumented DnsQueryEx Api will give you a clear process correlation and also which queries (IPV4, IPV6) were tried. To get all details you can export the data to a CSV file (add -csv xxx.csv) and post process the data further if you need to. To select just one process (e.g. firefox) you can filter by process with -pn firefox which is is the shorthand notation for -processname firefox. For more information on the query syntax of ETWAnalyzer see https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/Filters.md.

image.png?w=1024

If you have just Context switch traces but no DNS Client traces not all is lost. If you suspect DNS delays you can still query the thread wait times of all dns related methods with ETWAnalyzer which also calculates for wait and ready the non overlapping times summed for all threads.

EtwAnalyzer -dump cpu -pn firefox -fd WebBrowsingRouterOn_SKYRMION.10-06-2022.19-46-16 -methods *dns* -fld s s

image-1.png?w=1024

If you have async queries in place, but you know that just one async operation was initiated you can add -fld s s to show the time difference the method was seen first and last in the trace which should give you a good estimate how long that async method was running, if upon completion it was called again (and consumed enough CPU so we could get a sample). For more details please refer to https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/DumpCPUCommand.md.

You can also use one specific method or event as zero point and let ETWAnalyzer calculate first/last timings relative to a specific event which can be useful to calculate e.g. timing of methods relative to a key event like OnLoadClicked or something similar. That advanced feature is explained at https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/ZeroTime.md in much greater detail. It is also possible to correlate traces across machines without synchronized clocks by using different zero times for different files within one query and export the data to a CSV file for further processing.

That was a lot of information, but I hope I have convinced you that you can access previously hard to get information now much easier with ETWAnalyzer. It is not only useful for performance investigations, but it might be also interesting to get an overview which servers were tried to query from your machine for forensic audits.

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK