Prevent NDP neighbor cache expiry - unsolicited neighbor advertisements?

Achtung! Dieser Artikel ist älter als ein Jahr. Der Inhalt ist möglicherweise nicht mehr aktuell!

In my previous post I wrote about my first contact with switched IPv6 network at Netcup and how I dealt with it. Of ocurse only one problem was not enough.

Suddenly unreachable over IPv6

After I added the NDP proxy for my DNS LXC container I’ve tested the reachability over IPv6. The server was responding without any problems. I’ve updated the glue records at my registrar and pointed it towards my new LXC container. This took me a couple of minutes and by pure luck I’ve re-run the dig command towards my DNS server over IPv6. But this time it did did not respond.
I’ve checked that the nameserver software was running and bound to the right port. I’ve checked if I’ve entered the right IPv6 address. I’ve checked the NDP proxy on my host system. Everything was okay. My conclusion was that the traffic is not hitting the server. I’ve fired up tcpdump (tcpdump -i eth0 -nn icmpv6) inside the container and on my laptop before sending out single pings back and forth. I’ve seen the ping from the server in both tcpdumps, but the answer package was not visible on the server. Apparently only incoming traffic was affected.
Then I’ve started a constant ping towards the container from my laptop. The result:

16 bytes from 2001:db8:1337:420::1aa, icmp_seq=4 hlim=53 time=1129.620 ms
16 bytes from 2001:db8:1337:420::1aa, icmp_seq=6 hlim=53 time=19.456 ms
16 bytes from 2001:db8:1337:420::1aa, icmp_seq=6 hlim=53 time=15.190 ms
16 bytes from 2001:db8:1337:420::1aa, icmp_seq=6 hlim=53 time=18.300 ms

The interesting part is that the first ping took over a second. At the same time I saw an NDP Neighbor Solicitation followed by a Neighbor Advertisement for that IP in the tcpdump inside my container. After that the ping went smoothly and querying the DNS was also possible again.
Apparently the Netcup router is loosing the NDP cache entry for that IP address. Switched IPv6 networks ftw!!1.

Sending unsolicited neighbor advertisements

In IPv4 you can send unsolicited ARP aka. gratuitous ARP to populate neighbor caches or update existing entries. This is for example used by a tool called keepalived when a slave system transitions to master state. In my naive approach I thought I could do the same with IPv6 and NDP. So sending unsolicited neighbor advertisements every minute it was.

I’ve downloaded the source code of libndp (🖇️ 🔐) (which contains ndptool), quickly compiled it on my host server and sent out two unsolicited neighbor advertisements (./ndptool -t na -T 2001:db8:1337:420::1aa -i eth0 -U send):

01:14:38.313386 IP6 fe80::1337:420 > ff02::1: ICMP6, neighbor advertisement, tgt is from 2001:db8:1337:420::1aa, length 32
01:14:40.850726 IP6 fe80::1337:420 > ff02::1: ICMP6, neighbor advertisement, tgt is from 2001:db8:1337:420::1aa, length 32

Note to my fellow Arch Linux users: While writing this blog post the libndp package from the extra repository does not provide the -T option. Also you are the best!

I was sure that in between researching online how to send such unsolicited neighbor advertisements the cache entry for the IP address expired. So I started a ping and it took indeed a couple of seconds again. To my surprise I saw a neighbor solicitation with a Neighbor Advertisement:

01:14:46.940820 IP6 fe80::22d8:b00:adee:ff4 > ff02::1:ff00:1af: ICMP6, neighbor solicitation, who has from 2001:db8:1337:420::1aa, length 32
01:14:46.950268 IP6 fe80::1337:420 > fe80::22d8:b00:adee:ff4: ICMP6, neighbor advertisement, tgt is from 2001:db8:1337:420::1aa, length 32

So apparently sending unsolicited neighbor advertisements does not work. I’ve opened RFC 4861 and read section 7.2.6 (🖇️ 🔐) :

In either case, neighboring nodes will immediately change the state of their Neighbor Cache entries for the Target Address to STALE, prompting them to verify the path for reachability

Okay. Not what I actually wanted. But there is more:

If the Override flag is set to one, neighboring nodes will install the new link-layer address in their caches

Cool! The override flag is actually set! But why does it not work then?

Note that because unsolicited Neighbor Advertisements do not reliably update caches in all nodes (the advertisements might not be received by all nodes), they should only be viewed as a performance optimization to quickly update the caches in most neighbors.

Fuck this shit! Sending out multiple unsolicited messages a couple of seconds apart changed nothing. But I observed that after every send from my side a new solicitation comes in. So the routers at Netcup stale the cache entry instead of updating it. So this won’t solve my problem.

The solution: Generate traffic

My DNS server serves a couple of requests over IPv6 every minute. So I’m unsure why the cache entries expire in the first place or why the router takes over a second to send out a solicitation. I’m going to open a support ticket tomorrow. In the meantime I’m pinging Google’s DNS server every minute inside the containers until it responds via a systemd timer:

/etc/systemd/system/ping-google.timer:

[Unit]
Description=Ping google

[Timer]
OnCalendar=minutely

[Install]
WantedBy=basic.target

/etc/systemd/system/ping-google.service:

[Unit]
Description=Ping Google

[Service]
ExecStart=/bin/bash /usr/sbin/ping-google

/etc/systemd/system/ping-google.service:

#!/usr/bin/bash

while ! /usr/bin/ping -c 1 2001:4860:4860::8844
do
	/usr/bin/sleep 1
done

This actually helps to keep inbound IPv6 connectivity. But this is of course not a real solution.

Conclusions

In my eyes there is something wrong with the Netcup routers. I hope their support can help me and solve this problem.

I’m fairly sure that the NDP proxy is not part of the problem. As soon as the neighbor solicitation arrives a neighbor advertisement is sent out without a delay and traffic is flowing. I can clearly see that the delay is caused by the late arrival of the neighbor solicitation. There is a chance that Netcups router sends out the solicitation immediately but the hypervisor delays or drops it.

Netcup also offers an additional routed IPv6 network for round about 1 € per month. No idea if this will solve my problems but flawless working IPv6 is not something I want to pay extra for.

Switched IPv6 networks are a pain in the ass and should be punished.

Update 15. January 2021

I’ve contacted the Netcup Support Team. We have exchanged a couple of mails by now. I’ve sent them traces, configs and a video showing the late arrival of the solicitation. I reminded them that according to their forums this is not happening with a routed prefix.
The plan now is to boot the server into recovery mode for 30 minutes and an engineer checks the situation out. I told them to go for it. But at the end I got very clear: This issue is apparently going away with a routed prefix and I asked that they give me one for free so I can test it out instead if wasting more money on a technician.

Update 06. February 2021

In the meanwhile I’ve closed the ticket. They booted the server into recovery and claim that they are unable to find any problems with IPv6. I’ve no energy left to prove them wrong. I migrated back to Hetzner. I was dumb enough to sign a 12 month contract (billed every 6 months) but they were willing to terminate it end january. I had to reminded them that I’d like to get 5 months worth of money back. I already got the refund receipt but the money has not yet arrived on my bank account. But I’m fairly sure it’ll be on there in no time.
Overall I’d give Netcup 3.5 out of 5 stars. The VPS itself is good I won’t deny it, but without working IPv6 unusable for me.


Du hast einen Kommentar, einen Wunsch oder eine Verbesserung? Schreib mir doch eine E-Mail! Die Infos dazu stehen hier.

🖇️ = Link zu anderer Webseite
🔐 = Webseite nutzt HTTPS (verschlüsselter Transportweg)
Zurück