Community discussions

MikroTik App
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

ERROR: RECV RouteRefresh with invalid subtype: 0

Sat Aug 20, 2022 11:32 am

Hello,

Since I installed the CCR2216 with 7.4.1 (stable) I keep getting the following error with our Equinix IX peering.

RECV RouteRefresh with invalid subtype: 0

It only happens with Equinix Routeservers and it seems to reset the BGP session when it happens (losing 1 second of connectivity).

Our peers with Apple, BGPtools, Cloudflare, Hurricane, Meta over the same IX do not have this issue.

Does anyone know what is causing this? It didn't happen when I was using Mikrotik CHR.

Thank for your help!

[admin@Router] /routing/bgp/connection> print
1 name="peer-to-equinix"
remote.address=xxx.xxx.xxx.xxx/32
local.default-address=xxx.xxx.xxx.xxx .role=ebgp
routing-table=main router-id=xxx.xxx.xxx.xxx as=xxxx address-families=ip cisco-vpls-nlri-len-fmt=auto-bits
output.redistribute=connected,static,bgp .filter-chain=peer-filters .network=peering-networks .no-client-to-client-reflection=yes


[admin@Router] > /routing bgp
[admin@Router] /routing/bgp> session print
8 E name="peer-to-equinix-1"
remote.address=xx.xxx.xxx.xxx .as=24115 .id=xxx.xxx.xxx.xxx .refused-cap-opt=no .capabilities=mp,gr,as4,llgr .hold-time=4m .messages=46562 .bytes=7323827 .gr-time=120 .eor=ip
local.address=xxx.xxx.xxx.xxx .as=xxxx .id=xxx.xxx.xxx.xxx .capabilities=mp,rr,gr,as4 .messages=3 .bytes=90 .eor=""
output.procid=82 .filter-chain=peer-filters .network=peering-networks
input.procid=82 .last-notification=ffffffffffffffffffffffffffffffff001403060305 ebgp
hold-time=3m keepalive-time=1m uptime=1m21s760ms
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 24, 2022 10:02 am

Mikrotik support replied and said:

"Thank you for the report, it is a known problem and will be fixed in the future"...

So i'll have to disable my peering until its fixed :(
 
pe1chl
Forum Guru
Forum Guru
Posts: 9029
Joined: Mon Jun 08, 2015 12:09 pm

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 24, 2022 12:03 pm

It already is a known problem in v7 for well over a year, so don't hold your breath waiting for "future"!
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 24, 2022 3:51 pm

I spent many hours and nights setting it up based on it being promoted as a BGP router.... and I have to turn BGP off and attempt to wind it back to the old router.
 
pe1chl
Forum Guru
Forum Guru
Posts: 9029
Joined: Mon Jun 08, 2015 12:09 pm

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 24, 2022 5:03 pm

I can understand your frustration! BGP used to work quite well in RouterOS v6 but there were some limitations that some people hit.
(e.g. CPU usage with several full internet route feeds, lack of some modern BGP features like RPKI).
For several years, MikroTik had said that they would end all trouble by writing "their own" BGP implementation instead of relying on proven software like bird or quagga, and it would perform much better. Of course we all know that doing such a project from scratch will invariably introduce bugs and going along you will discover that some requested features are difficult to implement within the chosen "better" architecture.
That would not be an issue when there still was a lot of manpower working on resolving those issues, but it appears a bit like the developer of the new BGP has left the company or was assigned other tasks, and the BGP development has come almost to a standstill.
So there we are now: a new RouterOS with a half-broken (and incomplete) BGP, and for lots of people no way to go back to v6 because it does not run on their hardware.
Not a good thing... I am in the same boat! Cannot upgrade routers to v7 due to this, even when it would be better to do it for other reasons.
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Fri Aug 26, 2022 10:04 am

I'm trying to see if Equinix can turn off the RouteRefresh messages on my port because my routes don't change, I only have a single subnet.... fingers crossed!

None of my other peers seem to do the same thing
 
pe1chl
Forum Guru
Forum Guru
Posts: 9029
Joined: Mon Jun 08, 2015 12:09 pm

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Fri Aug 26, 2022 10:52 am

There is probably another issue as well, as this invalid route refresh does not make the connection fail by itself.
When a MikroTik router running v6 is connected to one running v7, and the "refresh all" button is clicked on the v6 router, the same message appears on v7.
It will ignore the refresh but otherwise it will remain connected and routing. Reported that before, but not fixed yet.
However when you have trouble with that peer, it likely sends other things that v7 cannot handle.
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Tue Aug 30, 2022 4:32 pm

Equinix Support performed a packet capture of the incident, here is their response:
Based on the customer PCAP file, it showed that the local device has route refresh enabled. It reflects on the output taken from our route servers when BGP sessions were as Established. As so, it appears that there is a mismatched command as route refresh is confirmed disabled on Equinix route server settings. May we ask that you include this findings when you liaise with your vendor for them to further check? Thank you.
 
eduplant
Member Candidate
Member Candidate
Posts: 122
Joined: Tue Dec 19, 2017 9:45 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 31, 2022 10:06 am

There do seem to be an abnormally high number of weird BGP bugs that I see cropping up for v7 in these forum threads. Every week I refresh "Forwarding Protocols" and there's some new cryptic interoperability or feature breakage being reported. Sometimes it's threads about people not reading the limitations of what's been implemented and what hasn't, but frequently it's just weird brokenness being reported.

I'm fortunate enough to not be using Mikrotik for my day job otherwise this would be starting to look pretty grim in terms of their ability to address it in a timely fashion. There are already some issues that affect my hobbyist use of it (namely not supporting link-local next hop addresses for BGP routes) but I don't envy anyone trying to feature-for-feature validate this against their RouterOS v6 install base.

If it doesn't start being prioritized, there will be an increasing number of these nice new hardware platforms that will be untenable for certain peoples' use cases because they are v7 only.

I am curious as to why Mikrotik chose to reimplement BGP, especially since I thought one of the benefits of moving to v7 was less bespoke networking internals that was keeping them on an old version of the Linux kernel and slowing down their ability to add features to v6. You would imagine that would make it easier to slap BIRD on top of v7 rather than harder.

Has Mikrotik made a public comment about any of this? If they have I haven't seen it.
 
pe1chl
Forum Guru
Forum Guru
Posts: 9029
Joined: Mon Jun 08, 2015 12:09 pm

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Wed Aug 31, 2022 11:56 am

Well, it is of course clear that all the weird issues and limitations (not-yet-implemented features) are the result of them starting from scratch with an own implementation...
Any experienced software project manager or designer or programmer would have predicted this beforehand. It always is more difficult in practice than envisioned before.
The reason for the redesign was that in the original BGP the whole daemon was single-threaded and so on the large CCR routers with 36 or 72 cores the BGP handling would use only a single (comparatively slow) core. Many years ago there were lots of complaints about that from users of the large routers that wanted to run several full route BGP feeds, and it was promised that the holy grail of v7 would solve all their worries by making it multithreaded and generally more efficient.

But that was said of so many other unrelated issues and limitations. Everything would be solved in v7. The v7 releases was pushed forward time after time, and many solutions promised for v7 were eventually introduced in v6. Except that new BGP daemon.
When v7 was finally released it appeared to the user much the same as what v6 had become in the meantime, except the routing (route tables, policy rules, BGP, OSPF) which was restructured and rewritten. That is when the trouble started, there were many subtle bugs and each time we had to pray that a new release was around the corner that fixed some of them.
But after an initial start where some things indeed were fixed (mainly in the multiple route table management), unfortunately it appears that focus at MikroTik has shifted away from this issue and on to other features that are new to v7 and maybe requested by more or by more important customers...

At this time, the progress is so slow that I even fear that the responsible developer(s) for this part of RouterOS have left or have been assigned other tasks.
In a week time we will celebrate the first anniversary of the famous reply by mrz: "BFD is currently work in progress." in the v7.1rc2 topic... viewtopic.php?p=877008#p877008
As far as I understand it, at least the function of BFD as it was present in v6 (which was fine for us) is little more than what the extended "netwatch" function can do as well: send/receive some ping-like packet (in this case over UDP) and take action on persistent failure (close the peer connection immediately). I cannot understand why after a year of "work in progress" that still isn't finished, unless actually nobody is really working on that. I even suggested that the guy who extended "netwatch" adds a BFD mode to his code so we can use that. Maybe that can get some manpower assigned to this dire situation.

But indeed, as you mention, there are other issues. A responsible way would have been to keep the v6 BGP as an optional package that could be used instead of the new one, much as is done with wireless. That would at least allow those that need a working BGP and do not care so much about the new implementation to run v7 on production routers. And that becomes even more important because new models cannot run v6 anymore.
I guess the original BGP in v6 was originally based on BIRD but it has been doctored so much that it would not be a 1-day job to compile it on v7, and therefore it was decided not to pursue that. And of course, when v6 BGP would become available as an option, people would request additional features in that, taking away manpower from the v7 BGP implementation. But MikroTik should understand that "we" cannot wait two years for them to finish their new code...
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Thu Sep 01, 2022 1:00 pm

The image attached shows the latest response from Equinix, I am hoping they can make changes on their end so their BGP session matches what Facebook, Hurricane Electric, Cloudflare, Apple, Anycast Global Backbone, Ripe and bgp.tools do because all of those peers work perfectly with no drops.

Image
You do not have the required permissions to view the files attached to this post.
 
newnet
just joined
Topic Author
Posts: 9
Joined: Fri Aug 19, 2022 10:23 am

Re: ERROR: RECV RouteRefresh with invalid subtype: 0

Fri Sep 09, 2022 1:54 pm

So the peer dropping with Equinix seems to be fixed since Equinix made changes.

The error still appears in the log every 2 hours but the peer stays up now, and is stable.

The issue was fixed by Equinix who use "BIRD".BGP software on their end.

They said "Route refresh is now included in neighbour capabilities and the link has been stable for 17 hours"...

So I am happy for now!

Who is online

Users browsing this forum: bpwl, IPANetEngineer and 5 guests