Community discussions

MikroTik App
 
millenium7
Member
Member
Topic Author
Posts: 448
Joined: Wed Mar 16, 2016 6:12 am

MPLS bugs, had enough

Mon Oct 18, 2021 9:15 am

Here's my last ditch effort to see if anyone has a surefire 100% effective method for making MPLS 'just work' with MikroTik - otherwise i'm ripping it entirely out of our network

99.9% of the time it seems to work perfect, but that 0.1% is just too painful. A link somewhere in our network may go down and then spontaneously routing breaks to certain destinations, trying to run a traceroute from RouterC to RouterJ doesn't even go 1 hop it just outright fails. Trying several other routers in different locations of the network show the same
Everything looks totally fine in the normal routing table and route is 100% definitely there in OSPF
The only fix is to periodically disable MPLS on routers along the path (often isn't the source/destination) until hey presto traffic starts flowing, then I can progressively turn it back on again

Since MikroTik has no 'refresh MPLS forwarding table' option it needs to be forcefully turned off and back on. And this can't be automated via a central monitoring platform because............. those routers might be unreachable because MPLS is broken..........
I've looked over the configs with a fine tooth comb, nothing wrong
If there's something very obvious with a very obvious workaround i'm happy to implement it, but i'm not going to chase my tail with this anymore. It either gets fixed or it gets ripped out entirely, replaced with EoIP tunnels for now and progressively replace routers with something else down the track
 
quantumx
just joined
Posts: 2
Joined: Tue Nov 20, 2018 11:22 pm

Re: MPLS bugs, had enough

Mon Oct 18, 2021 5:42 pm

I have found that local label corruption is often the cause after such link instability. Adding a static label mapping seems to regenerate/refresh all local labels without restarting the router. Adding then removing a static label that matches an existing dynamic label for a local loopback IP address seems to accomplish this in a minimally disruptive way. I created a script that runs regularly on all area MPLS routers periodically, checks for OSPF adjacency changes and adds/removes this label soon after OSPF adjacency comes back up. I forget where, but this method was suggested somewhere on the forum.

Now, the remaining problem is stuck 'unknown' VPLS bridge interfaces, but this seems to be occurring less frequently with 6.48.4+.

If there' s a better option, I'm interested as well. Hopefully ROS7 will be the answer.

Script follows:
#####################################################################
# Forces update of MPLS labels if any OSPF adjacency >30s and <180s
#
# Workaround for LDP label corruption bug
# Script to be run every 2 minutes
#
#               Dec 17, 2020
#

:global trigger 0; 

:foreach ospfNeighborObject in=[/routing ospf neighbor find where state="Full"] do={

    :local time  [/routing ospf neighbor get value-name=adjacency $ospfNeighborObject];
    
    #Convert time to seconds
    :local weekend 0;
    :local dayend 0;
    :local weeks 0;
    :local days 0; 
    
    :if ([:find $time "w" -1] > 0) do={
        :set weekend [:find $time "w" -1];
        :set weeks [:pick $time 0 $weekend];
        :set weekend ($weekend+1);
    };

    :if ([:find $time "d" -1] > 0) do={
        :set dayend [:find $time "d" -1];
        :set days [:pick $time $weekend $dayend];
    };

    :local hms [:pick $time ([:len $time]-8) [:len $time]];
    :local hours [:pick $hms 0 2];
    :local minutes [:pick $hms 3 5];
    :local seconds [:pick $hms 6 8]; 
    :local adjSeconds [($weeks*86400*7+$days*86400+$hours*3600+$minutes*60+$seconds)];
    
    #Decide whether or not to trigger a label reset
    :if ( $adjSeconds < 155 ) do={ 
        :if ( $adjSeconds > 30 ) do={
                :log warning "Set trigger - Resetting MPLS local label bindings due to recent OSPF adjacency change";:set trigger 1; };
    };    
};
  
#Have we been triggered
:if ( $trigger > 0 ) do={

    #Delay 1 - 10 seconds
   :delay  ([{:local upTime [:tostr [/system resource get uptime]]; :local upTimeLen [:len ( $upTime ) ]; :pick $upTime ($upTimeLen - 1) $upTimeLen ; }] + 1);
   
    #Create static labels based on loopbacks
   :foreach addrObject in=[/ip address find where netmask=255.255.255.255] do={ 
        /mpls local-bindings add dst-address=[/ip address get value-name=address $addrObject] label=impl-null
    }; 
        
    #Wait 10 seconds
    :delay 10s;
        
    #Remove static labels
    /mpls local-bindings remove [find where !dynamic];
};
Test carefully.
 
kevinM
just joined
Posts: 18
Joined: Mon Jul 28, 2014 8:57 pm

Re: MPLS bugs, had enough

Mon Oct 18, 2021 7:22 pm

been there done that. we ripped out MPLS due to instability, moved everything to EOIP and now slowly replacing MT with Arista/VXLAN.
 
mducharme
Trainer
Trainer
Posts: 1760
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: MPLS bugs, had enough

Tue Oct 19, 2021 2:28 am

RouterOS v7 is basically completely different as far as how the routing engine is designed, and MPLS is much more integrated into the FIB so that you can see all of the MPLS label information inside the routing table as well. Unfortunately it is also one of the least stable parts of the RouterOS v7 routing engine in rc4 as I can't bring up a VPLS tunnel without a crash. The rest of it is surprisingly reliable - I've been running v7 at home for the past month with no issues, and my setup includes OSPFv2 and OSPFv3.
 
mikeeg02
Member Candidate
Member Candidate
Posts: 162
Joined: Fri Mar 30, 2018 2:28 am
Location: Pennsylvania

Re: MPLS bugs, had enough

Tue Oct 19, 2021 2:35 am

millenium7 :

Did you ever set the hello timers in mpls to match the hello timers in ospf? Thats saved me a lot of grief over the years.
 
ste
Forum Guru
Forum Guru
Posts: 1922
Joined: Sun Feb 13, 2005 11:21 pm

Re: MPLS bugs, had enough

Wed Oct 20, 2021 2:33 pm

millenium7 :

Did you ever set the hello timers in mpls to match the hello timers in ospf? Thats saved me a lot of grief over the years.
But does not help in some cases. Just again had to disable/enable MPLS as OSPF had learned a route but MPLS did not. Even disabling the interface did not help. MPLS sends packets into nowhere while a route exist. MT does not fix these bugs for years now.
 
ste
Forum Guru
Forum Guru
Posts: 1922
Joined: Sun Feb 13, 2005 11:21 pm

Re: MPLS bugs, had enough

Wed Oct 20, 2021 2:35 pm

been there done that. we ripped out MPLS due to instability, moved everything to EOIP and now slowly replacing MT with Arista/VXLAN.
We tried EOIP, too. Was a desaster regarding jitter.
 
mikeeg02
Member Candidate
Member Candidate
Posts: 162
Joined: Fri Mar 30, 2018 2:28 am
Location: Pennsylvania

Re: MPLS bugs, had enough

Wed Oct 20, 2021 3:53 pm

But does not help in some cases. Just again had to disable/enable MPLS as OSPF had learned a route but MPLS did not. Even disabling the interface did not help. MPLS sends packets into nowhere while a route exist. MT does not fix these bugs for years now.
I assume you guys have also limited mapping with the mpls advertise filter?
I tend to bring up the hello timers, because by default they do not match.
Between the mpls advertise filters and timers match, I have eliminated losing mpls sites in my system. I may not be as big a fish as some of you, but I have almost 150 routed mpls remote sites.(not counting customer equipment) Losing connectivity/routes is a big problem for me. (as I am sure it is for everyone)
 
millenium7
Member
Member
Topic Author
Posts: 448
Joined: Wed Mar 16, 2016 6:12 am

Re: MPLS bugs, had enough

Wed Oct 20, 2021 4:23 pm


I assume you guys have also limited mapping with the mpls advertise filter?
I tend to bring up the hello timers, because by default they do not match.
Between the mpls advertise filters and timers match, I have eliminated losing mpls sites in my system. I may not be as big a fish as some of you, but I have almost 150 routed mpls remote sites.(not counting customer equipment) Losing connectivity/routes is a big problem for me. (as I am sure it is for everyone)
In our case no, MPLS uses all the OSPF learned routes
Hello timers are always reduced to 1s for OSPF. But I fail to see how it would have any impact on MPLS/LDP as they are separate processes.
MPLS should simply see any route change then adapt labels accordingly, doesn't matter if its very slightly behind
But sometimes it just flat out breaks for no reason

At this point though i'm not seeing any benefit to keeping MPLS around. The only reason for us was VPLS tunnels to carry PPPoE, however they would sometimes have issues with the MPLS problems. And they would also have issues with re-routing of traffic and the PPPoE tunnels would fail to resume traffic flow and have to time-out and reconnect. This caused havoc with VoIP so we moved to a 'route as close to the customer as possible' approach and have PPPoE terminate on the closest router, thus allowing traffic failover to work without requiring PPPoE to re-establish

MPLS is currently in-place solely to speed up traffic flow (isn't always the case though, RB3011 especially is really bad at MPLS and is actually quite a bit slower) but i'd rather upgrade a few pieces of hardware at key locations than to deal with the problems it presents us. In the few cases we need a tunnel, EoIP will do
 
mikeeg02
Member Candidate
Member Candidate
Posts: 162
Joined: Fri Mar 30, 2018 2:28 am
Location: Pennsylvania

Re: MPLS bugs, had enough

Wed Oct 20, 2021 5:22 pm

In our case no, MPLS uses all the OSPF learned routes
Hello timers are always reduced to 1s for OSPF. But I fail to see how it would have any impact on MPLS/LDP as they are separate processes.
MPLS should simply see any route change then adapt labels accordingly, doesn't matter if its very slightly behind
But sometimes it just flat out breaks for no reason
Well that will definitely fix it, by not attempting anything.

Its been documented on here quite a bit about others having issues with mpls and ospf forwarding tables getting out of sync and often times making the hello timers match fixes the problem. Yes logically it seems like there should just be a delay before the mpls tables repopulate, but that doesnt always seem to be the actual case, they get out of sync with ospf.

As I mentioned before, and you confirmed my suspicion, you're using vpls tunnels. Each tunnel creates a new remote binding in the mpls table by default. So when the tunnel endpoints are not neighbors, you should not advertise that. You can filter that with mpls advertise filters. They are processed in order, and once you are finished you have to remove all ldp neighbors and let them re-establish so it can refresh.

https://wiki.mikrotik.com/wiki/Manual:M ... stribution

These two things have made my equipment stable, I dont have to re-start things to fix issues. Most of my routers have >300 days of uptime since the last time I went through and did a software update. I only have a handful of paths between sites that are actually wired, the rest are all microwave paths, so there are plenty of ospf changes and recalculations to be had. I also run BFDs on every microwave path.

One of my customers uses 20 of my sites for voice and their equipment does 24/7 performance monitoring and reports packet loss and IPDV in 15 minute intervals, so Im not ignorant to problems with passed traffic on my network.
 
ste
Forum Guru
Forum Guru
Posts: 1922
Joined: Sun Feb 13, 2005 11:21 pm

Re: MPLS bugs, had enough

Wed Oct 20, 2021 5:30 pm


I assume you guys have also limited mapping with the mpls advertise filter?
I tend to bring up the hello timers, because by default they do not match.
Between the mpls advertise filters and timers match, I have eliminated losing mpls sites in my system. I may not be as big a fish as some of you, but I have almost 150 routed mpls remote sites.(not counting customer equipment) Losing connectivity/routes is a big problem for me. (as I am sure it is for everyone)
In our case no, MPLS uses all the OSPF learned routes
Hello timers are always reduced to 1s for OSPF. But I fail to see how it would have any impact on MPLS/LDP as they are separate processes.
MPLS should simply see any route change then adapt labels accordingly, doesn't matter if its very slightly behind
But sometimes it just flat out breaks for no reason

At this point though i'm not seeing any benefit to keeping MPLS around. The only reason for us was VPLS tunnels to carry PPPoE, however they would sometimes have issues with the MPLS problems. And they would also have issues with re-routing of traffic and the PPPoE tunnels would fail to resume traffic flow and have to time-out and reconnect. This caused havoc with VoIP so we moved to a 'route as close to the customer as possible' approach and have PPPoE terminate on the closest router, thus allowing traffic failover to work without requiring PPPoE to re-establish

MPLS is currently in-place solely to speed up traffic flow (isn't always the case though, RB3011 especially is really bad at MPLS and is actually quite a bit slower) but i'd rather upgrade a few pieces of hardware at key locations than to deal with the problems it presents us. In the few cases we need a tunnel, EoIP will do
> 'route as close to the customer as possible' approach and have PPPoE terminate on the closest router,

This is the next step we do. Move pppoe-Server to the DSLAMs and then drop MPLS. We do not expect v7/MPLS will be stable enough for at least 2 years.
 
millenium7
Member
Member
Topic Author
Posts: 448
Joined: Wed Mar 16, 2016 6:12 am

Re: MPLS bugs, had enough

Thu Oct 21, 2021 1:25 am

We havnt used VPLS in a while, still get problems with MPLS with regular routing of traffic. So advertising filters have nothing to do with it

Only problem with moving PPPoE closer vs having it aggregated at a central location, is the customer router then can't tell if there's a problem upstream. To their router it appears like the connection is up and working, so if they have a backup internet service it won't kick in. However this can be taken care of with keep alive checks on their side
Oh and you now need firewall rules and/or separate routing tables for your customers on all routers to ensure they can't route to your equipment. So this needs more processing power and slows things down a little bit
Many other benefits to regular routing of traffic though. You can do policy based routing (QoS or force different routing tables for VOIP as it's not encapsulated inside a PPPoE tunnel), brief outages recover much faster as the tunnel doesn't need to go down, traffic resumes the moment OSPF restores, traceroutes indicate actual path with visibility so you can troubleshoot where i.e. packet loss is occuring

I think the benefits are much better than a single concentrated PPPoE location
 
mducharme
Trainer
Trainer
Posts: 1760
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: MPLS bugs, had enough

Thu Oct 21, 2021 3:38 am

We havnt used VPLS in a while, still get problems with MPLS with regular routing of traffic. So advertising filters have nothing to do with it
I don't think you understand - by using advertise filters, you can make it so that only your VPLS traffic has MPLS labels placed on it and nothing else, so all of your regular routing is still routed the old fashioned way without labels. We use advertise filters in this way, to only advertise the loopback that we use for VPLS and nothing else, and only the VPLS tunnel traffic has MPLS labels placed on them. We have always done this and have never had these types of problems. I have never wanted label switching for any traffic except VPLS. If all you want is VPLS, why put labels on things that don't need them?

This also decreases the size of the MPLS forwarding table greatly, which may explain why we have never experienced this issue even with our VPLS tunnels. We do experience an issue with MikroTik OSPF where after an outage and it comes back up, the state will move to "full" without having all the routes in the routing table, and the remaining routes will slowly appear over the next half hour as the LSA timers expire and they periodically reannounce. Often most of the remaining routes suddenly pop back in exactly a half hour later.
 
millenium7
Member
Member
Topic Author
Posts: 448
Joined: Wed Mar 16, 2016 6:12 am

Re: MPLS bugs, had enough

Thu Oct 21, 2021 4:21 am

We havnt used VPLS in a while, still get problems with MPLS with regular routing of traffic. So advertising filters have nothing to do with it
I don't think you understand - by using advertise filters, you can make it so that only your VPLS traffic has MPLS labels placed on it and nothing else, so all of your regular routing is still routed the old fashioned way without labels. We use advertise filters in this way, to only advertise the loopback that we use for VPLS and nothing else, and only the VPLS tunnel traffic has MPLS labels placed on them. We have always done this and have never had these types of problems. I have never wanted label switching for any traffic except VPLS. If all you want is VPLS, why put labels on things that don't need them?
Speed. MPLS doesn't even touch the routing table so traffic carried by MPLS labels tends to gain about 30% more bandwidth (if CPU is a limiting factor) and also reduces latency quite a bit
Most carriers use MPLS internally
 
mducharme
Trainer
Trainer
Posts: 1760
Joined: Tue Jul 19, 2016 6:45 pm
Location: Vancouver, BC, Canada

Re: MPLS bugs, had enough

Thu Oct 21, 2021 4:48 am

Speed. MPLS doesn't even touch the routing table so traffic carried by MPLS labels tends to gain about 30% more bandwidth (if CPU is a limiting factor) and also reduces latency quite a bit
Most carriers use MPLS internally
Yes, I am aware. More bandwidth, but loss of some control since it bypasses the firewall and other such things.

This was in response to what you said earlier: "At this point though i'm not seeing any benefit to keeping MPLS around. The only reason for us was VPLS tunnels to carry PPPoE". If all you need is VPLS tunnels, then use advertise filters, for now at least, instead of removing MPLS entirely.

In our case we deliver all customer services over VPLS tunnels - the only traffic that doesn't go over VPLS is management traffic to the sites, which is not that much traffic and therefore the benefits of placing labels on this traffic are minimal. Even for customers who buy DIA service from us, the subnet is not routed to them. We instead have a virtual "DIA concentrator" that has the default gateway for the DIA customer, and a VPLS pseudowire bridges the port on the CPE mikrotik device to the DIA concentrator at our core. The advantages of doing this are that it simplifies firewalls/ACL's, since you are grouping your customers together in one place by tunneling them all to the central location, and then your actual firewalls at the towers can be much simpler because they don't have to protect against customers getting access to management networks.

I have to say that I am impressed by the way MikroTik is designing their routing protocols in RouterOS v7. The entire system seems to be well thought out and less thrown together with patchwork solutions. The OSPFv3 implementation in RouterOS v6 was not good at all, and didn't properly handle situations described in the RFC, resulting in weird behavior when establishing neighbors with non-MikroTik devices. OSPFv2 in RouterOS v6 is better than the v3 implementation, but our flooding issues show that there are still some serious problems there. Of course I still encounter an occasional bug with OSPF in RouterOS v7, but it feels like they have people on the team now who really know what they are doing as far as designing routing engines. As a result, I am expecting much more out of MPLS in RouterOS v7, when it stabilizes. Using MPLS for everything, as you seem to want to do, will likely be much more of a reality in v7.
 
User avatar
IPANetEngineer
Trainer
Trainer
Posts: 1676
Joined: Fri Aug 10, 2012 6:46 am
Location: iparchitechs.com
Contact:

Re: MPLS bugs, had enough

Thu Oct 21, 2021 10:10 am

Its been documented on here quite a bit about others having issues with mpls and ospf forwarding tables getting out of sync and often times making the hello timers match fixes the problem. Yes logically it seems like there should just be a delay before the mpls tables repopulate, but that doesnt always seem to be the actual case, they get out of sync with ospf.

As I mentioned before, and you confirmed my suspicion, you're using vpls tunnels. Each tunnel creates a new remote binding in the mpls table by default. So when the tunnel endpoints are not neighbors, you should not advertise that. You can filter that with mpls advertise filters. They are processed in order, and once you are finished you have to remove all ldp neighbors and let them re-establish so it can refresh.

https://wiki.mikrotik.com/wiki/Manual:M ... stribution

These two things have made my equipment stable, I dont have to re-start things to fix issues. Most of my routers have >300 days of uptime since the last time I went through and did a software update. I only have a handful of paths between sites that are actually wired, the rest are all microwave paths, so there are plenty of ospf changes and recalculations to be had. I also run BFDs on every microwave path.

Exactly this. We've built some incredibly large MikroTik based MPLS networks and solved the same challenges using these two strategies.

This is exactly why I've been pushing for SR-MPLS with MikroTik in v7 because label distribution is integrated into the IGP instead of the need for LDP to follow OSPF and risk lagging behind.

viewtopic.php?f=1&t=171278&p=837339#p837339
 
mada3k
Long time Member
Long time Member
Posts: 597
Joined: Mon Jul 13, 2015 10:53 am
Location: Sweden

Re: MPLS bugs, had enough

Sat Oct 23, 2021 12:10 pm

Using MPLS/LDP without advertise-filter is very bad practice.

Most carriers use MPLS internally
And real equipment does MPLS forwarding in hardware. Something Mikrotik should try to achieve instead of only HW-L3 routing.
 
glueck05
newbie
Posts: 27
Joined: Fri Jan 26, 2018 12:49 pm

Re: MPLS bugs, had enough

Wed Apr 06, 2022 3:45 pm

Its been documented on here quite a bit about others having issues with mpls and ospf forwarding tables getting out of sync and often times making the hello timers match fixes the problem. Yes logically it seems like there should just be a delay before the mpls tables repopulate, but that doesnt always seem to be the actual case, they get out of sync with ospf.

As I mentioned before, and you confirmed my suspicion, you're using vpls tunnels. Each tunnel creates a new remote binding in the mpls table by default. So when the tunnel endpoints are not neighbors, you should not advertise that. You can filter that with mpls advertise filters. They are processed in order, and once you are finished you have to remove all ldp neighbors and let them re-establish so it can refresh.

https://wiki.mikrotik.com/wiki/Manual:M ... stribution

These two things have made my equipment stable, I dont have to re-start things to fix issues. Most of my routers have >300 days of uptime since the last time I went through and did a software update. I only have a handful of paths between sites that are actually wired, the rest are all microwave paths, so there are plenty of ospf changes and recalculations to be had. I also run BFDs on every microwave path.

Exactly this. We've built some incredibly large MikroTik based MPLS networks and solved the same challenges using these two strategies.

This is exactly why I've been pushing for SR-MPLS with MikroTik in v7 because label distribution is integrated into the IGP instead of the need for LDP to follow OSPF and risk lagging behind.

viewtopic.php?f=1&t=171278&p=837339#p837339
Hello everyone, first of all thanks for all the information. I recreated the whole thing and wanted to clarify for myself what exactly the two strategies/advertisement rules are for a stable and large MPLS network:

/mpls ldp advertise-filter add prefix=0.0.0.0/0 neighbors=[only on tunnel-endpoint-routers, not on LSR/P routers?] advertise=no

/mpls ldp advertise-filter add prefix=[loopback-ip-range from all routers or only from the current ones?] advertise=yes

/mpls ldp advertise-filter add prefix=0.0.0.0/0 advertise=no

thanks,
glueck
 
MitecNick
just joined
Posts: 7
Joined: Thu Oct 16, 2014 9:45 am

Re: MPLS bugs, had enough

Fri Apr 15, 2022 12:55 am

We had the issues described in OP's post and we where able to resolve it by doing a few things

1st Running the same Stable on all routers
2nd Matching hello and dead timers with MPLS
3rd Setting non overlapping label ranges on each router
4th using MPLS advertise filters to only advertise loop backs

Before doing this we would see mpls labels missing and would cause VPLS tunnels show down on only one side, After doing what was listed above we have not had any issues with MPLS/VPLS and it runs super fast and smooth.
 
glueck05
newbie
Posts: 27
Joined: Fri Jan 26, 2018 12:49 pm

Re: MPLS bugs, had enough

Thu Apr 21, 2022 5:26 pm

Hello, thanks for the reply. What timer you change ospf or mpls:
2nd Matching hello and dead timers with MPLS and OSPF
/routing/ospf/interface-template
hello-interval=10 (i would change this timer to=5)
dead-interval=40 (i would change this timer to=15)


/mpls/ldp/interface
hello-interval=5 (according to the documentation 5 is the default)
hold-time=15 (according to the documentation 15 is the default)


A quick question about point 4: Are the advertise filters not yet working in ROS7?

regards,
glueck
Last edited by glueck05 on Thu Apr 21, 2022 5:40 pm, edited 1 time in total.
 
ste
Forum Guru
Forum Guru
Posts: 1922
Joined: Sun Feb 13, 2005 11:21 pm

Re: MPLS bugs, had enough

Thu Apr 21, 2022 5:30 pm

Hello, thanks for the reply. What timer you change ospf or mpls:
2nd Matching hello and dead timers with MPLS and OSPF
/routing/ospf/interface-template
hello-interval=10 (i would change this timer to=5)
dead-interval=40 (i would change this timer to=15)


/mpls/ldp/interface
hello-interval=5 (according to the documentation 5 is the default)
hold-time=15 (according to the documentation 15 is the default)


regards,
glueck
We use 5/20.
 
glueck05
newbie
Posts: 27
Joined: Fri Jan 26, 2018 12:49 pm

Re: MPLS bugs, had enough

Mon Apr 25, 2022 3:56 pm

Thanks ste!

A quick question about point 4: Are the advertise filters not yet working in ROS7?

regards,
glueck

Who is online

Users browsing this forum: No registered users and 7 guests