This article provides the cause and work-around for the timeout error.
When checking the advertised routes to the neighbor with the "show route advertising-protocol bgp <Neighbor IP>" command, it does not provide any output. Instead it throws an error "timeout communicating with routing daemon". For example:
lab@MX480> show route advertising-protocol bgp 2.2.2.2
error: timeout communicating with routing daemon
The following log message is reported when the command is executed:
Dec 17 18:33:51 MX480 mgd[56990]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
Dec 17 18:37:20 MX480 mgd[56990]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
Dec 17 18:39:23 MX480 mgd[56990]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
Dec 17 18:41:45 MX480 mgd[56990]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
Dec 17 18:49:58 MX480 mgd[57160]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
Dec 17 19:28:33 MX480 mgd[56990]: %DAEMON-5-UI_READ_TIMEOUT: Timeout on read of peer 'routing'
A large number of BGP routes exist:
lab@MX480> show route summary
Autonomous system number: 65000
Router ID: 1.1.1.1
inet.0: 848580 destinations, 3391293 routes (848580 active, 5 holddown, 0 hidden)
Direct: 41 routes, 41 active
Local: 41 routes, 41 active
OSPF: 366 routes, 360 active
BGP: 3390821 routes, 848114 active >>>>>>>>>>>>>>>>>>>> Huge number of routes
Static: 22 routes, 22 active
Aggregate: 1 routes, 1 active
LDP: 1 routes, 1 active
inet.3: 308 destinations, 310 routes (308 active, 0 holddown, 0 hidden)
RSVP: 2 routes, 2 active
LDP: 308 routes, 306 active
Test1.inet.0: 7 destinations, 7 routes (7 active, 0 holddown, 0 hidden)
BGP: 5 routes, 5 active
Static: 2 routes, 2 active
Test2.inet.0: 10620 destinations, 17363 routes (10610 active, 0 holddown, 10 hidden)
Direct: 18 routes, 18 active
Local: 18 routes, 18 active
BGP: 17327 routes, 10574 active
Test3.inet.0: 10311 destinations, 16944 routes (10310 active, 0 holddown, 1 hidden)
Direct: 1 routes, 1 active
Local: 1 routes, 1 active
BGP: 16942 routes, 10308 active
mpls.0: 263 destinations, 263 routes (263 active, 0 holddown, 0 hidden)
MPLS: 6 routes, 6 active
LDP: 254 routes, 254 active
VPN: 3 routes, 3 active
bgp.l3vpn.0: 17316 destinations, 17316 routes (17316 active, 0 holddown, 0 hidden)
BGP: 17316 routes, 17316 active
RPD spikes high or to 100%, while checking the advertised routes.
---(refreshed at 2018-12-17 07:25:53 PDT)---
13805 root 1 102 0 4991M 4123M CPU2 2 20.2H 100.00% rpd
---(*more 100%)---
---(refreshed at 2018-12-17 07:25:54 PDT)---
13805 root 1 102 0 4991M 4123M CPU1 1 20.2H 100.00% rpd
---(*more 100%)---
---(refreshed at 2018-12-17 07:25:55 PDT)---
13805 root 1 102 0 4991M 4123M CPU2 2 20.2H 100.00% rpd
---(*more 100%)---
---(refreshed at 2018-12-17 07:25:56 PDT)---
13805 root 1 103 0 4991M 4123M CPU1 1 20.2H 100.00% rpd
---(*more 100%)---
---(refreshed at 2018-12-17 07:25:57 PDT)---
13805 root 1 103 0 4991M 4123M CPU3 3 20.2H 100.00% rpd
Depending on the scale of routes and neighbors and number of routing-instances, the command work sometimes and sometimes it raises the CPU to 100% and fails.
When there is a large number of routes in the inet.0 and there are scaled routing instances, the execution of the command “show route advertised protocol bgp <>” on the RE first parses through the inet.0 table and then to the routing-instances. Therefore, it takes time and CPU increases during this. When RPD CPU spikes while parsing the other tables, the command times out with the error.
In such conditions, the best practice is to use the instance name when querying details about a neighbor in a routing-instance. For example:
lab@MX480> show route advertising-protocol bgp 2.2.2.2 table Test1.inet.0
Test1.inet.0: 10667 destinations, 35400 routes (10665 active, 0 holddown, 2 hidden)
Prefix Nexthop MED Lclpref AS path
* 10.26.93.48/28 Self 1000 ?
* 10.26.93.64/28 Self 1000 ?
* 10.93.233.64/28 Self 1000 ?
* 10.93.233.96/28 Self 1000 ?