Support Support Downloads Knowledge Base Service Request Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

Resolution Guide - MX - Troubleshoot Fabric Planes in Fault or Check State

0

0

Article ID: KB23173 KB Last Updated: 19 Dec 2012Version: 9.0
Summary:

This article will assist you with troubleshooting MX Fabric Plane related alarms or issues in a step-by-step approach.


Symptoms:

Symptoms:

  • 'Fault' or 'Check' chassis alarms like the following:

user@router> show chassis alarms
2 alarms currently active
Alarm time               Class  Description
2012-03-31 11:06:30 PST  Minor  Check CB 1 Fabric Chip 1
2012-03-31 11:06:30 PST  Major  Fault CB 2 Fabric Chip 0

  • Fabric related log messages like the following:

    SCB related message:

    chassisd[4729]: CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link 32 failed because of crc errors


    DPC related message:

    fpc4 CMXDPC: CRC link error detected for FPC: 4 PFE: 2 fabric plane 0


    MPC related message:

    fpc5 CMTDPC: CRC link error detected for FPC: 5 PFE: 1 fabric plane 6


NOTE:
 The scope of this troubleshooting guide is limited to:
  • MX routers containing MX-SCB and DPC cards
  • MX routers containing MX-SCB and mix of MPC/DPC cards
  • Junos 12.2 and below

Cause:
 
Solution:

These MX960 Switch Fabric diagrams can be used for reference in this guide. For more information on the terminology, specifications, and other platforms, refer to KB23065 - Understanding MX Fabric Planes.

MX960 with SCB and DPC


MX960 with SCB and MPC




Perform the following steps to troubleshoot Fabric Planes in Fault or Check State:

Note: For the flowchart version of these steps, click the flowchart icon:

[Look for Chassis alarms]

Step 1. Run the command 'show chassis alarms'.  Does your MX router have any alarms in the Check or Fault state?

user@router> show chassis alarms
2 alarms currently active
Alarm time               Class  Description
2012-09-31 11:06:30 PST  Minor  Check CB 1 Fabric Chip 1 <-----
2012-09-31 11:06:30 PST  Major  Fault CB 1 Fabric Chip 0 <-----
Like other hardware issues, if there is any issue with Fabric chips on MX routers, then chassis alarms are raised.

In the above output, two alarms are active indicating Fabric Chips in Check and Fault state.
Fabric chip 1 on CB1 is in Check state and Fabric chip 0 on CB1 is in Fault State at the indicated time stamps.

  • Yes - Continue to Step 2

  • No - Search here to find solutions

Note: If you have multiple errors, address one error at a time. Restart this Resolution Guide to address each error.




[Fabric Planes in Check or Fault state?]

Step 2.   Observe the status of the Fabric Planes using the command 'show chassis fabric summary'.

user@router> show chassis fabric summary
Plane      State    Uptime
 0         Online   129 days, 18 hours, 24 minutes, 9 seconds
 1         Online   129 days, 18 hours, 24 minutes, 9 seconds
 2         Check    129 days, 18 hours, 24 minutes, 9 seconds           
 3         Fault    129 days, 18 hours, 24 minutes, 8 seconds
 4         Online   129 days, 18 hours, 24 minutes, 8 seconds
 5         Online   129 days, 18 hours, 24 minutes, 8 seconds

In the above example output of a MX960 router with a mix of DPC/MPC/SCB cards, the Fabric Plane 2 and Fabric Plane 3 are in the Check and Fault state respectively. The output also indicates how long it is up and running.
NOTE also that planes 4 and 5 are online. In MX routers with SCB cards and at least one MPC card, there is a loss of Fabric redundancy.  With a mix of cards, all six planes will be online on a MX960 with no spare; and for MX480/240 all 8 planes will be online and no spare. So use caution while troubleshooting; some commands should be run during a maintenance window.

Are any of the Planes in the Check state or Fault state?




[Fabric or DPC/MPC reporting error?]

Step 3. Analyze 'show log messages' and 'show log chassisd' at the time stamp when the chassis alarms are raised (in Step1), and identify which component is reporting the errors:  Fabric or DPC/MPC?

For example, in Step 1, the alarms were raised at 2012-09-31 11:06:30, so one would look for any Fabric Plane (Fchip for SCB) or DPC/MPC related logs at that timestamp. The log messages will help us know more about the errors associated with the Fabric Planes.

     Example of Fabric reporting errors:    (SCB and Fchip/Plane are part of the Chassis Fabric)

     2012-03-31 11:06:30 chassisd[4729]: CHASSISD_FASIC_HSL_LINK_ERROR: Fchip (CB 0, ID 0): link 32 failed because of crc errors      

     In this log message, Fchip 0 (plane 0) on SCB 0 is reporting crc errors on link32 which is going to some DPC. 

    Example of  DPC / MPC reporting errors:
     DPC:
     2012-03-31 11:06:30 fpc4 CMXDPC: CRC link error detected for FPC: 4 PFE: 2 fabric plane 0

     MPC:
     fpc5 CMTDPC: CRC link error detected for FPC: 5 PFE: 1 fabric plane 6

     In the log messages above, DPC on slot 4 is reporting crc errors on its PFE 2 which is going to Plane 0. MPC on slot5 is reporting crc error on its PFE1 going to plane 6.

Which component is reporting the error?   

  •  DPC / MPC - Continue to Step 4

  •  Fabric - Jump to Step 6


NOTE: When DPC/MPC or Fabric is reporting the error, it does not necessarily mean that the issue is with that component, but rather the issue can be with the component connected to it. So, in the steps that follow, if the DPC/MPC is reporting the error, then we start troubleshooting with the Fabric connected to it, and if the Fabric is reporting the error, then we start troubleshooting with the connected DPC/MPC.




[Steps to do when DPC/MPC is reporting error]

Step 4. Identify which planes are reporting errors in the output of 'show chassis fabric fpcs':

user@router> show chassis fabric fpcs
Jan 17 17:42:24
Fabric management FPC state
FPC 0
  PFE #0
      Plane 0: Link error
      Plane 1: Plane enabled
      Plane 2: Plane enabled
      Plane 3: Plane enabled
      Plane 4: Links ok
      Plane 5: Links ok

Note: In the above output, the number of PFE per FPC will depend on the type of cards (DPC or MPC). For DPC and MPC 16x10GE, it will be 4 PFEs. For MPC type1, it will be 1 PFE. For MPC type2, it will be 2.

Example outputs for errors reported by a Single DPC/MPC and Multiple DPCs/MPCs are shown in KB23382 - Example outputs for DPC reporting errors.  

Are there link errors reported only from a single DPC/MPC or multiple DPCs/MPCs (for the same plane)?



Step 5.  Look for CRC Errors by collecting the HSL2 statistics from all the DPC's/MPC's with the following command:

request pfe execute command "show hsl2 statistics" target fpc <fpc_#>
wait 5-10 seconds

request pfe execute command "show hsl2 statistics" target fpc <fpc_#>

wait 5-10 seconds

request pfe execute command "show hsl2 statistics" target fpc <fpc_#>

wait 5-10 seconds

Take the above output multiple times with at least a 5-10 sec time difference to observe any CRC errors incrementing.




Examples:

From DPC:
In the following output, CRC errors are incrementing on one of the links going to a Fabric Plane on DPC0.

user@router> request pfe execute command "show hsl2 statistics" target fpc0 
                                           Cell Received (last)      CRC Errors 
                                           -------------------------------
ichip_0 channel statistics :
    ichip_0-chan-rx-13 <= remote/unknown   9173956724691 (415830)     24 (0)
    ichip_0-chan-rx-14 <= remote/unknown   9173956724691 (415830)      0 (0)
    ichip_0-chan-rx-16 <= remote/unknown   9173927537621 (416940)      0 (0)
    ichip_0-chan-rx-17 <= remote/unknown   9173927537621 (416940)      0 (0)
    ichip_0-chan-rx-19 <= remote/unknown   9173956724627 (418209)      0 (0)
    ichip_0-chan-rx-20 <= remote/unknown   9173956724627 (418209)      0 (0)
    ichip_0-chan-rx-22 <= remote/unknown   9173927538255 (419449)      0 (0)
    ichip_0-chan-rx-23 <= remote/unknown   9173927538255 (419449)      0 (0)



FROM type2 MPC:

In the following output, CRC errors are incrementing on one of the links going from LUCHIP on MPC0 to the Fabric.

NPC platform (1067Mhz MPC 8548 processor, 2048MB memory, 512KB flash)
NPC0(Router-re1 vty)# show hsl2 statistics
                                           Cell Received (last)      CRC Errors (last)
                                           -------------------------------------------
<SNIP>
LUCHIP(0) channel statistics :
LUCHIP(0)-chan-rx-0 <= MQCHIP(0)-chan-tx-4 60261371309051 (7612705)   6 (6)
LUCHIP(0)-chan-rx-1 <= MQCHIP(0)-chan-tx-5 60261371309051 (7612705)   0 (0)
<SNIP>


FROM (16x10GE) MPC:
In the following output, CRC errors are incrementing on one of the links going from MQCHIP on MPC9 to the Fabric.

NPC9(Router-re1 vty)# show hsl2 statistics
                                           Cell Received (last)       CRC Errors (last)
                                           --------------------------------------------
<SNIP>
LUCHIP(1) channel statistics :
LUCHIP(1)-chan-rx-0 <= MQCHIP(1)-chan-tx-4 156367513250852 (68082509) 0 (0)
LUCHIP(1)-chan-rx-1 <= MQCHIP(1)-chan-tx-5 156367513250852 (68082509) 0 (0)

MQCHIP(1) channel statistics :
MQCHIP(1)-chan-rx-2 <= LUCHIP(1)-chan-tx-2 1863515864 (1863515864)   12 (12)
MQCHIP(1)-chan-rx-3 <= LUCHIP(1)-chan-tx-3 1863515864 (1863515864)    0 (0)
<SNIP>


 

Are the CRC error counters incrementing on any DPC/MPC (FPC) which is connected to the Fabric Plane in the Check state?

  • Yes - There are two sets of subroutines available for further troubleshooting: MX SCB Troubleshooting Subroutine and MX DPC/MPC Troubleshooting Subroutine.
    Note the Plane and DPC/MPC #s with errors from steps 4 and 5 above, and use that information in the subroutines. For example, in the above output Plane 0 and DPC 0 are in question.

    1. First, perform the steps in KB23068 - MX SCB Troubleshooting Subroutine.

    2. If the issue is not resolved, then go to KB23069 - MX DPC/MPC Troubleshooting Subroutine.

    3. If the issue is still not resolved, then include the data from the steps you performed, collect the information in the KB22637 - [M, MX, T Routers] Data Collection Checklist under the 'MX Fabric Plane' section, and open a case with your technical support representative.

  •  No - Open a case with your technical support representative.  Optionally, you can also try to resolve the issue by going to KB23068 - MX SCB Troubleshooting Subroutine first.



[Steps to do when Fabric is reporting error]

Step 6. Review the output of 'show chassis fabric plane' and identify which DPC/MPCs (FPCs) have Link errors:

user@router> show chassis fabric plane
Fabric management PLANE state
Plane 0
  Plane state: ACTIVE
      FPC 0
          PFE 0 :Links ok
          PFE 1 :Links ok
          PFE 2 :Links ok
          PFE 3 :Links ok
      FPC 1       
          PFE 0 :Links ok
          PFE 1 :Link error
          PFE 2 :Links ok
          PFE 3 :Links ok
      FPC 2
          PFE 0 :Links ok
          PFE 1 :Links ok
      FPC 3
          PFE 0 :Links ok
          PFE 1 :Links ok
          PFE 2 :Links ok
          PFE 3 :Links ok

In the above example, Plane 0 is reporting an error on DPC1 (PFE1).
Example outputs of single Plane reporting errors and Multiple Plane reporting errors are shown in KB23384 - Example outputs for Fabric reporting errors.

Do multiple planes report Link errors for same DPC/MPC?




Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Security Alerts and Vulnerabilities

Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search