How to prepare a standalone high-end SRX Series to join another high-end SRX Series, which is configured for the chassis cluster.
1. If you are replacing an SRX Series in the cluster, power down the node to be removed.
2. Remove pre-empt from the production node, for all redundancy groups and commit.
a. Delete the chassis cluster redundancy-group 1 preempt.
3. Configure new fab0 and fab1 interfaces on the production node (if using affected versions),
as per KB18189 - Replacing Routing Engine (RE) and Chassis of SRX-3400/3600 and SRX-5600/5800 in a Cluster May Result in Node Going to Disabled State. 4. Commit and quit.
5. Prepare the new node for isolated testing (no cable connections).
6. Plug in all the modules in the same slots, as the production node.
7. Power up the new/replacement node.
8. Take a snapshot of the production node:
a. Insert the USB in the RE module (If using SRX3000 Services Gateway, it is in the rear; do not use the USB ports on the switch fabric board (SFB) module in the front).
b. Execute the command:
request system snapshot media usb partition
.
c. When the above command completes, remove the USB memory stick from the production node.
9. Enable the cluster on the replacement node and boot from the USB snapshot:
a. Execute the command:
set chassis cluster cluster-id X node Y
(replace X and Y with appropriate values).
b. You will get a notification to reboot; do not reboot now.
c. Insert the USB in the RE module (if using SRX3000, it is in the rear; do not use the USB ports on the SFB module in the front).
d. Execute the request system reboot media usb command, to boot system up with the USB snapshot.
10. After the system boots up from the USB, restore the snapshot to the node's internal storage:
a. Execute the command:
request system snapshot media compact-flash partition
(
Note: If there is a failure on the compact flash, it boots from the hard disk.)
b. Execute the command:
request system snapshot media hard-disk partition
11. After the snapshot completes, reboot the system from the internal flash:
a. Execute the command:
request system reboot media compact-flash
to boot the system up from the Compact-Flash.(
Note: If there is a failure on the Compact flash, it boots from the hard disk.)
b. Execute the command:
request system reboot media hard-disk
to boot the system up from the Hard-Disk.
c. After the system returns completely online, remove the USB memory stick.
12. Perform a health check on the replacement system, when it is still unplugged from the network:
a. Does
show chassis fpc pic-status
show all the modules online? (do not proceed until all are show online; this will take 5-10 minutes).
b. Does
show system alarms
show no major alarms?
c. Does
show chassis cluster status
show the primary state for this node?
d. Does
show chassis fabric plane
show no errors for the fabric links?
13. The on-site engineer re-introduces the node to the cluster:
a. Power down the replacement node.
b. Completely cable the replacement node to the production network, including control/fab links.
c. Boot up the replacement node.
14. Perform a health check on the replacement system, after it boots up fully cabled to the network:
a. Does
show chassis fpc pic-status
show all the modules online? (do not proceed until all are show online; this will take 5-10 minutes).
b. Does
show system alarms
show no major alarms?
c. Does
show chassis cluster status
show the primary and secondary states, with non-0 priority for both nodes on all redundancy groups?
d. Does
show chassis fabric plane
show no errors for the fabric links?