Support Support Downloads Knowledge Base Case Manager My Juniper Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[Contrail] Spare node scale-in for contrail cloud

0

0

Article ID: KB36763 KB Last Updated: 22 Apr 2021Version: 1.0
Summary:

In some cases, you may need to scale down your Ceph cluster, or even replace a Ceph Storage node (for example, if a Ceph Storage node is faulty). In either situation, you need to disable and rebalance any Ceph Storage node you are removing from the Overcloud to ensure no data loss. This procedure explains the process for removing a Ceph Storage node.

Solution:
  1. Log in to a Controller node as the heat-admin user. The director’s stack user has an SSH key to access the heat-admin user.

     (undercloud) [stack@undercloud ~]$ openstack server list --flavor Controller
     +--------------------------------------+---------------------+--------+------------------------+----------------+------------+
     | ID                                   | Name                | Status | Networks               | Image          | Flavor     |
     +--------------------------------------+---------------------+--------+------------------------+----------------+------------+
     | c06869c4-392e-4a0d-85c2-a6402fcbd3e4 | overcloudvas-ctrl-1 | ACTIVE | ctlplane=192.168.81.18 | overcloud-full | Controller |
     | adcd3bd7-2bce-4d91-aa3a-64a54abc62a1 | overcloudvas-ctrl-2 | ACTIVE | ctlplane=192.168.81.16 | overcloud-full | Controller |
     | ae07208b-49c9-4db3-a0ce-446a47e89abd | overcloudvas-ctrl-0 | ACTIVE | ctlplane=192.168.81.9  | overcloud-full | Controller |
     +--------------------------------------+---------------------+--------+------------------------+----------------+------------+
     (undercloud) [stack@undercloud ~]$ ssh heat-admin@192.168.81.18
     Warning: Permanently added '192.168.81.18' (ECDSA) to the list of known hosts.
     [heat-admin@overcloudvas-ctrl-1 ~]$

     
  2. List the OSD tree and find the OSDs for your node and check ceph status is active and clean.

     [heat-admin@<CTRL node> ~]$ sudo ceph osd tree
     [heat-admin@<CTRL node> ~]$ sudo ceph status

     
  3. Make the Ceph not manage automatically the OSDs on a controller node. Get the initial values for parameters:

     [heat-admin@overcloudh3n-ctrl-0 ~]$ sudo ceph daemon /var/run/ceph/ceph-mon.$(hostname -a).asok config show  | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills"
         "osd_max_backfills": "2",
         "osd_recovery_max_active": "4",
         "osd_recovery_op_priority": "3",

     
  4. Set the flags on the control node:

     sudo ceph osd set noout
     sudo ceph osd set norebalance
     sudo ceph osd set norecover
     sudo ceph osd set noscrub
     sudo ceph osd set nodeep-scrub
     sudo ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
     sudo ceph tell 'osd.*' injectargs '--osd_recovery_max_active  1'
     sudo ceph tell 'osd.*' injectargs '--osd_recovery_op_priority  1'

     
  5. Verify that flags are set:

     [heat-admin@<CTRL node> ~]$ sudo docker exec ceph-mon-$(hostname -s) ceph status
     
  6. The changes to osd_max_backfills, osd_recovery_max_active, and osd_recovery_op_priority are done on the ceph node and not on the controller. So the change need to be verified on the ceph node.

     [heat-admin@<CEPH node> ~]$ sudo ceph daemon /var/run/ceph/ceph-osd.
     ceph-osd.0.asok    ceph-osd.12.asok   ceph-osd.24.asok   ceph-osd.36.asok   ceph-osd.48.asok   ceph-osd.60.asok   ceph-osd.6.asok    ceph-osd.77.asok   ceph-osd.87.asok   
     ceph-osd.101.asok  ceph-osd.18.asok   ceph-osd.30.asok   ceph-osd.42.asok   ceph-osd.54.asok   ceph-osd.66.asok   ceph-osd.72.asok   ceph-osd.81.asok   ceph-osd.98.asok   
     [heat-admin@overcloudh3n-cephstorage12hw11-0 ~]$ sudo ceph daemon /var/run/ceph/ceph-osd.0.asok config show  | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills"
         "osd_max_backfills": "1",
         "osd_recovery_max_active": "1",
         "osd_recovery_op_priority": "1",


    Note: The values on the controllers stay the same, only the values on the OSDs change.
     
  7. Log in to the Ceph Storage node you are removing as the heat-admin user and stop all the OSDs running on it.

     [heat-admin@<CEPH node> ~]$ for i in $(sudo docker ps | grep -i ceph-osd | awk -F 'ceph-osd-' '{print $2}' | sort -n ); do \
     echo "Removing OSD id : $ ";  \
     sudo systemctl disable ceph-osd@$ .service --now; \
     done

     
  8. Verify that all OSDs are now stopped. There should not be any container ceph-osd- running on the system:

     [heat-admin@<CEPH node> ~]$ sudo docker ps
     
  9. Verify the Ceph cluster has changed all its PGs to be active again (with some of them being in the degraded or undersized state also).

     [heat-admin@<CTRL node> ~]$ sudo ceph status
     
  10. Remove all OSDs from the Ceph cluster that are stopped and that are associated with the OSD node to remove.

     [heat-admin@<CTRL node> ~]$ for i in $(sudo docker exec ceph-mon-$(hostname -s) ceph osd ls-tree <OSD node to remove>); do \
     echo "Putting OUT OSD id $ " ; \
     sudo docker exec ceph-mon-$(hostname -s) ceph osd out $ ; \
     done

     
  11. Unset the norecover and norebalance flags:

     [heat-admin@<CTRL node> ~]$ sudo ceph osd unset norecover
     [heat-admin@<CTRL node> ~]$ sudo ceph osd unset norebalance

     
  12. The Ceph cluster starts rebalancing. Wait for this process to complete. Follow the status using the below command until all PGs are in state active+clean:

     [heat-admin@<CTRL node> ~]$ sudo ceph -w
     
  13. Make sure OSDs are safe to be removed without reducing data durability.

     [heat-admin@<CTRL node> ~]$ sudo docker exec ceph-mon-$(hostname -s) ceph osd safe-to-destroy $(sudo docker exec ceph-mon-$(hostname -s) ceph osd ls-tree <OSD node to remove>)
     
  14. Once it is safe to remove the OSD IDs associated with the OSD node to remove (all PGs should be in active+clean state again), remove them permanently:

    [heat-admin@<CTRL node> ~]$ for i in $(sudo docker exec ceph-mon-$(hostname -s) ceph osd ls-tree <OSD node to remove>); do \
    echo "Purging OSD id $ " ; \
    sudo docker exec ceph-mon-$(hostname -s) ceph osd purge $ --yes-i-really-mean-it; \
    done

     
  15. Remove the host from the crush-map bucket:

     [heat-admin@<CTRL node> ~]$ sudo ceph osd crush rm <REMOVED_NODE>
     
  16. Unset the remaining flags:

     [heat-admin@<CTRL node> ~]$ sudo ceph tell 'osd.*' injectargs '--osd_recovery_op_priority  <initial_value>'
     [heat-admin@<CTRL node> ~]$ sudo ceph tell 'osd.*' injectargs '--osd_recovery_max_active  <initial_value>'
     [heat-admin@<CTRL node> ~]$ sudo ceph tell 'osd.*' injectargs '--osd-max-backfills <initial_value>'
     [heat-admin@<CTRL node> ~]$ sudo ceph osd unset nodeep-scrub
     [heat-admin@<CTRL node> ~]$ sudo ceph osd unset noscrub
     [heat-admin@<CTRL node> ~]$ sudo ceph osd unset noout

     
  17. Un-register the storage node by running the below command on the node:

     [heat-admin@<CEPH node> ~]$ sudo subscription-manager remove --all
     [heat-admin@<CEPH node> ~]$ sudo subscription-manager unregister
     [heat-admin@<CEPH node> ~]$ sudo subscription-manager clean

     
  18. Leave the node and return to the undercloud host as the stack user.
     
  19. Disable the Ceph Storage node so the undercloud does not reprovision it.

    (undercloud) [stack@undercloud ~]$ openstack baremetal node list
    (undercloud) [stack@undercloud ~]$ openstack baremetal node maintenance set UUID


    Note: This is not required if the node is removed in the next step.
     
  20. Removing a Ceph Storage node requires an update to the overcloud stack in the director using the local template files.

    First, identify the UUID of the Overcloud stack:
    (undercloud)  [stack@undercloud ~]$ openstack stack list

    Identify the UUIDs of the Ceph Storage node you want to delete:
    (undercloud)  [stack@undercloud ~]$ openstack server list

    Run the following command to delete the node from the stack and update the plan accordingly:
    (undercloud)  [stack@undercloud ~]$ openstack overcloud node delete --stack overcloud NODE_UUID

    Wait until the stack completes its update. Monitor the stack update using the heat stack-list --show-nested.

    Run the following command to delete the node from the ironic:
    (undercloud) [stack@undercloud ~]$ openstack baremetal node delete <baremetal node uuid to be deleted>
     
  21. Log in to a Controller node as the heat-admin user and check the status of the Ceph Storage node.
    For example:

     [heat-admin@<CTRL node> ~]$ sudo ceph status
     [heat-admin@<CTRL node> ~]$ sudo ceph osd tree

     
  22. Remove or revert the changed storage node entry in /var/lib/contrail_cloud/config/storage-nodes.yaml and /var/lib/contrail_cloud/config/inventory.yaml on the undercloud host.
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search