Support Support Downloads Knowledge Base Juniper Support Portal Community

Knowledge Base

Search our Knowledge Base sites to find answers to your questions.

Ask All Knowledge Base Sites All Knowledge Base Sites JunosE Defect (KA)Knowledge BaseSecurity AdvisoriesTechnical BulletinsTechnotes Sign in to display secure content and recently viewed articles

[QFX] How to a perform storage health check

0

0

Article ID: KB36853 KB Last Updated: 30 Apr 2021Version: 1.0
Summary:

This article demonstrates how to perform a health check on QFX10002 and QFX10008 device storage units with the Self-Monitoring, Analysis and Reporting Technology (SMART) system. The purpose of SMART is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.

Symptoms:

The QFX10000 system by default has one 50 GB Serial Advanced Technology Attachment (SATA) Solid State Drive (SSD) storage device, and also allows optional installation of either a 50 GB or a 100 GB additional SATA SSD as a secondary boot drive or log storage. You can see the disk(s) as /dev/sda and /dev/sdb (if additional SSD installed) using these commands:

request app-engine host-shell
fdisk -l

Output example:

root@QFX10008-re0> request app-engine host-shell 

root@QFX10008-re0-node:~# fdisk -l

Disk /dev/sda: 46.6 GiB, 50020540416 bytes, 97696368 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 000C0D62-85FA-4A5D-818B-FB788FBE861C

Device           Start          End   Size Type
/dev/sda1         1953       390625 189.8M EFI System
/dev/sda2       390626      2390625 976.6M Microsoft basic data
/dev/sda3      2390626     45779296  20.7G Microsoft basic data
/dev/sda4     45779297     49779296   1.9G Microsoft basic data
/dev/sda5     49779297     90564453  19.5G Microsoft basic data
/dev/sda6     90564454     93169921   1.2G Microsoft basic data
 
Disk /dev/mapper/vg0_vjunos-lv_junos: 16.7 GiB, 17918066688 bytes, 34996224 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/vg0_vjunos-lv_var_third_party: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

In order to determine if the SSD storage unit is running in a healthy state, the smartctl tool can be used to examine the results of hard drive with SMART tests.

Solution:

Smartctl controls the Self-Monitoring, Analysis and Reporting Technology (SMART) system built into most ATA/SATA and SCSI/SAS hard drives and solid-state drives

  1. Use smartctl tool with --scan option to scans for devices and prints each device name, device type and protocol ([ATA] or [SCSI]) info

    root@QFX10008:RE:0%
    root@QFX10008:RE:0% smartctl --scan
    /dev/ada0 -d atacam # /dev/ada0, ATA device
    /dev/ada1 -d atacam # /dev/ada1, ATA device
    /dev/ada2 -d atacam # /dev/ada2, ATA device
    /dev/ada3 -d atacam # /dev/ada3, ATA device
    root@QFX10008:RE:0%
  2. Use smartctl tool with --health option to print the health status of each device.

    If the device reports failing health status, this means either the device has already failed, or it is predicting its own failure within the next 24 hours.

    root@QFX10008:RE:0% smartctl --health /dev/ada0
    smartctl 6.4 2015-06-04 r4109 [FreeBSD JNPR-11.0-20200922.4042921_buil amd64] Junos Build
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
     
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
     
    root@QFX10008:RE:0%
    root@QFX10008:RE:0% smartctl --health /dev/ada1
    smartctl 6.4 2015-06-04 r4109 [FreeBSD JNPR-11.0-20200922.4042921_buil amd64] Junos Build
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
     
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
     
    root@QFX10008:RE:0%
    root@QFX10008:RE:0% smartctl --health /dev/ada2
    smartctl 6.4 2015-06-04 r4109 [FreeBSD JNPR-11.0-20200922.4042921_buil amd64] Junos Build
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
     
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
     
    root@QFX10008:RE:0%
    root@QFX10008:RE:0% smartctl --health /dev/ada3
    smartctl 6.4 2015-06-04 r4109 [FreeBSD JNPR-11.0-20200922.4042921_buil amd64] Junos Build
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
     
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    root@QFX10008:RE:0%
  3. If disk failure happens, use the -a option to get more information, and collect your data off the disk and to someplace safe as soon as you can.

    root@QFX10008:RE:0% smartctl -a /dev/ada3
    smartctl 6.4 2015-06-04 r4109 [FreeBSD JNPR-11.0-20200922.4042921_buil amd64] Junos Build
    Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Device Model:     QEMU HARDDISK
    Serial Number:    QM00004
    Firmware Version: 2.2.0
    User Capacity:    4,294,967,296 bytes [4.29 GB]
    Sector Size:      512 bytes logical/physical
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS 340-2000
    Local Time is:    Tue Apr 20 13:01:00 2021 PDT
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x82) Offline data collection activity
                                            was completed without error.
                                            Auto Offline Data Collection: Enabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                                            without error or no self-test has ever 
                                            been run.
    Total time to complete Offline 
    data collection:                (  288) seconds.
    Offline data collection
    capabilities:                    (0x19) SMART execute Offline immediate.
                                            No Auto Offline data collection support.
                                            Suspend Offline collection upon new
                                            command.
                                            Offline surface scan supported.
                                            Self-test supported.
                                            No Conveyance Self-test supported.
                                            No Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                                            power-saving mode.
                                            Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                                            No General Purpose Logging support.
    Short self-test routine 
    recommended polling time:        (   2) minutes.
    Extended self-test routine
    recommended polling time:        (  54) minutes.
    
    SMART Attributes Data Structure revision number: 1
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x0003   100   100   006    Pre-fail  Always       -       0
      3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       16
      4 Start_Stop_Count        0x0002   100   100   020    Old_age   Always       -       100
      5 Reallocated_Sector_Ct   0x0003   100   100   036    Pre-fail  Always       -       0
      9 Power_On_Hours          0x0003   100   100   000    Pre-fail  Always       -       1
     12 Power_Cycle_Count       0x0003   100   100   000    Pre-fail  Always       -       0
    190 Airflow_Temperature_Cel 0x0003   069   069   050    Pre-fail  Always       -       31 (Min/Max 31/31)
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    Selective Self-tests/Logging not supported
    
    root@QFX10008:RE:0% 
     
Comment on this article > Affected Products Browse the Knowledge Base for more articles related to these product categories. Select a category to begin.

Getting Up and Running with Junos

Getting Up and Running with Junos Security Alerts and Vulnerabilities Product Alerts and Software Release Notices Problem Report (PR) Search Tool EOL Notices and Bulletins JTAC User Guide Customer Care User Guide Pathfinder SRX High Availability Configurator SRX VPN Configurator Training Courses and Videos End User Licence Agreement Global Search