Runbook: Edge Node Configuration

For edge nodes deployed in the cloud and configured with limited replication, this section offers procedures to help you recover your EC2 instances or Azure cloud VMs in case of failure or accidental termination.

Since cloud providers already provide high availability and replication of storage resources, you might not need the default storage replication provided by the HPE Ezmeral Data Fabric. In an edge-node configuration, you can limit the replication provided by the Data Fabric. For example, instead of the default 3-way replication, you might choose to configure 1 replication for data volumes and 3 replications for system volumes. This can reduce storage costs and improve the efficiency of your Data Fabric public-cloud deployments.

This runbook provides examples to help retain service resilience while maintaining the benefits of an edge node configuration.

AWS EC2 Instances with EBS Volumes

Installer checks:
  1. Set the Delete on Termination policy to No to prevent data loss.
  2. Set Auto assign public IP to Disable to prevent the public IP address from changing after stopping and restarting the instance.
  3. Configure the installer to allow the same private IP address that was used for a terminated instance to a new instance. This enables you to launch a replacement instance in case of accidental termination.
  4. In Network Interface, set Delete on Termination to No so that you can recover the data in case of accidental termination.
  5. You can also add udev rules to the installer.
Examples: The following examples provide procedures for recovering EC2 instances with EBS volumes in case of failure or accidental termination.
  1. Core services go down on the node:
    • If core services (such as Warden) go down on a node, restart core services. The node should begin to function again.
  2. EC2 instance is stopped by mistake:
    • If an EC2 node containing a data volume is stopped, you will no longer be able to read data on that node. Restart the node to fix the issue.
  3. EC2 instance is terminated by mistake:
    1. Note the data volume ID associated with the terminated node.
    2. Launch a new instance with only a root volume and without any attached data volumes. Assign the same private IP that was associated with the terminated node.
    3. Attach the EBS volume that was previously attached to the terminated node to the new node.
    4. Run configure.sh without the -F disk.list arguement.
    5. Write udev rules to access the attached disk using the following commands:
      vim /etc/udev/rules.d/99-custom.rules
      KERNEL=="/dev/xvdb", SUBSYSTEM=="block", MODE="0660",
      GROUP="disk", OWNER="mapr"
    6. Reload the udev rules and then trigger:
      sudo udevadm control --reload-rules
      sudo udevadm trigger
    7. Verify the permissions:
      ls -l /dev/xvdb
    8. Run the following mrconfig commands:
      1. Attach the disk:
        /opt/mapr/server/mrconfig disk load /dev/xvdb
      2. Check the disk list:
        /opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 disk list -v
      3. Check the SP list:
        /opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 sp list -v
        If the SP is offline, set it to online:
        /opt/mapr/server/mrconfig sp online SP1
    9. After a short period, the volume data becomes readable again.

Azure Cloud Virtual Machines with Managed Disks

Installer checks:
  1. Set the Delete with VM policy for disks to No to prevent data loss.
  2. Configure the installer to allow the same private IP address that was used for a terminated instance to a new instance. This enables you to launch a replacement instance in case of accidental termination.
  3. In Network Interface, select Delete Public IP and NIC when VM is deleted so that the same IP address can be assigned to a new VM after termination.
  4. You can also add udev rules to the installer.
Examples: The following examples provide procedures for recovering Azure Cloud VMs with managed disks in case of failure or accidental termination.
  1. Core services go down on the VM:
    • If core services (such as Warden) go down on a VM, restart core services. The VM should begin to function again.
  2. The VM is stopped by mistake:
    • If an Azure VM containing a data volume is stopped, you will no longer be able to read data on that VM. Restart the VM to fix the issue.
  3. The VM is terminated by mistake:
    1. Note the disk name associated with the terminated VM.
    2. Launch a new VM with the OS disk and without any data disks attached. Assign the same private IP that was associated with the terminated VM.
    3. Attach the data disk that was previously attached to the terminated VM to the new VM.
    4. Run configure.sh without the -F disk.list arguement.
    5. Check the disks attached to the VM using lsblk.
    6. Write udev rules to access the attached disk using the following commands:
      vim /etc/udev/rules.d/99-custom.rules
      KERNEL=="/dev/xvdb", SUBSYSTEM=="block", MODE="0660",
      GROUP="disk", OWNER="mapr"
    7. Reload the udev rules and then trigger:
      sudo udevadm control --reload-rules
      sudo udevadm trigger
    8. Verify the permissions:
      ls -l /dev/sdb
    9. Run the following mrconfig commands:
      1. Attach the disk:
        /opt/mapr/server/mrconfig disk load /dev/sdb
      2. Check the disk list:
        /opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 disk list -v
      3. Check the SP list:
        /opt/mapr/server/mrconfig -h 127.0.0.1 -p 5660 sp list -v
        If the SP is offline, set it to online:
        /opt/mapr/server/mrconfig sp online /dev/sdb
    10. After a short period, the volume data becomes readable again.