top of page
Writer's pictureMukesh Chanderia

ACI Multi-Site Orchestrator (MSO) Tshoot - Part 2

Increasing CPU Cycle Reservation for Orchestrator VMs


Cisco ACI Multi-Site Orchestrator VMs require a dedicated amount of CPU cycles to function optimally. While new deployments automatically set the necessary CPU reservations, upgrading from a version prior to Release 2.1(1) requires manual adjustments to each Orchestrator VM's settings.


Why Increase CPU Reservation?


Properly configuring CPU cycle reservations can help resolve or prevent various unpredictable issues, such as:


Delayed GUI Loading: Orchestrator GUI elements may require multiple attempts to load.


Node Status Fluctuations: Nodes might intermittently switch to an "Unknown" status before reverting to "Ready" on their own.


# docker node ls


ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION

t8wl1zoke0vpxdl9fysqu9otb node1 Ready Active Reachable 18.03.0-ce

kyriihdfhy1k1tlggan6e1ahs * node2 Unknown Active Reachable 18.03.0-ce

yburwactxd86dorindmx8b4y1 node3 Ready Active Leader 18.03.0-ce


Heartbeat Failures: Transient misses in the Orchestrator logs may indicate communication issues.



node2 dockerd: [...] level=error msg="agent: session failed" backoff=100ms

error="rpc error: code = Canceled desc = context canceled" module=node/agent [...]

node2 dockerd: [...] level=error msg="heartbeat to manager [...] failed"

error="rpc error: code = Canceled desc = context canceled" [...]


Enabling NTP for Orchestrator Nodes


Clock synchronization is crucial for Orchestrator nodes. Without it, you might encounter issues like random GUI session log-offs due to expired authentication tokens.


Procedure to Enable NTP


1) Log in directly to one of the Orchestrator VMs.


Navigate to the Scripts Directory:



cd /opt/cisco/msc/scripts

Configure NTP Settings:


2) Use the svm-msc-tz-ntp script to set the time zone and enable NTP.


Parameters:

-tz <time-zone>: Specify your time zone (e.g., US/Pacific).

-ne: Enable NTP.

-ns <ntp-server>: Specify your NTP server (e.g., ntp.esl.cisco.com).


Example Command:


./svm-msc-tz-ntp -tz US/Pacific -ne -ns ntp.esl.cisco.com


svm-msc-tz-ntp: Start

svm-msc-tz-ntp: Executing timedatectl set-timezone US/Pacific

svm-msc-tz-ntp: Executing sed -i 's|^server|\# server|' /etc/ntp.conf

svm-msc-tz-ntp: Executing timedatectl set-ntp true

svm-msc-tz-ntp: Sleeping 10 seconds

svm-msc-tz-ntp: Checking NTP status

svm-msc-tz-ntp: Executing ntpstat;ntpq -p

unsynchronised

polling server every 64 s

remote refid st t when poll reach delay offset jitter

==============================================================================

mtv5-ai27-dcm10 .GNSS. 1 u - 64 1 1.581 -0.002 0.030


3) Verify NTP Configuration:


Check NTP Status:


ntpstat; ntpq -p


unsynchronised

polling server every 64 s

remote refid st t when poll reach delay offset jitter

==============================================================================

*mtv5-ai27-dcm10 .GNSS. 1 u 14 64 1 3.522 -0.140 0.128


4) Confirm Date and Time:


date


Mon Jul 8 14:19:26 PDT 2019


Repeat for All Orchestrator Nodes:


Ensure that each Orchestrator VM undergoes the same NTP configuration process.




Updating DNS for MSO OVA Deployments in VMware ESX


Note: This procedure is only for MSO OVA deployments in VMware ESX. It does not apply to Application Services Engine or Nexus Dashboard deployments.


Procedure to Update DNS

  1. Access the Cluster Node:

    • SSH into one of the cluster nodes using the root user account.

  2. Update DNS Configuration:

    • Use the nmcli command to set the DNS server IP address.

    • Single DNS Server:


    • nmcli connection modify eth0 ipv4.dns "<dns-server-ip>"


    • Multiple DNS Servers:


      nmcli connection modify eth0 ipv4.dns "<dns-server-ip-1> <dns-server-ip-2>"


  3. Restart the Network Interface:

    • Apply the DNS changes by restarting the eth0 interface.

      nmcli connection down eth0 && nmcli connection up eth0


  4. Reboot the Node:

    • Restart the node to ensure all changes take effect.

  5. Repeat for Other Nodes:

    • Perform the same steps on the remaining two cluster nodes.


Restarting Cluster Nodes


Restarting a Single Node Temporarily Down


  1. Restart the Affected Node:

    • Simply restart the node that is down.

    • No additional steps are needed; the cluster will automatically recover.


Restarting Two Nodes Temporarily Down

  1. Backup MongoDB:

    • Before attempting recovery, back up the MongoDB to prevent data loss.

    • Important: Ensure at least two nodes are running to keep the cluster operational.

  2. Restart the Two Affected Nodes:

    • Restart both nodes that are down.

    • No additional steps are needed; the cluster will automatically recover.


Backing Up MongoDB for Cisco ACI Multi-Site


Recommendation: Always back up MongoDB before performing any upgrades or downgrades of the Cisco ACI Multi-Site Orchestrator.


Procedure to Back Up MongoDB


  1. Log In to the Orchestrator VM:

    • Access the Cisco ACI Multi-Site Orchestrator virtual machine.

  2. Run the Backup Script:

    • Execute the backup script to create a backup file.

      ~/msc_scripts/msc_db_backup.sh


    • A backup file named msc_backup_<date+%Y%m%d%H%M>.archive will be created.


  3. Secure the Backup File:

    • Copy the backup file to a safe location for future use.


Restoring MongoDB for Cisco ACI Multi-Site


Procedure to Restore MongoDB


  1. Log In to the Orchestrator VM:

    • Access the Cisco ACI Multi-Site Orchestrator virtual machine.

  2. Transfer the Backup File:

    • Copy your msc_backup_<date+%Y%m%d%H%M>.archive file to the VM.

  3. Run the Restore Script:

    • Execute the restore script to restore the database from the backup file.

      ~/msc_scripts/msc_db_restore.sh

  4. Push the Schemas:

    • After restoring, push the schemas to ensure everything is up to date.

      msc_push_schemas.py



Custom Certificates Troubleshooting


How to resolve common issues when using custom SSL certificates with Cisco ACI Multi-Site Orchestrator.


Unable to Load the Orchestrator GUI


If you can't access the Orchestrator GUI after installing a custom certificate, the issue might be due to incorrect certificate placement on the Orchestrator nodes. Follow these steps to recover the default certificates and reinstall the new ones.


Steps to Recover Default Certificates and Reinstall Custom Certificates

  1. Log In to Each Orchestrator Node:

    • Access each node directly using SSH or your preferred method.

  2. Navigate to the Certificates Directory:

    cd /data/msc/secrets

  3. Restore Default Certificates:

    • Replace the existing certificate files with the backup copies.


    mv msc.key_backup msc.key mv msc.crt_backup msc.crt


  4. Restart the Orchestrator GUI Service:


    docker service update msc_ui --force


  5. Reinstall and Activate the New Certificates:

    • Follow the certificate installation procedure as outlined in previous documentation to ensure the new certificates are correctly applied.


Adding a New Orchestrator Node to the Cluster


When you add a new node to your Multi-Site Orchestrator cluster, ensure the key is activated to maintain cluster security and functionality.


Steps to Add a New Orchestrator Node

  1. Log In to the Orchestrator GUI:

    • Access the GUI using your web browser.

  2. Re-activate the Key:

    • Follow the key activation steps as described in the "Activating Custom Keyring" section to integrate the new node securely into the cluster.


Unable to Install a New Keyring After the Default Keyring Expired


If the default keyring has expired and you can't install a new one, it's likely that the custom keyring wasn't properly installed on the cluster nodes. Follow these steps to delete the old keyring and create a new one.


Steps to Create a New Keyring

  1. Access All Cluster Nodes:

    • SSH into each node in the cluster.

  2. Remove Old Keyring Files:


    cd /data/msc/secrets rm -rf msc.key msc.crt msc.key_backup msc.crt_backup


  3. Generate a New Keyring:

    • Create new key and certificate files using OpenSSL.


    openssl req -newkey rsa:2048 -nodes -keyout msc.key -x509 -days 365 -out msc.crt -subj '/CN=MSC'


  4. Backup the New Keyring Files:


    cp msc.key msc.key_backup cp msc.crt msc.crt_backup


  5. Set Proper Permissions:


    chmod 777 msc.key msc.key_backup msc.crt msc.crt_backup


  6. Force Update the Orchestrator GUI Service:


    docker service update msc_ui --force


  7. Re-install and activate the new certificates




Adding a New Orchestrator Node to the Cluster


If you're adding a new node to your Multi-Site Orchestrator cluster, follow these steps:

  1. Log in to the Orchestrator GUI.

  2. Re-activate the Key:


Unable to Install a New Keyring After the Default Keyring Expired


If you're unable to install a new keyring after the default one expired, it might be because the custom keyring isn't installed on the cluster nodes.

  • Solution: Delete the old default keyring and create a new one using the steps provided in the relevant section.


Replacing a Single Node of the Cluster with a New Node


If one node (e.g., node1) goes down and you need to replace it with a new node, follow these steps:


Step 1: Identify the Down Node

  • On any existing node, get the ID of the node that is down by running:

    # docker node ls


    ID                           HOSTNAME  STATUS  AVAILABILITY   MANAGER STATUS 11624powztg5tl9nlfoubydtp *  node2     Ready   Active         Leader fsrca74nl7byt5jcv93ndebco    node3     Ready   Active         Reachable wnfs9oc687vuusbzd3o7idllw    node1     Down    Active         Unreachable

    • Note the ID of the down node (node1).


Step 2: Demote the Down Node

  • Demote the down node by running:


    docker node demote <node ID>

    • Replace <node ID> with the ID from Step 1.

    • Example:

      bash

      Copy code

      docker node demote wnfs9oc687vuusbzd3o7idllw

    • You'll see a message: Manager <node ID> demoted in the swarm.


Step 3: Remove the Down Node

  • Remove the down node from the swarm:


    docker node rm <node ID>

    • Example:

      bash

      Copy code

      docker node rm wnfs9oc687vuusbzd3o7idllw


Step 4: Navigate to the 'prodha' Directory

  • On any existing node, change to the prodha directory:


    cd /opt/cisco/msc/builds/<build_number>/prodha

    • Replace <build_number> with your actual build number.


Step 5: Obtain the Swarm Join Token

  • Get the token needed to join the swarm:


    docker swarm join-token manager

    • This command will display a join command containing the token and IP address.

    Example Output:


    docker swarm join --token SWMTKN-1-... <IP_address>:2376

    • Note: Copy the entire join command or at least the token and IP address for later use.


Step 6: Note the Leader's IP Address

  • Identify the leader node by running:


    docker node ls

  • On the leader node, get its IP address:


    ifconfig

    • Look for the inet value under the appropriate network interface (e.g., eth0).


    inet 10.23.230.152 netmask 255.255.255.0 broadcast 192.168.99.255

    • Note: The IP address is 10.23.230.152.


Step 7: Prepare the New Node


  • Set the Hostname:

    hostnamectl set-hostname <new_node_name>

    • Replace <new_node_name> with the desired hostname (e.g., node1).


Step 8: Navigate to the 'prodha' Directory on New Node

  • On the new node, change to the prodha directory:


    cd /opt/cisco/msc/builds/<build_number>/prodha


Step 9: Join the New Node to the Swarm

  • Run the join command using the token and leader IP address obtained earlier:


    ./msc_cfg_join.py <token> <leader_IP_address>

    • Replace:

      • <token> with the token from Step 5.

      • <leader_IP_address> with the IP address from Step 6.

    • Example:


      ./msc_cfg_join.py SWMTKN-1-... 10.23.230.152


Step 10: Deploy the Configuration

  • On any node, navigate to the prodha directory:


    cd /opt/cisco/msc/builds/<build_number>/prodha

  • Run the deployment script:


    ./msc_deploy.py

  • Result: All services should be up, and the database replicated.


Replacing Two Existing Nodes of the Cluster with New Nodes


If two nodes are down and you need to replace them, follow these steps:


Before You Begin

  • Important: Since there's a lack of quorum (only one node is up), the Multi-Site Orchestrator won't be available.

  • Recommendation: Back up the MongoDB database before proceeding.


Step 1: Prepare New Nodes

  • Bring Up Two New Nodes:

    • Ensure they are properly set up and connected.

  • Set Unique Hostnames for Each New Node:


    hostnamectl set-hostname <new_node_name>

    • Repeat for both new nodes, using unique names.


Step 2: Remove Down Nodes from Swarm

  • SSH into the Only Live Node.

  • List All Nodes:


    docker node ls

    Example Output:

    mathematica

    Copy code

    ID                            HOSTNAME   STATUS   AVAILABILITY  MANAGER STATUS g3mebdulaed2n0cyywjrtum31     node2      Down     Active        Reachable ucgd7mm2e2divnw9kvm4in7r7     node1      Ready    Active        Leader zjt4dsodu3bff3ipn0dg5h3po *   node3      Down     Active        Reachable


  • Remove Nodes with 'Down' Status:


    docker node rm <node-id>

    • Example:


      docker node rm g3mebdulaed2n0cyywjrtum31 docker node rm zjt4dsodu3bff3ipn0dg5h3po


Step 3: Re-initialize the Docker Swarm

  • Leave the Existing Swarm:


    docker swarm leave --force

  • Navigate to the 'prodha' Directory:


    cd /opt/cisco/msc/builds/<build_number>/prodha

  • Initialize a New Swarm:


    ./msc_cfg_init.py

    • This command will provide a new token and IP address.


Step 4: Join New Nodes to the Swarm

  • On Each New Node:

    • SSH into the Node.

    • Navigate to the 'prodha' Directory:

      cd /opt/cisco/msc/builds/<build_number>/prodha

    • Join the Node to the Swarm:

      ./msc_cfg_join.py <token> <leader_IP_address>

      • Replace:

        • <token> with the token from Step 3.

        • <leader_IP_address> with the IP address of the first node (from Step 3).


Step 5: Deploy the Configuration

  • On Any Node:

    • Navigate to the 'prodha' Directory:

      cd /opt/cisco/msc/builds/<build_number>/prodha

    • Run the Deployment Script:


      ./msc_deploy.py

  • Result: The new cluster should be operational with all services up.



Relocating Multi-Site Nodes to a Different Subnet


When you need to move one or more Multi-Site nodes from one subnet to another—such as spreading nodes across different data centers—you can follow this simplified procedure.


It's important to relocate one node at a time to maintain redundancy during the migration.



Scenario: Relocating node3 from Data Center 1 (subnet 10.1.1.1/24) to Data Center 2 (subnet 11.1.1.1/24).


Steps:

  1. Demote node3 on node1:

    • On node1, run:

      docker node demote node3

  2. Power Down node3:

    • Shut down the virtual machine (VM) for node3.

  3. Remove node3 from the Cluster:

    • On node1, execute:

      docker node rm node3

  4. Deploy a New VM for node3 in Data Center 2:

    • Install the Multi-Site VM (same version as node1 and node2).

    • Configure it with the new IP settings for the 11.1.1.1/24 subnet.

    • Set the hostname to node3.

  5. Power Up node3 and Test Connectivity:

    • Start the new node3 VM.

    • Verify connectivity to node1 and node2:

      ping [node1_IP] ping [node2_IP]

  6. Obtain the Swarm Join Token from node1:

    • On node1, get the join token:

      docker swarm join-token manager

    • Note the provided command and token.

  7. Join node3 to the Swarm:

    • On node3, use the token to join the cluster:

      docker swarm join --token [token] [node1_IP]:2377

  8. Verify Cluster Health:

    • On any node, check the status:

      docker node ls

    • Ensure each node shows:

      • STATUS: Ready

      • AVAILABILITY: Active

      • MANAGER STATUS: One node as Leader, others as Reachable

  9. Update the Swarm Label for node3:

    • On node1, update the label:

      docker node update node3 --label-add msc-node=msc-node3

  10. Check Docker Services Status:

    • On any node, list services:

      docker service ls

    • Confirm that services are running (e.g., REPLICAS show 1/1 or 3/3).

    • Wait up to 15 minutes for synchronization if necessary.

  11. Delete the Original node3 VM:

    • After confirming everything is functioning, remove the old node3 VM from Data Center 1.


By following these steps, you can successfully relocate a Multi-Site node to a different subnet while maintaining cluster integrity and minimizing downtime


6 views0 comments

Recent Posts

See All

Comments


bottom of page