top of page

Initial Fabric Setup

Writer's picture: Mukesh ChanderiaMukesh Chanderia

ACI Fabric Discovery Workflow


  1. Initial Setup on APIC1 (via KVM console):

    • Provide basic configuration details (e.g., fabric name, APIC cluster size, TEP address pool).

  2. APIC1 Starts LLDP on Fabric Ports:

    • LLDP packets include special information (TLVs) such as the infrastructure (infra) VLAN and indicate the port belongs to an APIC (controller).

  3. Leaf Switch Detects APIC1:

    • Upon receiving the LLDP packets from APIC1, the leaf switch configures the infra VLAN on any port that connects to an APIC.

  4. Leaf Switch Sends DHCP Discovers:

    • Now that the infra VLAN is known, the leaf switch can send out DHCP Discover messages to obtain its TEP IP address.

  5. Register Leaf in APIC1:

    • Using APIC1’s out-of-band (OOB) IP, log in via HTTPS.

    • Go to the Fabric Membership section and register the newly discovered leaf.

  6. Leaf Receives TEP IP Address:

    • Once assigned a Node ID, APIC1 responds with a TEP IP from the configured pool, and the DHCP process completes.

  7. Leaf Relays DHCP for Connected Spines:

    • When spines connect to this leaf (discovered by LLDP), the leaf relays the spines’ DHCP requests to APIC1.

  8. Register and Assign IPs to Spines:

    • Spines appear in Fabric Membership as they’re discovered.

    • Once they’re registered, APIC1 assigns TEP IPs to each spine.

  9. Spines Relay DHCP for Other Nodes:

    • Spines forward DHCP traffic for remaining leaf nodes in the pod (assuming a full mesh between leaves and spines).

  10. Other APICs Join the Fabric:

  11. Leaf nodes connected to APIC2 and APIC3 are similarly discovered and registered.

  12. Complete setup dialogs on APIC2 and APIC3.

  13. Cluster Formation and Validation:

  14. All APICs form a cluster and communicate over TCP.

  15. Verify they are fully “fit” (healthy), indicating fabric discovery is finished.



show discoveryissues 


Below is a simplified summary of each check that appears when you run show discoveryissues on a leaf switch.


leaf101# show discoveryissues


Check01 – System State


  • Goal: Verify the switch is properly registered and in an “in-service” state.

  • Possible States:

    • in-service: The leaf has a Node ID, is registered, and has completed its bootstrap.

    • out-of-service: The leaf may be unregistered or has not completed bootstrap.

    • downloading-boot-script: The leaf is still in the process of downloading its initial configuration.

  • Troubleshooting:

    1. Run moquery -c topSystem to confirm the leaf’s state.

    2. Check if the leaf has a valid Node ID and TEP IP address


Check02 – DHCP Status


  • Goal: Confirm the leaf has received a TEP IP address via DHCP from the APIC.

  • Issues:

    • “Node ID not configured” or “IP not assigned” means the leaf did not get an IP from the APIC.

  • Troubleshooting:

    1. Use tcpdump -ni kpm_inb port 67 or 68 to see if DHCP requests are being sent/received.

    2. Verify cabling and ensure the APIC is reachable on the infra VLAN.


Check03 – AV Details


  • Goal: Check internal version or “availability” details related to the APIC fabric domain.

  • Troubleshooting:

    1. Use acidiag avread to confirm the leaf is part of the correct fabric domain and time synchronization is accurate.


Check04 – IP Reachability to APIC


  • Goal: Verify the leaf can reach the APIC using its TEP IP address.

  • Troubleshooting:

    1. Use iping -V overlay-1 <APIC-TEP-IP> (for example, iping -V overlay-1 10.0.0.1) to ensure end-to-end connectivity.


Check05 – Infra VLAN


  • Goal: Confirm the leaf has discovered the correct infra VLAN via LLDP.

  • Note:

    • This will only pass if the leaf is connected to a pod with at least one APIC.

    • The leaf learns the infra VLAN from the first LLDP packet it receives from another ACI device.

  • Troubleshooting:

    1. Run moquery -c lldpInst to see the infraVlan field.

    2. Check for “infra-vlan-mismatch” issues with moquery -c lldpIf -f 'lldp.If.wiringIssues!=""'


Check06 – LLDP Adjacency


  • Goal: Confirm the leaf detects connections to spine switches and APIC via LLDP.

  • Troubleshooting:

    1. Run show lldp neighbors (or moquery -c lldpAdjEp) to confirm adjacency information.

    2. Ensure physical connections are correct and that LLDP is enabled.


Check07 – Switch Version


  • Goal: Verify the leaf’s NX-OS version matches (or is compatible with) the APIC version.

  • Troubleshooting:

    1. Run show version (or vsh -c 'show version') to view the software version on the leaf.

    2. Ensure the APIC and leaf are running compatible versions.


Check08 – FPGA/EPLD/BIOS Out of Sync


  • Goal: Check if the leaf’s FPGA, EPLD, and BIOS versions align with expected levels.

  • Symptom: If too far out of date, modules or interfaces may not come online.

  • Troubleshooting:

    1. Run moquery -c firmwareCardRunning and moquery -c firmwareCompRunning to compare running vs. expected versions.

    2. Update FPGA/EPLD/BIOS if they do not match.


(none)# moquery -c firmwareCardRunning


Total Objects shown: 2

# firmware.CardRunning

biosVer : v07.66(06/11/2019)

childAction :

descr :

dn : sys/ch/supslot-1/sup/running

expectedVer : v07.65(09/04/2018) interimVer : 14.2(1j)

internalLabel :

modTs : never

mode : normal

monPolDn : uni/fabric/monfab-default

operSt : ok

rn : running

status :

ts : 1970-01-01T00:00:00.000+00:00

type : switch

version : 14.2(1j)


# firmware.CardRunning

biosVer : v07.66(06/11/2019)

childAction :

descr :

dn : sys/ch/lcslot-1/lc/running

expectedVer : v07.65(09/04/2018) interimVer : 14.2(1j)

internalLabel :

modTs : never

mode : normal

monPolDn : uni/fabric/monfab-default

operSt : ok

rn : running

status :

ts : 1970-01-01T00:00:00.000+00:00

type : switch

version : 14.2(1j)



(none)# moquery -c firmwareCompRunning

Total Objects shown: 2

# firmware.CompRunning childAction :

descr :

dn : sys/ch/supslot-1/sup/fpga-1/running

expectedVer : 0x14 internalLabel :

modTs : never

mode : normal

monPolDn : uni/fabric/monfab-default

operSt : ok

rn : running

status :

ts : 1970-01-01T00:00:00.000+00:00

type : controller

version : 0x14

# firmware.CompRunning

childAction :

descr :

dn : sys/ch/supslot-1/sup/fpga-2/runnin

expectedVer : 0x4

internalLabel :

modTs : never

mode : normal

monPolDn : uni/fabric/monfab-default

operSt : ok

rn : running

status :

ts : 1970-01-01T00:00:00.000+00:00

type : controller

version : 0x4


Check09 – SSL Check

  • Goal: Ensure the SSL certificate on the leaf is valid and matches its chassis serial number.

    Troubleshooting:

    1. Verify the certificate with:


      cd /securedata/ssl

    2. openssl x509 -noout -subject -in server.crt

    3. openssl x509 -noout -dates -in server.crt


      Confirm subject= /serialNumber=PID:<Model> SN:<Serial>/CN=<Serial> and valid dates.



Check10 – Downloading Policies


  • Goal: Check that the leaf has downloaded its policy configuration from the APIC.

  • Symptom: “Registration to all PM shards is not complete” indicates the leaf has not fully received its configuration.

  • Troubleshooting:

    1. Run moquery -c pconsBootStrap and check completedPolRes : no status.

    2. Confirm leaf has full IP connectivity and is properly registered.


Check11 – Time


  • Goal: Display the current switch time for comparison to the APIC’s time.

  • Symptom: A large time difference can break discovery processes.

  • Troubleshooting:

    1. On the APIC, run date to check APIC time.

    2. Ensure NTP or manual time settings are correct.


Check12 – Modules, PSU, Fan Check


  • Goal: Make sure hardware components (modules, power supplies, fans) are online and healthy.

  • Troubleshooting:

    1. Use show module and show environment commands to verify status.

    2. If a module is down, reseat it or check for version mismatches (FPGA/EPLD/BIOS).


Device Replacement Scenario (Leaf/Spine EPLD/FPGA mismatch, F1582)


If a leaf or spine is replaced and the EPLD/FPGA versions are not correct (or out-of-date), discovery may fail with code F1582


moquery -c faultInst -f 'fault.Inst.code=="F1582"'


leaf101# /bin/check-fpga.sh FpGaDoWnGrAdE


leaf101# /usr/sbin/chassis-power-cycle.sh



EPLD


  • Programmable Logical Devices (PLDs):

    • Cisco Nexus 9000 Series ACI-mode switches include multiple PLDs in each module.

    • PLDs include EPLDs, FPGAs, and CPLDs (but not ASICs).

    • The term “EPLD” is often used to refer to both FPGAs and CPLDs.

  • Why EPLDs are useful:

    • EPLDs let you update certain hardware functionalities by installing new software images.

    • This avoids the need to replace physical hardware components.

  • EPLD upgrades and traffic disruption:

    • Upgrading an I/O module’s EPLD briefly powers down that module.

    • In a modular chassis, each module is upgraded one at a time, so only that module’s traffic is affected during the upgrade.

  • EPLD image releases:

    • Cisco provides the latest EPLD images in each software release.

    • Often, these images remain the same across releases, but sometimes they are updated.

    • EPLD updates are not mandatory unless specifically stated, but they are recommended if you have a maintenance window that tolerates downtime.

    • New hardware functionality introduced by a software upgrade can require a matching EPLD upgrade.

  • Reasons to upgrade EPLDs while in ACI Mode:

    1. The device needed an EPLD upgrade before being converted from Cisco NX-OS to ACI Boot Mode, but was not upgraded.

    2. A leaf/spine was upgraded manually (not through APIC policy), so its EPLD was skipped.

    3. Once a leaf/spine is in the fabric, a standard policy upgrade from the APIC will automatically upgrade EPLDs.

  • Simplified upgrade process (from ACI 11.2(1m) onward):

    • Previously, you might have needed to downgrade first, then upgrade.

    • Now, you can use two shell scripts:

      • /bin/check-fpga.sh FpGaDoWnGrAdE

      • /usr/sbin/chassis-power-cycle.sh

  • Power cycling vs. software reload:

    • /usr/sbin/chassis-power-cycle.sh does a hard power reset, unlike a simple software reload.

    • Removing power completely is necessary to reprogram EPLDs.

    • If this script isn’t available or doesn’t work, you must physically disconnect power cables for at least 30 seconds, then reconnect them to restore power.

 
 
 

Recent Posts

See All

MCP (Mis-Cabling Protocol)

How Loops Can Form in the ACI Fabric Incorrect cabling or misconfigurations can cause loops in the Cisco ACI fabric. A loop means there...

Comments


Follow me

© 2021 by Mukesh Chanderia
 

Call

T: 8505812333  

  • Twitter
  • LinkedIn
  • Facebook Clean
©Mukesh Chanderia
bottom of page