ACI Fabric Discovery Workflow
Initial Setup on APIC1 (via KVM console):
Provide basic configuration details (e.g., fabric name, APIC cluster size, TEP address pool).
APIC1 Starts LLDP on Fabric Ports:
LLDP packets include special information (TLVs) such as the infrastructure (infra) VLAN and indicate the port belongs to an APIC (controller).
Leaf Switch Detects APIC1:
Upon receiving the LLDP packets from APIC1, the leaf switch configures the infra VLAN on any port that connects to an APIC.
Leaf Switch Sends DHCP Discovers:
Now that the infra VLAN is known, the leaf switch can send out DHCP Discover messages to obtain its TEP IP address.
Register Leaf in APIC1:
Using APIC1’s out-of-band (OOB) IP, log in via HTTPS.
Go to the Fabric Membership section and register the newly discovered leaf.
Leaf Receives TEP IP Address:
Once assigned a Node ID, APIC1 responds with a TEP IP from the configured pool, and the DHCP process completes.
Leaf Relays DHCP for Connected Spines:
When spines connect to this leaf (discovered by LLDP), the leaf relays the spines’ DHCP requests to APIC1.
Register and Assign IPs to Spines:
Spines appear in Fabric Membership as they’re discovered.
Once they’re registered, APIC1 assigns TEP IPs to each spine.
Spines Relay DHCP for Other Nodes:
Spines forward DHCP traffic for remaining leaf nodes in the pod (assuming a full mesh between leaves and spines).
Other APICs Join the Fabric:
Leaf nodes connected to APIC2 and APIC3 are similarly discovered and registered.
Complete setup dialogs on APIC2 and APIC3.
Cluster Formation and Validation:
All APICs form a cluster and communicate over TCP.
Verify they are fully “fit” (healthy), indicating fabric discovery is finished.
show discoveryissues
Below is a simplified summary of each check that appears when you run show discoveryissues on a leaf switch.
leaf101# show discoveryissues
Check01 – System State
Goal: Verify the switch is properly registered and in an “in-service” state.
Possible States:
in-service: The leaf has a Node ID, is registered, and has completed its bootstrap.
out-of-service: The leaf may be unregistered or has not completed bootstrap.
downloading-boot-script: The leaf is still in the process of downloading its initial configuration.
Troubleshooting:
Run moquery -c topSystem to confirm the leaf’s state.
Check if the leaf has a valid Node ID and TEP IP address
Check02 – DHCP Status
Goal: Confirm the leaf has received a TEP IP address via DHCP from the APIC.
Issues:
“Node ID not configured” or “IP not assigned” means the leaf did not get an IP from the APIC.
Troubleshooting:
Use tcpdump -ni kpm_inb port 67 or 68 to see if DHCP requests are being sent/received.
Verify cabling and ensure the APIC is reachable on the infra VLAN.
Check03 – AV Details
Goal: Check internal version or “availability” details related to the APIC fabric domain.
Troubleshooting:
Use acidiag avread to confirm the leaf is part of the correct fabric domain and time synchronization is accurate.
Check04 – IP Reachability to APIC
Goal: Verify the leaf can reach the APIC using its TEP IP address.
Troubleshooting:
Use iping -V overlay-1 <APIC-TEP-IP> (for example, iping -V overlay-1 10.0.0.1) to ensure end-to-end connectivity.
Check05 – Infra VLAN
Goal: Confirm the leaf has discovered the correct infra VLAN via LLDP.
Note:
This will only pass if the leaf is connected to a pod with at least one APIC.
The leaf learns the infra VLAN from the first LLDP packet it receives from another ACI device.
Troubleshooting:
Run moquery -c lldpInst to see the infraVlan field.
Check for “infra-vlan-mismatch” issues with moquery -c lldpIf -f 'lldp.If.wiringIssues!=""'
Check06 – LLDP Adjacency
Goal: Confirm the leaf detects connections to spine switches and APIC via LLDP.
Troubleshooting:
Run show lldp neighbors (or moquery -c lldpAdjEp) to confirm adjacency information.
Ensure physical connections are correct and that LLDP is enabled.
Check07 – Switch Version
Goal: Verify the leaf’s NX-OS version matches (or is compatible with) the APIC version.
Troubleshooting:
Run show version (or vsh -c 'show version') to view the software version on the leaf.
Ensure the APIC and leaf are running compatible versions.
Check08 – FPGA/EPLD/BIOS Out of Sync
Goal: Check if the leaf’s FPGA, EPLD, and BIOS versions align with expected levels.
Symptom: If too far out of date, modules or interfaces may not come online.
Troubleshooting:
Run moquery -c firmwareCardRunning and moquery -c firmwareCompRunning to compare running vs. expected versions.
Update FPGA/EPLD/BIOS if they do not match.
(none)# moquery -c firmwareCardRunning
Total Objects shown: 2
# firmware.CardRunning
biosVer : v07.66(06/11/2019)
childAction :
descr :
dn : sys/ch/supslot-1/sup/running
expectedVer : v07.65(09/04/2018) interimVer : 14.2(1j)
internalLabel :
modTs : never
mode : normal
monPolDn : uni/fabric/monfab-default
operSt : ok
rn : running
status :
ts : 1970-01-01T00:00:00.000+00:00
type : switch
version : 14.2(1j)
# firmware.CardRunning
biosVer : v07.66(06/11/2019)
childAction :
descr :
dn : sys/ch/lcslot-1/lc/running
expectedVer : v07.65(09/04/2018) interimVer : 14.2(1j)
internalLabel :
modTs : never
mode : normal
monPolDn : uni/fabric/monfab-default
operSt : ok
rn : running
status :
ts : 1970-01-01T00:00:00.000+00:00
type : switch
version : 14.2(1j)
(none)# moquery -c firmwareCompRunning
Total Objects shown: 2
# firmware.CompRunning childAction :
descr :
dn : sys/ch/supslot-1/sup/fpga-1/running
expectedVer : 0x14 internalLabel :
modTs : never
mode : normal
monPolDn : uni/fabric/monfab-default
operSt : ok
rn : running
status :
ts : 1970-01-01T00:00:00.000+00:00
type : controller
version : 0x14
# firmware.CompRunning
childAction :
descr :
dn : sys/ch/supslot-1/sup/fpga-2/runnin
expectedVer : 0x4
internalLabel :
modTs : never
mode : normal
monPolDn : uni/fabric/monfab-default
operSt : ok
rn : running
status :
ts : 1970-01-01T00:00:00.000+00:00
type : controller
version : 0x4
Check09 – SSL Check
Goal: Ensure the SSL certificate on the leaf is valid and matches its chassis serial number.
Troubleshooting:
Verify the certificate with:
cd /securedata/ssl
openssl x509 -noout -subject -in server.crt
openssl x509 -noout -dates -in server.crt
Confirm subject= /serialNumber=PID:<Model> SN:<Serial>/CN=<Serial> and valid dates.
Check10 – Downloading Policies
Goal: Check that the leaf has downloaded its policy configuration from the APIC.
Symptom: “Registration to all PM shards is not complete” indicates the leaf has not fully received its configuration.
Troubleshooting:
Run moquery -c pconsBootStrap and check completedPolRes : no status.
Confirm leaf has full IP connectivity and is properly registered.
Check11 – Time
Goal: Display the current switch time for comparison to the APIC’s time.
Symptom: A large time difference can break discovery processes.
Troubleshooting:
On the APIC, run date to check APIC time.
Ensure NTP or manual time settings are correct.
Check12 – Modules, PSU, Fan Check
Goal: Make sure hardware components (modules, power supplies, fans) are online and healthy.
Troubleshooting:
Use show module and show environment commands to verify status.
If a module is down, reseat it or check for version mismatches (FPGA/EPLD/BIOS).
Device Replacement Scenario (Leaf/Spine EPLD/FPGA mismatch, F1582)
If a leaf or spine is replaced and the EPLD/FPGA versions are not correct (or out-of-date), discovery may fail with code F1582
moquery -c faultInst -f 'fault.Inst.code=="F1582"'
leaf101# /bin/check-fpga.sh FpGaDoWnGrAdE
leaf101# /usr/sbin/chassis-power-cycle.sh
EPLD
Programmable Logical Devices (PLDs):
Cisco Nexus 9000 Series ACI-mode switches include multiple PLDs in each module.
PLDs include EPLDs, FPGAs, and CPLDs (but not ASICs).
The term “EPLD” is often used to refer to both FPGAs and CPLDs.
Why EPLDs are useful:
EPLDs let you update certain hardware functionalities by installing new software images.
This avoids the need to replace physical hardware components.
EPLD upgrades and traffic disruption:
Upgrading an I/O module’s EPLD briefly powers down that module.
In a modular chassis, each module is upgraded one at a time, so only that module’s traffic is affected during the upgrade.
EPLD image releases:
Cisco provides the latest EPLD images in each software release.
Often, these images remain the same across releases, but sometimes they are updated.
EPLD updates are not mandatory unless specifically stated, but they are recommended if you have a maintenance window that tolerates downtime.
New hardware functionality introduced by a software upgrade can require a matching EPLD upgrade.
Reasons to upgrade EPLDs while in ACI Mode:
The device needed an EPLD upgrade before being converted from Cisco NX-OS to ACI Boot Mode, but was not upgraded.
A leaf/spine was upgraded manually (not through APIC policy), so its EPLD was skipped.
Once a leaf/spine is in the fabric, a standard policy upgrade from the APIC will automatically upgrade EPLDs.
Simplified upgrade process (from ACI 11.2(1m) onward):
Previously, you might have needed to downgrade first, then upgrade.
Now, you can use two shell scripts:
/bin/check-fpga.sh FpGaDoWnGrAdE
/usr/sbin/chassis-power-cycle.sh
Power cycling vs. software reload:
/usr/sbin/chassis-power-cycle.sh does a hard power reset, unlike a simple software reload.
Removing power completely is necessary to reprogram EPLDs.
If this script isn’t available or doesn’t work, you must physically disconnect power cables for at least 30 seconds, then reconnect them to restore power.
Comments