top of page
Writer's pictureMukesh Chanderia

Troubleshooting ACI Constructs

Updated: Sep 19


Troubleshooting Logical Constructs


  1. Check if leaf has required vrf

Leaf# show vrf all


2. Check on VRF if "Policy Control Enforcement Preference" is Enforced (requires contract between EPG - Default) or Unenforced (All EGPs inside VRF can communicate without restriction.)



3. Check vzAny


4. Check BD: it must have VRF & other options i.e. Hardware Proxy / Flood .... L3 Unknown Multicast Flooding is flood or optimized flood & multi destination flooding is set to Flood in BD / Drop / Flood in Encapsulation.



Apart from above also check if "ARP flooding" & "Limit Local IP Learning To BD/EPG Subnet" check.


4. Policy-- L3 configuration: Any custom MAC address or unicast Routing is checked or not.



Next is to check subnet in BD or EPG




To see all subnets on VRF


Leaf # show ip interface brief vrf Sales:Presales_VRF




iping Command


iping -V Tenant1:VRF1 -s 10.0.2.254 10.0.2.1





APIC Troubleshooting


Bond 0 : In-band Maangement


Bond 1 : OOB Management



Leaf1# show lldp neighbours --> You must see APIC


APIC# cat /proc/net/bonding/bond0


APIC# acidiag cluster


APIC# acidiag avread


APIC # acidiag verifyapic


APIC# cluster_health


Change cluster size






Note: If you want to reduce cluster size then first decom the unwanted APIC


If there are three APICs & one of them is down the still replica will be in read write.






Troubleshooting Endpoint Learning




Spine # show coop internal info ip-db | grep -E "address|Vrf"


End Point Retention Policy


Go to BD --> Policy



Data Plane Learning is on VRF



System -->System Settings --> EndPoint Controls


Loop Detection


Rogue EP Control


System -->System Settings --> Fabric-Wide Settings




Disable Remote EP Learning & Enforce subnet check




leaf-a# itraceroute src-ip 10.0.1.1 10.0.1.2 vrf Sales:Presales_VRF encap vlan 11

On leaf-a, issue itraceroute to the APP_VM (10.0.1.2) using the source IP address of the WEB_VM (10.0.1.1)


Access VLAN: Classifies traffic into a security Endpoint Group (EPG)


Bridge Domain (BD) VLAN: Maps L2 traffic into a bridge domain VLAN Network Identifier (VNID)


Flood Domain (FD) VLAN : Leaf locally significant VLAN assigned to an access VLAN within an EGP.


To check ip and mac address of all Endpoints present over vrf


Leaf # show endpoint vrf Tenant1:VRF1 detail



To see vlans provisioned to ports


Leaf # show vlan brief


Leaf # show vlan extended



Leaf # show mac address-table vlan 20



The simple output that provides the connection for a specific FD_VLAN and the access encapsulation VLAN and BD_VLAN can be obtained as in this example for FD_VLAN 20:



The mappings between the access encapsulation and internal VLAN for EPG and EPG to BD_VLAN can also be obtained by connecting to the module using vsh_lc command. Then issue the show system internal eltmc info vlan brief command.



Types of VLANs Used in Cisco ACI:


  1. Encapsulation VLAN (Encap):

    • This is the VLAN or VXLAN used by the Virtual Machine Manager (VMM) for a specific endpoint group.

    • It's the VLAN assigned to the group where your connected devices (endpoints) are.


  2. Bridge Domain VLAN (BD-VLAN):

    • Acts as a bridge connecting multiple Flood Domain VLANs (FD-VLANs) and various hardware or internal VLANs.

    • Helps the Broadcom ASIC (a type of network chip) decide if traffic should stay local or be sent to the NorthStar ASIC for more processing.

    • Combines different local FD-VLANs into one bridge domain.

    • Defines the Layer 2 broadcast area on the Broadcom ASIC, which can include multiple subnets or access encapsulations (ACCESS_ENCs).


  3. Flood Domain VLAN (FD-VLAN):

    • Used by the Broadcom ASIC to forward traffic.

    • Directly linked to ACCESS_ENC and also known as the internal VLAN.

    • Represents the ACCESS_ENC without connecting directly to the BD-VLAN.

    • Allows the BD-VLAN to connect to different ACCESS_ENCs, treating them as if they're all on the same VLAN, like on an NX-OS switch.

    • When a broadcast packet arrives from the ACI fabric, the BD-VLAN can map to several FD-VLANs to send the packet out through different ports using different ACCESS_ENCs.

    • Used to learn the MAC addresses of devices on the network.


  4. Platform Independent VLAN (PI-VLAN):

    • Often seen when you run show commands on the switch.

    • Specific to each individual switch and not the same across all leaf switches in the network.


Troubleshooting L3


To check L3 drops


Tenant --> Operational --> Flows --> L3 drop



To see above information from switch.


Leaf # show logging ip access-list internal packet-log deny | more



Let's say connectivity between two L3 out endpoints has to be verified.


Step 1: See is external host is reachable through L3out


Leaf # iping -V Tenant1:VRF1 -s 10.2.1.1 2.2.2.2


10.2.1.1 --> interface which is being used to make neighborship for L3out


Step 2: show ip route vrf Tenant:VRF


See if subnet is there is routing table on border leaf for external network.


Step 3: check neighbourship



Step 4: Check the interface on which L3out has been created.





Troubleshooting VMware




You can verify that the VM's network adapter is connected to the appropriate network (port group). So, when you are using the WebUI you should ensure that the connected option is checked. When using the client, connected (if the VM is powered on) and connect at power on should be chosen.



The VMs using their virtual NICs (vNIC) can communicate through the virtual switch (vSwitch) on the hypervisor. There are two different types of vSwitches in VMware:

  • Standard virtual switch: Configured on each VMware ESXi host (the span is limited to its host) and does not require vSphere. This vSwitch cannot be used with VMM integration in Cisco ACI, but it can be used with Cisco ACI physical domains.

  • Distributed virtual switch (DVS): Configured in one place, while the configuration is distributed. It requires VMware vSphere, and Enterprise Plus license. Hence, the DVS is required for VMM integration in Cisco ACI.

Each vSwitch, has the following elements:

  • Virtual ports:

  • Mapped to vNICs on the VMs.

  • Port groups:

  • Mapped to VLANs that are utilized on the physical network.

  • Represent a Layer 2 broadcast domain, since there is Layer 2 connectivity between virtual ports belonging to the same port group.

  • Uplinks:

  • Mapped to a physical NIC (pNIC, or also referred as VMNIC) to provide connectivity to the outside network.

  • vSwitch has exclusive use of the VMNIC.


You can verify that the VM's network adapter is connected to the appropriate network (port group). So, when you are using the WebUI you should ensure that the connected option is checked. When using the client, connected (if the VM is powered on) and connect at power on should be chosen.








Click on Physical Adapter --> click on blue icon for vmnic0




Click on VM Network




Verifying Leafs


The leaf switches in Cisco ACI fabric provide connection for the servers, which can serve as hypervisor hosts in the data center. Servers can be rack-mount units, such as Cisco UCS C-Series, or blade servers, as Cisco UCS B-Series.


To benefit from the Cisco ACI fabric functionalities, servers should have dual-homed connection to the leaf switches, as in this figure:



To utilize both uplink connection from the server to the Cisco ACI fabric, you can use MAC pinning or Link Aggregation Control Protocol (LACP) configuration to have active-active uplinks.


However, if you are using Cisco UCS blade servers, you can only implement MAC pinning on the server side for active-active configuration.


That is because Cisco UCS Fabric Interconnects do not support LACP or vLACP on the southbound ports towards the blade servers.



With Cisco UCS blade servers, the server links (vNIC on the blades) are associated with a single uplink port, which referred to as pinning, while the selected external interface is called a pinned uplink port.


You can configure a static or dynamic pinning process when you configure the vNICs. When using LACP, the load-balancing method for active-active uplinks can be based on the IP hash.


If you are using the route based on IP hash option for load balancing, it requires that a port channel or vPC is configured on the leaf switch or leaf switches, respectively.


The following are common problems during integration of Cisco ACI and VMM:

  • Wrong credentials for the VMM (such as VMware vCenter, Microsoft System Center Virtual Machine Manager [SCVMM], and so on.) For example, credentials can be wrong in the first place, can be changed, or are no longer valid.

  • Wrong permissions are assigned to the account that you using for the connection to the VMM, such as the account for the vCenter credential information in the APIC GUI.

  • Wrong data center hostname (or IP address), such as when you specify the hostname or IP address for the vCenter controller.

  • Out of VLANs in dynamic VLAN pools. For example, when you do not allocate enough VLANs in the defined range.

  • Inconsistent port group configurations, due to:

  • Disassociating the VMM domain from the EPG when the VMs are still attached to the port group (order of operations).

  • Deleting the port group in VMware (wrong direction).

  • Manually changing the VLAN that is assigned to the port group.




The service graph is always associated with a contract between two EPGs. A service graph template is a sequence of Layer 4 to Layer 7 functions, Layer 4 to Layer 7 devices.It represents a reusable, generic representation of the expected traffic flow that defines connection points and nodes.





Cisco ACI supports different deployment modes for Layer 4-7 devices with the service graph, such as Go-To mode (also known as routed mode) where the traffic is routed on the Layer 4-7 service device (for example, it can be the default gateway for the servers).


It also can be Go-Through mode (also known as transparent mode or bridged mode), where the default gateway for the servers is the client-side bridge domain, and the Layer 4-7 device bridges the client-side bridge domain and the server-side bridge domain.



A concrete device represents a service device, for example, one load balancer, or one firewall. A concrete device can be either a physical device or a virtual machine.


  • Concrete device: Represents a service device, which can be physical or virtual.

  • Logical device: Represents a cluster of two devices, which can operate, for example, in an active/standby mode. It also defines the logical interfaces (defined in the device model) for device selection policy.


The deployment steps are the following:



  1. Define the Layer 4-7 service device. The configuration can comprise a single device, two devices (such as an active-standby high-availability pair).

  2. Create the service graph template.

  3. Attach the service graph template to the contract subject. The service graph template must be associated with the contract (between EPGs)

  4. The rendering layer determines which device should be used, if you are using two or more devices.


Layer 4–7 Service Insertion Modes


Unmanaged mode: Offers some configuration automation and simplification. It is a commonly used mode, where the configuration of the Layer 4–7 device is performed separately.

  • Managed mode: There is capability of pushing configuration from APIC to a service node via a device package.

  • Policy-based redirect (PBR): Utilizes PBR, as one of the main features of the service graph, where the Cisco ACI fabric can redirect traffic between security zones to Layer 4–7 devices. With PBR, the Layer 4–7 device does not need to be the default gateway for the servers.

  • Copy services: Unlike SPAN that duplicates all of the traffic, the Cisco ACI copy services feature enables selectively copying portions of the traffic between endpoint groups, according to the specifications of the contract. A copy service is configured as part of a Layer 4 to Layer 7 service graph template that specifies a copy cluster as the destination for the copied traffic.

  • Service chaining: The Layer 4–7 service insertion feature enables you to insert more than one service between two EPG, and create a service chain between them.



The following summarizes the steps you should take while verifying the PBR configuration:


  1. Define the Layer 4–7 device (single, high-availability, or cluster) in Cisco APIC GUI using Tenants > Tenant_name > Services > L4–L7 > Devices, right-click to Create L4–L7 Devices:


The type of device (firewall, load balancing).

Where to find the device?

Virtual or physical (choose a domain)



2. Configure PBR policies using Tenants > Tenant_name > Policies > Protocol > L4-L7 Policy-Based Redirect, right-click to Create L4-L7 Policy-Based Redirect:


PBR policies define the next hop for the traffic that will be sent through the L4-L7 device.

Note: You may define them while applying the service graph template as well.



3. Define a service graph template, within the tenant, on Services > L4–L7 > Service Graph Templates, right-click to Create L4–L7 Service Graph Template.


The device that you will use.

The topology shows a function node that is connected to the consumer and provider EPGs.



4. Apply the service graph template, with right-click on the service graph template and choose Apply L4–L7 Service Graph Template


Choose the EPGs and contract (create new or reference an existing one) that instructs Cisco ACI which traffic to send to the device.


Choose how the device is connected.


Consumer connector: Redirect policy

Provider connector: Redirect policy




Resolution Immediacy: It is used for VMM domain. This option controls when VRF, bridge domains, and SVIs are programmed on the leaf nodes.


  • Pre-provision: The policy is configured to leaf regardless of Cisco Discovery Protocol or Link Layer Discovery Protocol (LLDP) relationship, even without a host connected to the VMM switch. For example, if an EPG is associated with a VMM domain, the bridge domain and the VRF to which the EPG refers are pushed on all of the leaf nodes, where the VMM domain is configured.

  • Immediate: Policy is configured on a leaf when a hypervisor that is connected to this leaf is attached to an APIC VMM DVS.A discovery protocol, such as Cisco Discovery Protocol/LLDP or the OpFlex protocol, is used to form the adjacency and discover to which leaf the virtualized host is attached.

  • On demand: Policy is configured on a leaf when a hypervisor that is connected to this leaf is attached to an APIC VMM DVS and at least one virtual machine on the host is connected to a port group and EPG that is associated with this physical NIC and leaf.


Deployment Immediacy: It is used for both Physical & VMM Domain. This option controls when contracts are programmed in the hardware.


Immediate: The policy CAM is programmed on the leaf when the policy is resolved to the leaf (see resolution immediacy, above), regardless of whether the virtual machine on the virtualized host has sent traffic.


On Demand: The policy CAM is programmed after the virtual machine sends the first packet, when the first data-plane packet reaches the leaf to trigger an endpoint learning for the EPG.


Verify Policy CAM Status


The main methods to program zoning-rules within Cisco ACI are as follows:


  • EPG-to-EPG contracts: Typically requires at least one consumer and one provider to program zoning-rules across two or more distinct endpoint groups.


  • Preferred groups: Requires enabling grouping at the VRF level, where all members of the group can communicate freely. Non-members require contracts to allow flows to the preferred group. Each VRF can have one preferred group.


  • vzAny: An EPG collection that is defined under a given VRF. vzAny represents all EPGs in the VRF. Usage of vzAny allows flows between one EPG and all EPGs within the VRF via one contract connection.


To view the available resources in Cisco ACI, choose Operations > Capacity Dashboard in the APIC GUI menu bar.


On GitHub


A fabric resource calculation tool is available.



FabricResourceCalculation/policyTCAM.py script is available.




Moquery Commands


Find the number of rules: Recommended stay under 50,000


apic# moquery -c actrlRule -x rsp-subtree-include=count | grep count

count : 24504


Find the number of DNs associated to the contract: Recommended to be under 80,000


apic# moquery -c vzRsRFltAtt -x rsp-subtree-include=count | grep count

count : 42587



The following figure shows two different configurations in Cisco APIC GUI, where the first one (left side) utilizes one filter with multiple filter entries, while the second one (right side) uses individual filters for the same services.



If you are using 7 individual filters, the product for a given contract would equal 7 for just one source and destination combination. However, the product for one source and 10 destinations would equal 70.




Static Routes in the fabric


Fabric > Inventory > Pod_number > Leaf_name > Protocols > IPV4, and chose IPv4 for a specific VRF (for example, VRF-T01:DB-VRF). From the working pane, choose Operational > Static Routes to inspect the static routes in the overlay, so you can find the leaked routes.



You can also use the APIC GUI via Fabric > Inventory > Pod_number > Leaf_name > Rules to insect the security zoning-rules.




You can also observe the packets in the APIC GUI, for EPG.




L3 Packet Drop from GUI


Tenant --> Operational --> Packet ---> L3 drop




Leaf # show logging ip access-list internal packet-log deny | more




Tenant --> Operational --> Resource IDs




Segment ID is the VXLAN ID that is assigned as a VRF tag. It is autogenerated by Cisco ACI and associated with the VRF.


Similar to the VRF Segment ID, all EPGs have a pcTag ID that is autogenerated by Cisco ACI and associated with each individual EPG. This tag is significant because it is used by Cisco ACI for communication between EPGs.



Leaf # show zoning-rule | more



Get the VXLAN ID (Segment ID) for VRF Presales_VRF


Leaf # show vrf Sales:Presales_VRF detail extended


It is 2195464





Leaf # show system internal epm endpoint ip 10.0.1.1




On the APIC, you could obtain the pcTags using the moquery command, even without learned endpoints.


apic1# moquery -d uni/tn-Sales/ap-eCommerce_AP/epg-Web_EPG





Leaf-a# show zoning-rule src-epg 49158 dst-epg 49156




leaf-a# show system internal policy-mgr stats | grep 2195464



ssh from web to app VM twice to see increase in packet




91 views0 comments

Recent Posts

See All

OpFlex

OpFlex  is an open and extensible policy protocol developed by Cisco Systems. It is designed to facilitate communication between a policy...

Comments


bottom of page