What is Cisco Nexus Dashboard?

Cisco Nexus Dashboard (ND) is a centralised and unified management platform designed to host multiple Cisco data center services in a single interface.

Key Features:

Manages multi-fabric deployments including Cisco ACI, NDFC, and NX-OS standalone switches
Unified platform that includes services like:
- Nexus Dashboard Insights (NDI) – Real-time analytics, visibility, and assurance
- Nexus Dashboard Orchestrator (NDO) – Policy and multi-fabric automation
- Nexus Dashboard Fabric Controller (NDFC) – Centralized fabric control and automation
Microservices-based architecture for seamless upgrades and scalability
AI/ML-driven analytics for intelligent troubleshooting and proactive recommendations
Available in different form factors:
- Physical Appliance (Cisco UCS hardware)
- Virtual Machines (VMware ESXi, Linux KVM)
- Cloud (AWS, Microsoft Azure)

Key Components of Nexus Dashboard

Component	Functionality
Nexus Dashboard Insights (NDI)	Provides real-time analytics, proactive troubleshooting, and policy assurance
Nexus Dashboard Orchestrator (NDO)	Enables multi-fabric automation and policy orchestration across ACI & NDFC fabrics
Nexus Dashboard Fabric Controller (NDFC)	Manages LAN, SAN, and IP Fabric for Media (IPFM) environments

🔹 Fabric Types Supported:

ACI (Application Centric Infrastructure)
NDFC (LAN/SAN/IPFM)
NX-OS Standalone Mode

Deployment Requirements & Prerequisites

General Network & Infrastructure Requirements

DNS & NTP Servers:

Required for all deployments and upgrades
Must be reachable; incorrect entries can prevent deployment
Supports NTP authentication (MD5, SHA1, AES128CMAC)

Cluster Network Requirements:

Network	Use Case	RTT (Latency) Requirements
Management Network	GUI, SSH, DNS, NTP, APIC/NDFC Communication	≤50ms between nodes
Data Network	Service-to-service communication, telemetry, insights, orchestration	≤200ms (≤50ms for POAP)
External L3 Connectivity	Required for multi-cluster & external integrations	≤500ms for ACI Orchestrator

Overlay Networks (VXLAN-based):

Application Overlay (Used for internal apps)
Service Overlay (Used for internal services)

Nexus Dashboard Form Factors & Deployment Models

Form Factor	Description	Scalability	Use Cases
Physical Appliance (ISO)	Deployed on Cisco UCS servers (C220, C225)	High	Enterprise-scale
VMware ESXi (OVA)	Virtual appliance for vSphere	High	On-prem private cloud
Linux KVM (QCOW2)	Open-source KVM virtualization	Moderate	OpenStack deployments
AWS Cloud (AMI)	Amazon EC2 instance deployment	Moderate	Public cloud-based workloads
Azure Cloud (ARM)	Microsoft Azure virtual machine	Moderate	Hybrid-cloud integration

🔹 Cluster Configuration

1-node cluster (Limited, no expansion)
3-node cluster (Standard, scalable)
Optional secondary nodes (Increase capacity)
Standby nodes (Only for physical clusters)

Services Deployment & Fabric Onboarding

Cisco ACI Fabrics:

Nexus Dashboard Orchestrator (NDO) required
Connectivity can be in-band or out-of-band
Contracts & policies required for ND–ACI communication

NDFC (Nexus Dashboard Fabric Controller):

Used for NX-OS-based data center fabrics
Supports LAN (VXLAN, vPC), SAN, IPFM fabrics
Requires persistent IPs for SNMP, Syslog, and EPL tracking

Standalone NX-OS Switches:

Added manually to Nexus Dashboard

Upgrades & Migration Strategies

Upgrading a Nexus Dashboard Cluster

Upload new software image (Admin > Software Management)
Validate firmware version
Click Install (Takes ~20 minutes)
For UCS migration, add standby nodes, failover gradually
Convert 9-node clusters to 6-node clusters for better scalability

Migrating from DCNM to NDFC

Migration Workflow:

Backup existing DCNM config
Deploy new Nexus Dashboard cluster
Restore configuration
Unsupported Post-Migration:
- IPv6 switch discovery
- DCNM Tracker
- Report definitions
- SAN CLI templates

Nexus Dashboard Security & Communication Ports

Secure Communication (TLS/mTLS) Role-based Access Control (RBAC)

Service	Protocol	Port	Purpose
SSH	TCP	22	Secure access to CLI
HTTPS	TCP	443	API/UI Access
Syslog	UDP	514	Logging
SNMP	UDP	161/162	Device monitoring
Kafka Messaging	TCP	3001	Orchestrator-to-APIC/NDFC communication

🔹 Persistent IPs:

Required for SNMP, Syslog, EPL, and telemetry
Used for HA and failover scenarios

Troubleshooting & Best Practices

🔍 Common Issues & Solutions

Issue	Possible Cause	Solution
Cluster Deployment Failure	Incorrect DNS/NTP settings	Validate DNS/NTP reachability
High Latency Alerts	Suboptimal network design	Check MTU, RTT requirements
Fabric Not Discovering	Missing contracts in ACI	Ensure ND EPG can reach fabric in-band EPG
Service Failures	Insufficient resources	Verify cluster sizing

Below are detailed, bullet‑point study notes based on the Cisco Nexus Dashboard Troubleshooting document (Release 3.1.1) available on Cisco’s website, combined with additional relevant information from various Cisco resources and community best practices.

Cisco Nexus Dashboard Troubleshooting

Useful Commands & Their Purposes

A. System Health and Cluster Status

show cluster status
- Displays the current state of the Nexus Dashboard cluster.
- Helps determine if all nodes are synchronized and healthy.
show system
- Provides overall system information including software version, uptime, and resource utilization.
show tech-support
- Runs a comprehensive collection of system logs, configuration data, and performance statistics.
- Use this command when escalating issues or for in-depth analysis.

B. Container and Service Diagnostics

docker ps or podman ps (depending on the container runtime)
- Lists all running containers and their status.
- Useful for verifying that all microservices (e.g., UI, API, backend services) are operational.
docker logs <container_name>
- Retrieves logs from a specific container.
- Helps pinpoint errors or abnormal behavior within a service.
show container status (or equivalent custom command)
- Displays the health and resource usage of individual containers.

C. Network & Connectivity Checks

ping
- Tests connectivity between Nexus Dashboard nodes or to external servers (DNS, NTP, APIC).
traceroute
- Identifies network paths and detects any potential latency or routing issues.
show network interface
- Displays interface statistics (e.g., error counters, throughput) to ensure proper connectivity.
check routing
- Validate that routing configurations (static routes or dynamic routing protocols) are set correctly, especially if nodes are spread across multiple subnets.

D. Log Review & Debugging

show logs
- Reviews the system and application logs stored on the Nexus Dashboard.
- Helps detect recurring error messages, warnings, or specific failure patterns.
debug commands (when necessary and with caution)
- Enable targeted debugging for network protocols or service interactions.
- Often used to trace packet flows or to capture detailed error conditions.

Common Troubleshooting Scenarios & Best Practices

A. Cluster Synchronization Issues

Symptoms:
- One or more nodes are out-of-sync, showing degraded or failed status.
Steps:
- Run “show cluster status” to check for mismatches.
- Verify network connectivity and latency between nodes.
- Restart affected services/containers if necessary.
- Check for configuration inconsistencies.

B. Service Failures or Unresponsive Applications

Symptoms:
- Specific services (e.g., Insights or Orchestrator) are not responding.
Steps:
- Use “docker ps” (or “podman ps”) to ensure containers are running.
- Retrieve container logs using “docker logs <container_name>” to pinpoint errors.
- Confirm that required ports (e.g., 443 for HTTPS, 22 for SSH) are open and not blocked by firewalls.
- Validate that persistent IP settings are correctly configured.

C. Network Connectivity & Latency Problems

Symptoms:
- High RTT (round-trip time) values, packet loss, or intermittent connectivity.
Steps:
- Execute “ping” and “traceroute” commands between nodes and external servers.
- Review “show network interface” output for errors.
- Verify MTU settings and VLAN configurations on physical switches.
- Ensure DNS and NTP servers are correctly reachable.

D. Log & Artifact Collection for Escalation

Best Practices:
- Regularly collect and archive logs using “show tech-support” for proactive analysis.
- Document configuration changes and system events that coincide with issues.
- Follow Cisco’s recommended steps before contacting TAC, including running diagnostic commands and reviewing common error messages.

Additional Tips & Information from Cisco and Community Sources

Documentation:
- Always refer to Cisco’s official troubleshooting guides for the latest commands and procedures.
- Review Cisco’s Nexus Dashboard Cluster Sizing tool and configuration guides for best practices.
Community Forums & Cisco Live:
- Participate in Cisco forums and Cisco Live sessions to learn about common challenges and solutions from peers.
- Use Cisco’s support website to search for specific error messages or command outputs.
Automation & Monitoring:
- Consider integrating automated monitoring tools (e.g., Cisco DCNM, third‑party network management systems) to continuously track health metrics.
- Use periodic “tech‑support” snapshots to create baselines for normal operations.

References:

https://www.cisco.com/c/en/us/td/docs/dcn/nd/3x/deployment/cisco-nexus-dashboard-and-services-deployment-guide-321/nd-deploy-new-and-changed-32x.html

https://www.cisco.com/c/en/us/td/docs/dcn/nd/3x/articles-311/nexus-dashboard-troubleshooting-311.html#_useful_commands

Cisco Nexus Dashboard and Services Deployment Guide