Kubernetes Fundamentals

What is Kubernetes?

Kubernetes (commonly referred to as "K8s") is an open-source platform designed to automate the deployment, scaling, and operation of containerised applications.

It was originally developed by Google and is now maintained by the Cloud Native Computing Foundation (CNCF). Kubernetes provides a robust framework for running distributed systems resiliently, managing container lifecycles, and enabling automated operations.

Why Kubernetes?

Container Orchestration:
- Kubernetes automates the management of containerized applications across multiple hosts, ensuring they are running efficiently and can scale as needed.
Scalability:
- Automatically scales applications horizontally by increasing or decreasing the number of containers in response to application load.
Self-healing:
- Automatically restarts failed containers, replaces them, kills containers that don't respond to user-defined health checks, and reschedules containers when nodes die.
Load Balancing and Service Discovery:
- Kubernetes can expose a container using DNS or its own IP address. If traffic to a container is high, Kubernetes can load balance and distribute network traffic to ensure stability.
Infrastructure Abstraction:
- Developers don't need to worry about infrastructure when deploying applications. Kubernetes takes care of the underlying compute, network, and storage resources for containerized applications.
Automated Rollouts and Rollbacks:
- Kubernetes allows you to define the desired state of your application, and it automatically manages the application’s lifecycle to match the desired state. It can perform automated rollouts, and rollbacks are supported if needed.

Kubernetes Architecture

Kubernetes follows a master-worker architecture, where the control plane manages the overall system, and worker nodes handle running application workloads.

1. Master Node (Control Plane)

The control plane consists of multiple components that manage the cluster and ensure the desired state is maintained.

API Server (kube-apiserver):
- The front-end interface for Kubernetes, responsible for handling all REST requests and providing interaction between the user and the Kubernetes cluster.
Controller Manager (kube-controller-manager):
- Watches the cluster state via the API server and makes changes to ensure the current state matches the desired state. Examples include replication controller, node controller, and endpoints controller.
Scheduler (kube-scheduler):
- Assigns workloads to worker nodes based on resource availability and other constraints (e.g., affinity, anti-affinity rules).
etcd:
- A key-value store used to store all cluster data, including configuration details, secrets, and service discovery information. It acts as the source of truth for the cluster’s state.

2. Worker Nodes

Worker nodes are responsible for running containerized applications and are managed by the master node.

kubelet:
- An agent running on each worker node that ensures the containers are running in a Pod. The kubelet interacts with the API server to receive commands and report on node and pod health.
Container Runtime:
- The software responsible for running containers (e.g., Docker, containerd, CRI-O). Kubernetes interacts with this runtime to manage the container lifecycle.
kube-proxy:
- Manages network routing on worker nodes. It ensures that networking services can communicate with each other and with the external world. It also implements load balancing within the cluster.

Key Kubernetes Concepts

Pod:
- The smallest and most basic deployable unit in Kubernetes.
- A Pod can contain one or more containers, and they share the same network namespace and storage volumes. Containers in the same Pod can communicate with each other via local host
Service:
- An abstraction that defines a logical set of Pods and a policy for accessing them.
- Kubernetes Services allow communication between Pods and can load balance traffic across multiple Pods.
ReplicaSet:
- Ensures a specified number of Pod replicas are running at all times. If a Pod crashes, the ReplicaSet will create a new one to meet the desired replica count.
Deployment:
- Provides declarative updates for Pods and ReplicaSets. Deployments are used for rolling out new application versions, scaling Pods, and performing rollbacks.
Namespace:
- A way to divide cluster resources between different users or teams. Namespaces help organize resources and prevent name collisions.
ConfigMap:
- A key-value store used to configure application settings within Pods. It can be used to decouple configuration settings from application code.
Secret:
- Similar to ConfigMap but designed to store sensitive data (e.g., passwords, API keys) securely.
PersistentVolume (PV) & PersistentVolumeClaim (PVC):
- Persistent storage in Kubernetes is abstracted using PersistentVolumes. A PersistentVolumeClaim is a request for storage by a user.
Ingress:
- An API object that manages external access to services, typically HTTP. Ingress allows traffic to be routed to services inside the cluster based on rules.
DaemonSet:
- Ensures that a copy of a Pod is running on all or some nodes in the cluster.
StatefulSet:
- Manages stateful applications. It ensures Pods are created in a specified order and can manage persistent storage for stateful applications.
Job and CronJob:
- Job: A resource that runs a single or multiple pods until a specified task is completed.
- CronJob: Allows scheduling of Jobs based on time intervals (e.g., cron expressions).

Kubernetes Networking

Kubernetes networking is a core concept, and it enables communication between containers, Pods, and external resources.

Container-to-Container Communication:
- Containers within the same Pod can communicate using localhost.
Pod-to-Pod Communication:
- Pods can communicate with other Pods across nodes using the Pod’s IP address. This is facilitated by the cluster networking model.
Service Discovery:
- Kubernetes assigns each Service a DNS name, enabling Pods to communicate with Services by using their name instead of IP addresses.
ClusterIP, NodePort, and LoadBalancer:
- ClusterIP: Exposes the Service to internal cluster traffic.
- NodePort: Exposes the Service on a static port on each node's IP address.
- LoadBalancer: Provisions a load balancer in cloud environments (AWS, GCP, etc.) and forwards traffic to the Service.
Network Policies:
- Define rules for how Pods can communicate with each other and other network endpoints. They are used for enforcing network security and isolating traffic.

Kubernetes Storage

Volumes:
- Volumes provide persistent storage to Pods. When a Pod is deleted, the data in the volume persists and can be reattached to new Pods.
PersistentVolume (PV) and PersistentVolumeClaim (PVC):
- PVs are cluster-wide storage resources, and PVCs are claims to request the use of storage resources. Kubernetes abstracts the underlying storage (e.g., AWS EBS, Google Persistent Disks) and provides a consistent way to manage it.
Storage Classes:
- Define different storage types, such as standard and high-performance, and are used to provision PersistentVolumes dynamically.

Scaling and Autoscaling

Horizontal Pod Autoscaler (HPA):
- Automatically scales the number of Pods in a deployment or ReplicaSet based on observed CPU utilization or other custom metrics.
Vertical Pod Autoscaler (VPA):
- Automatically adjusts the CPU and memory requests/limits of a container to match its real-time resource needs.
Cluster Autoscaler:
- Adds or removes nodes in the cluster based on resource utilization. This is particularly useful in cloud environments where you only want to pay for the resources you need.

Security in Kubernetes

Role-Based Access Control (RBAC):
- Controls who can access the Kubernetes API and what actions they can perform. RBAC policies are configured using Roles and RoleBindings.
Network Policies:
- Define how Pods communicate with each other and with external services, enforcing traffic isolation and security at the network level.
Pod Security Policies (PSP):
- A mechanism to control the security aspects of Pods, such as enforcing whether privileged containers can run or controlling access to the host file system.
Service Accounts:
- A Kubernetes mechanism to assign identities to Pods. Service accounts can be used to limit what APIs a Pod can access.

Deployment Strategies

Rolling Update:
- Gradually updates Pods to the new version without downtime. Kubernetes replaces old Pods with new ones, ensuring the application remains available.
Blue/Green Deployment:
- Two environments (blue and green) are used. The new version (green) is deployed alongside the old version (blue), and traffic is shifted to the new version once it's validated.
Canary Deployment:
- A small percentage of traffic is sent to the new version while the rest continues to go to the old version. If the new version performs well, more traffic is gradually sent to it.

Monitoring and Logging

Kubernetes Metrics:
- kube-state-metrics and metrics-server are commonly used to collect cluster-level metrics such as CPU, memory, and network usage.
Logging:
- Kubernetes doesn’t handle application logging directly, but tools like Fluent