Mastering Kubernetes: Container Orchestration Beyond Docker

Development
12 Jun, 2024

What is Kubernetes?

While Docker revolutionized the creation and management of single containers, Kubernetes (k8s for short) is a 'Container Orchestration' tool that automates the process of deploying, scaling, and managing dozens or hundreds of containers on a large scale. It started with Google releasing it as open source based on the know-how of its internal system (Borg) created to manage its massive container infrastructure.

Just as an orchestra conductor directs numerous instruments to play music in harmony, Kubernetes is responsible for directing numerous containers to provide stable services.

Why do you need Kubernetes?

For early-stage services or small-scale projects, it may be possible to manually manage a few Docker containers. However, as the service scale grows and microservice architecture (MSA) is adopted, the number of containers to manage increases exponentially.

How to automatically restart a specific container when it dies?
How to automatically increase (scale-out) the number of containers when traffic is high and scale back (scale-in) again when traffic decreases?
How to deploy containers by efficiently distributing multiple server (node) resources?
When distributing a new version of an app, how will the update be carried out without service interruption (Zero-downtime)?

Kubernetes solves these complex problems in a declarative way. All you have to do is tell Kubernetes, “Always keep 5 web app containers on 3 servers,” and Kubernetes will monitor that state and constantly control it to match.

Core objects of Kubernetes

Kubernetes uses several default objects to represent the state of the cluster. Understanding this is the first step to becoming a Kubernetes master.

1. Pod

A Pod is the smallest and most basic unit of execution that can be deployed in Kubernetes. A pod can contain one or more containers, and containers within the same pod share storage volumes, network IP addresses, execution options, etc. It is generally recommended to deploy one core application container in a pod, but also deploy auxiliary containers (sidecars) such as logging collectors or proxies.

2. ReplicaSet

ReplicaSet is responsible for ensuring that a specified number of Pods (Replica) are always running. If a pod has a problem and is terminated or a node fails, the ReplicaSet immediately creates a new pod to match the set number. Through this, high availability of services can be achieved.

3. Deployment

A Deployment is a higher-level controller that allows you to declaratively update the state of ReplicaSets and Pods. In practice, applications are mainly deployed through deployments rather than directly creating pods or replicasets. You can easily perform non-disruptive rolling updates or rollback to previous versions.

4. Service

Pods have a short life cycle, with their IP addresses constantly changing as they are created and destroyed. The service is responsible for providing a stable static IP (ClusterIP) and domain name (DNS) to this dynamically changing logical set of pods. Depending on the service type, you can set whether to communicate only within the cluster or expose the service to the outside (NodePort, LoadBalancer).

5. Ingress

An object that manages HTTP/HTTPS path routing from outside the cluster to services inside it. Efficient traffic management is possible by distributing traffic to multiple services through one external IP and supporting SSL/TLS certificate processing and name-based virtual hosting.

Kubernetes architecture summary

A Kubernetes cluster consists of nodes (servers) that play two major roles.

Control Plane / Master Node: Serves as the brain that manages and controls the entire cluster. It consists of an API server, etcd distributed storage that stores cluster data, a scheduler, and a controller manager.
Worker Node: A worker server where actual application containers (pods) run. Each worker node runs a Kubelet that communicates with the control plane, a container runtime (Docker, containerd, etc.), and Kube-proxy, a network proxy.

Considerations when introducing into practice

Kubernetes is certainly a powerful tool, but its adoption cost and learning curve are very high. Therefore, rather than blindly introducing it, the size of the organization and the nature of the service should be carefully considered.

Building and operating a Kubernetes cluster yourself (on-premise) requires significant infrastructure expertise. Therefore, many companies utilize managed Kubernetes services (AWS EKS, Google GKE, Azure AKS, etc.) provided by cloud providers. Managed services can significantly reduce operational burden by leaving management of the control plane to the cloud provider and allowing developers to focus solely on deploying worker nodes and applications.

Summary

While Docker solved the packaging problem of applications, Kubernetes is like a standard operating system in the cloud native era that solves the problems of deployment and operation in large-scale environments. If you want to build a modern backend system or data pipeline, understanding the Kubernetes ecosystem is becoming a necessity rather than an option.