In my routine work I work on platforms like Openstack, Kubernetes and Esxi VMware running my various LTE and 5g workloads, a group of microservices together perform a Cloud-Native Network Function (CNF)
This blog is a compilation of information where I am trying to parse the voluminous documentation especially useful for newcomers in my team. For me technical blogging is beneficial in subtler ways. Blog posts serves as kind of diary of my technical growth, honing my writing Skills,helps me in keeping up-to-date on the latest technology trends,it is great way of documenting my growth as a Telecom cloud architect making my thinking more organized.
This blog will give a high-level overview of the most important components mainly Container along with Docker, Linux components and the orchestration tool Kubernetes and we will see how they fit together. This is part1 of the blog.
Before we start let’s understand the difference between Docker and container.
Docker image is a template that defines how a container will be realized, docker image is an integral part of containers. As depicted in the below figure a virtual machine virtualizes the physical hardware for machine level isolation, whereas container virtualizes the operating system. a hypervisor abstracts away hardware for the virtual machines so they can run an operating system, a container engine abstracts away an operating system so containers can run applications. The overhead is high in case of running virtual machines. As an example, you can run 100s of containers on your desktop but only few virtual machines.
A container is a virtual runtime environment that runs on top of a single operating system (OS) kernel and emulates an operating system rather than the underlying hardware. The above diagram depicts a one-one matching layers. The basic building block of containers are Namespace, Cgroups and UnionFS. In the latter part of the blog, we will see how the Cgroups are attached to Pods which hosts the container in the respective namespace.
Namespace, Cgroups, and Union file-system are the basic building blocks of a container.
Cgroups is Linux kernel feature that isolates and controls the resource usage for user processes.
Cgroups isolation is a higher level of isolation that makes sure processes within a Cgroups namespace are independent of processes in other namespaces. Cgroups allocate CPU time, system memory, network bandwidth, or combinations of these among user-defined groups of tasks. As depicted in the above diagram namespace isolation features are mentioned below which are basis of containers.
PID (Process Identifier) Namespaces:
PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such as suspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs.
Network Namespaces: Isolation of the network interface controller, iptables, routing tables, and other lower-level networking tools. This namespace enables processes to have their own private network stack, including interfaces, routing tables and sockets
Mount Namespaces: Mount namespaces provide isolation of the list of mount points seen by the processes in each namespace instance. Thus, the processes in each of the mount namespace instances will see distinct single-directory hierarchies.
User Namespaces: Limits users within a namespace to only that namespace and avoids user ID conflicts across namespaces. User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs, the root directory, keys, and capabilities.
UTS namespace: Is to provide hostname isolation. UTS (Unix Timesharing System) namespace is about the hostname and domain names for the system that a process is aware of
By assigning a process in its own UTS namespace, hostname for the process can be changed independently of the host or VM on which the process is running.
Let’s focus on Union File System now. Why do we need Union file system when we already have file systems like xfs, ext2, ext3 etc. Union file system overcomes Inefficient Disk Space Utilization and Latency in bootstrap. Suppose we create 20 docker instances, with image size of 1 GB and the conventional files system like ext, at least 20GB of space would have been eaten up.
Containers starts in milliseconds ,it is possible only because new containers are created only by forking existing containers which is a quick process.
The fork operation creates a separate address space for the child. The child process has an exact copy of all the memory segments of the parent process. In order to create a new container, all the files of image layers would be copied into container namespace. A container is expected to start in a few milliseconds. If a huge payload is needed to be copied at the time of starting a container it increases the bootstrap time of a container.
So, here we need some mechanism to efficiently share physical memory segments among containers. In order to address these challenges listed above, Union Capable File Systems came into existence.
Union file system works on top of the other file-systems. It gives a single coherent and unified view to files and directories of separate file-system
Let’s see the layers of the UFS. These are Base Layer (Read Only), Overlay Layer (Main User View) and Diff Layer. Base layer is where the base files for your file system are stored, this layer ( from the overlay view) is read only.
Overlay layer is like a playground (Kind of cache) where the user operates, it initially offers a view of the base layer and gives the user the ability to interact with files and even “write” to them! When you write to this layer changes are stored in our next layer which is Diff layer.
Any changes made in the Overlay Layer are automatically stored in Diff layer. This type of operation is known as a copy-on-write operation and is probably the most important part of making a Union File System function correctly.
Let us explore container ecosystem now. Following are the standards adopted for interoperability
- Container Runtime Interface (CRI) defines an API between Kubernetes and the container runtime.
- Open Container Initiative (OCI) defines a standard for images and containers.
The diagram below shows how Docker, Kubernetes, CRI, OCI , Containerd , and RunC fit together in this ecosystem:
CRI (Container Runtime Interface) is a Kubernetes API (Application Programming Interface). CRI defines the way Kubernetes interacts with different container runtimes
OCI is an open industry standard specification that contains two specifications: the runtime specification (runtime-spec) and the image specification (image-spec)
RunC is an OCI compliant tool for creating and running containers. runc is used to create and run the containers according to the OCI specification.
Let’s understand the relationship between Container and Pod and what is Kubernetes application.
For an application to run on Kubernetes cluster, we Package the application as a container. Wrap the container in a Pod and finally deploy using the Manifest file.
We cannot run a container directly on a Kubernetes cluster, containers always run inside a pod. Pod is the atomic unit in the Kubernetes cluster just as container in the docker world
Pod is a wrapper that allows containers to run on Kubernetes. All we do is define a Pod and deploy it on a container. Consider an example of a chocolate. A chocolate is wrapped in a wrapper as it is not sold to consumer directly, similarly a container is wrapped and the product item becomes pod.
A group of container is called a pod. It is a ring-fenced environment to run the containers. We ring-fence an area/slice of the host operating system and build a network stack, create namespace and run a container. It is similar to Cgroups of Linux server. CPU, RAM and IP address can only be defined at Pod level. Containers running in same Pod share environment like memory, volumes, network stack. All containers in the same Pod share the same ip address.