Project Calico, the CNI way

vikram fugro
8 min readNov 14, 2018

--

When it comes to Kubernetes networking, Calico is widely used. One of the main reasons being its ease of use and the way it shapes up the network fabric. Calico is a pure L3 solution, where packets are routed in just the same manner as your regular Internet. Each node (eg. VM) acts like a vRouter, which means tools like traceroute, ping, tcpdump, etc just work as expected! Whether the packet is flowing from one container to another or container to another node (or vice-versa), its just treated as a flat network route (L3 hops). By default, there is no notion of overlays, tunneling or NAT. Each endpoint is actually a /32 IP in IPv4 (or equivalent in other), which means a container can be assigned a public IP. All this is achieved using the Linux kernel’s existing network capabilities. This gives a great flexibility in scaling out the network fabric of a platform running atop Calico.

Components (Brief)

Calico is not just limited to programming data routes, but also has rich interface for network policies and now, also supports securing cross-container communications.

The core components of Calico are Bird, Felix and a data-store like Etcd, Kubernetes API Server, etc. The data-store is used to store the config information(ip-pools, endpoints info, network policies, etc).

Bird is a per node BGP daemon that exchanges route information with BGP daemons running on other nodes. Common topology could be node-to-node mesh, where each BGP peers with every other. For large scale deployments, this can get messy. To reduce the number of BGP-BGP connections, there are Route Reflectors for completing the route propagation. Certain BGP nodes in that case, can be configured as Route Reflectors.

Felix is another per-node daemon that is used to configure routes and enforce network policies on the node it is running.

Other (but equally important) components include Dikastes/Envoy to secure container-to-container communication.

We will be peeking into Calico purely as a standalone CNI plugin, independently of Kubernetes, which I believe will help us in understanding it better.

Utilities — Calico

Following utilities will be needed.

  • calico-nod: This is the agent that we will run inside the VMs. It includes Bird, Felix and a few other helper processes.
  • etcd (v3.3.7): Etcd server (for data-store) will be running on the host. When running on Kubernetes, the preferred way is to use CRDs (via Kube-API Server) as the data-store. One less moving part.
  • calicoctl (v3.3.0): Client utility to interact with the etcd server to read/write the configuration/status of the cluster.
  • calico/calico-ipam (v3.2.3): Calico CNI plugins.
  • cnitool (v0.6.0): Add/remove the containers to the network.

Setup

My setup includes a couple of vagrant machines (ubuntu-18.04) running with host-only (VirtualBox) networking mode. Network: 172.17.8.x.

  • VM-1 (machine-01) : 172.17.8.101
  • VM-2 (machine-02) : 172.17.8.102
  • Host : 172.17.8.1

Launch the etcd server on the host and export the environment variable ETCD_ENDPOINTS=http://172.17.8.1:2379 on all three.

user@host:~/cni$ export HostIP=172.17.8.1user@host:~/cni$ docker run --net=host \
--name etcd-v3.3.7 \
--volume=/tmp/etcd-data:/etcd-data \
--rm \
quay.io/coreos/etcd:v3.3.7 \
/usr/local/bin/etcd \
--data-dir /etcd-data \
--listen-client-urls http://0.0.0.0:2379 \
--advertise-client-urls http://${HostIP}:2379 \
--listen-peer-urls http://0.0.0.0:2380 \
--initial-advertise-peer-urls http://${HostIP}:2380 \
--initial-cluster default=http://${HostIP}:2380 \
--initial-cluster-token my-etcd-token \
--initial-cluster-state new \
--auto-compaction-retention 1

Next, we’ll launch the calico-node (agent) containers on both the VMs. But first, let’s set CALICO_IP to the IP of the interface we want the caloco/node agent to use.

vagrant@machine-01:/vagrant$ export CALICO_IP=172.17.8.101

… and now let’s launch the agent in a container on both the vagrant VMs.

vagrant@machine-01:/vagrant$ docker run --net=host --privileged \
--name=calico-node \
--rm \
-e IP=${CALICO_IP} \
-e CALICO_NETWORKING_BACKEND="bird" \
-e NO_DEFAULT_POOLS="true" \
-e CALICO_LIBNETWORK_ENABLED="false" \
-e ETCD_ENDPOINTS="http://172.17.8.1:2379" \
-v /var/log/calico:/var/log/calico \
-v /run/docker/plugins:/run/docker/plugins \
-v /lib/modules:/lib/modules \
-v /var/run/calico:/var/run/calico \
-v /var/lib/calico:/var/lib/calico \
quay.io/calico/node:v3.2.3

Note that we need to launch the agent in the VM’s network namespace. We have to set CALICO_LIBNETWORK_ENABLED to false. We don’t want any docker networking to be enabled here, when the actual workload containers get launched. We want this control to be given to CNI.

Note: When using with an orchestrator, it’s the job of the orchestrator (Kubelet in Kubernetes invoking CNI) to add/remove the containers to the network. Here, we are not using any orchestrator, so we will be using a combination of calicoctl and cnitool.

Once the agent is launched, you should see the following logs:

startup.go 252: Early log level set to info
startup.go 270: Using stored node name from /var/lib/calico/nodename
startup.go 280: Determined node name: machine-01
startup.go 102: Skipping datastore connection test
startup.go 462: Using IPv4 address from environment: IP=172.17.8.101
startup.go 495: IPv4 address 172.17.8.101 discovered on interface enp0s8
startup.go 633: No AS number configured on node resource, using global value
startup.go 668: Skipping IP pool configuration
startup.go 177: Using node name: machine-01
Calico node started successfully

Check if the nodes are registered:

user@host:~/cni$ ./calico/calicoctl get nodes
NAME
machine-01
machine-02

At this point, we can hope that both the nodes (agents) have discovered each other. To verify, run the following on one of the nodes:

vagrant@machine-01:/vagrant$ sudo ./calico/calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+------------+
| 172.17.8.102 | node-to-node mesh | up | 09:39:22 | Established|
+--------------+-------------------+-------+----------+------------+
IPv6 BGP status
No IPv6 peers found.

This shows that the peering is established between the nodes. Now, we will create an ip-pool for the workload containers to get the IP from. Here’s the config:

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: my.ippool-1
spec:
cidr: 10.1.0.0/16
ipipMode: Never
natOutgoing: true
disabled: false
blockSize: 26

With blockSize set to 26, each node will be given a /26 chunk from the larger /16 CIDR, out of which the IPs will be assigned to the containers launched on the respective node. ipipMode=Never disables encapsulation. Encapsulation is needed for packets flowing between the containers located on nodes that are in different subnets. The packets get encapsulated under another packet with the src and dst IP of the nodes involved in the flow. Useful when traversing the subnet boundaries (unless the router is BGP-aware). natOutgoig=true will masquerade the traffic that is destined outside the ip-pool’s CIDR.

Create the ip-pool.

user@host:~/cni$ cat calico/ippool.yml | \
./calico/calicoctl create -f -
Successfully created 1 'IPPool' resource(s)

Verify:

user@host:~/cni$ ./calico/calicoctl get ippool -o wideNAME          CIDR          NAT    IPIPMODE      DISABLED   
my.ippool-1 10.1.0.0/16 true Never false

CNI — Calico Networking

Assuming we have launched containers with — net=none on both the VMs, let’s add them to the calico network using CNI. Before we do that, we will need the calico CNI plugins (calico and calico-ipam) and a CNI config. For CNI plugins, I am using https://github.com/projectcalico/cni-plugin/releases/tag/v3.2.3.

Here’s a simple CNI config that we will use:

{
"name": "test-calico",
"cniVersion": "0.3.1",
"type": "calico",
"ipam": {
"type": "calico-ipam"
},
"etcd_endpoints" : "http://172.17.8.1:2379"
}

We will place this config into /vagrant/net.d/ and the plugins into /vagrant/bin/ on both the VMs. To add the containers to the network, we will run the following command on both the VMs. Before we do that, we need to set the container_name to the name (or ID) of the container.

vagrant@machine-01:/vagrant$ sudo CNI_PATH=/vagrant/bin \
NETCONFPATH=/vagrant/net.d \
CNI_CONTAINERID=$(docker inspect $container_name | \
jq .[0].Id | tr -d '"') \
./cnitool add test-calico \
$(docker inspect $container_name | \
jq .[0].NetworkSettings.SandboxKey | tr -d '"')

… and we should see the following output (eg. from one of the containers’ output)

Calico CNI IPAM request count IPv4=1 IPv6=0
Calico CNI IPAM handle=test-calico.cni
Calico CNI IPAM assigned addresses IPv4=[10.1.240.64] IPv6=[]
Calico CNI using IPs: [10.1.240.64/32]
Calico CNI creating profile: test-calico
{
"cniVersion": "0.3.1",
"interfaces": [
{
"name": "calicni",
"sandbox": "eth0"
}
],
"ips": [
{
"version": "4",
"address": "10.1.240.64/32"
}
],
"dns": {}
}

Here, we can see that this particular container got assigned IP10.1.240.64 . We should be able to ping this container from a container on another VM. In the case below, the container that we are pinging from, was assigned IP 10.1.134.128.

root@a139f793e6eb:/# ip addr show eth0 | grep inet
inet 10.1.134.128/32 scope global eth0
root@a139f793e6eb:/# ping -c 3 10.1.240.64
PING 10.1.240.64 (10.1.240.64): 56 data bytes
64 bytes from 10.1.240.64: icmp_seq=0 ttl=62 time=1.073 ms
64 bytes from 10.1.240.64: icmp_seq=1 ttl=62 time=1.119 ms
64 bytes from 10.1.240.64: icmp_seq=2 ttl=62 time=1.211 ms

For the intrigued, here’s the route programmed on one of the VMs.

As you can see, on this particular VM, we have one container with IP 10.1.240.64, and to reach containers with IP within 10.1.134.128/26 , the next hop is the VM with IP 172.17.8.102. To take it further, try running traceroute 10.1.134.128.

root@b536fa4bec02:/# traceroute 10.1.134.128
traceroute to 10.1.134.128 (10.1.134.128), 30 hops max, 60 byte packets
1 10.0.2.15 (10.0.2.15) 0.023 ms 0.007 ms 0.006 ms
2 172.17.8.102 (172.17.8.102) 0.214 ms 0.116 ms 0.186 ms
3 10.1.134.128 (10.1.134.128) 0.205 ms 0.160 ms 0.180 ms

It just works like normal!

10.0.2.15 is the VM (172.17.8.101) running this container and serves as a first hop. 10.0.2.0/24 network is for the NAT mode (see: VirtualBox networking options), which gets configured along with the host-only mode (172.17.8.0/24) and is set as the default route for the VMs. Every VM gets the same IP 10.0.2.15, as if they are within their own NAT. 172.17.8.0/24 is a private network that I configured in my Vagrantfile and enables the communication only within the VMs (including the host).

Isolation — Profiles

At this point, if we try pinging a container from a VM to a container on another VM, it will get rejected. This is due to a resource type called Profile (test-calico,name picked from CNI config in this case) that is attached to the containers . Profile is a resource type that contains a set of rules applied to the endpoints like containers or VMs.

Let’s examine this profile:

As you can see above, the ingress rule only allows packets from endpoints having label test-calico. If we remove the selector and “apply” this config , it enables the communication from the VM to the container.

user@host:~/cni$ cat ./calico/profile.yml 
apiVersion: projectcalico.org/v3
kind: Profile
metadata:
name: test-calico
spec:
egress:
- action: Allow
destination: {}
source: {}
ingress:
- action: Allow
destination: {}
source: {}
labelsToApply:
test-calico: ""
user@host:~/cni$ cat ./calico/profile.yml | \
./calico/calicoctl apply -f -
Successfully applied 1 'Profile' resource(s)

There are other resource types such as Network Policy, that provides isolation like Profile resource does. Network Policy takes precedence over Profile and is more in tune with Kubernetes constructs. Note: Calico has a network policy controller for Kubernetes.

Closing Points

I covered some basic use-cases of Calico here and I hope will provide a good base to proceed. Immediate set of things to explore /deep-dive into, could be;

  • ipipmode, and it’s impact on performance due to encapsulation.
  • Dikastes, cross-container secure communication.
  • Kubernetes integration (via CNI).
  • Other resource types. There are quite a few.
  • Understanding Calico’s BGP Topologies.

--

--

vikram fugro

Open Source Software Enthusiast, Polyglot & Systems Generalist.