Deploy Akash Provider with kubeadm, containerd, gvisor

This write-up follows you through the Akash Provider deployment using kubeadm

Deploy Akash Provider with kubeadm, containerd, gvisor
Photo by Christina @ wocintechchat.com / Unsplash
12th of July 2021: originally published for Akash 0.12.0.
30th of October 2021: updated for Akash 0.14.0, multi-master/worker node setup. Have also added the HA support through a very simplistic round robin DNS A records.
5th of December 2021: update for Akash 0.14.1. Important provider updates are there.

This write-up follows you through the necessary configuration & setup steps required for you to run the Akash Provider on your own Linux distro. (I used x86_64 Ubuntu Focal).
The steps to register and activate Akash Provider are also included.

We are going to be using containerd so there is no need installing docker!

Neither I've used kubespray as the official doc suggests. That is because I want to have more control over every gear in the system neither I want to install the docker.

Preparation

Hostname

Set hostname to something meaningful:

hostnamectl set-hostname akash-single.domainXYZ.com

If you are planning to use a recommended way with 3 master (control plane) nodes and N worker nodes, then you can chose the following hostnames:

If you are going to deploy multi-master (and worker) nodes, which is highly recommended, you can go with the following hostnames:
akash-master-01.domainXYZ.com
akash-master-02.domainXYZ.com
akash-master-03.domainXYZ.com

akash-worker-01.domainXYZ.com
...
akash-worker-NN.domainXYZ.com
In the examples below I've used my actual address *.ingress.nixaid.com I am using on my Akash provider. In your case you will want to replace it with your domain name.

Enable netfilter and kernel IP forwarding (routing)

kube-proxy needs net.bridge.bridge-nf-call-iptables enabled
cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

modprobe br_netfilter
sysctl -p /etc/sysctl.d/k8s.conf

Swap file

It is recommended to disable and remove the swap file.

swapon -s
swapoff -a
sed -i '/swap/d' /etc/fstab
rm /swapfile

Better entropy, better performance

Low entropy is bad and can impact any process relying on it!
To improve performance, install haveged or rng-tools package.

apt -y install haveged

Back in the day, I remember I've seen dockerd daemon would stall and not respond to the commands such as  docker logs, docker-compose up/logs. I figured that was because the server did not have enough entropy, you can check it by running cat /proc/sys/kernel/random/entropy_avail command.

Install cointainerd

wget https://github.com/containerd/containerd/releases/download/v1.5.7/containerd-1.5.7-linux-amd64.tar.gz
tar xvf containerd-1.5.7-linux-amd64.tar.gz -C /usr/local/

wget -O /etc/systemd/system/containerd.service https://raw.githubusercontent.com/containerd/containerd/v1.5.7/containerd.service

mkdir /etc/containerd

systemctl daemon-reload
systemctl start containerd
systemctl enable containerd

Install CNI plugins

Container Network Interface (CNI) - required for most pod network.
cd
mkdir -p /etc/cni/net.d /opt/cni/bin
CNI_ARCH=amd64
CNI_VERSION=1.0.1
CNI_ARCHIVE=cni-plugins-linux-${CNI_ARCH}-v${CNI_VERSION}.tgz
wget https://github.com/containernetworking/plugins/releases/download/v${CNI_VERSION}/${CNI_ARCHIVE}
tar -xzf $CNI_ARCHIVE -C /opt/cni/bin

Install crictl

Kubelet Container Runtime Interface (CRI) - required by kubeadm, kubelet.
INSTALL_DIR=/usr/local/bin
mkdir -p $INSTALL_DIR
CRICTL_VERSION="v1.22.0"
CRICTL_ARCHIVE="crictl-${CRICTL_VERSION}-linux-amd64.tar.gz"
wget "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/${CRICTL_ARCHIVE}"
tar -xzf $CRICTL_ARCHIVE -C $INSTALL_DIR
chown -Rh root:root $INSTALL_DIR

Update /etc/crictl.yaml with the following lines:

cat > /etc/crictl.yaml << 'EOF'
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
#debug: true
pull-image-on-create: true
disable-pull-on-run: false
EOF
/etc/crictl.yaml

Install runc

runc is the default OCF runtime used by non-Akash deployments (i.e. standard kubernetes containers such as kube, etcd, calico pods).
apt install -y runc

(Only on workers) Install gVisor (runsc) and runc

gVisor (runsc) is an application kernel for containers that provides efficient defense-in-​depth anywhere.
See the container runtimes comparison is here.
apt -y install software-properties-common
curl -fsSL https://gvisor.dev/archive.key | apt-key add -
add-apt-repository "deb [arch=amd64,arm64] https://storage.googleapis.com/gvisor/releases release main"
apt update
apt install -y runsc

Configure containerd to use gVisor

Now that Kubernetes is going to use containerd (you will see this later, when we will bootstrap it using kubeadm), we will need to configure it to use gVisor runtime.

Remove the "runsc" (last two lines) on NoSchedule'able master nodes.

Update /etc/containerd/config.toml:

cat > /etc/containerd/config.toml << 'EOF'
# version MUST present, otherwise containerd won't pick the runsc !
version = 2

#disabled_plugins = ["cri"]

[plugins."io.containerd.runtime.v1.linux"]
  shim_debug = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
EOF
/etc/containerd/config.toml

And restart containerd service:

systemctl restart containerd
gVisor (runsc) isn't working with the systemd-cgroup nor cgroup v2 yet, there are two issues open if you wish to follow them up:
systemd-cgroup support #193
Support cgroup v2 in runsc #3481

Install Kubernetes

Install latest stable kubeadm, kubelet, kubectl and add a kubelet systemd service

INSTALL_DIR=/usr/local/bin
RELEASE="$(curl -sSL https://dl.k8s.io/release/stable.txt)"
cd $INSTALL_DIR
curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
chmod +x {kubeadm,kubelet,kubectl}

RELEASE_VERSION="v0.9.0"
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service" | sed "s:/usr/bin:${INSTALL_DIR}:g" | tee /etc/systemd/system/kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf" | sed "s:/usr/bin:${INSTALL_DIR}:g" | tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

cd
systemctl enable kubelet

Deploy Kubernetes cluster using kubeadm

Feel free to adjust podSubnet & serviceSubnet and other control plane configuration to your needs.
For more flags refer to https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/

Make sure to set "kubernetesVersion" to the version you have downloaded the binaries for (https://dl.k8s.io/release/stable.txt)

You need to run kubeadm init on 1 master only!
You will use kubeadm join to join the other master (control plane) nodes and worker nodes later.

Uncomment controlPlaneEndpoint for multi-master deployment or if you plan to scale your master nodes.
Set controlPlaneEndpoint to the same value you have set --cluster-public-hostname to. That hostname should resolve to the public IP of the Kubernetes cluster.

Pro-tip: you can register the same DNS A record multiple times, pointing to multiple Akash master nodes. And then set controlPlaneEndpoint to that DNS A record so it will get DNS round-robin balancing out-of-a-box! ;)
cat > kubeadm-config.yaml << EOF
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: cgroupfs
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock # --cri-socket=unix:///run/containerd/containerd.sock
  ##kubeletExtraArgs:
    ##root-dir: /mnt/data/kubelet
  imagePullPolicy: "Always"
localAPIEndpoint:
  advertiseAddress: "0.0.0.0"
  bindPort: 6443
---
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
kubernetesVersion: "stable"
#controlPlaneEndpoint: "akash-master-lb.domainXYZ.com:6443"
networking:
  podSubnet: "10.233.64.0/18" # --pod-network-cidr, taken from kubespray
  serviceSubnet: "10.233.0.0/18" # --service-cidr, taken from kubespray
EOF

Download necessary dependencies to your master (control plane) node, run preflight check and pre-pull the images:

apt -y install ethtool socat conntrack
kubeadm init phase preflight --config kubeadm-config.yaml
kubeadm config images pull --config kubeadm-config.yaml

Now, if you are ready to initialize single-node configuration (where you have only a single master node which will also be running pods):

kubeadm init --config kubeadm-config.yaml

If you are planning to run multi-master deployment, make sure to add --upload-certs to the kubeadm init command as follows:

kubeadm init --config kubeadm-config.yaml --upload-certs
You can always run kubeadm init phase upload-certs --upload-certs --config kubeadm-config.yaml followed by kubeadm token create --config kubeadm-config.yaml --print-join-command anytime later to get your kubeadm join command!

Do not need to run upload-certs command for a single-node master deployment.

If you will see "Your Kubernetes control-plane has initialized successfully!" message, then everything went successfully and you now have your Kubernetes control-plane node at your service!

You will also see the kubeadm output kubeadm join command with the --token, keep it safe as this command is required for joining more nodes (worker nodes, data nodes) should you want to and depending on what type of architecture you want.

With multi-master node deployment, you will see kubeadm join command with additional --control-plane --certificate-key arguments! Make sure to use them when joining more master nodes to your cluster!

Check your nodes

You can either set KUBECONFIG variable whenever you want to talk to your Kubernetes cluster using the kubectl command OR you can make a symlink to ~/.kube/config.
Keep your /etc/kubernetes/admin.conf safe as it's your Kuberentes admin key which lets you do everything with your K8s cluster.

And that config will be used by the Akash Provider service, you will see how later.

(Multi-master deployments) Newly joined master nodes will automatically receive the admin.conf file off of the source master node. So more backups for you! ;)
mkdir ~/.kube
ln -sv /etc/kubernetes/admin.conf ~/.kube/config
kubectl get nodes -o wide

Install Calico network

Kubernetes is of no use without the network plugin:

# kubectl describe node akash-single.domainXYZ.com | grep -w Ready
  Ready            False   Wed, 28 Jul 2021 09:47:09 +0000   Wed, 28 Jul 2021 09:46:52 +0000   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
cd
curl https://docs.projectcalico.org/manifests/calico.yaml -O
kubectl apply -f calico.yaml

(Optional) Allow the master node schedule PODs

By default, your K8s cluster will not schedule Pods on the control-plane node (master) for security reasons. Either remove the taints on the master so that you can schedule pods on it using the kubectl taint nodes command OR use kubectl join to join worker nodes which will run calico (but first make sure to perform the preparation steps: install CNI plugins, install crictl, configure Kubernetes to use gVisor).

Remove the taints on the master if you are running a single-master deployment:

# kubectl describe node akash-single.domainXYZ.com |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

# kubectl taint nodes --all node-role.kubernetes.io/master-

Check your nodes and pods

# kubectl get nodes -o wide --show-labels
NAME          STATUS   ROLES                  AGE     VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME    LABELS
akash-single.domainXYZ.com   Ready    control-plane,master   4m24s   v1.22.1   149.202.82.160   <none>        Ubuntu 20.04.2 LTS   5.4.0-80-generic   containerd://1.4.8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=akash-single.domainXYZ.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=

# kubectl describe node akash-single.domainXYZ.com | grep -w Ready
  Ready                True    Wed, 28 Jul 2021 09:51:09 +0000   Wed, 28 Jul 2021 09:51:09 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
# kubectl get pods -A 
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-78d6f96c7b-kkszw   1/1     Running   0          3m33s
kube-system   calico-node-ghgz8                          1/1     Running   0          3m33s
kube-system   coredns-558bd4d5db-2shqz                   1/1     Running   0          4m7s
kube-system   coredns-558bd4d5db-t9r75                   1/1     Running   0          4m7s
kube-system   etcd-akash-single.domainXYZ.com                           1/1     Running   0          4m26s
kube-system   kube-apiserver-akash-single.domainXYZ.com                 1/1     Running   0          4m24s
kube-system   kube-controller-manager-akash-single.domainXYZ.com        1/1     Running   0          4m23s
kube-system   kube-proxy-72ntn                           1/1     Running   0          4m7s
kube-system   kube-scheduler-akash-single.domainXYZ.com                 1/1     Running   0          4m21s

(Optional, almost) Install NodeLocal DNSCache

You do not have to install NodeLocal DNSCache if you have akash version with this patch https://github.com/arno01/akash/commit/5c81676bb8ad9780571ff8e4f41e54565eea31fd

PR https://github.com/ovrclk/akash/pull/1440
Issue https://github.com/ovrclk/akash/issues/1339#issuecomment-889293170

Use NodeLocal DNSCache for better performance,
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/

NodeLocal DNSCache service installation is simple:

kubedns=$(kubectl get svc kube-dns -n kube-system -o 'jsonpath={.spec.clusterIP}')
domain="cluster.local"
localdns="169.254.25.10"

wget https://raw.githubusercontent.com/kubernetes/kubernetes/v1.22.1/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/__PILLAR__DNS__SERVER__/$kubedns/g" nodelocaldns.yaml

kubectl create -f nodelocaldns.yaml

Now you need to tell kubelet about the new NodeLocal DNSCache service. You have to do the following for each of your nodes of your Kubernetes cluster:

Modify clusterDNS in your /var/lib/kubelet/config.yaml to use 169.254.25.10 (NodeLocal DNSCache) instead of a default 10.233.0.10 and restart Kubelet service:

systemctl restart kubelet

To make sure you are using NodeLocal DNSCache, you can create a POD and check inside the nameserver should be 169.254.25.10:

/ # cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 169.254.25.10
options ndots:5

Optional: IPVS mode.

NOTE: the cross-service communication (container X to service Y within same POD) does not work in the IPVS mode due to this line https://github.com/ovrclk/akash/blob/7c39ea403/provider/cluster/kube/builder.go#L599 in the "akash-deployment-restrictions" network policy. There might be another way to make it work though, one can try the kubespray deployment with the kube_proxy_mode toggle enabled and see if it gets to work that way.

https://www.linkedin.com/pulse/iptables-vs-ipvs-kubernetes-vivek-grover/
https://forum.akash.network/t/akash-provider-support-ipvs-kube-proxy-mode/720

If you want to run kube-proxy in the IPVS mode one day (instead of the default IPTABLES one), you would need to repeat the steps from the above section "Install NodeLocal DNSCache" except that to modify the nodelocaldns.yaml file use the following command:

sed -i "s/__PILLAR__LOCAL__DNS__/$localdns/g; s/__PILLAR__DNS__DOMAIN__/$domain/g; s/,__PILLAR__DNS__SERVER__//g; s/__PILLAR__CLUSTER__DNS__/$kubedns/g" nodelocaldns.yaml

Switch kube-proxy to the IPVS mode by setting mode: to "ipvs"

kubectl edit configmap kube-proxy -n kube-system

And restart the kube-proxy:

kubectl -n kube-system delete pod -l k8s-app=kube-proxy

Configure Kubernetes to use gVisor

Set up the gvisor (runsc) Kubernetes RuntimeClass.
The deployments created by Akash Provider will use it by default.

cat <<'EOF' | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc
EOF

Check your gVisor and K8s DNS are working as expected

cat > dnstest.yaml << 'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  runtimeClassName: gvisor
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

# kubectl apply -f dnstest.yaml

# kubectl exec -i -t dnsutils -- sh
/ # dmesg 
[   0.000000] Starting gVisor...
[   0.459332] Reticulating splines...
[   0.868906] Synthesizing system calls...
[   1.330219] Adversarially training Redcode AI...
[   1.465972] Waiting for children...
[   1.887919] Generating random numbers by fair dice roll...
[   2.302806] Accelerating teletypewriter to 9600 baud...
[   2.729885] Checking naughty and nice process list...
[   2.999002] Granting licence to kill(2)...
[   3.116179] Checking naughty and nice process list...
[   3.451080] Creating process schedule...
[   3.658232] Ready!
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope global dynamic 
    inet6 ::1/128 scope global dynamic 
2: eth0: <UP,LOWER_UP> mtu 1480 
    link/ether 9e:f1:a0:ee:8a:55 brd ff:ff:ff:ff:ff:ff
    inet 10.233.85.133/32 scope global dynamic 
    inet6 fe80::9cf1:a0ff:feee:8a55/64 scope global dynamic 
/ # ip r
127.0.0.0/8 dev lo 
::1 dev lo 
169.254.1.1 dev eth0 
fe80::/64 dev eth0 
default via 169.254.1.1 dev eth0 
/ # netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
169.254.1.1     0.0.0.0         255.255.255.255 U         0 0          0 eth0
0.0.0.0         169.254.1.1     0.0.0.0         UG        0 0          0 eth0
/ # cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.233.0.10
options ndots:5
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=42 time=5.671 ms
^C
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 5.671/5.671/5.671 ms
/ # ping google.com
PING google.com (172.217.13.174): 56 data bytes
64 bytes from 172.217.13.174: seq=0 ttl=42 time=85.075 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 85.075/85.075/85.075 ms
/ # nslookup kubernetes.default.svc.cluster.local
Server:		10.233.0.10
Address:	10.233.0.10#53

Name:	kubernetes.default.svc.cluster.local
Address: 10.233.0.1

/ # exit

# kubectl delete -f dnstest.yaml

If you see "Starting gVisor..." that means Kubernetes is able to run the containers using the gVisor (runsc).

You are going to see 169.254.25.10 instead of 10.233.0.10 nameserver if you are using NodeLocal DNSCache.
The network test won't work (i.e. ping 8.8.8.8 will fail) once you apply network-policy-default-ns-deny.yaml. This is expected.

(Optional) Encrypt etcd

etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.
Kubernetes uses etcd to store all its data – its configuration data, its state, and its metadata. Kubernetes is a distributed system, so it needs a distributed data store like etcd. etcd lets any of the nodes in the Kubernetes cluster read and write data.

⚠️ Storing the raw encryption key in the EncryptionConfig only moderately improves your security posture, compared to no encryption. Please use kms provider for additional security.

️️⚠️ Make sure you have the same ENCRYPTION_KEY across all control plane nodes! (the ones running kube-apiserver). Just copy /etc/kubernetes/encrypt/config.yaml file across them.

# Run this only once and remember the value of that key!
ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)

mkdir /etc/kubernetes/encrypt
cat > /etc/kubernetes/encrypt/config.yaml <<EOF
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: ${ENCRYPTION_KEY}
      - identity: {}
EOF

Update your /etc/kubernetes/manifests/kube-apiserver.yaml in the following way so kube-apiserver knows where to read the secret from:

# vim /etc/kubernetes/manifests/kube-apiserver.yaml
# diff -Nur kube-apiserver.yaml.orig /etc/kubernetes/manifests/kube-apiserver.yaml
--- kube-apiserver.yaml.orig	2021-07-28 10:05:38.198391788 +0000
+++ /etc/kubernetes/manifests/kube-apiserver.yaml	2021-07-28 10:13:51.975308872 +0000
@@ -41,6 +41,7 @@
     - --service-cluster-ip-range=10.233.0.0/18
     - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
     - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
+    - --encryption-provider-config=/etc/kubernetes/encrypt/config.yaml
     image: k8s.gcr.io/kube-apiserver:v1.22.1
     imagePullPolicy: IfNotPresent
     livenessProbe:
@@ -95,6 +96,9 @@
     - mountPath: /usr/share/ca-certificates
       name: usr-share-ca-certificates
       readOnly: true
+    - mountPath: /etc/kubernetes/encrypt
+      name: k8s-encrypt
+      readOnly: true
   hostNetwork: true
   priorityClassName: system-node-critical
   volumes:
@@ -122,4 +126,8 @@
       path: /usr/share/ca-certificates
       type: DirectoryOrCreate
     name: usr-share-ca-certificates
+  - hostPath:
+      path: /etc/kubernetes/encrypt
+      type: DirectoryOrCreate
+    name: k8s-encrypt
 status: {}
/etc/kubernetes/manifests/kube-apiserver.yaml

kube-apiserver will automatically restart when you save /etc/kubernetes/manifests/kube-apiserver.yaml file. (This can take a minute or two, be patient.)

# crictl ps | grep apiserver
10e6f4b409a4b       106ff58d43082       36 seconds ago       Running             kube-apiserver            0                   754932bb659c5

Don't forget to do the same across all your Kubernetes nodes!

Encrypt all secrets using the encryption key you have just added:

kubectl get secrets -A -o json | kubectl replace -f -

(Optional) IPv6 support

If you wish to enable IPv6 support in your Kubernetes cluster, then please refer to this page https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/dual-stack-support/

Configure Kubernets for Akash Provider service

If you are updating Akash provider from 0.12 to 0.14, please make sure to follow these steps https://github.com/ovrclk/akash/blob/9e1a7aa5ccc894e89d84d38485b458627a287bae/script/provider_migrate_to_hostname_operator.md
mkdir akash-provider
cd akash-provider

wget https://raw.githubusercontent.com/ovrclk/akash/mainnet/main/pkg/apis/akash.network/v1/crd.yaml
kubectl apply -f ./crd.yaml

wget https://raw.githubusercontent.com/ovrclk/akash/mainnet/main/pkg/apis/akash.network/v1/provider_hosts_crd.yaml
kubectl apply -f ./provider_hosts_crd.yaml

wget https://raw.githubusercontent.com/ovrclk/akash/mainnet/main/_docs/kustomize/networking/network-policy-default-ns-deny.yaml
kubectl apply -f ./network-policy-default-ns-deny.yaml

wget https://raw.githubusercontent.com/ovrclk/akash/mainnet/main/_run/ingress-nginx-class.yaml
kubectl apply -f ./ingress-nginx-class.yaml

wget https://raw.githubusercontent.com/ovrclk/akash/mainnet/main/_run/ingress-nginx.yaml
kubectl apply -f ./ingress-nginx.yaml

# NOTE: in this example the Kubernetes node is called "akash-single.domainXYZ.com" and it's going to be the ingress node too.
# In the perfect environment that would not be the master (control-plane) node, but rather the worker nodes!

kubectl label nodes akash-single.domainXYZ.com akash.network/role=ingress

# Check the label got applied:

# kubectl get nodes -o wide --show-labels 
NAME          STATUS   ROLES                  AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME    LABELS
akash-single.domainXYZ.com   Ready    control-plane,master   10m   v1.22.1   149.202.82.160   <none>        Ubuntu 20.04.2 LTS   5.4.0-80-generic   containerd://1.4.8   akash.network/role=ingress,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=akash-single.domainXYZ.com,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
git clone --depth 1 -b mainnet/main https://github.com/ovrclk/akash.git
cd akash
kubectl apply -f _docs/kustomize/networking/namespace.yaml
kubectl kustomize _docs/kustomize/akash-services/ | kubectl apply -f -

cat >> _docs/kustomize/akash-hostname-operator/kustomization.yaml <<'EOF'
images:
  - name: ghcr.io/ovrclk/akash:stable
    newName: ghcr.io/ovrclk/akash
    newTag: 0.14.1
EOF

kubectl kustomize _docs/kustomize/akash-hostname-operator | kubectl apply -f -

Get a wildcard DNS record

In my case I'm going to be using <anything>.ingress.nixaid.com where it would resolve to the IP of my Kubernetes node(s). Preferably only the worker nodes!

A *.ingress.nixaid.com resolves to 149.202.82.160

And akash-provider.nixaid.com is going to resolve to the IP of the Akash Provider service itself that I'm going to be running. (Akash Provider service is listening over 8443/tcp port)

Pro-tip: you can register the same DNS A wildcard record multiple times, pointing to multiple Akash worker nodes so it will get DNS round-robin balancing out-of-a-box! ;)

Creating the Akash Provider on the Akash Blockchain

Now that we've got our Kubernetes configured, up & running, it's time to get the Akash Provider running.

NOTE: You don't have to run Akash Provider service on your Kubernetes cluster directly. You can run it anywhere. It only needs to be able to access your Kubernetes cluster over the internet.

Create Akash user

We are going to be running akash provider under the akash user.

useradd akash -m -U -s /usr/sbin/nologin
mkdir /home/akash/.kube
cp /etc/kubernetes/admin.conf /home/akash/.kube/config
chown -Rh akash:akash /home/akash/.kube

Install Akash client

su -s /bin/bash - akash

wget https://github.com/ovrclk/akash/releases/download/v0.14.1/akash_0.14.1_linux_amd64.zip
unzip akash_0.14.1_linux_amd64.zip
mv /home/akash/akash_0.14.1_linux_amd64/akash /usr/bin/
chown root:root /usr/bin/akash

Configure Akash client

su -s /bin/bash - akash

mkdir ~/.akash

export KUBECONFIG=/home/akash/.kube/config
export PROVIDER_ADDRESS=akash-provider.nixaid.com
export AKASH_NET="https://raw.githubusercontent.com/ovrclk/net/master/mainnet"
export AKASH_NODE="$(curl -s "$AKASH_NET/rpc-nodes.txt" | shuf -n 1)"
export AKASH_CHAIN_ID="$(curl -s "$AKASH_NET/chain-id.txt")"
export AKASH_KEYRING_BACKEND=file
export AKASH_PROVIDER_KEY=default
export AKASH_FROM=$AKASH_PROVIDER_KEY

Check the variables:

$ set |grep ^AKASH
AKASH_CHAIN_ID=akashnet-2
AKASH_FROM=default
AKASH_KEYRING_BACKEND=file
AKASH_NET=https://raw.githubusercontent.com/ovrclk/net/master/mainnet
AKASH_NODE=http://135.181.181.120:28957
AKASH_PROVIDER_KEY=default

Now create the default key:

$ akash keys add $AKASH_PROVIDER_KEY --keyring-backend=$AKASH_KEYRING_BACKEND

Enter keyring passphrase:
Re-enter keyring passphrase:

- name: default
  type: local
  address: akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
...

Make sure to keep your mnemonic seed somewhere safe as it's the only way to recover your account and funds on it!

If you want to restore your key from your mnemonic seed, add --recover flag after akash keys add ... command.

Configure Akash provider

$ cat provider.yaml 
host: https://akash-provider.nixaid.com:8443
attributes:
  - key: region
    value: europe  ## change this to your region!
  - key: host
    value: akash   ## feel free to change this to whatever you like
  - key: organization  # optional
    value: whatever-your-Org-is  ## change this to your org.
  - key: tier          # optional
    value: community

Fund your Akash provider's wallet

You will need about 10 AKT (Akash Token) to get you started.

Your wallet must have sufficient funding, as placing a bid on an order on the blockchain requires a 5 AKT deposit. This deposit is fully refunded after the bid is won/lost.

Purchase AKT at one of the exchanges mentioned here https://akash.network/token/

To query the balance of your wallet:

# Put here your address which you've got when created one with "akash keys add" command.
export AKASH_ACCOUNT_ADDRESS=akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0

$ akash \
  --node "$AKASH_NODE" \
  query bank balances "$AKASH_ACCOUNT_ADDRESS"
Denomination: 1 akt = 1000000 uakt (akt*10^6)

Register your provider on the Akash Network

$ akash tx provider create provider.yaml \
  --from $AKASH_PROVIDER_KEY \
  --keyring-backend=$AKASH_KEYRING_BACKEND \
  --node=$AKASH_NODE \
  --chain-id=$AKASH_CHAIN_ID \
  --gas-prices="0.025uakt" \
  --gas="auto" \
  --gas-adjustment=1.15
If you want to change the parameters of your provider.yaml then use akash tx provider update command with the same arguments.

After registering your provider on the Akash Network, I was able to see my host there:

$ akash \
  --node "$AKASH_NODE" \
  query provider list -o json | jq -r '.providers[] | [ .attributes[].value, .host_uri, .owner ] | @csv' | sort -d
"australia-east-akash-provider","https://provider.akashprovider.com","akash1ykxzzu332txz8zsfew7z77wgsdyde75wgugntn"
"equinix-metal-ams1","akash","mn2-0","https://provider.ams1p0.mainnet.akashian.io:8443","akash1ccktptfkvdc67msasmesuy5m7gpc76z75kukpz"
"equinix-metal-ewr1","akash","mn2-0","https://provider.ewr1p0.mainnet.akashian.io:8443","akash1f6gmtjpx4r8qda9nxjwq26fp5mcjyqmaq5m6j7"
"equinix-metal-sjc1","akash","mn2-0","https://provider.sjc1p0.mainnet.akashian.io:8443","akash10cl5rm0cqnpj45knzakpa4cnvn5amzwp4lhcal"
"equinix-metal-sjc1","akash","mn2-0","https://provider.sjc1p1.mainnet.akashian.io:8443","akash1cvpefa7pw8qy0u4euv497r66mvgyrg30zv0wu0"
"europe","nixaid","https://akash-provider.nixaid.com:8443","akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0"
"us-west-demo-akhil","dehazelabs","https://73.157.111.139:8443","akash1rt2qk45a75tjxzathkuuz6sq90jthekehnz45z"
"us-west-demo-caleb","https://provider.akashian.io","akash1rdyul52yc42vd8vhguc0t9ryug9ftf2zut8jxa"
"us-west-demo-daniel","https://daniel1q84.iptime.org","akash14jpkk4n5daspcjdzsrylgw38lj9xug2nznqnu2"
"us-west","https://ssymbiotik.ekipi365.com","akash1j862g3efcw5xcvn0402uwygrwlzfg5r02w9jw5"

Create the provider certificate

You must issue a transaction to the blockchain to create a certificate associated with your provider:

akash tx cert create server $PROVIDER_ADDRESS \
  --chain-id $AKASH_CHAIN_ID \
  --keyring-backend $AKASH_KEYRING_BACKEND \
  --from $AKASH_PROVIDER_KEY \
  --node=$AKASH_NODE \
  --gas-prices="0.025uakt" --gas="auto" --gas-adjustment=1.15

Starting the Akash Provider

Akash provider will need the Kubernetes admin config. We have already moved it to /home/akash/.kube/config before.

Create start-provider.sh file which will be starting the Akash Provider.
But before, create the key-pass.txt file with the password you have set when created the provider's key.

echo "Your-passWoRd" | tee /home/akash/key-pass.txt
Make sure to set --cluster-public-hostname to the hostname that resolves to the public IP of the Kubernetes cluster. You will also set controlPlaneEndpoint to the that hostname as you will see further.
cat > /home/akash/start-provider.sh << 'EOF'
#!/usr/bin/env bash

export AKASH_NET="https://raw.githubusercontent.com/ovrclk/net/master/mainnet"
export AKASH_NODE="$(curl -s "$AKASH_NET/rpc-nodes.txt" | shuf -n 1)"

cd /home/akash
( sleep 2s; cat key-pass.txt; cat key-pass.txt ) | \
  /usr/bin/akash provider run \
  --chain-id akashnet-2 \
  --node $AKASH_NODE \
  --keyring-backend=file \
  --from default \
  --fees 5000uakt \
  --kubeconfig /home/akash/.kube/config \
  --cluster-k8s true \
  --deployment-ingress-domain ingress.nixaid.com \
  --deployment-ingress-static-hosts true \
  --bid-price-strategy scale \
  --bid-price-cpu-scale 0.0011 \
  --bid-price-memory-scale 0.0002 \
  --bid-price-storage-scale 0.00009 \
  --bid-price-endpoint-scale 0 \
  --bid-deposit 5000000uakt \
  --balance-check-period 24h \
  --minimum-balance 5000000 \
  --cluster-node-port-quantity 1000 \
  --cluster-public-hostname akash-master-lb.domainXYZ.com \
  --bid-timeout 10m0s \
  --withdrawal-period 24h0m0s \
  --log_level warn
  EOF

Make sure it's executable:

chmod +x /home/akash/start-provider.sh

Create akash-provider.service systemd service so Akash provider starts automatically:

cat > /etc/systemd/system/akash-provider.service << 'EOF'
[Unit]
Description=Akash Provider
After=network.target

[Service]
User=akash
Group=akash
ExecStart=/home/akash/start-provider.sh
KillSignal=SIGINT
Restart=on-failure
RestartSec=15
StartLimitInterval=200
StartLimitBurst=10
#LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

Start the Akash provider:

systemctl daemon-reload
systemctl start akash-provider
systemctl enable akash-provider

Check the logs:

journalctl -u akash-provider --since '5 min ago' -f

Akash detects the node as following:

D[2021-06-29|11:33:34.190] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=akash-single.domainXYZ.com available-cpu="units:<val:\"7050\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"32896909312\" > " available-storage="quantity:<val:\"47409223602\" > "

cpu units: 7050 / 1000 = 7 CPU (server actually's got 8 CPU, it must have reserved 1 CPU for whatever the provider node is running, which is a smart thing)
available memory: 32896909312 / (1024^3) = 30.63Gi (server's got 32Gi RAM)
available storage: 47409223602 / (1024^3) = 44.15Gi (here is a bit weird, I've got just 32Gi available on rootfs "/")

Deploying on our own Akash provider

In order to get your Akash client configured on your client side, please refer to the first 4 steps in https://nixaid.com/solana-on-akashnet/ or https://docs.akash.network/guides/deploy

Now that we have our own Akash Provider running, let's try to deploy something on it.
I'll deploy the echoserver service which can return interesting information to the client once queried over the HTTP/HTTPS port.

$ cat echoserver.yml
---
version: "2.0"

services:
  echoserver:
    image: gcr.io/google_containers/echoserver:1.10
    expose:
      - port: 8080
        as: 80
        to:
          - global: true
        #accept:
        #  - my.host123.com

profiles:
  compute:
    echoserver:
      resources:
        cpu:
          units: 0.1
        memory:
          size: 128Mi
        storage:
          size: 128Mi
  placement:
    akash:
      #attributes:
      #  host: nixaid
      #signedBy:
      #  anyOf:
      #    - "akash1365yvmc4s7awdyj3n2sav7xfx76adc6dnmlx63" ## AKASH
      pricing:
        echoserver: 
          denom: uakt
          amount: 100

deployment:
  echoserver:
    akash:
      profile: echoserver
      count: 1

Note that I've commented signedBy directive which typically is used by the clients to make sure they are deploying on a trusted provider. Leaving it commented, means that you can deploy at any Akash provider you want, not necessarily signed.

You can use the akash tx audit attr create command for signing attributes on your Akash Provider if you wish your clients to use signedBy directive.

akash tx deployment create echoserver.yml \
  --from default \
  --node $AKASH_NODE \
  --chain-id $AKASH_CHAIN_ID \
  --gas-prices="0.025uakt" --gas="auto" --gas-adjustment=1.15

Now that the deployment has been announced to the Akash network, let's look at our Akash Provider's side.

Here is what a successful reservation looks like from the Akash provider's point of view:

Reservation fulfilled is what we are looking for.
Jun 30 00:00:46 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:46.122] syncing sequence                             cmp=client/broadcaster local=31 remote=31
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.837] order detected                               module=bidengine-service order=order/akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.867] group fetched                                module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.867] requesting reservation                       module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: D[2021-06-30|00:00:53.868] reservation requested                        module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1 resources="group_id:<owner:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h\" dseq:1585829 gseq:1 > state:open group_spec:<name:\"akash\" requirements:<signed_by:<> > resources:<resources:<cpu:<units:<val:\"100\" > > memory:<quantity:<val:\"134217728\" > > storage:<quantity:<val:\"134217728\" > > endpoints:<> > count:1 price:<denom:\"uakt\" amount:\"2000\" > > > created_at:1585832 "
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.868] Reservation fulfilled                        module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: D[2021-06-30|00:00:53.868] submitting fulfillment                       module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1 price=357uakt
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.932] broadcast response                           cmp=client/broadcaster response="Response:\n  TxHash: BDE0FE6CD12DB3B137482A0E93D4099D7C9F6A5ABAC597E17F6E94706B84CC9A\n  Raw Log: []\n  Logs: []" err=null
Jun 30 00:00:53 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:53.932] bid complete                                 module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:00:56 akash1 start-provider.sh[1029866]: I[2021-06-30|00:00:56.121] syncing sequence                             cmp=client/broadcaster local=32 remote=31

Now that the Akash provider's got reservation fulfilled, we should be able to see it as a bid (offer) on the client side:

$ akash query market bid list   --owner=$AKASH_ACCOUNT_ADDRESS   --node $AKASH_NODE   --dseq $AKASH_DSEQ   
...
- bid:
    bid_id:
      dseq: "1585829"
      gseq: 1
      oseq: 1
      owner: akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h
      provider: akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
    created_at: "1585836"
    price:
      amount: "357"
      denom: uakt
    state: open
  escrow_account:
    balance:
      amount: "50000000"
      denom: uakt
    id:
      scope: bid
      xid: akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
    owner: akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
    settled_at: "1585836"
    state: open
    transferred:
      amount: "0"
      denom: uakt
...

Let's create the leases now (accept the bid offered by the Akash Provider):

akash tx market lease create \
  --chain-id $AKASH_CHAIN_ID \
  --node $AKASH_NODE \
  --owner $AKASH_ACCOUNT_ADDRESS \
  --dseq $AKASH_DSEQ \
  --gseq $AKASH_GSEQ \
  --oseq $AKASH_OSEQ \
  --provider $AKASH_PROVIDER \
  --from default \
  --gas-prices="0.025uakt" --gas="auto" --gas-adjustment=1.15

Now we can see "lease won" at the provider's site:

Jun 30 00:03:42 akash1 start-provider.sh[1029866]: D[2021-06-30|00:03:42.479] ignoring group                               module=bidengine-order order=akash15yd3qszmqausvzpj7n0y0e4pft2cu9rt5gccda/1346631/1/1 group=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: I[2021-06-30|00:03:42.479] lease won                                    module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1 lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: I[2021-06-30|00:03:42.480] shutting down                                module=bidengine-order order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: I[2021-06-30|00:03:42.480] lease won                                    module=provider-manifest lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: I[2021-06-30|00:03:42.480] new lease                                    module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: D[2021-06-30|00:03:42.480] emit received events skipped                 module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 data=<nil> leases=1 manifests=0
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: I[2021-06-30|00:03:42.520] data received                                module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 version=77fd690d5e5ec8c320a902da09a59b48dc9abd0259d84f9789fee371941320e7
Jun 30 00:03:42 akash1 start-provider.sh[1029866]: D[2021-06-30|00:03:42.520] emit received events skipped                 module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 data="deployment:<deployment_id:<owner:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h\" dseq:1585829 > state:active version:\"w\\375i\\r^^\\310\\303 \\251\\002\\332\\t\\245\\233H\\334\\232\\275\\002Y\\330O\\227\\211\\376\\343q\\224\\023 \\347\" created_at:1585832 > groups:<group_id:<owner:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h\" dseq:1585829 gseq:1 > state:open group_spec:<name:\"akash\" requirements:<signed_by:<> > resources:<resources:<cpu:<units:<val:\"100\" > > memory:<quantity:<val:\"134217728\" > > storage:<quantity:<val:\"134217728\" > > endpoints:<> > count:1 price:<denom:\"uakt\" amount:\"2000\" > > > created_at:1585832 > escrow_account:<id:<scope:\"deployment\" xid:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829\" > owner:\"akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h\" state:open balance:<denom:\"uakt\" amount:\"5000000\" > transferred:<denom:\"uakt\" amount:\"0\" > settled_at:1585859 > " leases=1 manifests=0

Send the manifest to finally deploy the echoserver service on your Akash Provider!

akash provider send-manifest echoserver.yml \
  --node $AKASH_NODE \
  --dseq $AKASH_DSEQ \
  --provider $AKASH_PROVIDER \
  --from default

Provider's got the manifest => "manifest received", and kube-builder module has "created service" under c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76 namespace:

Jun 30 00:06:16 akash1 start-provider.sh[1029866]: I[2021-06-30|00:06:16.122] syncing sequence                             cmp=client/broadcaster local=32 remote=32
Jun 30 00:06:21 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:21.413] inventory fetched                            module=provider-cluster cmp=service cmp=inventory-service nodes=1
Jun 30 00:06:21 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:21.413] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=akash-single.domainXYZ.com available-cpu="units:<val:\"7050\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"32896909312\" > " available-storage="quantity:<val:\"47409223602\" > "
Jun 30 00:06:26 akash1 start-provider.sh[1029866]: I[2021-06-30|00:06:26.122] syncing sequence                             cmp=client/broadcaster local=32 remote=32
Jun 30 00:06:35 akash1 start-provider.sh[1029866]: I[2021-06-30|00:06:35.852] manifest received                            module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829
Jun 30 00:06:35 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:35.852] requests valid                               module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 num-requests=1
Jun 30 00:06:35 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:35.853] publishing manifest received                 module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 num-leases=1
Jun 30 00:06:35 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:35.853] publishing manifest received for lease       module=manifest-manager deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829 lease_id=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
Jun 30 00:06:35 akash1 start-provider.sh[1029866]: I[2021-06-30|00:06:35.853] manifest received                            module=provider-cluster cmp=service
Jun 30 00:06:36 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:36.023] provider/cluster/kube/builder: created service module=kube-builder service="&Service{ObjectMeta:{echoserver      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[akash.network:true akash.network/manifest-service:echoserver akash.network/namespace:c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76] map[] [] []  []},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:0-80,Protocol:TCP,Port:80,TargetPort:{0 8080 },NodePort:0,AppProtocol:nil,},},Selector:map[string]string{akash.network: true,akash.network/manifest-service: echoserver,akash.network/namespace: c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76,},ClusterIP:,Type:ClusterIP,ExternalIPs:[],SessionAffinity:,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:,HealthCheckNodePort:0,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamily:nil,TopologyKeys:[],},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},},}"
Jun 30 00:06:36 akash1 start-provider.sh[1029866]: I[2021-06-30|00:06:36.121] syncing sequence                             cmp=client/broadcaster local=32 remote=32
Jun 30 00:06:36 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:36.157] provider/cluster/kube/builder: created rules module=kube-builder rules="[{Host:623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com IngressRuleValue:{HTTP:&HTTPIngressRuleValue{Paths:[]HTTPIngressPath{HTTPIngressPath{Path:/,Backend:IngressBackend{Resource:nil,Service:&IngressServiceBackend{Name:echoserver,Port:ServiceBackendPort{Name:,Number:80,},},},PathType:*Prefix,},},}}}]"
Jun 30 00:06:36 akash1 start-provider.sh[1029866]: D[2021-06-30|00:06:36.222] deploy complete                              module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash

Let's see the lease status from the client side:

akash provider lease-status \
  --node $AKASH_NODE \
  --dseq $AKASH_DSEQ \
  --provider $AKASH_PROVIDER \
  --from default

{
  "services": {
    "echoserver": {
      "name": "echoserver",
      "available": 1,
      "total": 1,
      "uris": [
        "623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com"
      ],
      "observed_generation": 1,
      "replicas": 1,
      "updated_replicas": 1,
      "ready_replicas": 1,
      "available_replicas": 1
    }
  },
  "forwarded_ports": {}
}

We've got it!
Let's query it:

$ curl 623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com


Hostname: echoserver-5c6f84887-6kh9p

Pod Information:
  -no pod information available-

Server values:
  server_version=nginx: 1.13.3 - lua: 10008

Request Information:
  client_address=10.233.85.136
  method=GET
  real path=/
  query=
  request_version=1.1
  request_scheme=http
  request_uri=http://623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com:8080/

Request Headers:
  accept=*/*
  host=623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com
  user-agent=curl/7.68.0
  x-forwarded-for=CLIENT_IP_REDACTED
  x-forwarded-host=623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com
  x-forwarded-port=80
  x-forwarded-proto=http
  x-real-ip=CLIENT_IP_REDACTED
  x-request-id=8cdbcd7d0c4f42440669f7396e206cae
  x-scheme=http

Request Body:
  -no body in request-

Our deployment on our own Akash provider is working as expected! Hooray!

Let's see how does our deployment is actually looking from the Kubernetes point of view on our Akash Provider:

# kubectl get all -A -l akash.network=true
NAMESPACE                                       NAME                      READY   STATUS    RESTARTS   AGE
c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   pod/echoserver-5c6f84887-6kh9p   1/1     Running   0          2m37s

NAMESPACE                                       NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   service/echoserver   ClusterIP   10.233.47.15   <none>        80/TCP    2m37s

NAMESPACE                                       NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   deployment.apps/echoserver   1/1     1            1           2m38s

NAMESPACE                                       NAME                            DESIRED   CURRENT   READY   AGE
c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   replicaset.apps/echoserver-5c6f84887   1         1         1       2m37s

# kubectl get ing -A 
NAMESPACE                                       NAME   CLASS    HOSTS                                                  ADDRESS     PORTS   AGE
c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   echoserver    <none>   623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com   localhost   80      8m47s

# kubectl -n c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76 describe ing echoserver
Name:             echoserver
Namespace:        c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76
Address:          localhost
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host                                                  Path  Backends
  ----                                                  ----  --------
  623n1u4k2hbiv6f1kuiscparqk.ingress.nixaid.com  
                                                        /   echoserver:80 (10.233.85.137:8080)
Annotations:                                            <none>
Events:
  Type    Reason  Age                  From                      Message
  ----    ------  ----                 ----                      -------
  Normal  Sync    8m9s (x2 over 9m5s)  nginx-ingress-controller  Scheduled for sync


# crictl pods
POD ID              CREATED             STATE               NAME                                        NAMESPACE                                       ATTEMPT             RUNTIME
4c22dba05a2c0       5 minutes ago       Ready               echoserver-5c6f84887-6kh9p                         c9mdnf8o961odir96rdcflt9id95rq2a2qesidpjuqd76   0                   runsc
...

The client can read his deployment's logs too:

akash \
  --node "$AKASH_NODE" \
  provider lease-logs \
  --dseq "$AKASH_DSEQ" \
  --gseq "$AKASH_GSEQ" \
  --oseq "$AKASH_OSEQ" \
  --provider "$AKASH_PROVIDER" \
  --from default \
  --follow

[echoserver-5c6f84887-6kh9p] Generating self-signed cert
[echoserver-5c6f84887-6kh9p] Generating a 2048 bit RSA private key
[echoserver-5c6f84887-6kh9p] ..............................+++
[echoserver-5c6f84887-6kh9p] ...............................................................................................................................................+++
[echoserver-5c6f84887-6kh9p] writing new private key to '/certs/privateKey.key'
[echoserver-5c6f84887-6kh9p] -----
[echoserver-5c6f84887-6kh9p] Starting nginx
[echoserver-5c6f84887-6kh9p] 10.233.85.136 - - [30/Jun/2021:00:08:00 +0000] "GET / HTTP/1.1" 200 744 "-" "curl/7.68.0"
[echoserver-5c6f84887-6kh9p] 10.233.85.136 - - [30/Jun/2021:00:27:10 +0000] "GET / HTTP/1.1" 200 744 "-" "curl/7.68.0"

After done testing, it's time to close the deployment:

akash tx deployment close \
  --node $AKASH_NODE \
  --chain-id $AKASH_CHAIN_ID \
  --dseq $AKASH_DSEQ \
  --owner $AKASH_ACCOUNT_ADDRESS \
  --from default \
  --gas-prices="0.025uakt" --gas="auto" --gas-adjustment=1.15

Provider's sees it as expected "deployment closed", "teardown request", ...:

Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.828] deployment closed                            module=provider-manifest deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.828] manager done                                 module=provider-manifest deployment=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.829] teardown request                             module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.830] shutting down                                module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash cmp=deployment-monitor
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.830] shutdown complete                            module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash cmp=deployment-monitor
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.837] teardown complete                            module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.837] waiting on dm.wg                             module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.838] waiting on withdrawal                        module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.838] shutting down                                module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash cmp=deployment-withdrawal
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.838] shutdown complete                            module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash cmp=deployment-withdrawal
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.838] shutdown complete                            module=provider-cluster cmp=service cmp=deployment-manager lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0 manifest-group=akash
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.838] manager done                                 module=provider-cluster cmp=service lease=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1/akash1nxq8gmsw2vlz3m68qvyvcf3kh6q269ajvqw6y0
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: D[2021-06-30|00:28:44.838] unreserving capacity                         module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.838] attempting to removing reservation           module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.838] removing reservation                         module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:28:44 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:44.838] unreserve capacity complete                  module=provider-cluster cmp=service cmp=inventory-service order=akash1h24fljt7p0nh82cq0za0uhsct3sfwsfu9w3c9h/1585829/1/1
Jun 30 00:28:46 akash1 start-provider.sh[1029866]: I[2021-06-30|00:28:46.122] syncing sequence                             cmp=client/broadcaster local=36 remote=36

Tearing down the cluster

Just in case if you want to destroy your Kubernetes cluster:

systemctl disable akash-provider
systemctl stop akash-provider

kubectl drain <node name> --delete-local-data --force --ignore-daemonsets

###kubectl delete node <node name>
kubeadm reset
iptables -F && iptables -t nat -F && iptables -t nat -X && iptables -t mangle -F && iptables -t mangle -X  && iptables -t raw -F && iptables -t raw -X && iptables -X
ip6tables -F && ip6tables -t nat -F && ip6tables -t nat -X && ip6tables -t mangle -F && ip6tables -t mangle -X && ip6tables -t raw -F && ip6tables -t raw -X && ip6tables -X
ipvsadm -C
conntrack -F

## if Weave Net was used:
weave reset (if you  used)  (( or "ip link delete weave" ))

## if Calico was used:
ip link
ip link delete cali*
ip link delete vxlan.calico

modprobe -r ipip

A bit of troubleshooting / getting out of the following situation:

## if getting during "crictl rmp -a" (deleting all pods using crictl)
removing the pod sandbox "f89d5f4987fbf80790e82eab1f5634480af814afdc82db8bca92dc5ed4b57120": rpc error: code = Unknown desc = sandbox network namespace "/var/run/netns/cni-65fbbdd0-8af6-8c2a-0698-6ef8155ca441" is not fully closed

ip netns ls
ip -all netns delete

ps -ef|grep -E 'runc|runsc|shim'
ip r
pidof runsc-sandbox |xargs -r kill
pidof /usr/bin/containerd-shim-runc-v2 |xargs -r kill -9
find /run/containerd/io.containerd.runtime.v2.task/ -ls

rm -rf /etc/cni/net.d

systemctl restart containerd
###systemctl restart docker

Scaling your Akash provider horizontally

You can scale your Akash Provider should you want to add more space for new deployments.

To do that, acquire new baremetal or VPS host and repeat all the steps up until "Deploy Kubernetes cluster using kubeadm" (not including).
Run the following commands on your new master (control-plane) or worker node:

apt update
apt -y dist-upgrade
apt autoremove

apt -y install ethtool socat conntrack

mkdir -p /etc/kubernetes/manifests

## If you are using NodeLocal DNSCache
sed -i -s 's/10.233.0.10/169.254.25.10/g' /var/lib/kubelet/config.yaml

Generate the token on your existing master (control-plane) node.
You are going to need it in order to join your new master / worker nodes.

If adding new master nodes, make sure to run the upload-certs phase:

This is to avoid copying /etc/kubernetes/pki manually from your master node to new master nodes.
kubeadm init phase upload-certs --upload-certs --config kubeadm-config.yaml

Generate the token which you will use for joining your new master or worker node to your kubernetes cluster:

kubeadm token create --config kubeadm-config.yaml --print-join-command

To join any number of the master (control-plane) nodes run the following command:

kubeadm join akash-master-lb.domainXYZ.com:6443 --token REDACTED.REDACTED --discovery-token-ca-cert-hash sha256:REDACTED --control-plane --certificate-key REDACTED

To join any number of the worker nodes run the following command:

kubeadm join akash-master-lb.domainXYZ.com:6443 --token REDACTED.REDACTED --discovery-token-ca-cert-hash sha256:REDACTED

Scale the ingress

Now that you have more than 1 worker nodes, you can scale the ingress-nginx-controller so to increase the service availability.
To do this, you just need to run the following commands.

Label all the workers with the akash.network/role=ingress label:

kubectl label nodes akash-worker-<##>.nixaid.com akash.network/role=ingress

Scale the ingress-nginx-controller to the number of workers you have:

kubectl -n ingress-nginx scale deployment ingress-nginx-controller --replicas=<number of worker nodes>

And now register new DNS A records for the IP of your worker nodes to *.ingress.nixaid.com wildcard name running the ingress-nginx-controller :-)

Example:

$ dig +noall +answer anything.ingress.nixaid.com
anything.ingress.nixaid.com. 1707 IN  A 167.86.73.47
anything.ingress.nixaid.com. 1707 IN  A 185.211.5.95

Known issues and workarounds

Donate

Please consider donating to me if you found this article useful.

Email me or DM me on Twitter https://twitter.com/andreyarapov for the donation.

References

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/#configuring-the-kubelet-cgroup-driver
https://storage.googleapis.com/kubernetes-release/release/stable.txt
https://gvisor.dev/docs/user_guide/containerd/quick_start/
https://github.com/containernetworking/cni#how-do-i-use-cni
https://docs.projectcalico.org/getting-started/kubernetes/quickstart
https://kubernetes.io/docs/concepts/overview/components/
https://matthewpalmer.net/kubernetes-app-developer/articles/how-does-kubernetes-use-etcd.html
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
https://docs.akash.network/operator/provider
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#tear-down