Reasons to Consider Terraform, GitOps and Portainer for Kubernetes

Introduction

Nowadays, bootstrapping Kubernetes clusters became monumentally simplified, especially with Terraform across many cloud providers and Kind and kubeadm for local. However, maintaining a bash script to initialise Kubernetes clusters with the basic infrastructure and monitoring components with Helm or Kustomize has been a huge challenge for me. For instance, it was nearly impossible to identify which cluster was with which version of ingress controller and so on.

Then I got to know GitOps founded by Flux from Weaveworks a while ago and I’ve been a big fan of it since the day I adopted it. The solution worked out quite nicely to resolve challenges around clusters consistency. All the modifications were recorded in GitHub.
Also, not having to expose the Kubernetes API server to CD was also a bonus to improve security.

However, the complexity of Kubernetes itself still remained. Developers with no Kubernetes background suffered from learning Kubernetes due to its complexity and needed a way to make it simplified. This is where Portainer comes to rescue. It is an open-source tool for managing container based applications across Kubernetes, Docker, Docker Swarm and Azure Container Instance (ACI) environments.

In this blog post, I will demonstrate how Terraform can be used to bootstrap a Kubernetes cluster in Azure, Flux to instate a list of infrastructure and monitoring Helm releases for a Kubernetes cluster and Portainer for application management.

Note: This blog post is based on Flux v1. GitOps Toolkit version will be released in near future. Likely to be covered during Flux v1 to v2 migration blog post

Walkthrough

Flux Repository

The following is how my base Flux repository is setup. This repository will be pointed by the Flux later on. You can find the details on my GitHub repository.

infrastructure

coredns-config.yaml to configure a custom DNS forwarder for my own domain
csi-release.yaml to integrate with Azure Key Vault and retrieve the secrets
kured-release.yaml to automatically apply OS patching for the AKS nodes every Saturday and Sunday
public-ingress-release.yaml to provide ingress rules exposing HTTP/HTTPS web traffics in public

monitoring

monitoring-namespace.yaml to create a namespace including certificate and PVCs for Prometheus and Grafana
grafana-release.yaml to deploy a Grafana instance with Azure AD integration, an ingress rule with Letsencrypt TLS and an existing PVC
loki-release.yaml to deploy Loki v2 with a storage account (S3 bucket version in Azure) for indexers and chunks persistence
prometheus-release to deploy a Prometheus instance with an existing PVC
promtail-release.yaml to deploy Promtail across all nodes for log collection

portainer

portainer-release.yaml to create a namespace and deploy Portainer

Bootstrapping Kubernetes

First thing first, a Kubernetes cluster will be required. In this walk-through, I will deploy an Azure Kubernetes Cluster with Terraform. The main.tf from the repository for the main AKS block is to enable:

Managed Identity for service integrations such as Azure Container Registry
Assign an Azure AD group with ClusterAdmin Role
Authorised list of IP addresses
An Azure Blob Storage Account for Loki indexers and chunks

Once Terraform apply finishes, ensure to update the Loki’s storage_config section with the right storage account access key.

Finally, deploy Flux & Helm Operator Helm charts. Below is an example of what I use:

az aks get-credentials -n kubernetes -g kubernetes-rg

helm repo add fluxcd https://charts.fluxcd.io
kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/master/deploy/crds.yaml

kubectl create namespace flux
kubectl create secret generic flux-git-auth \ 
  --namespace flux \
  --from-literal=GIT_AUTHUSER=${flux_git_username} \
  --from-literal=GIT_AUTHKEY=${flux_git_token} \
  --from-literal=GIT_URL=${flux_git_url}

helm upgrade -i flux fluxcd/flux \
  --set git.branch='master' \
  --set git.url='https://$(GIT_AUTHUSER):$(GIT_AUTHKEY)@$(GIT_URL)' \
  --set env.secretName=flux-git-auth \
  --set helm.versions=v3 \
  --set registry.acr.enabled=True \
  --set additionalArgs={--sync-garbage-collection} \
  --set nodeSelector."beta\.kubernetes\.io/os"=linux \
  --set memcached.nodeSelector."beta\.kubernetes\.io/os"=linux \
  --namespace flux

helm upgrade -i helm-operator fluxcd/helm-operator \
  --namespace flux \
  --set nodeSelector."beta\.kubernetes\.io/os"=linux \
  --set helm.versions=v3

Give it a few minutes and voila! Running kubectl get pods -A, you will see all the infrastructure and monitoring components in running state!

NAMESPACE        NAME                                                              READY   STATUS    RESTARTS   AGE
csi              csi-secrets-store-provider-azure-rnbnt                            1/1     Running   0          10m
csi              csi-secrets-store-provider-azure-secrets-store-csi-driver-ptdqs   3/3     Running   0          10m
flux             flux-6f7d499555-ch2xw                                             1/1     Running   0          10m
flux             flux-memcached-64f7865494-s6bn2                                   1/1     Running   0          10m
flux             helm-operator-85b4584d6-5fwj2                                     1/1     Running   0          10m
grafana          grafana-787b494774-646tb                                          1/1     Running   0          10m
grafana          loki-0                                                            1/1     Running   0          10m
grafana          promtail-6z4pv                                                    1/1     Running   0          10m
kube-system      azure-cni-networkmonitor-8bndg                                    1/1     Running   0          10m
kube-system      azure-ip-masq-agent-wlhpj                                         1/1     Running   0          10m
kube-system      azure-npm-mgfws                                                   1/1     Running   0          10m
kube-system      coredns-79766dfd68-mjrb4                                          1/1     Running   0          10m
kube-system      coredns-79766dfd68-v8pkb                                          1/1     Running   0          10m
kube-system      coredns-autoscaler-66c578cddb-lstdm                               1/1     Running   0          10m
kube-system      kube-proxy-zwl4j                                                  1/1     Running   0          10m
kube-system      metrics-server-7f5b4f6d8c-74qg4                                   1/1     Running   0          10m
kube-system      tunnelfront-7867c54b79-h8ztm                                      1/1     Running   0          10m
kured            kured-r5zvr                                                       1/1     Running   0          10m
portainer        portainer-6bddd5c8bc-dqvq8                                        1/1     Running   0          10m
prometheus       prometheus-kube-state-metrics-6df5d44568-b6b4k                    1/1     Running   0          10m
prometheus       prometheus-node-exporter-mkwhv                                    1/1     Running   0          10m
prometheus       prometheus-server-674c9d8dcf-9xnct                                2/2     Running   0          10m
public-ingress   cert-manager-58c645fccd-sjjp5                                     1/1     Running   0          10m
public-ingress   cert-manager-cainjector-78fc9bb777-r7jt7                          1/1     Running   0          10m
public-ingress   cert-manager-webhook-7fdb9b4d7d-fcxzc                             1/1     Running   0          10m
public-ingress   public-nginx-ingress-controller-99f56948f-59l86                   1/1     Running   0          10m
public-ingress   public-nginx-ingress-controller-99f56948f-8cbcj                   1/1     Running   0          10m
public-ingress   public-nginx-ingress-controller-99f56948f-v7xk2                   1/1     Running   0          10m
public-ingress   public-nginx-ingress-default-backend-698d978b4b-lbkg5             1/1     Running   0          10m

Now we are ready to deploy applications in our brand new AKS cluster with Portainer.

Portainer

Now, as per our ingress rule, Portainer will be accessible over http://portainer.ssbkang.io. Setup the local admin access, perform the initial Kubernetes features configuration to meet your requirements and click Save:

From Home, you will see the AKS instance. It is marked as local as Portainer is running locally inside the AKS cluster:

Navigate to the Cluster tab for more details around the Kubernetes nodes:

Navigating to the actual Kubernetes node visualises in-depth details:

Portainer has translated few Kubernetes terms to be more generic. For instance, Resource pools to represent Kubernetes namespaces and Applications for deployments. The idea is to simplify the wording for users who are not familiar with Kubernetes terms and manifests.

Let’s deploy a sample application. In this blog post, I will be using Stefan Prodan‘s podinfo application.

First of all, we will need a resource pool. It can be configured to have resource limits, which is equivalent to ResourceQuota but translated into a nice UI. Also, this is where the user can enable ingress with a hostname associated:

Inside the resource pool, the application will be created as per below:

Once deployed, I can then view the application status including the ingress endpoint:

Navigating to the endpoint URL, I can see my application running:

From this point, you will be able to manage this application in Portainer without having to know anything about Kubernetes manifests. This demonstrates a way to achieve yamlless deployment in Kubernetes!

Conclusion

In this blog post, I have walked through Terraform, FluxCD (GitOps) and Portainer to demonstrate the end-to-end Kubernetes setup including application management. As a DevOps or SRE, with this kind of a setup, you are ready to release a Kubernetes clusters for developers in a much more efficient and consistent way.

Hope this blog helped and for any questions or want clarifications, leave a comment! 😀

End to End TLS for Azure Front Door and Azure Kubernetes Service

Introduction

Whilst exploring options for exposing Azure Kubernetes Service (AKS) container services in public behind Web Application Firewall (WAF), I was able to find many references on how to accomplish end to end TLS encrypted connections between Azure Application Gateway and AKS (specifically Application Gateway Ingress Controller, AGIC), but not with Azure Front Door (AFD).

In this post, I will share how I achieved end to end TLS connectivity between AFD and AKS, including high level design, issue, resolution and optimised Azure cost.

Design

In a nutshell, the below diagram represents the high level overview regarding what I wanted to achieve:

Custom frontend domains to expose (they don’t exist, just examples):
- https://web.ssbkang.com
- https://api.ssbkang.com
Backend pools to be the Azure Load Balancer acting as an ingress controller for the AKS cluster:
- Backend host header to be null so that the request hostname determines this value.
  For instance, the requests in curl will be:
  - curl -vvv -H "Host: web.ssbkang.com" "LB IP Address"
  - curl -vvv -H "Host: api.ssbkang.com" "LB IP Address"
- HTTPS health probe hitting /healthz with HEAD protocol
1 x routing rule to be (apply URL rewrite and/or caching if required):
- Accepted frontend protocol: HTTPS only
- Forwarding protocol: HTTPS only
  Note: Implementing only one routing rule is to minimise the Azure cost by having only one routing rule but can scale out to multiple frontends / backend pools

Issue

I was expecting everything to work smoothly, however, faced the following from the browser:

Our services aren't available right now. 
We're working to restore all services as soon as possible. Please check back soon.

The first impression was that it sounded like the AFD instance was not fully deployed, but when I digged into it further, I managed to determine that the actual issue was related to the TLS certificate at the ingress controller level. I deployed the ingress controller using the official NGINX Helm Chart and by default, it leverages a self-signed certificate with the subject name ingress.local. Hence, when the AFD tried health probing at the backend (/healthz), it must have been returning bad request due to the fact that it does not match the certificate subject name.

Resolution

To resolve the above issue, there were two elements had to be amended; ingress controller and a public DNS entry for it.

I’ve initially deployed the ingress controller as a Helm chart leveraging FluxCD and whilst Googling, I managed to find an extra argument called default-ssl-certificate. This will automatically assign a TLS certificate if tls is missing from ingress manifests.

First of all, I used openssl to extract a crt file and a key file from a pfx certificate:

openssl pkcs12 -in wildcard.pfx -clcerts -nokeys -out tls.crt
openssl pkcs12 -in wildcard.pfx -nocerts -nodes -out tls.key

And then created a secret manifest in my Flux repository:

apiVersion: v1
kind: Secret
metadata:
  name: public-ingress-tls
  namespace: public-ingress
type: tls
data:
  tls.crt: {CRT}
  tls.key: {KEY}

Then I have updated the HelmRelease with the default-ssl-certificate argument as below:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: public-nginx-ingress
  namespace: public-ingress
spec:
  releaseName: public-nginx-ingress
  targetNamespace: public-ingress
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: nginx-ingress
    version: 1.39.0
  values:
    controller:
      ingressClass: public-nginx-ingress
      useIngressClassOnly: true
      replicaCount: 3
      nodeSelector.beta.kubernetes.io/os: "linux"
      service:
        annotations:
          service.beta.kubernetes.io/azure-load-balancer-internal: "false"
      extraArgs:
        default-ssl-certificate: "public-ingress/public-ingress-tls"
    defaultBackend:
      nodeSelector.beta.kubernetes.io/os: "linux"

Finally, I have added a public DNS for my public ingress controller aks-public-ingress.ssbkang.com and updated the backend accordingly.

After the DNS registration has finished, wallah, everything started working, magic 😀

Conclusion

In this blog post, I have discussed how to achieve end to end TLS encrypted connectivity between AFD and AKS ingress controller including how to optimise Azure cost at the AFD level.

AFD in fact support TLS terminations, however, as the ingress controller cannot be located in a private VNet i.e. a public instance, it is highly recommended to implement end to end TLS encryption.
Also, another security layer to consider is to update the NSG attached to the AKS subnet to be only accepting HTTP/HTTPS traffics from the AFD instance only (can use the service tag called AzureFrontDoor.Backend). This way, no one will be able to bypass AFD hitting the ingress controller directly.

Hope this helped and if you have any questions or require clarifications, leave a comment 🙂

Azure – Intra Subscription Azure SQL Database Migration Tip

Introduction

First blog for the year! It’s been extremely busy but it means that I’ve got lots of things to share.

In this blog post, I will be sharing efficient way of migrating (copying) a Azure database from one subscription to another.

Let’s get started!

Traditional Way

Previous to ARM model, and even ARM was introduced, I believe lots of you would still be using the following method:

Make a copy from a production database
Export the copied database to a blob storage
Import the .bacpac to the destination SQL server
Delete the database copy
Delete the .bacpac file

This method is completely fine, but there are some cons around this:

Export & import process takes long time which ends up incurring extra operational cost
Possibility of leaving the .bacpac file back in the blob storage (if not an automated way)
You need to know the administrator login details for both source and destination SQL servers

New Way

From the new ARM portal, it allows you to migrate a resource to a resource group in different subscription.
Using this functionality, the following approach could be made:

Create a temporary SQL server in the resource group where a production SQL server is running
Make a copy of the production database to the temporary SQL server
Move the temporary SQL server to another subscription
Make a copy of the copied database to the destination SQL server
Delete the temporary SQL server

What are the benefits?

Pretty much reverse to what’s mentioned above, cons for traditional way.
One of the main benefits is “reduced operational cost” – it doesn’t require a copied database to stand up for export & import processes.

Also, no additional space required in blob storage to hold the .bacpac file.

Hope this helps and feel free to leave a comment for any questions or clarifications 🙂