Terraformed Odyssey: From Code to Day Two Operation & Beyond

April 15, 2024 · 75 min read

Jr. Site Reliability Engineer

They say to master a new technology, you will have to play with it. While learning a new technology, I always write down the questions that pops up in my mind. And document it while trying to find answers. You can access my study notes at notes.sreboy.com.

These series of articles will be a refined version of my notes. I will try to cover the most important concepts and best practices I learned from documentations, exploring source code on github, github issues threads, other articles, youtube videos, and most importantly, through hands-on experimentation and embracing the creative chaos of exploration, much like assembling LEGO blocks, where I constantly experiment and combine different elements to learn.

In this article we will deploy using Terraform as much as possible, and with the minimum ClickOps required:

EKS cluster using terraform resources (No Modules).
Kube Prometheus Stack with Loki (Helm).
Two Ingress Nginx Controllers (Internal and External).
Basic Auth for Ingress. (Soon External auth using Keycloak with Terraform).
Configure Route53 with Split Horizon DNS.
Install Cert-Manager and configure it to automate the dns-01 challenge.
Restrict access to Route53 records by using IRSA.
Utilize sealed-secrets to store sensitive data in git. And Integrate it with kustomize.
Configure AWS Client VPN and AWS IAM Identity Center with:
- SSO: SAML based Federated Authentication.
- Active Directory Authentication.
Deploy ArgoCD with app-of-apps pattern. And then Deploy:
- CalcTube.
- GoViolin.
- Docker Voting App.
Send ArgoCD and Alert Manager Notifications to Slack.
Utilize kustomize to:
- Deploy custom dashboards to Grafana.
- Handle Sealed Secrets and ConfigMaps.

Besides that we will discuss:

How to build multi architecture Docker images.
OpenVPN over Shadowsocks to bypass Deep Packet Inspection.
Split Horizon DNS.

We will go through many concepts with some advanced configurations but at a quick pace. Because it is a From Code to Day Two Operation article at the end of the day. However, in the coming articles we will take a deep dive into each concept e.g.:

Docker Engine: namespaces, cgroups, pivot_root, etc.
Provision and Monitor a Highly Available Kubernetes Cluster from Scratch. E.g. monitor Certificates expiration dates. Besides, applying best practices to secure the cluster.
Maintaining a HA etcd cluster in production.
Many more...

Before we start you can access the code at:

GoViolin

This app is written in Go. It doesn't have any database or storage dependencies. Just a simple webapp that serves a static content.

Run Locally

Method One
Method Two
Method Three

go run $(ls -1 *.go | grep -v _test.go)

go run main.go home.go scale.go duet.go

go build -o main
./main

Dockerfile

We aim for our docker image to be as minimal as possible. So we will use multi-stage builds to achieve this. Also, supporting amd64 and arm64 architectures is a MUST for our app. Check REFERENCES section for useful resources.

In summary, we aim for a minimal multi-stage and multi-platform Docker image:

Dockerfile
FROM --platform=$BUILDPLATFORM golang:1.21.5 AS builder

WORKDIR /app

COPY go.mod go.sum /app/

RUN go mod download

COPY . .

ARG TARGETOS TARGETARCH

RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o main

FROM --platform=$TARGETPLATFORM scratch
WORKDIR /app
COPY --from=builder /app/ /app/

EXPOSE 8080

LABEL org.opencontainers.image.tags="ziadmmh/goviolin:v0.0.1,ziadmmh/goviolin:latest"
LABEL org.opencontainers.image.authors="ziadmansour.4.9.2000@gmail.com"

CMD ["/app/main"]

danger

Please do NOT forget the CGO_ENABLED=0 flag. Or you will face a weird error, that is hard to bug. Enjoy this good read after you finish :)

Debugging a weird 'file not found' error by Julia Evans

Enable Containerd Image Store

The term multi-platform image refers to a bundle of images for multiple different architectures. Out of the box, the default builder for Docker Desktop doesn't support building multi-platform images.

Enabling the containerd image store lets you build multi-platform images and load them to your local image store.

The containerd image store is NOT enabled by default. To enable the feature for Docker Desktop:

Navigate to Settings in Docker Desktop.
In the General tab, check Use containerd for pulling and storing images.
Select Apply & Restart.

To disable the containerd image store, clear the Use containerd for pulling and storing images checkbox. Please, do refer to the docs first.

Check Containerd Image Store is Enabled

docker info -f '{{ .DriverStatus }}'
[[driver-type io.containerd.snapshotter.v1]]

Build Image

Containerd Image Store
Docker Buildx

docker build --platform linux/arm64,linux/amd64 --progress plain -t ziadmmh/goviolin:v0.0.1 --push .

docker buildx build --platform linux/arm64,linux/amd64 --progress plain -t ziadmmh/goviolin:v0.0.1 --push .

GitHub Actions

This is a dummy GitHub Actions workflow that builds, extracts the image labels from Dockerfile, and then pushes the image to Docker Hub GoViolin Repository.

tip

It is a better idea to use the docker meta data action. To extract the image tags e.g. when you push a new tag to the repository or commit hash.

Click to expand

name: Test, Build, and Push Multi-Arch Image

on:
  push:
    branches:
      - master
  workflow_dispatch:

env:
  TAGS:
  TAG_VERSION:
  REPOSITORY:
  BRANCH_NAME:

permissions: write-all

jobs:
  test-and-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Run Go Tests
        run: go test ./...

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Extract metadata from Dockerfile
        run: |
          echo "TAGS=$(awk '/^LABEL org.opencontainers.image.tags/{gsub(/"/,"",$2); gsub(".*=",""); print }' Dockerfile)" >> $GITHUB_ENV
          echo "TAG_VERSION=$(echo $TAGS | cut -d: -f2)" >> $GITHUB_ENV
      
      - name: Check if TAGS is set
        run: |
          if [ -z "${{ env.TAGS }}" ]; then
            echo "TAGS environment variable is not set. Please set it before running this workflow."
            exit 1
          fi

      - name: Build and Push Multi-Arch Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          tags: ${{ env.TAGS }},ziadmmh/goviolin:latest
          push: true

      - name: Set Enviroment Variables
        run: |
          echo "REPOSITORY=ZiadMansourM/terraformed-odyssey" >> $GITHUB_ENV
          echo "BRANCH_NAME=update-goviolin-image-$RANDOM" >> $GITHUB_ENV

      - name: Checkout Code
        uses: actions/checkout@v4
        with:
          repository: ${{ env.REPOSITORY }}
          token: ${{ secrets.GH_CLI_TOKEN }}
          ref: main
          path: terraformed-odyssey

      - name: Checkout Branch and Update Image Tag
        working-directory: terraformed-odyssey/kubernetes/goviolin
        run: |
          git checkout -b "${{ env.BRANCH_NAME }}"
          rm -rf live && mkdir -p live
          kustomize edit set image ziadmmh/goviolin:${{ env.TAG_VERSION }}
          kustomize build > live/live.yaml
      
      - name: Update Image Tag and Send pull-request
        working-directory: terraformed-odyssey/kubernetes/goviolin
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
          git status
          git add live/live.yaml
          git commit -m "Update goviolin image tag to ${{ env.TAGS }}"
          git push origin ${{ env.BRANCH_NAME }}
      
      - name: Create Pull Request
        working-directory: terraformed-odyssey/kubernetes/goviolin
        run: |
          echo "${{ secrets.GH_CLI_TOKEN }}" > token.txt

          gh auth login --with-token < token.txt

          gh pr create \
          --title "Update goviolin image tag to ${{ env.TAGS }}" \
          --body "This PR updates the goviolin image tag to ${{ env.TAGS }}." \
          --base "main" \
          --head "${{ env.BRANCH_NAME }}"

CalcTube

I built this app to help me during my final exams. It calculates the time needed to watch a playlist on youtube. You just need to provide the playlist URL or Id and it returns the watch time corresponding to each speed.

Code

The code is super straight forward and easy to understand. It uses the pytube library to interact with the YouTube API. Have a look and please reach out if you have any question ^^.

Click to expand

main.py
from concurrent.futures import ThreadPoolExecutor
import re
import time

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse, HTMLResponse, FileResponse
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
from pytube import Playlist, YouTube


app = FastAPI()
templates = Jinja2Templates(directory="templates")

@app.get("/assets/{filename}")
async def read_item(filename: str):
    return FileResponse(f"templates/assets/{filename}")

def timeit(func):
    def wrapper(*args, **kwargs):
        print(f"Started {func.__name__}...")
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"Done {func.__name__} took {(end_time - start_time)*1000:.2f} ms to execute.")
        return result
    return wrapper


def get_playlist_id(link: str) -> str:
    video_pattern = r'(https?://)?(www\.)?(youtube\.com/watch\?v=)?([a-zA-Z0-9_-]+)&?list=([a-zA-Z0-9_-]+)'
    if match := re.match(video_pattern, link):
        return match[5]

    playlist_pattern = r'(https?://)?(www\.)?(youtube\.com/playlist\?list=)?([a-zA-Z0-9_-]+)'
    return match[4] if (match := re.match(playlist_pattern, link)) else link


def get_video_length(url: str) -> int:
    yt = YouTube(url)
    return yt.length


def get_playlist_duration(playlist_url: str) -> tuple[int, int, float]:
    playlist_id = get_playlist_id(playlist_url)
    playlist = Playlist(f"https://www.youtube.com/playlist?list={playlist_id}")
    video_count = len(playlist.video_urls)

    total_seconds = 0
    with ThreadPoolExecutor(max_workers=video_count) as executor:
        total_seconds = sum(executor.map(get_video_length, playlist.video_urls))

    avg_video_length = total_seconds / video_count if video_count != 0 else 0

    return total_seconds, video_count, avg_video_length


def calculate_speed_times(total_seconds: int) -> dict:
    speeds = [1, 1.25, 1.5, 1.75, 2]
    times = {}

    for speed in speeds:
        time_at_speed = total_seconds / speed
        hours = int(time_at_speed // 3600)
        minutes = int((time_at_speed % 3600) // 60)
        seconds = int(time_at_speed % 60)
        times[speed] = f"{hours} hours, {minutes} minutes, {seconds} seconds"

    return times


def format_time(seconds: float) -> str:
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours} hours, {minutes} minutes, {seconds} seconds"


@timeit
def run(user_input: str):
    total_seconds, video_count, avg_video_length = get_playlist_duration(user_input)
    times = calculate_speed_times(total_seconds)

    return {
        "videoCount": video_count,
        "avgVideoLength": format_time(avg_video_length),
        "speedTimes": dict(times.items()),
    }

@app.get("/", response_class=HTMLResponse)
async def read_item(request: Request):
    print("Here")
    return templates.TemplateResponse("index.html", {"request": request})

class PlaylistUrl(BaseModel):
    playlistUrl: str

@app.post("/calculate")
async def calculate_playlist_duration(playlist_url: PlaylistUrl):
    print("there")
    response_data = run(playlist_url.playlistUrl)
    return JSONResponse(content=response_data)

Dockerfile

It is actually advised to use requirements.txt file to install the dependencies. But for the sake of simplicity I will hard code the dependencies install directly in the Dockerfile without any version pinning.

FROM python:3.11-slim

WORKDIR /app

RUN pip install --no-cache-dir fastapi
RUN pip install --no-cache-dir uvicorn
RUN pip install --no-cache-dir pytube
RUN pip install --no-cache-dir jinja2

COPY main.py /app/
COPY templates /app/templates

EXPOSE 80

LABEL org.opencontainers.image.tags="ziadmmh/calctube:v0.0.1"
LABEL org.opencontainers.image.authors="ziadmansour.4.9.2000@gmail.com"

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

GitHub Actions

Same as the GoViolin app, we will use a dummy GitHub Actions workflow that builds, extracts the image labels from Dockerfile, and then pushes the image to Docker Hub CalcTube Repository.

Click to expand

name: Test, Build, and Push Multi-Arch Image

on:
  push:
    branches:
    - main
  workflow_dispatch:

env:
  TAGS:
  TAG_VERSION:
  REPOSITORY:
  BRANCH_NAME:

permissions: write-all

jobs:
  test-and-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Extract metadata from Dockerfile
        run: |
          echo "TAGS=$(awk '/^LABEL org.opencontainers.image.tags/{gsub(/"/,"",$2); gsub(".*=",""); print }' Dockerfile)" >> $GITHUB_ENV
          echo "TAG_VERSION=$(echo $TAGS | cut -d: -f2)" >> $GITHUB_ENV
      
      - name: Check if TAGS is set
        run: |
          if [ -z "${{ env.TAGS }}" ]; then
            echo "TAGS environment variable is not set. Please set it before running this workflow."
            exit 1
          fi

      - name: Build and Push Multi-Arch Docker Image
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          tags: ${{ env.TAGS }},ziadmmh/calctube:latest
          push: true

Just double check the image tags pushed and note the OS/ARCH supported.

CalcTube Image Tags

Voting App

The voting app is no different from the above setup. Please refer to the orginal repository and my fork of it for more information. And do reach out if you have any question ^^.

Voting App Architecture

Plan

Each of your apps has its own Github repository, and github actions is responsible for building and pushing the images to Docker Hub. The images are then pulled by ArgoCD and deployed to the EKS cluster. So in total we aim for Four Repositories. The three repositories containing the code you can configure them as you like. But the terraform-odyssey repo which contains the applications manifests according to the GitOps principles. I have configured it as follows and that will affect how the ArgoCD will deploy the applications:

Click to expand

ziadh@Ziads-MacBook-Air terraformed-odyssey % tree
.
├── argocd
│   ├── app-of-apps
│   │   ├── calctube.yaml
│   │   ├── goviolin.yaml
│   │   ├── system.yaml
│   │   └── voting-app.yaml
│   └── root-app
│       └── root-app.yaml
├── kubernetes
│   ├── README.md
│   ├── calctube
│   │   ├── 00-namespace.yaml
│   │   ├── 01-deployment.yaml
│   │   ├── 02-service.yaml
│   │   ├── 03-ingress.yaml
│   │   ├── files
│   │   │   └── auth
│   │   ├── kustomization.yaml
│   │   └── live
│   │       └── live.yaml
│   ├── goviolin
│   │   ├── 00-namespace.yaml
│   │   ├── 01-deployment.yaml
│   │   ├── 02-service.yaml
│   │   ├── 03-ingress.yaml
│   │   ├── files
│   │   │   └── auth
│   │   ├── kustomization.yaml
│   │   └── live
│   │       └── live.yaml
│   ├── system
│   │   ├── argocd-ingress.yaml
│   │   ├── argocd-notifications-cm.yaml
│   │   ├── components
│   │   │   └── sealed-secret-config.yaml
│   │   ├── dashboards
│   │   │   ├── argocd-14584.json
│   │   │   ├── cert-manager-20842.json
│   │   │   ├── ingress-nginx-14314.json
│   │   │   └── loki-14055.json
│   │   ├── kustomization.yaml
│   │   ├── live
│   │   │   └── live.yaml
│   │   ├── monitoring-ingress.yaml
│   │   ├── sealed-argocd-notifications-secret.yaml
│   │   └── secrets
│   │       └── argocd-notifications-secret-ignore.yaml
│   └── voting-app
│       ├── db-deployment.yaml
│       ├── db-service.yaml
│       ├── ingress.yaml
│       ├── kustomization.yaml
│       ├── live
│       │   └── live.yaml
│       ├── namespace.yaml
│       ├── redis-deployment.yaml
│       ├── redis-service.yaml
│       ├── result-deployment.yaml
│       ├── result-service.yaml
│       ├── vote-deployment.yaml
│       ├── vote-service.yaml
│       └── worker-deployment.yaml
└── terraform
    ├── 00_foundation
    │   ├── 00-locals.tf
    │   ├── 01-vpc.tf
    │   ├── 02-igw.tf
    │   ├── 03-subnets.tf
    │   ├── 04-nat-gw-eip.tf
    │   ├── 05-rt-rta.tf
    │   ├── 06-eks.tf
    │   ├── 07-node-group.tf
    │   ├── providers.tf
    │   ├── terraform.tfstate
    │   ├── terraform.tfstate.backup
    │   └── variables.tf
    ├── 10_platform
    │   ├── 00-kube-prometheus-stack-loki.tf
    │   ├── 01-ingress-nginx.tf
    │   ├── 02-route53.tf
    │   ├── 03-iam-oidc.tf
    │   ├── 04-cert-manager.tf
    │   ├── 05-sealed-secret.tf
    │   ├── data.tf
    │   ├── files
    │   │   ├── cert-manager-values.yaml
    │   │   ├── external-nginx-values.yaml
    │   │   ├── internal-nginx-values.yaml
    │   │   ├── kube-prometheus-stack-values.yaml
    │   │   ├── loki-distributed-values.yaml
    │   │   └── promtail-values.yaml
    │   ├── outputs.tf
    │   ├── providers.tf
    │   ├── terraform.tfstate
    │   ├── terraform.tfstate.backup
    │   └── variables.tf
    └── 15_platform
        ├── 00-argocd.tf
        ├── 01-vpn-acm.tf
        ├── 02-vpn-iam.tf
        ├── 03-vpn-sg.tf
        ├── 04-vpn-endpoint.tf
        ├── data.tf
        ├── files
        │   └── argocd-values.yaml
        ├── metadata
        │   ├── aws-client-vpn-self-service.xml
        │   └── aws-client-vpn.xml
        ├── outputs.tf
        ├── providers.tf
        ├── terraform.tfstate
        ├── terraform.tfstate.backup
        └── variables.tf

24 directories, 89 files

We will go through each file in details, but for a quick overview:

terraform directory contains the terraform code to provision the EKS cluster and the needed resources.
kubernetes directory contains the manifests for the applications we will deploy.
argocd directory contains the ArgoCD manifests for the applications and the ArgoCD itself.

Now lets discuss the contents of the terraform directory.

00_Foundation Layer

VPC.
Internet Gw.
Subnets.
Elastic IPs.
NAT Gateways.
Route Tables, Route Tables Association.
eks-cluster-role, eks-cluster-role-attachment then EKS Cluster.
eks-node-group-general-role and its Three different eks-node-group-general-role-attachment. Then aws_eks_node_group.

10_Platform Layer

Kube Prometheus Stack and Loki.
Ingress Nginx Controllers.
Route53 Split Horizon DNS.
Cert-Manager.
Sealed Secrets.

15_Platform Layer

ArgoCD.
VPN.

Pre-requisites

First make sure you downloaded aws-cli and created terraform user with programmatic access from the AWS Console.

AWS CLI

Follow the following link to download the latest aws-cli version compatible with your operating system:

Install or update to the latest version of the AWS CLI

aws --version
aws-cli/2.15.38 Python/3.11.8 Darwin/23.4.0 exe/x86_64 prompt/off

note

The AWS CLI version 2 is the most recent major version of the AWS CLI and supports all of the latest features. Some features introduced in version 2 are NOT backported to version 1 and you must upgrade to access those features.

Terraform User

Open AWS Console then navigate to IAM Service.
Click on Users then Create User.
Name user terraform.
Click Next then Add user to group and name it admin-access-automated-tools. Attach the AdministratorAccess policy then click Create user group. Next again, and finally Create User.
Navigate to terraform user and select Security Credentials tab.
Click Create access key and Select under use case Command Line Interface (CLI).
Read Alternatives recommended if you are okay check I understand and click Create.
Provide a description e.g. Terraform Programmatic Access then Create access key.
Download .csv file and store it in a safe place.

Access key best practices

Never store your access key in plain text, in a code repository, or in code.
Disable or delete access key when no longer needed.
Enable and stick to least-privilege permissions.
Rotate access keys regularly.
For more details about managing access keys, see the best practices for managing AWS access keys.

Install Access keys
cat $PATH_TO_CREDENTIALS_FILE/terraform_accessKeys.csv

# Enter region: eu-central-1
# Enter output format: json
aws configure --profile terraform

# To verify
cat ~/.aws/config
cat ~/.aws/credentials

00_Foundation

In this section we will provision the VPC, Internet GW, Subnets, Elastic IPs, NAT Gateway, Route Tables, EKS Cluster, EKS Node Groups, and IAM roles and policies needed.

Note that provisioning the 00_foundation took from me:

~12 minutes to apply.
~11 minutes to destroy.

We will be using the following Terraform providers:

AWS Provider

Variables

variables.tf
variable "region" {
  description = "The AWS region to deploy the resources."
  type        = string
  default     = "eu-central-1"
}

variable "profile" {
  description = "The AWS profile to use."
  type        = string
  default     = "terraform"
}

variable "aws_vpc_main_cidr" {
  description = "The CIDR block of the main VPC."
  type        = string
  default     = "10.0.0.0/16"
}

variable "cluster_name" {
  description = "The name of the EKS cluster."
  type        = string
  default     = "eks-cluster-production"
}

variable "eks_master_version" {
  description = "The Kubernetes version of the EKS cluster."
  type        = string
  default     = "1.28"
}

variable "worker_nodes_k8s_version" {
  description = "The Kubernetes version of the EKS worker nodes."
  type        = string
  default     = "1.28"
}

variable "node_group_scaling_config" {
  description = "The scaling configuration for the EKS node group."

  type = object({
    desired_size = number
    max_size     = number
    min_size     = number
  })

  default = {
    desired_size = 4
    max_size     = 4
    min_size     = 4
  }
}

Providers

providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.45.0"
    }
  }
}

provider "aws" {
  region  = var.region
  profile = var.profile
}

Disclaimer

Because the main.tf file is a bit lengthy, I will break it here to be easier for me to comment and provide addition resources for every related resources.

A better approach would have been grouping related resources in different .tf files. But for the sake of simplicity I didn't do it.

Update: I have updated the code and article to reflect the best practices.

Local variable

00-locals.tf
locals {
  tags = {
    author                   = "ziadh"
    "karpenter.sh/discovery" = var.cluster_name
  }
}

Craete VPC

You can see more at:

aws_vpc terraform Resource.
EKS Network Requirements documentation.

I choose the cidr block to be 10.0.0.0/16 you can choose them as per your convenience. You can visualize the subnets using Subnet Calculator. Also remember:

Private IP Addresses

Prefix	First Ip Address	Last Ip Address	Number of Addresses
10.0.0.0/8	10.0.0.0	10.255.255.255	16,777,216
172.16.0.0/12	172.16.0.0	172.31.255.255	1,048,576
192.168.0.0/16	192.168.0.0	192.168.255.255	65,536

01-vpc.tf
resource "aws_vpc" "main" {
  cidr_block = var.aws_vpc_main_cidr

  # Makes instances shared on the host.
  instance_tenancy = "default"

  # Required for EKS:
  # 1. Enable DNS support in the VPC.
  # 2. Enable DNS hostnames in the VPC.
  enable_dns_support   = true
  enable_dns_hostnames = true

  # Additional Arguments:
  assign_generated_ipv6_cidr_block = false

  tags = merge(local.tags, { Name = "eks-vpc" })
}

Create Internet Gateway

aws_internet_gateway terraform Resource.

02-igw.tf
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = merge(local.tags, { Name = "eks-igw" })
}

Subnets

We need two public and two private subnets. Read more here.

aws_subnet terraform Resource.
Visual Subnet Calculator.

Public Subnet 1
Public Subnet 2
Private Subnet 1
Private Subnet 2

03-subnets.tf
resource "aws_subnet" "public_1" {
  vpc_id = aws_vpc.main.id

  cidr_block        = "10.0.0.0/18"
  availability_zone = "${var.region}a"

  # Required for EKS: Instances launched into the subnet
  # should be assigned a public IP address.
  map_public_ip_on_launch = true

  tags = merge(
    local.tags,
    {
      Name                                          = "public-${var.region}a"
      "kubernetes.io/cluster/${var.cluster_name}" = "shared"
      "kubernetes.io/role/elb"                      = "1"
    }
  )
}

03-subnets.tf
resource "aws_subnet" "public_2" {
  vpc_id = aws_vpc.main.id

  cidr_block        = "10.0.64.0/18"
  availability_zone = "${var.region}b"

  # Required for EKS: Instances launched into the subnet
  # should be assigned a public IP address.
  map_public_ip_on_launch = true

  tags = merge(
    local.tags,
    {
      Name                                          = "public-${var.region}b"
      "kubernetes.io/cluster/${var.cluster_name}" = "shared"
      "kubernetes.io/role/elb"                      = "1"
    }
  )
}

03-subnets.tf
resource "aws_subnet" "private_1" {
  vpc_id = aws_vpc.main.id

  cidr_block        = "10.0.128.0/18"
  availability_zone = "${var.region}a"

  tags = merge(
    local.tags,
    {
      Name                                          = "private-${var.region}a"
      "kubernetes.io/cluster/${var.cluster_name}" = "shared"
      "kubernetes.io/role/internal-elb"             = "1"
    }
  )
}

03-subnets.tf
resource "aws_subnet" "private_2" {
  vpc_id = aws_vpc.main.id

  cidr_block        = "10.0.192.0/18"
  availability_zone = "${var.region}b"

  tags = merge(
    local.tags,
    {
      Name                                          = "private-${var.region}b"
      "kubernetes.io/cluster/${var.cluster_name}" = "shared"
      "kubernetes.io/role/internal-elb"             = "1"
    }
  )
}

note

Pay a close attention to:

The "kubernetes.io/role/elb" tag we had on public subnets vs "kubernetes.io/role/internal-elb". Read more on the docs here.
The map_public_ip_on_launch = true on public subnets ONLY.
Without this tag "kubernetes.io/cluster/${var.cluster_name}" the EKS cluster will not be able to communicate with the nodes.

Elastic IPs and NAT GWs

aws_eip terraform Resource.
aws_nat_gateway terraform Resource.

Elastic IP and NAT Gw One
Elastic IP and NAT Gw Two

04-nat-gw-eip.tf
resource "aws_eip" "nat_1" {
  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "gw_1" {
  subnet_id     = aws_subnet.public_1.id
  allocation_id = aws_eip.nat_1.id

  tags = merge(local.tags, { Name = "eks-nat-gw-1" })
}

04-nat-gw-eip.tf
resource "aws_eip" "nat_2" {
  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "gw_2" {
  subnet_id     = aws_subnet.public_2.id
  allocation_id = aws_eip.nat_2.id

  tags = merge(local.tags, { Name = "eks-nat-gw-2" })
}

RT and RTA

Route Tables and Route Tables Association section:

aws_route_table terraform Resource.
aws_route_table_association terraform Resource.

We will have three route tables and then associate each one of the four subnets with the appropriate route table.

Public Route Table
Private Route Table One
Private Route Table Two

05-rt-rta.tf
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = merge(local.tags, { Name = "eks-public-rt" })
}

05-rt-rta.tf
resource "aws_route_table" "private_1" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.gw_1.id
  }

  tags = merge(local.tags, { Name = "eks-private-rt-1" })
}

05-rt-rta.tf
resource "aws_route_table" "private_2" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.gw_2.id
  }

  tags = merge(local.tags, { Name = "eks-private-rt-2" })
}

And there respective associations:

05-rt-rta.tf
resource "aws_route_table_association" "public_1" {
  subnet_id      = aws_subnet.public_1.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "public_2" {
  subnet_id      = aws_subnet.public_2.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private_1" {
  subnet_id      = aws_subnet.private_1.id
  route_table_id = aws_route_table.private_1.id
}

resource "aws_route_table_association" "private_2" {
  subnet_id      = aws_subnet.private_2.id
  route_table_id = aws_route_table.private_2.id
}

IAM roles for EKS

aws_iam_role terraform Resource.
aws_iam_role_policy_attachment terraform Resource.

Note that we will attach the role to the AmazonEKSClusterPolicy policy, it is managed by aws. And the assume_role_policy is responsible on who can assume this role.

This role is used by the EKS control plane to make calls to AWS API operations on your behalf.

06-eks.tf
resource "aws_iam_role" "eks_cluster" {
  name = "eks-cluster"

  assume_role_policy = <<POLICY
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "eks.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }
  POLICY
}

resource "aws_iam_role_policy_attachment" "amazon_eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role = aws_iam_role.eks_cluster.name
}

EKS Cluster

aws_eks_cluster terraform Resource.

06-eks.tf
resource "aws_eks_cluster" "eks" {
  name = var.cluster_name

  # Amazon Resource Name (ARN) of the IAM role that provides permission for
  # the kubernetes control plane to make calls to aws API operations on your 
  # behalf.
  role_arn = aws_iam_role.eks_cluster.arn

  # Desired Kubernetes master version
  version = "1.28"

  vpc_config {
    endpoint_private_access = false
    endpoint_public_access  = true

    # Must be in at least two subnets in two different
    # availability zones.
    subnet_ids = [
      aws_subnet.public_1.id,
      aws_subnet.public_2.id,
      aws_subnet.private_1.id,
      aws_subnet.private_2.id
    ]
  }

  depends_on = [
    aws_iam_role_policy_attachment.amazon_eks_cluster_policy
  ]

  tags = local.tags
}

IAM roles for NodeGroups

We will create a role named eks-node-group-general and then attach three policies to that role:

Also we control who can assume the eks-node-group-general by the assume_role_policy below. Which is the EKS worker nodes that will assume this role.

In case you were wondering why we need these policies, please follow the docs here and the above links to know exactly what each policy gives permission to.

07-node-group.tf
resource "aws_iam_role" "node_group_general" {
  name = "eks-node-group-general"

  assume_role_policy = <<POLICY
  {
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "ec2.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }
  POLICY
}

AmazonEKSWorkerNodePolicy
AmazonEKS_CNI_Policy
AmazonEC2ContainerRegistryReadOnly

07-node-group.tf
resource "aws_iam_role_policy_attachment" "amazon_eks_worker_node_policy_general" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role = aws_iam_role.node_group_general.name
}

07-node-group.tf
resource "aws_iam_role_policy_attachment" "amazon_eks_cni_policy_general" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role = aws_iam_role.node_group_general.name
}

07-node-group.tf
resource "aws_iam_role_policy_attachment" "amazon_ec2_container_registry_read_only_general" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role = aws_iam_role.node_group_general.name
}

EKS NodeGroup

aws_eks_node_group terraform Resource.

07-node-group.tf
resource "aws_eks_node_group" "nodes_general" {
  cluster_name    = aws_eks_cluster.eks.name
  node_group_name = "nodes-general-group"
  node_role_arn   = aws_iam_role.node_group_general.arn

  # Identifiers of EC2 subnets to associate with the EKS Node Group.
  # These subnets must have the following resource tags:
  # - kubernetes.io/cluster/CLUSTER_NAME
  # Where CLUSTER_NAME is replaced with the name of the EKS cluster.
  subnet_ids = [
    aws_subnet.private_1.id,
    aws_subnet.private_2.id
  ]

  scaling_config {
    desired_size = var.node_group_scaling_config.desired_size
    max_size     = var.node_group_scaling_config.max_size
    min_size     = var.node_group_scaling_config.min_size
  }

  # Valid Values: AL2_x86_64, BOTTLEROCKET_x86_64
  # Ref: https://docs.aws.amazon.com/eks/latest/APIReference/API_Nodegroup.html#API_Nodegroup_Contents
  ami_type = "BOTTLEROCKET_x86_64"

  # Valid Values: ON_DEMAND, SPOT
  capacity_type = "ON_DEMAND"

  disk_size = 20 # GiB

  # Force version update if existing Pods are unable to be drained
  # due to a pod disruption budget issue.
  force_update_version = false

  # Docs: https://aws.amazon.com/ec2/instance-types/
  instance_types = ["t3.medium"]

  labels = {
    role = "nodes-general"
  }

  # If not specified, then inherited from the EKS master plane.
  version = "1.28"

  depends_on = [
    aws_iam_role_policy_attachment.amazon_eks_worker_node_policy_general,
    aws_iam_role_policy_attachment.amazon_eks_cni_policy_general,
    aws_iam_role_policy_attachment.amazon_ec2_container_registry_read_only_general
  ]

  tags = local.tags
}

warning

scaling_config {
  desired_size = 2
  max_size     = 2
  min_size     = 2
}

We can not have less than 2 worker nodes in the EKS cluster. As we will add a PodAntiAffinity rule to the ingress-nginx controller. More later on this.

Test & Verify

terraform fmt
terraform init
terraform validate
terraform plan
terraform apply

rm ~/.kube/config # (Optional)
aws eks --region eu-central-1 update-kubeconfig --name eks-cluster-production --profile terraform
kubectl get nodes,svc

10_Platform

In this section we will provision:

Kube Prometheus Stack and Loki.
Two Ingress Nginx Controllers.
Route53 with split horizon dns.
Cert-Manager.
Sealed Secrets.

info

Provisioning the 10_platform took from me:

~4 minutes to apply.
~2 minutes to destroy.

Vars

variables.tf
variable "region" {
  description = "The AWS region to deploy the resources."
  type        = string
  default     = "eu-central-1"
}

variable "profile" {
  description = "The AWS profile to use."
  type        = string
  default     = "terraform"
}

variable "cluster_name" {
  description = "The name of the EKS cluster."
  type        = string
  default     = "eks-cluster-production"
}

Providers

providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.45.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.29.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "1.14.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "2.13.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "4.0.5"
    }
  }
}

provider "aws" {
  region  = var.region
  profile = var.profile
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  token                  = data.aws_eks_cluster_auth.cluster.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    token                  = data.aws_eks_cluster_auth.cluster.token
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  }
}

provider "kubectl" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  token                  = data.aws_eks_cluster_auth.cluster.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  load_config_file       = false
}

Helm Intro

We will use terraform but I wanted to show you how to install them with helm.

Kube Prometheus Stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm search repo kube-prometheus-stack --max-col-width 23
# Release name: monitoring
# Helm chart name: kube-prometheus-stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--values prometheus-values.yaml \
--version 58.1.3 \
--namespace monitoring \
--create-namespace
# Later when you are done
helm uninstall monitoring -n monitoring

Ingress-Nginx
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm search repo ingress-nginx --max-col-width 23
helm install ingress-nginx ingress-nginx/ingress-nginx \
--values ingress-values.yaml \
--version 4.10.0 \
--namespace ingress-nginx \
--create-namespace
# Later when you are done
helm uninstall ingress-nginx -n ingress-nginx

Cert-Manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm search repo cert-manager --max-col-width 23
helm install cert-manager jetstack/cert-manager \
--values cert-manager-values.yaml \
--version 1.14.4 \
--namespace cert-manager \
--create-namespace
# Later when you are done
helm uninstall cert-manager -n cert-manager

Draft Plan

The following is just us drafting the plan, as we will use Terraform not the UI. Do not worry if you did not understand a certain part we are just planning. We will go into details later:

Delegate a subdomain to Route53. *.k8s.sreboy.com.
1. Create a public hosted zone in Route53.
  - Domain Name: k8s.sreboy.com.
2. Create a nameserver - (NS) record in your domain register e.g. Namecheap.
3. (Optional) Test subdomain delegation with a dummy test.k8s.sreboy.com in Route53 and try to resolve it with dig +short test.k8s.sreboy.com. Value can be anything: 10.10.10.10. You can also use this tool to see DNS propagation whatsmydns.

We will use IRSA: IAM Roles for Service Accounts to allow the cert-manager to manage the Route53 hosted zone.

Create OpenID Connect Provider first:
- Open eks service in AWS Console. Then under clusters select the cluster.
- Under Configuration tab, Copy the OpenID Connect Provider URL.
- Navigate to IAM Service then Identity Providers. Select Add provider.
- Select OpenID Connect, paste url and Get thumbprint.
- Under Audience: sts.amazonaws.com.
- Click Add provider.
Create an IAM policy. Name the policy CertManagerRoute53Access:

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Effect": "Allow",
          "Action": "route53:GetChange",
          "Resource": "arn:aws:route53:::change/*"
      },
      {
          "Effect": "Allow",
          "Action": [
              "route53:ChangeResourceRecordSets",
              "route53:ListResourceRecordSets"
          ],
          "Resource": "arn:aws:route53:::hostedzone/<id>"
      }
  ]
}

Craete an IAM role and associate it with the kubernetes service account. Under Roles click Create role.
- Select type of trusted entity to be Web identity.
- Choose the identity provider created in step 1.
- For Audience: sts.amazonaws.com.
- Click next for permissions and attach CertManagerRoute53Access policy.
- Name the role cert-manager-acme.

To allow only our cert-manager kubernetes account to assume this role, we need to update Trust Relationship of the cert-manager-acme role. Click edit Trust Relationships:

First we need the name of the service account attached to the cert-manager.
Run kubectl -n cert-manager get sa cert-manager called cert-083-cert-manager.
Update the trust relationship to be:

Before
After

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": <OIDC_PROVIDER_ARN>
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-central-1.amazonaws.com/id/<CLUSTER_ID>:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": <OIDC_PROVIDER_ARN>
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.eu-central-1.amazonaws.com/id/<CLUSTER_ID>:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}

Attach policy CertManagerRoute53Access to the role cert-manager-acme. Remember the assume_role_policy created inside the role defines who can assume this role.

Install Kube Prometheus Stack with custom values.yaml file.
Install Ingress-Nginx with custom values.yaml file.
Install Cert-Manager with custom values.yaml file.
Instal Sealed-Secret with custom values.yaml file.

Visualize Plan

The following is a Simplified Dependency Graph made by Mermaid.

Data

All needed data sources from previous layer. E.g. Use the aws_caller_identity data source to get the access to the effective Account ID, User ID, and ARN in which Terraform is authorized.

aws_caller_identity

data.tf
data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

data "aws_eks_cluster_auth" "cluster" {
  name = var.cluster_name
}

# Data Source: aws_caller_identity
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity
data "aws_caller_identity" "current" {}

data "kubernetes_service" "external_nginx_controller" {
  metadata {
    name      = "ingress-nginx-external-controller"
    namespace = "ingress-nginx-external"
  }

  depends_on = [
    helm_release.ingress-nginx-external
  ]
}

data "kubernetes_service" "internal_nginx_controller" {
  metadata {
    name      = "ingress-nginx-internal-controller"
    namespace = "ingress-nginx-internal"
  }

  depends_on = [
    helm_release.ingress-nginx-internal
  ]
}

data "tls_certificate" "demo" {
  url = data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer
}

Providers

providers.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.45.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.29.0"
    }
    kubectl = {
      source  = "gavinbunney/kubectl"
      version = "1.14.0"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "2.13.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "4.0.5"
    }
  }
}

provider "aws" {
  region  = var.region
  profile = var.profile
}

provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  token                  = data.aws_eks_cluster_auth.cluster.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    token                  = data.aws_eks_cluster_auth.cluster.token
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  }
}

provider "kubectl" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  token                  = data.aws_eks_cluster_auth.cluster.token
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
  load_config_file       = false
}

Variables

variables.tf
variable "region" {
  description = "The AWS region to deploy the resources."
  type        = string
  default     = "eu-central-1"
}

variable "profile" {
  description = "The AWS profile to use."
  type        = string
  default     = "terraform"
}

variable "cluster_name" {
  description = "The name of the EKS cluster."
  type        = string
  default     = "eks-cluster-production"
}

Outputs

outputs.tf
output "internal_nginx_dns_lb" {
  description = "Internal DNS name for the NGINX Load Balancer."
  value       = data.kubernetes_service.internal_nginx_controller.status.0.load_balancer.0.ingress.0.hostname
}

output "ns_records" {
  description = "The name servers for the public hosted zone"
  value       = aws_route53_zone.public.name_servers
}

output "external_nginx_dns_lb" {
  description = "External DNS name for the NGINX Load Balancer."
  value       = data.kubernetes_service.external_nginx_controller.status.0.load_balancer.0.ingress.0.hostname
}

output "issuer_url_oidc" {
  description = "Issuer URL for the OpenID Connect identity provider."
  value       = data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer
}

output "issuer_url_oidc_replaced" {
  description = "Issuer URL for the OpenID Connect identity provider without https://."
  value       = replace(data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer, "https://", "")
}

Kube Prometheus Stack

helm_release terraform Resource.

00-kube-prometheus-stack-loki.tf
resource "helm_release" "kube_prometheus_stack" {
  name             = "monitoring"
  namespace        = "monitoring"
  repository       = "https://prometheus-community.github.io/helm-charts"
  chart            = "kube-prometheus-stack"
  version          = "58.1.3"
  timeout          = 300
  atomic           = true
  create_namespace = true

  values = [
    "${file("files/kube-prometheus-stack-values.yaml")}"
  ]
}

resource "helm_release" "loki-distributed" {
  name             = "loki"
  namespace        = "monitoring"
  repository       = "https://grafana.github.io/helm-charts"
  chart            = "loki-distributed"
  version          = "0.79.0"
  timeout          = 300
  atomic           = true
  create_namespace = true

  values = [
    "${file("files/loki-distributed-values.yaml")}"
  ]

  depends_on = [helm_release.kube_prometheus_stack]
}

resource "helm_release" "promtail" {
  name             = "promtail"
  namespace        = "monitoring"
  repository       = "https://grafana.github.io/helm-charts"
  chart            = "promtail"
  version          = "6.15.5"
  timeout          = 300
  atomic           = true
  create_namespace = true

  values = [
    "${file("files/promtail-values.yaml")}"
  ]
}

Custom values.yaml

I provided inline comments explaining each value customized in the kube-prometheus-stack-values.yaml file.

Click Me (kube-prometheus-stack-values.yaml)

kube-prometheus-stack-values.yaml
---
# Ref: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml

# Since we are using eks. The control plane is abstracted away from us.
# We do NOT need to manage ETCD, scheduler, controller-manager, and API server.
# The following will disable alerts for etcd and kube-scheduler.
defaultRules:
  rules:
    etcd: false
    kubeScheduler: false

# Then disable servicemonitors for them
kubeControllerManager:
  enabled: false
kubeScheduler:
  enabled: false
kubeEtcd:
  enabled: false

# Add a custom labels to discover ServiceMonitors
prometheus:
  prometheusSpec:
    ## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector will cause the
    ## prometheus resource to be created with selectors based on values in the helm deployment,
    ## which will also match the servicemonitors created
    ##
    serviceMonitorSelectorNilUsesHelmValues: false

    serviceMonitorSelector: {}
      # matchLabels:
        # Prometheus will watch servicemonitors objects
        # with the following label:
        # e.g. app.kubernetes.io/monitored-by: prometheus
        # prometheus: monitor
    serviceMonitorNamespaceSelector: {}
      # matchLabels:
        # By default, prometheus will ONLY detect servicemonitors
        # in its own namespace `monitoring`. Instruct prometheus
        # to select service monitors in all namespaces with the
        # following label:
        # e.g. app.kubernetes.io/part-of: prometheus
        # monitoring: prometheus


# Last thing update common labels.
# If you did NOT add it. Prometheus Operator
# will IGNORE default service monitors created
# by this helm chart. Consequently, the prometheus 
# targets section will be empty.
# commonLabels:
#   prometheus: monitor
#   monitoring: prometheus

# Optionally, you can update the grafana admin password
grafana:
  adminPassword: testing321
  additionalDataSources:
  - name: Loki
    type: loki
    url: http://loki-loki-distributed-query-frontend.monitoring:3100

Click Me (loki-distributed-values.yaml)

loki-distributed-values.yaml
---
# Ref: https://github.com/grafana/helm-charts/blob/main/charts/loki-distributed/values.yaml
loki:
  serviceMonitor:
    enabled: true

Click Me (promtail-values.yaml)

promtail-values.yaml
---
# Ref: https://github.com/grafana/helm-charts/blob/main/charts/promtail/values.yaml
config:
  clients:
    - url: "http://loki-loki-distributed-gateway/loki/api/v1/push"

Ingress Nginx

External Ingress
Internal Ingress

01-ingress-nginx.tf
resource "helm_release" "ingress-nginx-external" {
  name             = "ingress-nginx-external"
  namespace        = "ingress-nginx-external"
  repository       = "https://kubernetes.github.io/ingress-nginx"
  chart            = "ingress-nginx"
  version          = "4.0.1"
  timeout          = 300
  atomic           = true
  create_namespace = true

  depends_on = [
    helm_release.kube_prometheus_stack
  ]

  values = [
    "${file("files/external-nginx-values.yaml")}"
  ]
}

01-ingress-nginx.tf
resource "helm_release" "ingress-nginx-internal" {
  name             = "ingress-nginx-internal"
  namespace        = "ingress-nginx-internal"
  repository       = "https://kubernetes.github.io/ingress-nginx"
  chart            = "ingress-nginx"
  version          = "4.0.1"
  timeout          = 300
  atomic           = true
  create_namespace = true

  depends_on = [
    helm_release.kube_prometheus_stack
  ]

  values = [
    "${file("files/internal-nginx-values.yaml")}"
  ]
}

Click Me (external-nginx-values.yaml)

external-nginx-values.yaml
---
# Ref: https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/values.yaml

controller:
  # name: controller
  # -- Election ID to use for status update, by default it uses the controller name combined with a suffix of 'leader'
  # electionID: ""
  config:
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#compute-full-forwarded-for
    compute-full-forwarded-for: "true"
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#use-forwarded-headers
    use-forwarded-headers: "true"
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#proxy-body-size
    proxy-body-size: "0"
  
  # This name we will reference this particular ingress controller
  # incase you have multiple ingress controllers, you can use
  # `ingressClassName` to specify which ingress controller to use.
  # ALSO: For backwards compatibility with ingress.class annotation, use ingressClass. Algorithm is as follows, first ingressClassName is considered, if not present, controller looks for ingress.class annotation. 
  # Ref: https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx
  # E.g. very often we have `internal` and `external` ingresses in the same cluster.
  ingressClass: external-nginx

  # New kubernetes APIs starting from 1.18 let us create an ingress class resource
  ingressClassResource:
    name: external-nginx
    # ENABLED: Create the IngressClass or not
    enabled: true
    # DEFAULT: If true, Ingresses without ingressClassName get assigned to this IngressClass on creation. Ingress creation gets rejected if there are multiple default IngressClasses. Ref: https://kubernetes.io/docs/concepts/services-networking/ingress/#default-ingress-class
    default: false
    # Ref: https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/#using-ingressclasses
    controllerValue: "k8s.io/ingress-nginx-external"

  # Pod Anti-Affinity Role: deploys nginx ingress pods on a different nodes
  # very helpful if you do NOT want to disrupt services during kubernetes rolling
  # upgrades.
  # IMPORTANT: try always to use it.
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - ingress-nginx
        topologyKey: "kubernetes.io/hostname"
  
  # Should at least be 2 or configured auto-scaling
  replicaCount: 1

  # Admission webhooks: verifies the configuration before applying the ingress.
  # E.g. syntax error in the configuration snippet annotation, the generated
  # configuration becomes invalid
  admissionWebhooks:
    enabled: true

  # Ingress is always deployed with some kind of a load balancer. You may use
  # annotations supported by your cloud provider to configure it. E.g. in AWS
  # you can use `aws-load-balancer-type` as the default is `classic`.
  service:
    annotations:
      # Ref: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/annotations/
      service.beta.kubernetes.io/aws-load-balancer-name: "load-balancer-external"
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      # service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

  # We want to enable prometheus metrics on the controller
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      # additionalLabels:
      #   prometheus: monitor

Click Me (internal-nginx-values.yaml)

internal-nginx-values.yaml
---
# Ref: https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/values.yaml

controller:
  # name: controller
  # -- Election ID to use for status update, by default it uses the controller name combined with a suffix of 'leader'
  # electionID: ""
  config:
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#compute-full-forwarded-for
    compute-full-forwarded-for: "true"
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#use-forwarded-headers
    use-forwarded-headers: "true"
    # https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/configmap.md#proxy-body-size
    proxy-body-size: "0"
  
  # This name we will reference this particular ingress controller
  # incase you have multiple ingress controllers, you can use
  # `ingressClassName` to specify which ingress controller to use.
  # ALSO: For backwards compatibility with ingress.class annotation, use ingressClass. Algorithm is as follows, first ingressClassName is considered, if not present, controller looks for ingress.class annotation. 
  # Ref: https://github.com/kubernetes/ingress-nginx/tree/main/charts/ingress-nginx
  # E.g. very often we have `internal` and `external` ingresses in the same cluster.
  ingressClass: internal-nginx

  # New kubernetes APIs starting from 1.18 let us create an ingress class resource
  ingressClassResource:
    name: internal-nginx
    # ENABLED: Create the IngressClass or not
    enabled: true
    # DEFAULT: If true, Ingresses without ingressClassName get assigned to this IngressClass on creation. Ingress creation gets rejected if there are multiple default IngressClasses. Ref: https://kubernetes.io/docs/concepts/services-networking/ingress/#default-ingress-class
    default: true
    # Ref: https://kubernetes.github.io/ingress-nginx/user-guide/multiple-ingress/#using-ingressclasses
    controllerValue: "k8s.io/ingress-nginx-internal"

  # Pod Anti-Affinity Role: deploys nginx ingress pods on a different nodes
  # very helpful if you do NOT want to disrupt services during kubernetes rolling
  # upgrades.
  # IMPORTANT: try always to use it.
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - ingress-nginx
        topologyKey: "kubernetes.io/hostname"
  
  # Should at least be 2 or configured auto-scaling
  replicaCount: 1

  # Admission webhooks: verifies the configuration before applying the ingress.
  # E.g. syntax error in the configuration snippet annotation, the generated
  # configuration becomes invalid
  admissionWebhooks:
    enabled: true

  # Ingress is always deployed with some kind of a load balancer. You may use
  # annotations supported by your cloud provider to configure it. E.g. in AWS
  # you can use `aws-load-balancer-type` as the default is `classic`.
  service:
    external:
      enabled: false
    internal:
      enabled: true
      annotations:
        # Ref: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/annotations/
        # if you want to have an internal load balancer with only private
        # IP address. That you can use within your VPC. you can use:
        # Ref: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-name: "load-balancer-internal"
        service.beta.kubernetes.io/aws-load-balancer-schema: "internal"
        # service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"

  # We want to enable prometheus metrics on the controller
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      # additionalLabels:
      #   prometheus: monitor

Route53

It is time to create the public and private hosted zone in Route53, as I have said before we will implement the split horizon dns. I registered my domain name from Namecheap and we will delegate the subdomain k8s.sreboy.com to Route53.

Why do we need to delegate the subdomain to Route53? Because we want to use the cert-manager to manage the Route53 hosted zone. This is done using the IAM Roles for Service Accounts (IRSA). Much more easily to be done on Route53 than on Namecheap.

Basically, the steps are:

Create Public Hosted Zone in Route53.
Create a NS record in Namecheap to delegate the subdomain to Route53.
(Optionally) Test the delegation with a dummy record.

Resources Used:

aws_route53_zone terraform Resource.

Public Hosted Zone
Private Hosted Zone

02-route53.tf
resource "aws_route53_zone" "public" {
  name = "k8s.sreboy.com"
}

resource "aws_route53_record" "wildcard_cname" {
  zone_id = aws_route53_zone.public.zone_id
  name    = "*"
  type    = "CNAME"
  ttl     = "300"

  records = [
    data.kubernetes_service.external_nginx_controller.status.0.load_balancer.0.ingress.0.hostname
  ]
}

02-route53.tf
resource "aws_route53_zone" "private" {
  name = "k8s.sreboy.com"

  vpc {
    vpc_id = data.aws_eks_cluster.cluster.vpc_config.0.vpc_id
  }
}

resource "aws_route53_record" "internal_wildcard_cname" {
  zone_id = aws_route53_zone.private.zone_id
  name    = "*"
  type    = "CNAME"
  ttl     = "300"

  records = [
    data.kubernetes_service.internal_nginx_controller.status.0.load_balancer.0.ingress.0.hostname
  ]
}

Retarded Namecheap API

Namecheap have a very retarded API. See docs. They require you to whitelist the IP address of the server you are calling their API from. You can NOT add a cider only a static IP address and you have only 10 IP addresses to whitelist. Along side with other hilarious decisions from their API design wise e.g. while adding or updating a record you can DELETE all your previous records if you forgot to set mode from OVERWRITE to MERGE and if you are calling the API raw you do have to include all your previous records in the call. It is a joke (a bad one).

So, instead of using terraform:

namecheap_domain_records terraform resource.

resource "namecheap_domain_records" "delegate_to_route53" {
  domain = "sreboy.com"

  for_each = aws_route53_zone.k8s.name_servers

  record {
    hostname = "k8s"
    type = "NS"
    address = each.value
  }
}

I will do it from the UI of Namecheap once the public hosted zone is created in Route53.

tip

You can use shadowsocks if your ISP does not provide static IP address service like me. Then just white list the Elastic IP of this shadowsocks server. I learned that after I finished writing the article :)

Cert-Manager

Now it is time to install the cert-manager. We will use the cert-manager to manage and automate obtaining and renewing SSL certificates for our services.

It is the same part as the block called Cert-Manager Configuration in the graph above. But I will divide them into separate blocks for better understanding.

03-iam-oidc.tf
resource "aws_iam_openid_connect_provider" "eks_oidc" {
  url             = data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.demo.certificates[0].sha1_fingerprint]
}

04-cert-manager.tf
resource "aws_iam_policy" "cert_manager_route53_access" {
  name        = "CertManagerRoute53Access"
  description = "Policy for cert-manager to manage Route53 hosted zone"
  depends_on = [
    aws_route53_zone.public,
    aws_route53_zone.private,
  ]
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "route53:GetChange",
      "Resource": "arn:aws:route53:::change/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets"
      ],
      "Resource": [
        "arn:aws:route53:::hostedzone/${aws_route53_zone.public.zone_id}",
        "arn:aws:route53:::hostedzone/${aws_route53_zone.private.zone_id}"
      ]
    }
  ]
}
EOF

  # [1]: The first Statement is to be able to get the current state 
  # of the request, to find out if dns record changes have been 
  # propagated to all route53 dns servers. 
  # [2]: The second statement one to update dns records such as txt 
  # for acme challange. We need to replace `<id>` with the hosted zone id.
}

resource "aws_iam_role" "cert_manager_acme" {
  name               = "cert-manager-acme"
  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${replace(data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer, "https://", "")}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${replace(data.aws_eks_cluster.cluster.identity.0.oidc.0.issuer, "https://", "")}:sub": "system:serviceaccount:cert-manager:cert-manager"
        }
      }
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "cert_manager_acme" {
  role       = aws_iam_role.cert_manager_acme.name
  policy_arn = aws_iam_policy.cert_manager_route53_access.arn
}

resource "helm_release" "cert-manager" {
  name             = "cert-manager"
  namespace        = "cert-manager"
  repository       = "https://charts.jetstack.io"
  chart            = "cert-manager"
  version          = "1.14.4"
  timeout          = 300
  atomic           = true
  create_namespace = true

  depends_on = [
    aws_iam_role_policy_attachment.cert_manager_acme,
  ]

  values = [
    <<YAML
installCRDs: true
# Helm chart will create the following CRDs:
# - Issuer
# - ClusterIssuer
# - Certificate
# - CertificateRequest
# - Order
# - Challenge


# Enable prometheus metrics, and create a service
# monitor object
prometheus:
  # Ref: https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/README.template.md#prometheusenabled--bool
  enabled: true
  servicemonitor:
    enabled: true
    # Incase we had more than one prometheus instance
    # prometheusInstance: monitor


# DNS-01 Route53
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: ${aws_iam_role.cert_manager_acme.arn}
extraArgs:
# You need to provide the following to be able to use the IAM role.
# If you are using cluster issuer you need to replace it with:
- --cluster-issuer-ambient-credentials
- --issuer-ambient-credentials
# - --enable-certificate-owner-ref=true
- --dns01-recursive-nameservers-only
- --dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53
    YAML
  ]
}

Public Cluster Issuer
Private Cluster Issuer

04-cert-manager.tf
resource "kubectl_manifest" "cert_manager_cluster_issuer_public" {
  depends_on = [
    helm_release.cert-manager
  ]

  yaml_body = yamlencode({
    "apiVersion" = "cert-manager.io/v1"
    "kind"       = "ClusterIssuer"
    "metadata" = {
      "name" = "letsencrypt-dns01-production-cluster-issuer-public"
    }
    "spec" = {
      "acme" = {
        "server" = "https://acme-v02.api.letsencrypt.org/directory"
        "email"  = "ziadmansour.4.9.2000@gmail.com"
        "privateKeySecretRef" = {
          "name" = "letsencrypt-production-dns01-public-key-pair"
        }
        "solvers" = [
          {
            "dns01" = {
              "route53" = {
                "region"       = "${var.region}"
                "hostedZoneID" = "${aws_route53_zone.public.zone_id}"
              }
            }
          }
        ]
      }
    }
  })
}

04-cert-manager.tf
resource "kubectl_manifest" "cert_manager_cluster_issuer_private" {
  depends_on = [
    helm_release.cert-manager
  ]

  yaml_body = yamlencode({
    "apiVersion" = "cert-manager.io/v1"
    "kind"       = "ClusterIssuer"
    "metadata" = {
      "name" = "letsencrypt-dns01-production-cluster-issuer-private"
    }
    "spec" = {
      "acme" = {
        "server" = "https://acme-v02.api.letsencrypt.org/directory"
        "email"  = "ziadmansour.4.9.2000@gmail.com"
        "privateKeySecretRef" = {
          "name" = "letsencrypt-production-dns01-private-key-pair"
        }
        "solvers" = [
          {
            "dns01" = {
              "route53" = {
                "region"       = "${var.region}"
                "hostedZoneID" = "${aws_route53_zone.private.zone_id}"
              }
            }
          }
        ]
      }
    }
  })
}

Test & Verify

Verify that the dns delegation for the subdomain is working successfully.
- Use whatsmydns to check the DNS propagation. Enter k8s.sreboy.com and see if the NS records are propagated. you should see the same output produced by output ns_records run terraform output ns_records to see them again.
Verify that the wildcard CNAME record is created in Route53:
- Run dig +short test.k8s.sreboy.com and see if it resolves to the external load balancer of the ingress-nginx controller.
- Or any other subdomain it is a wildcard dig +short <*>.k8s.sreboy.com.
Before You move to the NEXT layer:
- Run:

cd terraformed-odyssey/kubernetes/system

# Create Secret 
kubectl create secret generic argocd-notifications-secret -n argocd --from-literal slack-token=<slack-token> --dry-run=client -o yaml > secrets/argocd-notifications-secret.yaml
# Do NOT Forget to add annotations as in here:
# Ref: https://github.com/ZiadMansourM/terraformed-odyssey/blob/main/kubernetes/system/secrets/.gitkeep

# Then Seal the Secret
kubeseal --controller-name sealed-secrets --controller-namespace sealed-secrets --format yaml < secrets/argocd-notifications-secret-ignore.yaml > sealed-argocd-notifications-secret.yaml

15_Platform

In this layer, we will deploy:

ArgoCD using app-of-apps pattern.
AWS Client VPN using various authentication methods and over a proxy.

tip

You can use kustomize or define yaml variables inside the yaml files. Do what you feel comfortable with. I will use the yaml files directly for simplicity.

Visualize Plan

It is an overly simplified graph. Just to help you visualize. And in which namespace objects exists e.g. where the dashboards are deployed ...etc.

Cluster Issuer
Namespaced Issuer

Cert-Manager Issuers

Cluster Issuer
Monitoring Issuer
Goviolin Issuer
Voting Issuer

issuers.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns01-production-cluster-issuer
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ziadmansour.4.9.2000@gmail.com
    privateKeySecretRef:
      name: letsencrypt-production-dns01-key-pair
    solvers:
    - dns01:
        route53:
          region: eu-central-1
          hostedZoneID: Z10172763D2LB47VXDFP9

issuers.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-dns01-production
  namespace: monitoring
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ziadmansour.4.9.2000@gmail.com
    privateKeySecretRef:
      name: letsencrypt-production-dns01-key-pair
    solvers:
    - dns01:
        route53:
          region: eu-central-1
          hostedZoneID: Z10172763D2LB47VXDFP9

issuers.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-dns01-production
  namespace: goviolin
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ziadmansour.4.9.2000@gmail.com
    privateKeySecretRef:
      name: letsencrypt-production-dns01-key-pair
    solvers:
    - dns01:
        route53:
          region: eu-central-1
          hostedZoneID: Z10172763D2LB47VXDFP9

issuers.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-dns01-production
  namespace: voting
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ziadmansour.4.9.2000@gmail.com
    privateKeySecretRef:
      name: letsencrypt-production-dns01-key-pair
    solvers:
    - dns01:
        route53:
          region: eu-central-1
          hostedZoneID: Z10172763D2LB47VXDFP9

Monitoring Namespace Ingress

ingress-monitoring-ns.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: monitoring-ns-ingress
  namespace: monitoring
  annotations:
    cert-manager.io/issuer: letsencrypt-dns01-production-cluster-issuer
spec:
  ingressClassName: external-nginx
  tls:
  - hosts:
    - grafana.k8s.sreboy.com
    secretName: grafana-goviolin-k8s-sreboy-com-key-pair
  - hosts:
    - prometheus.k8s.sreboy.com
    secretName: prometheus-goviolin-k8s-sreboy-com-key-pair
  rules:
  - host: grafana.k8s.sreboy.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: monitoring-grafana
            port:
              number: 80
  - host: prometheus.k8s.sreboy.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: monitoring-kube-prometheus-prometheus
            port:
              number: 9090

Goviolin Namespace

goviolin.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: goviolin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: goviolin
  namespace: goviolin
spec:
  replicas: 3
  selector:
    matchLabels:
      app: goviolin
  template:
    metadata:
      labels:
        app: goviolin
    spec:
      containers:
      - name: goviolin
        image: ziadmmh/goviolin:v0.0.1
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: goviolin
  namespace: goviolin
spec:
  selector:
    app: goviolin
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: goviolin
  namespace: goviolin
  annotations:
    cert-manager.io/issuer: letsencrypt-dns01-production-cluster-issuer
spec:
  ingressClassName: external-nginx
  tls:
  - hosts:
    - goviolin.k8s.sreboy.com
    secretName: goviolin-k8s-sreboy-com-key-pair
  rules:
  - host: goviolin.k8s.sreboy.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: goviolin
            port:
              number: 80
---

Custom Dashboards

We are using the Kube Prometheus Stack. It is like the defacto solution to deploy Prometheus and Grafana on kubernetes. It has default targets and dashboards already configured.

Also, it has a unique and an easy way to add custom dashboards:

Visit Grafana Dashboards
Choose the dashboards you like, say:
- For Cert Manger: id 20842 AND download json cert-manager-20842.json.
- For Ingress Nginx: id 14314 AND download json ingress-nginx-14314.json.
Run the following commands:

kubectl create configmap cert-manager-dashboard-20842 --from-file=$PWD/dashboards/cert-manager-20842.json --dry-run=client -o yaml > cert-manager-dashboard-20842.yaml

kubectl create configmap ingress-nginx-dashboard-14314 --from-file=$PWD/dashboards/ingress-nginx-14314.json --dry-run=client -o yaml > ingress-nginx-dashboard-14314.yaml

We are NOT finished yet:

Current working directory like this:
ziadh@Ziads-MacBook-Air files % tree
.
├── cert-manager-dashboard-20842.yaml
├── dashboards
│   ├── cert-manager-20842.json
│   └── ingress-nginx-14314.json
├── goviolin.yaml
├── ingress-nginx-dashboard-14314.yaml
├── issuers.yaml
└── monitoring.yaml

1 directory, 7 files

Now you need to vi cert-manager-dashboard-20842.yaml and vi ingress-nginx-dashboard-14314.yaml and add the following lines:

labels:
  grafana_dashboard: "1"

This is how grafana discovers the dashboards and registers them.

cert-manager-dashboard-20842.yaml

apiVersion: v1
data:
  cert-manager-20842.json: |-
    {
        "annotations": {
          "list": [
            {
              "builtIn": 1,
              "datasource": {
                "type": "grafana",
                "uid": "-- Grafana --"
              },
              "enable": true,
              "hide": true,
              "iconColor": "rgba(0, 211, 255, 1)",
              "name": "Annotations & Alerts",
              "type": "dashboard"
            }
          ]
        },
        "description": "The dashboard gives an overview of the SSL certs managed by cert-manager in Kubernetes",
        "editable": true,
        "fiscalYearStartMonth": 0,
        "gnetId": 20842,
        "graphTooltip": 0,
        "links": [],
        "panels": [
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "The number if available certificates",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "thresholds"
                },
                "mappings": [],
                "noValue": "0",
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "green",
                      "value": null
                    }
                  ]
                }
              },
              "overrides": []
            },
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 0,
              "y": 0
            },
            "id": 1,
            "options": {
              "colorMode": "value",
              "graphMode": "none",
              "justifyMode": "auto",
              "orientation": "auto",
              "reduceOptions": {
                "calcs": [
                  "lastNotNull"
                ],
                "fields": "",
                "values": false
              },
              "showPercentChange": false,
              "textMode": "value",
              "wideLayout": true
            },
            "pluginVersion": "10.4.0",
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "disableTextWrap": false,
                "editorMode": "code",
                "exemplar": false,
                "expr": "count(certmanager_certificate_ready_status{condition=\"True\", exported_namespace=~\"$namespace\"})",
                "fullMetaSearch": false,
                "includeNullMetadata": true,
                "instant": true,
                "legendFormat": "__auto",
                "range": false,
                "refId": "A",
                "useBackend": false
              }
            ],
            "title": "Valid Certificates",
            "type": "stat"
          },
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "The number of certificates that will expire within the next 14 days",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "thresholds"
                },
                "mappings": [],
                "noValue": "0",
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "green",
                      "value": null
                    },
                    {
                      "color": "#EAB839",
                      "value": 1
                    }
                  ]
                }
              },
              "overrides": []
            },
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 8,
              "y": 0
            },
            "id": 3,
            "options": {
              "colorMode": "value",
              "graphMode": "none",
              "justifyMode": "auto",
              "orientation": "auto",
              "reduceOptions": {
                "calcs": [
                  "lastNotNull"
                ],
                "fields": "",
                "values": false
              },
              "showPercentChange": false,
              "textMode": "auto",
              "wideLayout": true
            },
            "pluginVersion": "10.4.0",
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "disableTextWrap": false,
                "editorMode": "code",
                "exemplar": false,
                "expr": "count(certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} < (time()+(14*24*3600)))",
                "fullMetaSearch": false,
                "includeNullMetadata": true,
                "instant": true,
                "legendFormat": "{{exported_namespace}}/{{name}}",
                "range": false,
                "refId": "A",
                "useBackend": false
              }
            ],
            "title": "Expiring Certificates",
            "type": "stat"
          },
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "Total number of HTTP requests",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "thresholds"
                },
                "mappings": [],
                "noValue": "0",
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "text",
                      "value": null
                    }
                  ]
                }
              },
              "overrides": []
            },
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 16,
              "y": 0
            },
            "id": 2,
            "options": {
              "colorMode": "value",
              "graphMode": "none",
              "justifyMode": "auto",
              "orientation": "auto",
              "reduceOptions": {
                "calcs": [
                  "lastNotNull"
                ],
                "fields": "",
                "values": false
              },
              "showPercentChange": false,
              "textMode": "auto",
              "wideLayout": true
            },
            "pluginVersion": "10.4.0",
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "disableTextWrap": false,
                "editorMode": "code",
                "exemplar": false,
                "expr": "sum(certmanager_http_acme_client_request_count)",
                "fullMetaSearch": false,
                "includeNullMetadata": true,
                "instant": true,
                "legendFormat": "__auto",
                "range": false,
                "refId": "A",
                "useBackend": false
              }
            ],
            "title": "Total ACME Requests",
            "type": "stat"
          },
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "Time before the certificates expire",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "thresholds"
                },
                "mappings": [],
                "min": 0,
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "red",
                      "value": null
                    },
                    {
                      "color": "orange",
                      "value": 14
                    },
                    {
                      "color": "green",
                      "value": 30
                    },
                    {
                      "color": "dark-green",
                      "value": 60
                    }
                  ]
                },
                "unit": "d"
              },
              "overrides": []
            },
            "gridPos": {
              "h": 10,
              "w": 12,
              "x": 0,
              "y": 8
            },
            "id": 5,
            "options": {
              "displayMode": "gradient",
              "maxVizHeight": 300,
              "minVizHeight": 16,
              "minVizWidth": 8,
              "namePlacement": "left",
              "orientation": "horizontal",
              "reduceOptions": {
                "calcs": [
                  "lastNotNull"
                ],
                "fields": "",
                "values": false
              },
              "showUnfilled": true,
              "sizing": "auto",
              "valueMode": "color"
            },
            "pluginVersion": "10.4.0",
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "editorMode": "code",
                "exemplar": false,
                "expr": "sort_desc(certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time())/(24*3600)",
                "format": "time_series",
                "instant": true,
                "legendFormat": "{{name}}",
                "range": false,
                "refId": "A"
              }
            ],
            "title": "Time to Expiration",
            "type": "bargauge"
          },
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "Time before the certificates are automatically renewed",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "thresholds"
                },
                "mappings": [],
                "min": 0,
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "green",
                      "value": null
                    }
                  ]
                },
                "unit": "d"
              },
              "overrides": []
            },
            "gridPos": {
              "h": 10,
              "w": 12,
              "x": 12,
              "y": 8
            },
            "id": 6,
            "options": {
              "displayMode": "gradient",
              "maxVizHeight": 300,
              "minVizHeight": 16,
              "minVizWidth": 8,
              "namePlacement": "left",
              "orientation": "horizontal",
              "reduceOptions": {
                "calcs": [
                  "lastNotNull"
                ],
                "fields": "",
                "values": false
              },
              "showUnfilled": true,
              "sizing": "auto",
              "valueMode": "color"
            },
            "pluginVersion": "10.4.0",
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "editorMode": "code",
                "exemplar": false,
                "expr": "sort(certmanager_certificate_renewal_timestamp_seconds{exported_namespace=~\"$namespace\"} - time())/(24*3600)",
                "format": "time_series",
                "instant": true,
                "legendFormat": "{{name}}",
                "range": false,
                "refId": "A"
              }
            ],
            "title": "Time to Automatic Renewal",
            "type": "bargauge"
          },
          {
            "datasource": {
              "type": "prometheus",
              "uid": "${datasource}"
            },
            "description": "Time before the certificates expire",
            "fieldConfig": {
              "defaults": {
                "color": {
                  "mode": "palette-classic"
                },
                "custom": {
                  "axisBorderShow": false,
                  "axisCenteredZero": false,
                  "axisColorMode": "text",
                  "axisLabel": "",
                  "axisPlacement": "auto",
                  "barAlignment": 0,
                  "drawStyle": "line",
                  "fillOpacity": 0,
                  "gradientMode": "none",
                  "hideFrom": {
                    "legend": false,
                    "tooltip": false,
                    "viz": false
                  },
                  "insertNulls": false,
                  "lineInterpolation": "linear",
                  "lineWidth": 1,
                  "pointSize": 5,
                  "scaleDistribution": {
                    "type": "linear"
                  },
                  "showPoints": "auto",
                  "spanNulls": false,
                  "stacking": {
                    "group": "A",
                    "mode": "none"
                  },
                  "thresholdsStyle": {
                    "mode": "off"
                  }
                },
                "mappings": [],
                "thresholds": {
                  "mode": "absolute",
                  "steps": [
                    {
                      "color": "green",
                      "value": null
                    }
                  ]
                },
                "unit": "d"
              },
              "overrides": []
            },
            "gridPos": {
              "h": 9,
              "w": 24,
              "x": 0,
              "y": 18
            },
            "id": 4,
            "options": {
              "legend": {
                "calcs": [],
                "displayMode": "list",
                "placement": "right",
                "showLegend": true
              },
              "tooltip": {
                "mode": "single",
                "sort": "none"
              }
            },
            "targets": [
              {
                "datasource": {
                  "type": "prometheus",
                  "uid": "${datasource}"
                },
                "editorMode": "code",
                "expr": "(certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time())/(24*3600)",
                "instant": false,
                "legendFormat": "{{name}}",
                "range": true,
                "refId": "A"
              }
            ],
            "title": "Time to Expiration",
            "type": "timeseries"
          }
        ],
        "schemaVersion": 39,
        "tags": [
          "k8s",
          "cert-manager"
        ],
        "templating": {
          "list": [
            {
              "current": {
                "selected": true,
                "text": "Prometheus",
                "value": "PBFA97CFB590B2093"
              },
              "hide": 0,
              "includeAll": false,
              "multi": false,
              "name": "datasource",
              "options": [],
              "query": "prometheus",
              "queryValue": "",
              "refresh": 1,
              "regex": "",
              "skipUrlSync": false,
              "type": "datasource"
            },
            {
              "allValue": ".*",
              "current": {
                "selected": true,
                "text": "All",
                "value": "$__all"
              },
              "datasource": {
                "type": "prometheus",
                "uid": "${datasource}"
              },
              "definition": "label_values(exported_namespace)",
              "hide": 0,
              "includeAll": true,
              "multi": false,
              "name": "namespace",
              "options": [],
              "query": {
                "qryType": 1,
                "query": "label_values(exported_namespace)",
                "refId": "PrometheusVariableQueryEditor-VariableQuery"
              },
              "refresh": 1,
              "regex": "",
              "skipUrlSync": false,
              "sort": 0,
              "type": "query"
            }
          ]
        },
        "time": {
          "from": "now-6h",
          "to": "now"
        },
        "timepicker": {},
        "timezone": "browser",
        "title": "Cert-manager-Kubernetes",
        "uid": "cdhrcds8aosg0c",
        "version": 1,
        "weekStart": ""
    }
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: "1"
  namespace: cert-manager
  name: cert-manager-dashboard-20842

ingress-nginx-dashboard-14314.yaml

apiVersion: v1
data:
  ingress-nginx-14314.json: |
    {
        "__inputs": [
          {
            "name": "DS_PROMETHEUS",
            "label": "Prometheus",
            "description": "",
            "type": "datasource",
            "pluginId": "prometheus",
            "pluginName": "Prometheus"
          }
        ],
        "__requires": [
          {
            "type": "grafana",
            "id": "grafana",
            "name": "Grafana",
            "version": "6.7.0"
          },
          {
            "type": "datasource",
            "id": "prometheus",
            "name": "Prometheus",
            "version": "5.0.0"
          },
          {
            "type": "panel",
            "id": "singlestat",
            "name": "Singlestat",
            "version": "5.0.0"
          }
        ],
        "annotations": {
          "list": [
            {
              "builtIn": 1,
              "datasource": "-- Grafana --",
              "enable": true,
              "hide": true,
              "iconColor": "rgba(0, 211, 255, 1)",
              "name": "Annotations & Alerts",
              "type": "dashboard"
            },
            {
              "datasource": "",
              "enable": true,
              "expr": "sum(changes(nginx_ingress_controller_config_last_reload_successful_timestamp_seconds{instance!=\"unknown\",controller_class=~\"$controller_class\",namespace=~\"$namespace\"}[30s])) by (controller_class)",
              "hide": false,
              "iconColor": "rgba(255, 96, 96, 1)",
              "limit": 100,
              "name": "Config Reloads",
              "showIn": 0,
              "step": "30s",
              "tagKeys": "controller_class",
              "tags": [],
              "titleFormat": "Config Reloaded",
              "type": "tags"
            }
          ]
        },
        "editable": true,
        "gnetId": 14314,
        "graphTooltip": 0,
        "id": 35,
        "iteration": 1619515274866,
        "links": [],
        "panels": [
          {
            "collapsed": false,
            "datasource": null,
            "gridPos": {
              "h": 1,
              "w": 24,
              "x": 0,
              "y": 0
            },
            "id": 31,
            "panels": [],
            "title": "Overview",
            "type": "row"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": false,
            "colors": [
              "#299c46",
              "rgba(237, 129, 40, 0.89)",
              "#d44a3a"
            ],
            "datasource": "",
            "decimals": 1,
            "description": "This is the total number of requests made in this period (top-right period selected)",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 0,
              "y": 1
            },
            "id": 8,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": false
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]))",
                "format": "time_series",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "timeFrom": null,
            "timeShift": null,
            "title": "Requests (period)",
            "type": "singlestat",
            "valueFontSize": "100%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": false,
            "colors": [
              "#299c46",
              "rgba(237, 129, 40, 0.89)",
              "#d44a3a"
            ],
            "datasource": null,
            "decimals": 1,
            "description": "This is the percentage of successful requests over the entire period in the top-right hand corner.\n\nNOTE: Ignoring 404s in this metric, since a 404 is a normal response for errant/invalid request.  This helps prevent this percentage from being affected by typical web scanners and security probes.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "percentunit",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 2,
              "x": 3,
              "y": 1
            },
            "id": 14,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(\n  rate(\n    nginx_ingress_controller_requests{status!~\"[4-5].*\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n      )\n   )   \n/ \n(\n  sum(\n    rate(\n      nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n        )\n     ) - \n  (\n  sum(\n    rate(\n      nginx_ingress_controller_requests{status=~\"404|499\", controller_class=~\"$controller_class\", ingress=~\"$ingress\",namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n        )\n     ) \n  or vector(0)\n  )\n)",
                "format": "time_series",
                "interval": "",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "timeFrom": null,
            "timeShift": null,
            "title": "% Success (period)",
            "type": "singlestat",
            "valueFontSize": "80%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": false,
            "colors": [
              "#299c46",
              "rgba(237, 129, 40, 0.89)",
              "#d44a3a"
            ],
            "datasource": null,
            "decimals": 0,
            "description": "This is the number of new connections made to the controller in the last minute.  NOTE: This metric does not support the Ingress, Namespace variables, as this is at a lower-level than the actual application.  It does support the others though (Env, Controller Class, Pod)",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "none",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 2,
              "x": 5,
              "y": 1
            },
            "id": 6,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": false
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{state=~\"active\", state=~\"active\",  controller_class=~\"$controller_class\", controller_pod=~\"$pod\"}[$__interval]))",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "{{ingress}}",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "timeFrom": null,
            "timeShift": null,
            "title": "Conns (2m)",
            "type": "singlestat",
            "valueFontSize": "80%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": false,
            "colors": [
              "#299c46",
              "rgba(237, 129, 40, 0.89)",
              "#d44a3a"
            ],
            "datasource": null,
            "decimals": 0,
            "description": "The number of HTTP requests made in the last 1 minute window",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 2,
              "x": 7,
              "y": 1
            },
            "id": 7,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": false
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval]))",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "timeFrom": null,
            "timeShift": null,
            "title": "Reqs (2m)",
            "type": "singlestat",
            "valueFontSize": "80%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": true,
            "colorValue": false,
            "colors": [
              "#d44a3a",
              "rgba(237, 129, 40, 0.89)",
              "#299c46"
            ],
            "datasource": null,
            "description": "This is the percentage of successful requests over the last minute.\n\nNOTE: Ignoring 404s in this metric, since a  404 is a normal response for errant requests",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "percentunit",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 9,
              "y": 1
            },
            "id": 13,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(rate(nginx_ingress_controller_requests{status!~\"[4-5].*\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) / \n(sum(rate(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) - \n(sum(rate(nginx_ingress_controller_requests{status=~\"404|499\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) or vector(0)))",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "0.8,0.9",
            "timeFrom": null,
            "timeShift": null,
            "title": "% Success (2m)",
            "type": "singlestat",
            "valueFontSize": "100%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": true,
            "colors": [
              "#73BF69",
              "#73BF69",
              "#73BF69"
            ],
            "datasource": "",
            "decimals": 0,
            "description": "This is the number of successful requests in the last minute.  Successful being 1xx or 2xx by the standard HTTP definition.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 12,
              "y": 1
            },
            "id": 12,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": true,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{status=~\"(1|2).*\",  controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) or vector(0)",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "title": "HTTP 1/2xx (2m)",
            "transparent": true,
            "type": "singlestat",
            "valueFontSize": "150%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorPrefix": false,
            "colorValue": true,
            "colors": [
              "#3274D9",
              "#3274D9",
              "#3274D9"
            ],
            "datasource": "",
            "decimals": 0,
            "description": "This is the number of 3xx requests in the last minute.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 15,
              "y": 1
            },
            "id": 10,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": true,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{status=~\"3.*\",  controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[2m]))  or vector(0)",
                "format": "time_series",
                "interval": "$__interval",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "title": "HTTP 3xx (2m)",
            "transparent": true,
            "type": "singlestat",
            "valueFontSize": "150%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": true,
            "colors": [
              "#FF9830",
              "#FF9830",
              "#FF9830"
            ],
            "datasource": "",
            "decimals": 0,
            "description": "This is the number of 4xx requests in the last minute.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 18,
              "y": 1
            },
            "id": 18,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": true,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{status=~\"4.*\",  controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval]))  or vector(0)",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "title": "HTTP 4xx (2m)",
            "transparent": true,
            "type": "singlestat",
            "valueFontSize": "150%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": true,
            "colors": [
              "#F2495C",
              "#F2495C",
              "#F2495C"
            ],
            "datasource": "",
            "decimals": 0,
            "description": "This is the number of 5xx requests in the last minute.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "format": "short",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 3,
              "w": 3,
              "x": 21,
              "y": 1
            },
            "id": 11,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": true,
              "lineColor": "rgb(31, 120, 193)",
              "show": true
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{status=~\"5.*\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) or vector(0)",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "title": "HTTP 5xx (2m)",
            "transparent": true,
            "type": "singlestat",
            "valueFontSize": "150%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "current"
          },
          {
            "aliasColors": {},
            "bars": false,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "description": "This is a total number of requests broken down by the ingress.  This can help get a sense of scale in relation to each other.",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 0,
              "y": 4
            },
            "hiddenSeries": false,
            "id": 2,
            "legend": {
              "avg": false,
              "current": false,
              "max": false,
              "min": false,
              "show": false,
              "total": false,
              "values": false
            },
            "lines": true,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null",
            "options": {
              "alertThreshold": true
            },
            "paceLength": 10,
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) by (ingress)",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "{{ingress}}",
                "refId": "A"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "HTTP Requests / Ingress",
            "tooltip": {
              "shared": true,
              "sort": 0,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": true,
              "values": []
            },
            "yaxes": [
              {
                "$$hashKey": "object:3838",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "$$hashKey": "object:3839",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "aliasColors": {
              "HTTP 101": "dark-green"
            },
            "bars": false,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "description": "The breakdown of the various HTTP status codes of the requests handled within' this period that matches the variables chosen above.\n\nThis chart helps notice and dive into which service is having failures and of what kind.",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 8,
              "y": 4
            },
            "hiddenSeries": false,
            "id": 3,
            "legend": {
              "avg": false,
              "current": false,
              "max": false,
              "min": false,
              "show": true,
              "total": false,
              "values": false
            },
            "lines": true,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null as zero",
            "options": {
              "alertThreshold": true
            },
            "paceLength": 10,
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [
              {
                "$$hashKey": "object:154",
                "alias": "/HTTP [1-2].*/i",
                "color": "#37872D"
              },
              {
                "$$hashKey": "object:155",
                "alias": "/HTTP 4.*/i",
                "color": "#C4162A"
              },
              {
                "$$hashKey": "object:156",
                "alias": "HTTP 404",
                "color": "#FF9830"
              },
              {
                "$$hashKey": "object:285",
                "alias": "HTTP 499",
                "color": "#FA6400"
              },
              {
                "$$hashKey": "object:293",
                "alias": "/HTTP 5.*/i",
                "color": "#C4162A"
              }
            ],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval])) by (status)",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "HTTP {{status}}",
                "refId": "A"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "HTTP Status Codes",
            "tooltip": {
              "shared": true,
              "sort": 0,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": true,
              "values": []
            },
            "yaxes": [
              {
                "$$hashKey": "object:182",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "$$hashKey": "object:183",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "aliasColors": {},
            "bars": true,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "description": "The total number of HTTP requests made within' each period",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 8,
              "x": 16,
              "y": 4
            },
            "hiddenSeries": false,
            "id": 4,
            "legend": {
              "avg": false,
              "current": false,
              "max": false,
              "min": false,
              "show": false,
              "total": false,
              "values": false
            },
            "lines": false,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null",
            "options": {
              "alertThreshold": true
            },
            "paceLength": 10,
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[$__interval]))",
                "format": "time_series",
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "{{ingress}}",
                "refId": "A"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "Total HTTP Requests",
            "tooltip": {
              "shared": true,
              "sort": 0,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": false,
              "values": []
            },
            "yaxes": [
              {
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": false
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "collapsed": false,
            "datasource": null,
            "gridPos": {
              "h": 1,
              "w": 24,
              "x": 0,
              "y": 12
            },
            "id": 33,
            "panels": [],
            "title": "Latency",
            "type": "row"
          },
          {
            "aliasColors": {},
            "bars": false,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "decimals": 1,
            "description": "This graph can help assess and help us meet SLA requirements as far as the responsive time of our services.\n\nFor a more detailed latency graph broken out by ingress please open the closed tab at the bottom because it is very CPU intensive.",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 12,
              "x": 0,
              "y": 13
            },
            "hiddenSeries": false,
            "id": 29,
            "legend": {
              "alignAsTable": true,
              "avg": true,
              "current": false,
              "hideEmpty": true,
              "hideZero": true,
              "max": true,
              "min": true,
              "rightSide": true,
              "show": true,
              "total": false,
              "values": true
            },
            "lines": true,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null",
            "options": {
              "alertThreshold": true
            },
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [
              {
                "$$hashKey": "object:294",
                "alias": "Average",
                "color": "#F2495C",
                "fill": 0,
                "points": true
              },
              {
                "$$hashKey": "object:316",
                "alias": "0.95",
                "color": "rgb(44, 0, 182)"
              },
              {
                "$$hashKey": "object:422",
                "alias": "0.9",
                "color": "#1F60C4"
              },
              {
                "$$hashKey": "object:430",
                "alias": "0.75",
                "color": "#8AB8FF",
                "fill": 1
              },
              {
                "$$hashKey": "object:440",
                "alias": "0.5",
                "color": "rgb(255, 255, 255)",
                "fill": 0
              },
              {
                "$$hashKey": "object:4144",
                "alias": "0.99",
                "color": "#8F3BB8",
                "fill": 0
              }
            ],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "histogram_quantile(\n  0.99,\n  sum by (le)(\n    rate(\n      nginx_ingress_controller_request_duration_seconds_bucket{\n        status!=\"404|500|304|499\",\n        controller_class=~\"$controller_class\",\n        ingress=~\"$ingress\",\n        namespace=~\"$namespace\",\n        controller_pod=~\"$pod\"\n      }[$__interval]\n    )\n  )\n)",
                "format": "time_series",
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "0.99",
                "refId": "A"
              },
              {
                "expr": "histogram_quantile(\n  0.95,\n  sum by (le)(\n    rate(\n      nginx_ingress_controller_request_duration_seconds_bucket{\n        status!=\"404|500|304|499\",\n        controller_class=~\"$controller_class\",\n        ingress=~\"$ingress\",\n        namespace=~\"$namespace\",\n        controller_pod=~\"$pod\"\n      }[$__interval]\n    )\n  )\n)",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "0.95",
                "refId": "B"
              },
              {
                "expr": "histogram_quantile(\n  0.9,\n  sum by (le)(\n    rate(\n      nginx_ingress_controller_request_duration_seconds_bucket{\n        status!=\"404|500|304|499\",\n        controller_class=~\"$controller_class\",\n        ingress=~\"$ingress\",\n        namespace=~\"$namespace\",\n        controller_pod=~\"$pod\"\n      }[$__interval]\n    )\n  )\n)",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "0.9",
                "refId": "C"
              },
              {
                "expr": "histogram_quantile(\n  0.5,\n  sum by (le)(\n    rate(\n      nginx_ingress_controller_request_duration_seconds_bucket{\n        status!=\"404|500|304|499\",\n        controller_class=~\"$controller_class\",\n        ingress=~\"$ingress\",\n        namespace=~\"$namespace\",\n        controller_pod=~\"$pod\"\n      }[$__interval]\n    )\n  )\n)",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "0.5",
                "refId": "D"
              },
              {
                "expr": "histogram_quantile(\n  0.75,\n  sum by (le)(\n    rate(\n      nginx_ingress_controller_request_duration_seconds_bucket{\n        status!=\"404|500|304|499\",\n        controller_class=~\"$controller_class\",\n        ingress=~\"$ingress\",\n        namespace=~\"$namespace\",\n        controller_pod=~\"$pod\"\n      }[$__interval]\n    )\n  )\n)",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "0.75",
                "refId": "E"
              },
              {
                "expr": "(\n\n(sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"0.01\"\n}[$__interval]))\n* 0.01)\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"0.1\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"0.01\"\n}[$__interval])))\n* 0.1)\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"1\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"0.1\"\n}[$__interval])))\n* 1)\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"10\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"1\"\n}[$__interval])))\n* 10 )\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"30\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"10\"\n}[$__interval])))\n* 30 )\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"60\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"30\"\n}[$__interval])))\n* 60 )\n\n+\n\n((sum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"+Inf\"\n}[$__interval]))\n-\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"60\"\n}[$__interval])))\n* 120 )\n\n) / \n\nsum(increase(nginx_ingress_controller_request_duration_seconds_bucket{\n    status!=\"404|500|304\",\n    controller_class=~\"$controller_class\",\n    ingress=~\"$ingress\",\n    namespace=~\"$namespace\",\n    controller_pod=~\"$pod\",\n    le=\"+Inf\"\n}[$__interval]))\n",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "Average",
                "refId": "F"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "Latency (Average Percentiles)",
            "tooltip": {
              "shared": true,
              "sort": 2,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": true,
              "values": []
            },
            "yaxes": [
              {
                "$$hashKey": "object:1035",
                "format": "s",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "$$hashKey": "object:1036",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": false
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "cards": {
              "cardPadding": null,
              "cardRound": null
            },
            "color": {
              "cardColor": "#C4162A",
              "colorScale": "linear",
              "colorScheme": "interpolateTurbo",
              "exponent": 0.5,
              "mode": "spectrum"
            },
            "dataFormat": "tsbuckets",
            "datasource": null,
            "description": "This graph can help assess and help us meet SLA requirements as far as the responsive time of our services.\n\nFor a more detailed latency graph broken out by ingress please open the closed tab at the bottom because it is very CPU intensive.",
            "fieldConfig": {
              "defaults": {
                "custom": {}
              },
              "overrides": []
            },
            "gridPos": {
              "h": 8,
              "w": 12,
              "x": 12,
              "y": 13
            },
            "heatmap": {},
            "hideZeroBuckets": false,
            "highlightCards": true,
            "id": 27,
            "legend": {
              "show": true
            },
            "links": [],
            "pluginVersion": "7.4.3",
            "reverseYBuckets": false,
            "targets": [
              {
                "expr": "sum by (le)(\n  increase(\n    nginx_ingress_controller_request_duration_seconds_bucket{\n      status!=\"404\",status!=\"500\",\n      controller_class =~ \"$controller_class\",\n      namespace =~ \"$namespace\",\n      ingress =~ \"$ingress\"\n    }[$__interval]\n  )\n)",
                "format": "time_series",
                "hide": false,
                "interval": "5m",
                "intervalFactor": 1,
                "legendFormat": "{{le}}",
                "refId": "D"
              }
            ],
            "timeFrom": null,
            "timeShift": null,
            "title": "Latency Heatmap",
            "tooltip": {
              "show": true,
              "showHistogram": false
            },
            "type": "heatmap",
            "xAxis": {
              "show": true
            },
            "xBucketNumber": null,
            "xBucketSize": null,
            "yAxis": {
              "decimals": 0,
              "format": "s",
              "logBase": 1,
              "max": null,
              "min": null,
              "show": true,
              "splitFactor": null
            },
            "yBucketBound": "auto",
            "yBucketNumber": null,
            "yBucketSize": null
          },
          {
            "collapsed": false,
            "datasource": null,
            "gridPos": {
              "h": 1,
              "w": 24,
              "x": 0,
              "y": 21
            },
            "id": 35,
            "panels": [],
            "title": "Connections",
            "type": "row"
          },
          {
            "aliasColors": {
              "New Connections": "purple"
            },
            "bars": true,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "description": "NOTE: This does not work per ingress/namespace\n\nThis is the number of new connections opened by the controller",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 12,
              "x": 0,
              "y": 22
            },
            "hiddenSeries": false,
            "id": 5,
            "legend": {
              "avg": false,
              "current": false,
              "max": false,
              "min": false,
              "show": false,
              "total": false,
              "values": false
            },
            "lines": false,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null",
            "options": {
              "alertThreshold": true
            },
            "paceLength": 10,
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "sum(increase(nginx_ingress_controller_nginx_process_connections{state=~\"active\",  controller_class=~\"$controller_class\", controller_pod=~\"$pod\"}[$__interval]))",
                "format": "time_series",
                "interval": "2m",
                "intervalFactor": 1,
                "legendFormat": "New Connections",
                "refId": "A"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "New Connections Opened (Controller / Ingress Pod)",
            "tooltip": {
              "shared": true,
              "sort": 0,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": false,
              "values": []
            },
            "yaxes": [
              {
                "$$hashKey": "object:3252",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "$$hashKey": "object:3253",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": false
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "aliasColors": {
              "Connections": "rgb(255, 200, 4)"
            },
            "bars": true,
            "dashLength": 10,
            "dashes": false,
            "datasource": null,
            "description": "NOTE: This does not work per ingress/namespace\n\nThe total number of connections opened to our ingresses.  If you have a CDN in front of our services, it is not unusual for this to be very low.  If/when we use something like websockets with a persistent connection this can/will be very high.",
            "fieldConfig": {
              "defaults": {
                "custom": {},
                "links": []
              },
              "overrides": []
            },
            "fill": 1,
            "fillGradient": 0,
            "gridPos": {
              "h": 8,
              "w": 12,
              "x": 12,
              "y": 22
            },
            "hiddenSeries": false,
            "id": 22,
            "legend": {
              "avg": false,
              "current": false,
              "max": false,
              "min": false,
              "show": false,
              "total": false,
              "values": false
            },
            "lines": false,
            "linewidth": 1,
            "links": [],
            "nullPointMode": "null",
            "options": {
              "alertThreshold": true
            },
            "paceLength": 10,
            "percentage": false,
            "pluginVersion": "7.4.3",
            "pointradius": 2,
            "points": false,
            "renderer": "flot",
            "seriesOverrides": [],
            "spaceLength": 10,
            "stack": false,
            "steppedLine": false,
            "targets": [
              {
                "expr": "sum(avg_over_time(nginx_ingress_controller_nginx_process_connections{state=~\"active\", state=~\"active\",  controller_class=~\"$controller_class\", controller_pod=~\"$pod\"}[$__range]))",
                "format": "time_series",
                "intervalFactor": 1,
                "legendFormat": "Connections",
                "refId": "A"
              }
            ],
            "thresholds": [],
            "timeFrom": null,
            "timeRegions": [],
            "timeShift": null,
            "title": "Total Connections Open (Controller / Ingress Pod)",
            "tooltip": {
              "shared": true,
              "sort": 0,
              "value_type": "individual"
            },
            "type": "graph",
            "xaxis": {
              "buckets": null,
              "mode": "time",
              "name": null,
              "show": false,
              "values": []
            },
            "yaxes": [
              {
                "$$hashKey": "object:3098",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": true
              },
              {
                "$$hashKey": "object:3099",
                "format": "short",
                "label": null,
                "logBase": 1,
                "max": null,
                "min": null,
                "show": false
              }
            ],
            "yaxis": {
              "align": false,
              "alignLevel": null
            }
          },
          {
            "collapsed": true,
            "datasource": null,
            "gridPos": {
              "h": 1,
              "w": 24,
              "x": 0,
              "y": 30
            },
            "id": 24,
            "panels": [
              {
                "aliasColors": {},
                "bars": false,
                "dashLength": 10,
                "dashes": false,
                "datasource": null,
                "description": "",
                "fieldConfig": {
                  "defaults": {
                    "custom": {},
                    "links": []
                  },
                  "overrides": []
                },
                "fill": 1,
                "fillGradient": 0,
                "gridPos": {
                  "h": 9,
                  "w": 24,
                  "x": 0,
                  "y": 38
                },
                "hiddenSeries": false,
                "id": 25,
                "legend": {
                  "alignAsTable": true,
                  "avg": true,
                  "current": false,
                  "max": true,
                  "min": true,
                  "rightSide": true,
                  "show": true,
                  "total": false,
                  "values": true
                },
                "lines": true,
                "linewidth": 1,
                "links": [],
                "nullPointMode": "null",
                "options": {
                  "alertThreshold": true
                },
                "paceLength": 10,
                "percentage": false,
                "pluginVersion": "7.4.3",
                "pointradius": 2,
                "points": false,
                "renderer": "flot",
                "seriesOverrides": [],
                "spaceLength": 10,
                "stack": false,
                "steppedLine": false,
                "targets": [
                  {
                    "expr": "sum(\n  rate(\n    nginx_ingress_controller_requests{status!~\"[4-5].*\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n      )\n   ) by (ingress)\n/ \n(\n  sum(\n    rate(\n      nginx_ingress_controller_requests{ controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n        )\n     ) by (ingress)\n     - \n  (\n  sum(\n    rate(\n      nginx_ingress_controller_requests{status=~\"404|499\", controller_class=~\"$controller_class\", ingress=~\"$ingress\",namespace=~\"$namespace\", controller_pod=~\"$pod\"}[${__range_s}s]\n        )\n     ) by (ingress)\n  or vector(0)\n  )\n)",
                    "format": "time_series",
                    "interval": "",
                    "intervalFactor": 1,
                    "legendFormat": "{{ingress}}",
                    "refId": "A"
                  }
                ],
                "thresholds": [],
                "timeFrom": null,
                "timeRegions": [],
                "timeShift": null,
                "title": "Percentage of Success (non-2xx) - By Ingress",
                "tooltip": {
                  "shared": true,
                  "sort": 0,
                  "value_type": "individual"
                },
                "type": "graph",
                "xaxis": {
                  "buckets": null,
                  "mode": "time",
                  "name": null,
                  "show": true,
                  "values": []
                },
                "yaxes": [
                  {
                    "$$hashKey": "object:108",
                    "decimals": null,
                    "format": "percentunit",
                    "label": null,
                    "logBase": 1,
                    "max": "1",
                    "min": "0",
                    "show": true
                  },
                  {
                    "$$hashKey": "object:109",
                    "format": "short",
                    "label": null,
                    "logBase": 1,
                    "max": null,
                    "min": null,
                    "show": false
                  }
                ],
                "yaxis": {
                  "align": false,
                  "alignLevel": null
                }
              },
              {
                "aliasColors": {},
                "bars": false,
                "dashLength": 10,
                "dashes": false,
                "datasource": null,
                "fieldConfig": {
                  "defaults": {
                    "custom": {},
                    "links": []
                  },
                  "overrides": []
                },
                "fill": 1,
                "fillGradient": 0,
                "gridPos": {
                  "h": 13,
                  "w": 24,
                  "x": 0,
                  "y": 47
                },
                "hiddenSeries": false,
                "id": 16,
                "legend": {
                  "alignAsTable": true,
                  "avg": true,
                  "current": false,
                  "max": true,
                  "min": true,
                  "rightSide": false,
                  "show": true,
                  "sort": "avg",
                  "sortDesc": false,
                  "total": false,
                  "values": true
                },
                "lines": true,
                "linewidth": 1,
                "links": [],
                "nullPointMode": "null",
                "options": {
                  "alertThreshold": true
                },
                "percentage": false,
                "pluginVersion": "7.4.3",
                "pointradius": 2,
                "points": false,
                "renderer": "flot",
                "seriesOverrides": [],
                "spaceLength": 10,
                "stack": false,
                "steppedLine": false,
                "targets": [
                  {
                    "expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{status!=\"404\",status!=\"500\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[5m])) by (le, ingress))",
                    "format": "time_series",
                    "intervalFactor": 1,
                    "legendFormat": "p99 {{ ingress }}",
                    "refId": "A"
                  },
                  {
                    "expr": "histogram_quantile(0.95, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{status!=\"404\",status!=\"500\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[5m])) by (le, ingress))",
                    "format": "time_series",
                    "intervalFactor": 1,
                    "legendFormat": "p95 {{ ingress }}",
                    "refId": "B"
                  },
                  {
                    "expr": "histogram_quantile(0.90, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket{status!=\"404\",status!=\"500\", controller_class=~\"$controller_class\", ingress=~\"$ingress\", namespace=~\"$namespace\", controller_pod=~\"$pod\"}[5m])) by (le, ingress))",
                    "format": "time_series",
                    "intervalFactor": 1,
                    "legendFormat": "p90 {{ ingress }}",
                    "refId": "C"
                  }
                ],
                "thresholds": [],
                "timeFrom": null,
                "timeRegions": [],
                "timeShift": null,
                "title": "Latency (per ingress)",
                "tooltip": {
                  "shared": true,
                  "sort": 0,
                  "value_type": "individual"
                },
                "type": "graph",
                "xaxis": {
                  "buckets": null,
                  "mode": "time",
                  "name": null,
                  "show": true,
                  "values": []
                },
                "yaxes": [
                  {
                    "format": "s",
                    "label": null,
                    "logBase": 1,
                    "max": null,
                    "min": null,
                    "show": true
                  },
                  {
                    "format": "short",
                    "label": null,
                    "logBase": 1,
                    "max": null,
                    "min": null,
                    "show": false
                  }
                ],
                "yaxis": {
                  "align": false,
                  "alignLevel": null
                }
              }
            ],
            "title": "CPU Intensive / Optional Graphs",
            "type": "row"
          }
        ],
        "refresh": "1m",
        "schemaVersion": 27,
        "style": "dark",
        "tags": [
          "ingress",
          "nginx",
          "networking",
          "services",
          "k8s"
        ],
        "templating": {
          "list": [
            {
              "allValue": ".*",
              "current": {
                "selected": true,
                "text": [
                  "All"
                ],
                "value": [
                  "$__all"
                ]
              },
              "datasource": "",
              "definition": "label_values(nginx_ingress_controller_config_hash, controller_class) ",
              "description": null,
              "error": null,
              "hide": 0,
              "includeAll": true,
              "label": "Controller Class",
              "multi": true,
              "name": "controller_class",
              "options": [],
              "query": {
                "query": "label_values(nginx_ingress_controller_config_hash, controller_class) ",
                "refId": "prometheus-controller_class-Variable-Query"
              },
              "refresh": 1,
              "regex": "",
              "skipUrlSync": false,
              "sort": 1,
              "tagValuesQuery": "",
              "tags": [],
              "tagsQuery": "",
              "type": "query",
              "useTags": false
            },
            {
              "allValue": ".*",
              "current": {
                "selected": true,
                "tags": [],
                "text": [
                  "All"
                ],
                "value": [
                  "$__all"
                ]
              },
              "datasource": "",
              "definition": "label_values(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\"},namespace)",
              "description": null,
              "error": null,
              "hide": 0,
              "includeAll": true,
              "label": "Namespace",
              "multi": true,
              "name": "namespace",
              "options": [],
              "query": {
                "query": "label_values(nginx_ingress_controller_requests{ controller_class=~\"$controller_class\"},namespace)",
                "refId": "prometheus-namespace-Variable-Query"
              },
              "refresh": 1,
              "regex": "",
              "skipUrlSync": false,
              "sort": 1,
              "tagValuesQuery": "",
              "tags": [],
              "tagsQuery": "",
              "type": "query",
              "useTags": false
            },
            {
              "allValue": null,
              "current": {
                "selected": true,
                "text": [
                  "All"
                ],
                "value": [
                  "$__all"
                ]
              },
              "datasource": "",
              "definition": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\"}, ingress) ",
              "description": null,
              "error": null,
              "hide": 0,
              "includeAll": true,
              "label": "Ingress",
              "multi": true,
              "name": "ingress",
              "options": [],
              "query": {
                "query": "label_values(nginx_ingress_controller_requests{namespace=~\"$namespace\",controller_class=~\"$controller_class\"}, ingress) ",
                "refId": "prometheus-ingress-Variable-Query"
              },
              "refresh": 2,
              "regex": "",
              "skipUrlSync": false,
              "sort": 1,
              "tagValuesQuery": "",
              "tags": [],
              "tagsQuery": "",
              "type": "query",
              "useTags": false
            },
            {
              "allValue": ".*",
              "current": {
                "selected": true,
                "text": [
                  "All"
                ],
                "value": [
                  "$__all"
                ]
              },
              "datasource": "",
              "definition": "label_values(nginx_ingress_controller_config_hash{controller_class=~\"$controller_class\"}, controller_pod) ",
              "description": null,
              "error": null,
              "hide": 0,
              "includeAll": true,
              "label": "Ingress Pod",
              "multi": true,
              "name": "pod",
              "options": [],
              "query": {
                "query": "label_values(nginx_ingress_controller_config_hash{controller_class=~\"$controller_class\"}, controller_pod) ",
                "refId": "StandardVariableQuery"
              },
              "refresh": 1,
              "regex": "",
              "skipUrlSync": false,
              "sort": 1,
              "tagValuesQuery": "",
              "tags": [],
              "tagsQuery": "",
              "type": "query",
              "useTags": false
            }
          ]
        },
        "time": {
          "from": "now-3h",
          "to": "now"
        },
        "timepicker": {
          "refresh_intervals": [
            "10s",
            "30s",
            "1m",
            "5m",
            "15m",
            "30m",
            "1h",
            "2h",
            "1d"
          ],
          "time_options": [
            "5m",
            "15m",
            "1h",
            "6h",
            "12h",
            "24h",
            "2d",
            "7d",
            "30d"
          ]
        },
        "timezone": "",
        "title": "Kubernetes Nginx Ingress Prometheus NextGen",
        "description": "Nginx Ingress Controller via Prometheus Metrics Dashboard created for DevOps Nirvana @ https://github.com/DevOps-Nirvana",
        "uid": "k8s-nginx-ingress-prometheus-ng",
        "version": 27
    }
kind: ConfigMap
metadata:
  labels:
    grafana_dashboard: "1"
  namespace: ingress-nginx
  name: ingress-nginx-dashboard-14314

Apply

kubectl apply -f 15_platform/files/

Verify 15_Platform Layer

Cert-Manager is able to serve challenges.

Watch the certificates being issued

kubectl get certificates -A -w
kubectl get challenges -A -w

(Important) Cert Manager is able to create TXT records in Route53.
External Access through:
- grafana.k8s.sreboy.com
- prometheus.k8s.sreboy.com
- goviolin.k8s.sreboy.com
Verify Default Prometheus Targets are created.
Verify Default Grafana Dashboards are created.
Verify Cert Manager, Ingress Targets and Custom Dashboards are created.

GoViolin​

Run Locally​

Dockerfile​

Build Image​

GitHub Actions​

CalcTube​

Code​

Dockerfile​

GitHub Actions​

Voting App​

Plan​

00_Foundation Layer​

10_Platform Layer​

15_Platform Layer​

Pre-requisites​

AWS CLI​

Terraform User​

00_Foundation​

Variables​

Providers​

Disclaimer​

Local variable​

Craete VPC​

Create Internet Gateway​

Subnets​

Elastic IPs and NAT GWs​

RT and RTA​

IAM roles for EKS​

EKS Cluster​

IAM roles for NodeGroups​

EKS NodeGroup​

Test & Verify​

10_Platform​

Vars​

Providers​

Helm Intro​

Draft Plan​

Visualize Plan​

Data​

Providers​

Variables​

Outputs​

Kube Prometheus Stack​

Custom values.yaml​

Ingress Nginx​

Route53​

Cert-Manager​

Test & Verify​

15_Platform​

Visualize Plan​

Cert-Manager Issuers​

Monitoring Namespace Ingress​

Goviolin Namespace​

Custom Dashboards​

Apply​

Verify 15_Platform Layer​

REFERENCES​

GoViolin

Run Locally

Dockerfile

Build Image

GitHub Actions

CalcTube

Code

Dockerfile

GitHub Actions

Voting App

Plan

00_Foundation Layer

10_Platform Layer

15_Platform Layer

Pre-requisites

AWS CLI

Terraform User

00_Foundation

Variables

Providers

Disclaimer

Local variable

Craete VPC

Create Internet Gateway

Subnets

Elastic IPs and NAT GWs

RT and RTA

IAM roles for EKS

EKS Cluster

IAM roles for NodeGroups

EKS NodeGroup

Test & Verify

10_Platform

Vars

Providers

Helm Intro

Draft Plan

Visualize Plan

Data

Providers

Variables

Outputs

Kube Prometheus Stack

Custom values.yaml

Ingress Nginx

Route53

Cert-Manager

Test & Verify

15_Platform

Visualize Plan

Cert-Manager Issuers

Monitoring Namespace Ingress

Goviolin Namespace

Custom Dashboards

Apply

Verify 15_Platform Layer

REFERENCES