Deploying NSX Application Platform – Part Three: Tanzu Community Edition – Prerequisites and Initial Steps

At this point in our series, we’ve discussed:

Overall requirements for NSX Application Platform (NAPP)
Standing up a Harbor repository (a NAPP requirement)

Let’s take a look at our next NAPP requirement, which is a Kubernetes (K8s) cluster where NAPP will reside. As stated in our first post, the presumption for this entire series is you do not utilize Kubernetes today; that is, the requirement for K8s is a “first” for your environment.

From a VMware perspective, an ideal way to adopt K8s is the purchase of VMware Tanzu; in fact, many discussions and blogs around the deployment of NAPP make the presumption that you already have VMware Tanzu. However, if purchasing VMware Tanzu is not a possibility, we want to provide you an alternative to utilize NAPP. Additionally, we also presume that you do not have available licensing for the NSX Advanced Load Balancer.

If you already have VMware Tanzu or other K8s solutions in your environment, you can work with your cloud native infrastructure team in getting a K8s cluster deployed for NAPP. In this case, you can mostly skip what we’ll be covering and jump way ahead to the final part of our series (once it’s made available).

Likewise, if you do have ‘NSX Advanced Load Balancer‘ edition licensing available, you may use it to provide load balancing for K8s services where applicable. In the future, we may demonstrate leveraging the ‘NSX Advanced Load Balancer‘, but for now, we presume that ‘NSX Advanced Load Balancer‘ is not available to you.

VMware Tanzu Community Edition

While there are many ways to deploy a K8s cluster to your vSphere environment, a streamlined option is leveraging VMware Tanzu Community Edition(TCE). First released September 2021, TCE is an open source release of the same technology that powers VMware Tanzu.

Obviously, there are some big advantages in the commercial version of Tanzu; specifically, the tight integration with vSphere and the VMware support infrastructure to name a few. However, in a situation where purchasing VMware Tanzu isn’t a possibility, TCE gives you a straightforward method to provision K8s clusters.

Tanzu Community Edition – Common terms and definitions

While we can’t provide a full education around how K8s works in this series, we do want to define a few terms used when discussing TCE. Our hope is for those who have little K8s experience, this will ease some potential confusion.

Node – a virtual machine that hosts containerized applications (via pods in K8s). Think of it like a hypervisor; where a hypervisor runs virtual machines, a node in K8s runs containers.
K8s Cluster – a deployment of K8s nodes that work together as a logical construct. K8s clusters are comprised of control plane nodes (which act as the command and control interface to a cluster) and worker nodes (where the containerized applications reside.)
Bootstrap or Local Bootstrap – This is the machine to which you will download and install Tanzu and is utilized to deploy your Tanzu Management Cluster.
Tanzu Management Cluster – A K8s cluster that is responsible for management and operations of TCE. For our purposes, the Management Cluster is responsible for the provisioning and scaling of Workload Clusters.
Tanzu Workload Cluster – A K8s cluster where your containerized applications reside.
Tanzu Standalone Cluster – This is a still in development feature of TCE that allows you to deploy a Workload cluster without first deploying a Management Cluster. Since we want to ensure you can make any necessary alterations to your Workload cluster in the future (such as scaling up your node count), we will not be utilizing a Standalone cluster; it’s only mentioned here so you understand what it is in the TCE documentation.

For NAPP, we will deploy a Management Cluster and a Workload Cluster to our vSphere environment; each of these clusters are comprised of control plane nodes and worker nodes. Once we’ve provisioned both a Management cluster and a Workload cluster, NAPP will be deployed upon the worker nodes of the Workload Cluster.

From a vSphere perspective, K8s clusters are nothing more that disparate virtual machines. As vSphere consumers, we often hear “cluster” and immediately think of vSphere clusters, which are comprised of ESXi hosts. TCE Clusters are not denoted in the vSphere Web client; the “cluster” aspect of these virtual machines is defined in K8s, not vSphere.

Understanding Tanzu Bootstrap to Management Cluster

When beginning the installation, you will first install TCE to your local machine or a VM (this is Step 1 from the below ‘Installing Tanzu Community Edition on vSphere‘ section). This machine will be known as the ‘bootstrap‘ machine for your TCE deployment. In our case, we opted to utilize Ubuntu desktop 21.04, but you can use Windows or MAC OS as you like.

Your bootstrap machine creates a local K8s cluster via kind (Kubernetes in Docker), which is utilized to provision the Tanzu Management Cluster that will reside in vSphere. To be clear, this local K8s cluster exists solely on the bootstrap machine; once the Tanzu Management Cluster (which resides in vSphere) has been created, this local K8s cluster will be deleted.

The creation of the Tanzu Management Cluster results in the deployment of virtual machines (either Photon OS or Ubuntu). These are the K8s nodes of the Tanzu Management cluster, and consist of control plane nodes and worker nodes. As mentioned previously, this cluster is dedicated to the management of Tanzu Workload Clusters; as such, no workloads of your own should be deployed to it.

Once the Tanzu Management Cluster is up, the Tanzu cli (which is installed on the ‘bootstrap’ machine) is utilized to instruct the Tanzu Management Cluster to create a Tanzu Workload Cluster. This results in additional virtual machines (comprising the Tanzu Workload Cluster) being deployed to vSphere.

Installing Tanzu Community Edition on vSphere

As we did in our previous post about Harbor, rather than taking you through a step by step deployment of TCE, we’ll be using the existing installation instructions as a guide and provide additional clarification around the installation. From the TCE on vSphere installation guide, for our deployment of NAPP, we recommend focusing on the first four steps:

As you grow more comfortable with TCE, you certainly should look over all of the documentation on the community site. However, for NAPP, we only need to complete the above four steps. Let’s take a look at each step now:

1. Install Tanzu Community Edition

The steps as listed on the site are straightforward in providing instruction on installing TCE, as well as Docker and kubectl. Simply choose the OS of your bootstrap machine and follow the listed directions carefully; it can be easy to inadvertently skip steps.

For example, as we used Ubuntu desktop for our bootstrap machine, we had to ensure we added our non-root user account to the docker user group. This action is called out in the above instructions on the TCE site, but our first time through, we were so focused on just getting Docker installed that this part was missed. We don’t want you to have any of the same issues!

2. Prepare to deploy clusters

This step provides a list of everything required to get TCE Management (and Workload) clusters installed to your vSphere environment. Please ensure you meet these requirements.

In the ‘Procedure‘ section, you’ll find specific actions that must be taken, such as downloading the an approved K8s node OVA (which may be downloaded from the VMware Customer Connect site) and creating a template from it. You’ll also find instructions on creating a ssh key pair used to communicate with the Tanzu Management Cluster.

For the purposes of this series, we deployed TCE Management and Workload Clusters to a native vSphere Distributed Switch port group. A DHCP server/scope must be available on this port group to provide nodes with IP addressing. As there’s no mechanism to assign static IP addresses to the TCE nodes, DHCP is a must.

The DHCP scope should provide enough addressing for all of your TCE Management and Workload nodes. If deploying the barest minimum, you must have at least six (6) IP addresses (this would mean a TCE Management cluster of 1 control plane node and 1 worker node as well as a Workload cluster comprised of 1 control plane node and 3 worker nodes).

However, we recommend your DHCP scope to be at least twenty (20) IP addresses to ensure you can add additional TCE Workload cluster nodes if needed. The breakdown of this recommendation below:

the Management TCE cluster to have three (3) control plane nodes and one (1) worker node (total of 4 IP addresses)
the TCE Workload Cluster to have have three (3) control plane nodes and three (3) worker nodes (total of 6 IP addresses)
An additional ten (10) IP addresses to allow for additional TCE Workload Cluster worker nodes as required

Next, you must reserve two IP addresses in the same network range as the nodes, as each K8s cluster you have (in our case two – one Management and one Workload) requires an IP address that is used to access the cluster’s K8s control plane.

As noted in the NAPP System Requirements, you’ll also need at least five (5) additional IP addresses in this same subnet for the K8s services that are utilized by NAPP. If you scale the TCE Workload cluster where NAPP resides in the future, then you will need up to an additional ten (10) IP addresses, for a total of 15. If possible, identify a range of 15 addresses in this subnet.

Lastly, make sure the subnet is routable within your environment. The following is expected from a routing/firewall perspective:

Your ‘bootstrap’ machine can reach the cluster IP addresses (the two (2) reserved addresses mentioned above) on port 6443.
This subnet can reach your private Harbor instance or the publicly provided VMware Harbor instance)
TCE nodes reside can reach vCenter (specifically on port 443) and the NSX-T Manager
NTP servers are reachable from the node subnet

As we discussed some specific requirements for the L2/L3 network upon which your TCE nodes will reside, we’d like to identify exactly what is needed minus the explanations as to why. Below is a breakdown of what is required for our deployment, along with example data, to ensure everything is clear:

A distributed switch port group for the VLAN where your TCE Management and Workload cluster nodes will reside.
Identify a target subnet (ex. 192.168.0.0/24; this subnet must have a gateway configured on it to allow nodes to route out of the network.
A DHCP server or relay reachable from this target distributed switch port group.
A DHCP scope for the TCE Management and Workload cluster nodes; the scope should have at least ten (6) IP addresses in your target subnet. However, we suggest your DHCP scope be no less than twenty (20) IP addresses. (ex. 192.168.0.10-192.168.0.50)
2 additional IP addresses in the same subnet as your DHCP scope but not part of the scope itself. (ex. 192.168.0.51 and 192.168.0.52).
An additional range of no less than five (5) but up to fifteen (15) IP addresses reserved. Like the previous item, these addresses need to be in the same subnet as the DHCP scope, but not in the scope itself. These will be put in use later (ex. 192.168.0.60-192.168.0.74)

Wrap-Up…wait, where’s Steps 3 and 4??

At this point we’ve:

discussed Tanzu Community Edition (TCE) and common terms around it
Identified what a ‘bootstrap’ machine is in relation to TCE
Reviewed data you’ll need to prepare for deploying TCE clusters
Following the prescribed procedure from the TCE documentation, we’ve created a vSphere template and an SSH key pair.

As this post is already pretty large, we’re going to wrap things up for today. Steps 3 and 4 will each have their own post in this series. In fact, our very next post covers Step 3, which is deploying our TCE Management Cluster. See you soon!