Automate multi-cloud service failover with sameness groups
Admin partitions are a multi-tenancy solution in Consul datacenters. You can use them to peer clusters across different datacenters. When services share the same names across Consul deployments with peered clusters, you can configure sameness groups to add automatic failover between services.
A sameness group is a logical collection of local admin partitions and remote admin partitions on cluster peers, where services with the same names are treated as the same service in terms of service failover. With sameness groups, you can setup and manage automatic service failover with fewer configuration steps.
In this tutorial, you will peer clusters deployed to different cloud providers and then configure a sameness group to automate multi-cloud failover in your Consul service mesh.
Scenario overview
HashiCups is a coffee-shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. In this tutorial, you will deploy HashiCups services on Kubernetes clusters deployed in two different cloud providers. By peering the Consul clusters, the services in one region can communicate with the services in the other. By using sameness groups, unavailable services from one peer will automatically fallback to services from another peer.
HashiCups uses the following microservices:
- The
nginx
service is an NGINX instance that routes requests to thefrontend
microservice and serves as a reverse proxy to thepublic-api
service. - The
frontend
service provides a React-based UI. - The
public-api
service is a GraphQL public API that communicates with theproduct-api
and thepayments
services. - The
product-api
service stores the core HashiCups application logic, including authentication, coffee (product) information, and orders. - The
product-api-db
service is a Postgres database instance that stores user, product, and order information. - The
payments
service is a gRCP-based Java application service that handles customer payments.
Prerequisites
To complete this tutorial, you should already be familiar with admin partitions and cluster peering in Consul.
Enterprise Only
The functionality described in this tutorial requires Consul Enterprise. To explore Consul Enterprise features, you can sign up for a free 30-day trial.
To complete this tutorial, you need:
- A valid Consul Enterprise license
- An HCP account configured for use with Terraform
- An AWS account configured for use with Terraform
- aws-cli v2.0 or later
- A Google Cloud account configured for use with Terraform
- gcloud CLI v461.0.0 or later with the
gke-cloud-auth-plugin
plugin installed - kubectl v1.27 or later
- git v2.0 or later
- terraform v1.2 or later
- consul-k8s v1.3.3
- jq v1.6 or later
This tutorial uses Terraform automation to deploy the demo environment. You do not need to know Terraform to successfully complete this tutorial.
Clone example repository
Clone the GitHub repository containing the configuration files and resources.
$ git clone https://github.com/hashicorp-education/learn-consul-sameness-groups
Change directories to the newly cloned repository.
$ cd learn-consul-sameness-groups
The repository has the following structure:
- The
dc1-aws
directory contains Terraform configuration to deploy an HCP Consul Dedicated cluster and an AWS EKS cluster inus-west-2
. - The
dc2-gcloud
directory contains Terraform configuration to deploy an GKE cluster inus-central1-a
. - The
consul-peering
directory contains Terraform configuration to automate peering of two Consul clusters. - The
k8s-yamls
directory contains Consul custom resource definitions (CRDs) that support this tutorial. - The
hashicups-v1.0.2
directory contains YAML configuration files for deploying HashiCups.
Deploy Kubernetes clusters
In this section, you will deploy the infrastructure for this tutorial. You will use Terraform to create an HCP Consul cluster, deploy a Kubernetes cluster on each cloud provider, and deploy Consul dataplanes alongside services in each Kubernetes cluster.
Deploy HCP Consul Dedicated and the first Kubernetes cluster on AWS
Initialize the Terraform configuration for dc1-aws
to download the necessary providers and modules.
$ terraform -chdir=dc1-aws init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...
By default, dc1-aws
deploys to us-west-2
. You can change the region your workloads run in by using the terraform.tfvars.example
template file to create a terraform.tfvars
file.
$ cp dc1-aws/terraform.tfvars.example dc1-aws/terraform.tfvars
dc1-aws/terraform.tfvars
vpc_region = "us-west-2"
hvn_region = "us-west-2"
Deploy the resources for dc1
. Confirm the run by entering yes
.
$ terraform -chdir=dc1-aws apply
## ...
Plan: 112 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 112 added, 0 changed, 0 destroyed.
Outputs:
cluster_name = "learn-consul-sameness-dc1"
consul_datacenter = "learn-consul-sameness-dc1"
consul_token = <sensitive>
hcp_consul_ca = <sensitive>
region = "us-east-2"
It takes about 15 minutes to deploy your infrastructure. To save time while waiting, you may proceed to the next section of this tutorial and begin the second datacenter deployment in parallel.
Note
If your HCP account has access to multiple organizations or projects, you may encounter a Terraform error related to an unexpected number of organizations or projects. If you receive this error, use the HCP_PROJECT_ID
environment variable to specify which HCP project you want your Consul cluster deployed in. For more information, refer to the Terraform HCP provider documentation.
After you deploy the first datacenter, configure the kubectl
tool to interact with it. The following command stores the cluster connection information in the dc1
alias.
$ aws eks \
update-kubeconfig \
--region $(terraform -chdir=dc1-aws output -raw region) \
--name $(terraform -chdir=dc1-aws output -raw cluster_name) \
--alias=dc1
Deploy the second Kubernetes cluster
Place the contents of your Consul Enterprise license into a file named consul.hclic
. A Consul license is a requirement for the second cluster only. The first cluster does not need a license because it uses HCP Consul Dedicated, which already contains the Enterprise feature set.
$ touch consul.hclic
Next, initialize the Terraform configuration for dc2-gcloud
to download the necessary providers and modules.
$ terraform -chdir=dc2-gcloud init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...
Use the terraform.tfvars.example
template file to create a terraform.tfvars
file. Then set your Google Cloud project ID in the project
variable. By default, dc2-gcloud
deploys to us-central1-a
. You have the option to change the region in the variables file.
$ cp dc2-gcloud/terraform.tfvars.example dc2-gcloud/terraform.tfvars
dc2-gcloud/terraform.tfvars
project = "XXXXXXXXXXXXXXXXXXXXXXX"
zone = "us-central1-a"
Then deploy the resources for dc2
. To confirm the run, enter yes
. It takes about 10 minutes to deploy your infrastructure.
$ terraform -chdir=dc2-gcloud apply
## ...
Plan: 45 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 45 added, 0 changed, 0 destroyed.
Outputs:
get-credentials_command = "gcloud container clusters get-credentials --zone us-central1-a learn-consul-sameness-dc2"
project_id = "hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
rename-context_command = "kubectl config rename-context gke_hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx_us-central1-a_learn-consul-sameness-dc2 dc2"
set-project_command = "gcloud config set project hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
zone = "us-central1-a"
After you deploy the second datacenter, set the active Google Cloud project to reference the deployment in dc2
.
$ gcloud config set project $(terraform -chdir=dc2-gcloud output -raw project_id)
Updated property [core/project].
Next, obtain the Google Cloud credentials required to interact with the dc2
deployment.
$ gcloud container clusters get-credentials --zone $(terraform -chdir=dc2-gcloud output -raw zone) learn-consul-sameness-dc2
Fetching cluster endpoint and auth data.
kubeconfig entry generated for learn-consul-sameness-dc2.
Configure kubectl
to use the dc2
alias for the second datacenter.
$ kubectl config rename-context gke_$(terraform -chdir=dc2-gcloud output -raw project_id)_$(terraform -chdir=dc2-gcloud output -raw zone)_learn-consul-sameness-dc2 dc2
Context "gke_hc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx_us-central1-a_learn-consul-sameness-dc2" renamed to "dc2".
Review infrastructure and service deployments
Terraform deploys Consul on both your Kubernetes platforms. By default, Consul deploys into its own dedicated namespace (consul
). The following settings in the Consul Helm chart are mandatory for cluster peering.
global:
##...
peering:
enabled: true # mandatory for cluster peering
tls:
enabled: true # mandatory for cluster peering
##...
meshGateway:
enabled: true # mandatory for k8s cluster peering
##...
Inspect the Kubernetes pods in the consul
namespace to verify that Terraform deployed Consul in dc1
. Notice that there are no Consul servers in this Kubernetes deployment because they are running on the HCP platform.
$ kubectl --context=dc1 --namespace=consul get pods
NAME READY STATUS RESTARTS AGE
consul-api-gateway-96bbfdb55-kc2zc 1/1 Running 0 41m
consul-connect-injector-6b9b644469-vfvnv 1/1 Running 0 42m
consul-mesh-gateway-7856f98dd8-bkt2x 1/1 Running 0 42m
consul-webhook-cert-manager-84d694f9c9-bmr25 1/1 Running 0 42m
prometheus-server-8455cbf87d-sdcxc 2/2 Running 0 42m
The Consul API Gateway enables browser access to the HashiCups application. Terraform deploys the API Gateway in both datacenters. For more information on Consul API Gateway, refer to the Consul API Gateway tutorial.
Next, inspect the Kubernetes pods in the consul
namespace to verify that Terraform deployed Consul in dc2
. Because dc2
is a self-managed installation of Consul Enterprise, it includes Consul server pods in the Running
state. If any pods are in a failed state, run the kubectl --context=dc2 --namespace=consul logs <PODNAME>
command to inspect the logs and get more information.
$ kubectl --context=dc2 --namespace=consul get pods
NAME READY STATUS RESTARTS AGE
consul-api-gateway-7c85b597b-lnr6w 1/1 Running 0 16m
consul-connect-injector-6d5864f969-tpmwf 1/1 Running 0 19m
consul-mesh-gateway-788cbd8448-7xjdb 1/1 Running 0 19m
consul-server-0 1/1 Running 0 19m
consul-server-1 1/1 Running 0 19m
consul-server-2 1/1 Running 0 19m
consul-webhook-cert-manager-84d694f9c9-km4g5 1/1 Running 0 19m
prometheus-server-8455cbf87d-zth86 2/2 Running 0 19m
Explore HashiCups in browser (optional)
Open HashiCups from dc1
in your browser and verify that it is operational. If you receive an error, wait a few minutes for the deployment to be ready before trying again.
$ echo http://$(kubectl --context=dc1 --namespace=consul get services consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
http://a336ea2854e1c4f3294470eed4975c42-388180783.us-west-2.elb.amazonaws.com
Open HashiCups from dc2
in your browser and verify that it is also operational. It may also take a few minutes for this deployment to be ready.
$ echo http://$(kubectl --context=dc2 --namespace=consul get services consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
http://1.2.3.4
Peer the Consul clusters
Before you can use sameness groups with the two Consul clusters, you need to peer them. Cluster peering lets you connect two or more independent admin partitions so that services deployed to different Consul datacenters can communicate. In this section of the tutorial, you will use Terraform to peer the HCP Consul dedicated cluster in dc1
with the self-managed cluster in dc2
. For more information, including how to peer clusters manually, refer to the Connect services between Consul datacenters with cluster peering tutorial.
Initialize the Terraform configuration to download the necessary providers and modules.
$ terraform -chdir=consul-peering init
Initializing the backend...
## ...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...
In order to peer the two clusters, Terraform needs a variables file with the endpoint addresses and a valid token for dc1
and dc2
. Run the following command to generate a variables file.
$ cat <<EOF \
>consul-peering/terraform.tfvars
dc1_address = "$(terraform -chdir=dc1-aws output -raw consul_public_url)"
dc1_token = "$(terraform -chdir=dc1-aws output -raw consul_token)"
dc2_address = "$(kubectl --context=dc2 get svc -n consul consul-ui -o jsonpath='{.status.loadBalancer.ingress.*.ip}')"
dc2_token = "$(kubectl --context=dc2 get secrets -n consul consul-bootstrap-acl-token -o go-template='{{.data.token | base64decode}}')"
dc2_certificateauthority = "$(kubectl --context=dc2 get secrets -n consul consul-ca-cert -o jsonpath='{.data.tls\.crt}')"
EOF
Inspect the contents of the generated variables file in consul-peering/terraform.tfvars
- it contains the addresses of each Consul cluster, a token for performing the peering operations, as well as the CA certificate of dc2
.
consul-peering/terraform.tfvars
dc1_address = "https://learn-consul-sameness-dc1.consul.3eecc579-d274-4792-ae72-22d2035a4c1d.aws.hashicorp.cloud"
dc1_token = "<sensitive>"
dc2_address = "34.135.181.34"
dc2_token = "<sensitive>"
dc2_certificateauthority = "..." # output trimmed for brevity
Deploy the cluster peering. Confirm the run by entering yes
.
$ terraform -chdir=consul-peering apply
## ...
Plan: 6 to add, 0 to change, 0 to destroy.
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
Create a sameness group
Sameness groups make automatic failover possible between services with identical names in different datacenters. If Consul namespaces are enabled, they must also match in order for the failover to happen. For more information about preparing your Consul network for sameness groups, refer to the recommendations for sameness groups.
To use sameness groups in your network, you need to perform the following steps in each datacenter:
- Create the sameness group
- Export services to sameness group members
- Create service intentions
To configure this sameness group to failover to other service instances in the sameness group by default, create a configuration entry for the sameness group that sets spec.defaultForFailover=true
and list the group members in the order you want to use in a failover scenario. Refer to failover with sameness groups for more information.
The following CRDs with these configurations are included in this tutorial's repository:
dc1-sg-hashicups.yaml
apiVersion: consul.hashicorp.com/v1alpha1
kind: SamenessGroup
metadata:
name: hashicups
spec:
defaultForFailover: true
members:
- partition: default
- peer: learn-consul-sameness-dc2-default
The spec.members.peer
stanza contains the peered datacenter name learn-consul-sameness-dc2
from the cluster peering page in HCP Consul, joined with a dash and the peered partition name default
. If you formatted cluster names differently, use the consul peering
CLI command to return a cluster's active cluster peering connections. For a full list of attributes for the SamenessGroup CRD, refer to the Sameness Group configuration entries.
Apply the dc1-sg-hashicups.yaml
resource to the first Consul cluster:
$ kubectl --context=dc1 apply -f k8s-yamls/dc1-sg-hashicups.yaml
samenessgroup.consul.hashicorp.com/hashicups created
Apply the dc2-sg-hashicups.yaml
resource to the second Consul cluster:
$ kubectl --context=dc2 apply -f k8s-yamls/dc2-sg-hashicups.yaml
samenessgroup.consul.hashicorp.com/hashicups created
Export service to other partition in the sameness group
The goal of this tutorial is for the HashiCups application in dc1
to fallback to the public-api
service in dc2
. Because this fallback goes in one direction, you only need apply the exported service CRD in dc2
.
To make the public-api
service available to other members of the sameness group, apply an exported services CRD. In this CRD, the sameness group is the consumer
for the exported services.
The following configuration file demonstrates how to format an ExportedServices
CRD. In this example, Consul exports the public-api
service in the local default
namespace to the sameness group. The metadata.name
stanza refers to the local partition that the service is being exported from. The spec.services.[].namespace
stanza reflects the local partition’s namespace from which the service is exported.
apiVersion: consul.hashicorp.com/v1alpha1
Kind: ExportedServices
metadata:
name: default
spec:
services:
- name: public-api
namespace: default
consumers:
- samenessGroup: hashicups
For more information about exporting services, including examples of CRDs that export multiple services at the same time, refer to the exported services configuration entry reference.
Apply the ExportedServices
CRD in dc2
.
$ kubectl --context=dc2 apply -f k8s-yamls/exp-hashicups.yaml
exportedservices.consul.hashicorp.com/default created
Verify that the service from dc2
was exported to dc1
successfully. The following command queries the dc1
cluster about its peering connection with dc2
:
$ curl \
--silent \
--header "X-Consul-Token: $(terraform -chdir=dc1-aws output -raw consul_token)" \
"$(terraform -chdir=dc1-aws output -raw consul_public_url)/v1/peering/learn-consul-sameness-dc2-default" \
| jq
The following example of this command's output highlights the name of the datacenter peer, the state of the peering connection, and a list of imported services.
{
"ID": "be47e9d9-f3bb-6df5-0f76-99a456f18858",
"Name": "learn-consul-sameness-dc2-default",
"Partition": "default",
"State": "ACTIVE",
"PeerCAPems": [ "OMITTED FOR BREVITY" ],
"StreamStatus": {
"ImportedServices": [
"default/default/public-api"
],
"ExportedServices": null,
"LastHeartbeat": "2024-02-27T10:43:40.240719241Z",
"LastReceive": "2024-02-27T10:43:40.240719241Z",
"LastSend": "2024-02-26T14:48:05.907301818Z"
},
"CreateIndex": 4251,
"ModifyIndex": 22038,
"Remote": {
"Partition": "default",
"Datacenter": "learn-consul-sameness-dc2",
"Locality": {
"Region": "us-central1",
"Zone": ""
}
}
}
Then, access the Consul UI for dc1
to verify that the sameness group actively supports datacenter failover for the public-api
service.
$ echo "$(terraform -chdir=dc1-aws output -raw consul_public_url)/ui/$(terraform -chdir=dc1-aws output -raw consul_datacenter)/services/public-api/routing"
https://learn-consul-sameness-dc1.consul.3eecc579-d274-4792-ae72-22d2035a4c1d.aws.hashicorp.cloud/ui/learn-consul-sameness-dc1/services/public-api/routing
In the Consul UI, click Services. Click public-api and then Routing.
The Resolvers for public-api
lists the local instance, as well as the name of the cluster peer that resolves the remote instance: learn-consul-sameness-dc2-default
.
Simulate and observe service failure
Retrieve the API gateway URL of the first datacenter. Open it in your browser to view the HashiCups application.
$ export APIGW_URL=$(kubectl --context=dc1 get services --namespace=consul consul-api-gateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') && echo "http://$APIGW_URL"
http://a336ea2854e1c4f3294470eed4975c42-388180783.us-west-2.elb.amazonaws.com
The nginx
service connects to public-api
to retrieve a list of coffees. When the services can communicate with each other, HashiCups displays a selection of coffees in your browser.
To simulate failure, delete public-api
service from dc1
.
$ kubectl --context=dc1 delete -f hashicups-v1.0.2/public-api.yaml
service "public-api" deleted
serviceaccount "public-api" deleted
servicedefaults.consul.hashicorp.com "public-api" deleted
deployment.apps "public-api-v1" deleted
Verify that there are no active public-api
deployments in dc1
.
$ kubectl --context=dc1 get deployments public-api
Error from server (NotFound): deployments.apps "public-api" not found
Refresh the HashiCups UI in your web browser.
At this point, the frontend
service in dc1
is configured to connect to its local instance of the public-api
service, however there is currently no instance of public-api
on dc1
. Before the failover to use the instance of public-api
hosted in the sameness group, you must configure a Consul service intention for the public-api
service in dc2
to authorize traffic.
Authorize traffic between sameness group members and observe recovery
The ExportedServices
CRD does not automatically grant permission to accept traffic from a remote service. You must also create a service intention that references the sameness group.
The following configuration file demonstrates how to format a ServiceIntentions
CRD so that a service named public-api
becomes available to all instances of nginx
deployed in all members of the sameness group. In the following example, public-api
is deployed to the default
namespace and default
partition in both the local datacenter and the remote sameness group. The ServiceIntentions CRD includes two rules. One rule authorize traffic to the local service and the other authorizes traffic to the sameness group.
intentions-samenessgroup.yaml
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
name: public-api
namespace: default
spec:
sources:
- name: nginx
namespace: default
samenessGroup: hashicups
action: allow
- name: nginx
namespace: default
action: allow
destination:
name: public-api
Refer to create and manage intentions for more information about how to create and apply service intentions in Consul.
Apply the ServiceIntentions
CRD on the destination cluster dc2
.
$ kubectl --context=dc2 apply -f k8s-yamls/intentions-samenessgroup.yaml
serviceintentions.consul.hashicorp.com/public-api configured
Refresh the HashiCups UI in your web browser.
With a service intention that allows the connection, Consul is now able to route the request for the public-api
service in dc1
to the public-api
instance in dc2
. Because these services are in the sameness group and defaultForFailover
is enabled, the HashiCups UI automatically recovers and shows the list of coffees.
Without sameness groups, you must define exported services and service intentions for each service and each admin partition. However, the number of CRDs required grows exponentially, and changes to a service in a single datacenter would require re-applying the CRDs after each change. In multi-cloud deployments, these updates would be applied separately for each cloud provider.
Sameness groups allow you to export and authorize services in a point-to-multipoint approach by managing one sameness group membership and intention per datacenter. The consumers of the service can reside in different regions or cloud providers without complicating the connectivity matrix. Enabling connectivity with sameness groups is not affected by the number of datacenters involved and always consists of two steps: one sameness group CRD per datacenter and one service intention per destination service.
Clean up environments
After you complete this tutorial, you can stop Consul, the demo application, and remove the Kubernetes cluster in each datacenter. Begin the cleanup by removing the peering between the two Consul clusters.
$ terraform -chdir=consul-peering destroy
##...
Plan: 0 to add, 0 to change, 6 to destroy.
##...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
##...
Destroy complete! Resources: 6 destroyed.
Next, destroy the supporting infrastructure in your second datacenter. Due to race conditions with the various cloud resources created in this tutorial, you may need to run the destroy
operation twice to ensure all resources have been properly removed.
$ terraform -chdir=dc2-gcloud destroy
##...
Plan: 0 to add, 0 to change, 112 to destroy.
##...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
##...
Destroy complete! Resources: 112 destroyed.
Next, destroy the supporting infrastructure in your first datacenter. Due to race conditions with the various cloud resources created in this tutorial, you may need to run the destroy
operation twice to ensure all resources have been properly removed.
$ terraform -chdir=dc1-aws destroy
##...
Plan: 0 to add, 0 to change, 45 to destroy.
##...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
##...
Destroy complete! Resources: 45 destroyed.
Next steps
In this tutorial, you used Consul sameness groups to automate failover across two Consul clusters in different cloud providers. In the process, you learned about the benefits of using Sameness Groups for highly available applications over multiple cloud provider deployments.
Feel free to explore these tutorials and collections to learn more about Consul service mesh, microservices, and Kubernetes security.