This blog provides recommendations to Temporal platform teams to manage Temporal Cloud deployment using Terraform.
Introduction
Over the past few months, I’ve worked with multiple teams building business-critical systems on Temporal Cloud, helping them operationalize their Temporal Cloud deployments using Terraform as the primary infrastructure-as-code interface. Temporal Cloud offers two methods for infrastructure automation: Terraform and Cloud Operations API.
When choosing between Terraform and Cloud Ops API, my recommendation is to use Terraform unless it doesn't address your requirements. Terraform is declarative, reproducible, and has built-in mechanism to prevent configuration drift, ensuring that the declarative state is aligned with the actual deployed state. These capabilities are difficult to achieve when you are building on top of imperative API calls, like Cloud Ops API.
Temporal Cloud Terraform provider is implemented using the Cloud Ops API.
In this post, I’ll walk through my recommendations for managing Temporal Cloud using the Terraform Temporal Cloud provider.
Terraform foundation
There’s no shortage of guidance on how to write and organize Terraform code. To keep things DRY, I’ll point to three resources that have meaningfully influenced how I structure Terraform projects:
- Coding style
- Adoption framework
Below, I will highlight a few essential Terraform best practices.
📌 Pin provider versions explicitly
In the past, I used ~>
, >=
, and <=
to constrain provider versions in production, and that caused a deployment incident when one of the providers published a backward-incompatible update.
Because most Terraform providers are maintained by engineers from third-party organizations or the community, it is inevitable that a mistake would be made.
Hence, my recommendation is to always pin specific provider versions and upgrade only after testing, to avoid breakages from backward-incompatible changes.
terraform {
required_providers {
temporal = {
source = "temporalio/temporal"
version = "= 0.8.3"
}
}
}
To streamline provider version upgrades, I recommend using an automated dependency tooling like Dependabot.
🌐 Use remote state
By default, Terraform uses a local
backend, which stores state as a local file on disk.
This makes coordination between team members and automation difficult and insecure.
Hence, I recommend using a remote backend (like azurerm
, gcs
, s3
) and restrict access to the remote state file to required user & service accounts.
The best practices to using remote state in Terraform is out of scope for this article. If you choose to use a remote backend, I recommend reviewing their respective documentation.
Temporal Cloud provider recommendations
When you are using the Temporal Cloud Terraform provider, you will have to make certain design decisions. I will highlight some of the decisions I encountered and share my recommendations.
📁 Namespace naming convention
Use the following pattern to name Temporal namespaces: <use-case>-<domain>-<region>-<environment>
Use the following rules to ensure that a namespace name doesn’t exceed 39 characters:
- Use at most 10 characters for use case (e.g.
payments
,fulfill
) - Use at most 10 characters for domain (e.g.
checkout
,notify
) - Use at most 5 characters for region (e.g.
aps1
,apse1
) - Use at most 3 characters for environment (e.g.
dev
,prd
)
Examples: payments-checkout-dev
, payments-checkout-prd
, fulfill-notify-prd
Why this pattern?
- Simple and easy to understand.
- Complies to Temporal Cloud namespace requirements
- Clearly separates environments (e.g.
dev
,prod
) - Groups related services under domains that organization has defined
- Namespace level system limits are isolated across different services and environments.
A Temporal Cloud account can have up to 100 namespaces. This is a soft limit, which can be increased by opening a support ticket.
🔎 Include standard custom search attributes
As you are designing a naming convention for your namespaces, you should consider a standard set of custom search attributes for all of your workflow executions.
To define ownership of workflow executions, it is recommended that you define an owner
attribute of type Keyword
that links workflow executions to their directly responsible individuals (DRIs).
This reduces the time required to find the DRI and mean-time-to-repair when a workflow exhibits faulty behaviors.
When defining custom search attributes in Temporal Cloud, carefully select the appropriate data type based on your query patterns — using Keyword
for exact matches, Text
for full-text search, and numeric types for range queries.
For the full list of supported types, see Custom Search Attributes | Temporal Doc.
🤖 Use a service account for Terraform
When you are using the Temporal Cloud Terraform provider, you have to supply a Temporal Cloud API key.
Avoid hard-coding API keys and credentials in Terraform configurations.
Temporal recommends passing the API key to their provider via environment variable (i.e. TEMPORAL_CLOUD_API_KEY
).
For infrastructure automation, it is recommended that you create a service account and generate an API key for the service account. Avoid using an user account API key because your infrastructure automation would likely break when a specific employee leaves your company.
🔄 Rotate your API keys
The maximum expiration time for an API key is 2 years. Based on your security policy, it is recommended to create a plan to rotate your API keys regularly. The steps to rotate your API key is documented here.
🔒 Protect namespaces from deletion
When managing Temporal Cloud namespaces using the UI and tcld
, deletion protection can be enabled to prevent accidental namespace deletion.
At the time of writing, the support for deletion protection is not available for the temporalcloud_namespace
resource.
Hence, it is recommended to use Terraform meta-argument prevent_destroy
to prevent business critical namespaces (e.g. production) from accidental deletion.
A feature request to add deletion protection support to temporalcloud_namespace
is filed here.
When the feature is shipped, I recommend switching from the prevent_destroy
approach to using the new feature.
👤 Prefer SSO and SCIM for user management
Temporal Cloud Terraform provider supports temporalcloud_user
for managing Temporal Cloud users.
When possible, it is recommended to use SCIM and SSO, instead of Terraform, to manage users.
By using these protocols, you eliminate the operational overhead of maintaining separate user credentials and permissions within Temporal Cloud, and centralize user management in your organization's identity provider.
SSO eliminates password fatigue and enforces your organization's authentication policies (including MFA requirements).
SCIM inherits the benefits of using SSO. In addition, SCIM automates user account provisioning and de-provisioning, ensuring access consistency between your identify platform and Temporal Cloud.
SSO is available for Business, Enterprise, and Mission Critical users (note that Business plan users require an add-on fee). SCIM is accessible to Enterprise and Mission Critical users.
📈 Configure metrics endpoint
Use the temporalcloud_metrics_endpoint
resource to configure your Temporal Cloud metrics endpoint.
If you are using Datadog, follow this to export Temporal Cloud metrics to Datadog.
If you are using Prometheus and Grafana, you should reference this to configure scraping and Grafana dashboards.
🧪 Test your Temporal Cloud setup
After provisioning your Temporal Cloud namespaces with Terraform, validate your deployment by executing a simple "hello world" workflow against your new namespaces to confirm that authentication, network connectivity, and configurations are established properly. A sample "hello world" workflow is available here.
Gotchas
🔑 temporalcloud_apikey
in Terraform state
As of v0.8.0, the API key value of the temporalcloud_apikey
resource is stored in the Terraform state.
If you are using temporalcloud_apikey
, I recommend following the best practice for securing your remote Terraform state.
If this is a problem for you, upvote this feature request.
⛰️ Changing namespace region
As of v0.8.0, your namespace will be replaced (destroy-then-recreate) when its region is updated. This operation will terminate running workflow executions and erase workflow execution history.
Updating the region of an existing namespace is currently not supported in Temporal Cloud.
If this is a problem for you, upvote this feature request.
Summary
To recap, here are the key takeaways for Terraforming Temporal Cloud effectively:
- Pin Terraform provider versions explicitly
- Store Terraform state remotely
- Create a naming convention for your namespaces
- Define standard custom search attributes
- Use a service account for Terraform
- Rotate your API keys
- Protect business critical namespaces from accidental deletion
- Use SSO and SCIM whenever possible
- Configure metrics endpoint
- Test your Temporal Cloud setup
You can find a reference Terraform implementation of Temporal Cloud at https://github.com/kawofong/temporal-terraform
I'd love to hear how you're tackling infrastructure challenges in your own Temporal Cloud deployments! Share your biggest Terraform + Temporal Cloud learning (and pain points) in the comments below.