Cloud KMS Fundamentals for Enterprise: Part 2
TL;DR
As a security architect, you might think you need to centralize your KMS keys into a single project for security because you would centralize administration when you do it on-prem, but you’re probably better off decentralizing when you move to Google Cloud. By this I mean allowing application owners to create and manage their own KMS keys, but also empower security engineers to build best practices and templates to ensure app owners are doing what they are supposed to do. Using this “trust but verify” paradigm, you’re building guardrails not roadblocks. Ok now, without any further cliches or buzzwords, let’s start with some background.
Background
In the previous article, we addressed fundamentals of KMS and encryption in general. If you are not fairly familiar, please have a look there. In this article we’ll talk about some different scaling strategies and some pros and cons of each.
If you already know about GCP IAM permissions as it relates to Cloud KMS, feel free to skip ahead to the next section.
Project Hierarchy
For those who are fairly new to Google Cloud Platform, resources (compute, data stores, services, etc.) are broken up into projects, which belong to a tree of folders all underneath a single node called the organization. Permissions can be granted at organization, folder, project or resource level in an additive fashion (meaning you cannot, as of the time of this writing, explicitly deny permissions that have been granted a level above).
This is important to understand when we talk about scaling out KMS usage since key and key ring usage permissions can be granted at any one of these levels. For more information on this, see the docs on Resource Hierarchy.
KMS Resources
Cloud KMS keys (commonly referred to as CMEKs or Customer Managed Keys), are organized into Keyrings within a project. As noted above, if I have a user who needs to use a CMEK, permissions can be granted to access that key at a Project, Folder or Organization level as well as a Keyring or Key level. You might be thinking, “I follow the Principle of Least Privilege so therefore access should only be granted at the Key level”, but the answer may not be so simple. You also have to consider the management of those keys, how they are used, and how you can audit their usage.
Two Models as a Spectrum
If you are in an enterprise with multiple data centers, you may be trying to follow a traditional key management system model, like NIST 800–152. There are a lot of things that can go wrong in cryptographic key generation, like having insecure defaults, or transporting keys in an insecure manner. Many organizations solve these problems by centralizing key management in one organization that is responsible for the key generation equipment, key quality, and delivering keys to the users and applications who need them.
In a Cloud setting, however, high-quality key generation systems with secure defaults are available to every application. And, by using a model like Cloud KMS where the key material never leaves the service, security issues due to insecure transmission can be avoided. This gives organizations options to move key usage closer to the users and systems that need them, and allows better cost and resource accounting.
One concern many organizations might have is that auditing can appear to be easier if there is one organization controlling all cryptographic keys. However, Cloud systems like Google Cloud have asset management systems that make tracking the use of Cloud KMS easier, so that audit teams can maintain good visibility into how and where the keys are used. The audits can also confirm the parameters used for key generation so that the keys are known to be fit for purpose.
There are probably far more ways to manage KMS keys than these two methods, but solutions usually fall somewhere between a Centralized and Decentralized model, which we’ll explain below.
Centralized Model
This is the centralized model, which many enterprises choose because it’s more similar to their data center tooling, such as a KMS server or HSM. In this model, one or multiple centralized projects are used specifically to manage keys in KMS. When you first look at this model, you may think, this is the most secure option because you have “Separation of Duties” i.e. one party manages access to keys and another party manages access to data. While this does adhere to separation of duties, it’s not the only way and doesn’t necessarily mean it’s more secure. More to come on this in a moment.
For this model to work at scale, there needs to be some central ticketing system, such as ServiceNow or JIRA that allows keys to be generated automatically. The application owner would fill in some information about her application and a key would be created and granted to the users and service accounts specified, assuming some approval process.
Variation 1: Single Shared Key Project Per Organization
This first variation is similar to the way things are done on-prem, where you would have one KMS cluster or HSM that generates keys for everything. In this variation of a centralized model, we’ll have to put ALL our trust in the teams or admins who are managing key permissions, hereafter called “Key Admins”. While you have separation of duties in the sense of separation of the team managing keys versus the teams using them, you do not have separation between data classifications or environments. This means we are open to a breach due to human error. It’s very common in setups like this to accidentally grant prod access to dev, or other types of misconfigurations, which is why you probably shouldn’t use this variation.
While it’s not a recommendation, it’s important to understand this variation since it is on the far “fully centralized” end of this spectrum.
Variation 2: Single Shared Key Project Per Environment
In the next variation, we have a single project, ideally per environment (i.e. dev, test, staging, prod) or data classification (i.e. public, internal, confidential, secret). Like the first variation, we are sharing keys between projects, but this becomes much more secure because we’re breaking out the Key Project per environment. This gives us much stronger isolation between critical environments such as dev and production or data classifications such as internal and secret. However, as we’ll come to find out, there’s a down-side to this as well.
Variation 3: Single Shared Key Project Per Application
This variation is moving even further toward the decentralized side of the spectrum with a key project per application. You can break this out even further by saying you’d need a key project per application and per environment. While this gives us much stronger isolation between applications, it may be more difficult to manage. Since this assumes the application owner (i.e. user of the key) still has to request access to have a key created, it dramatically increases the number of projects the key management team needs to maintain and know about. It also generally makes it more difficult to reason about your organizational hierarchy if for every application, you have a key management project.
Similar to Variation 2, this variation definitely needs some automation to work at scale. Not only would you need to automate the creation of the key, but also the KMS project itself for every new application.
Auditing in a Centralized Model
You may be thinking that since the same number of keys need to be created and managed (assuming least privilege) regardless of the level of centralization, it should be easier to audit usage if all the keys are in one particular place. While the premise is correct, the conclusion is not, necessarily. This is because if keys are stored in a single, or even a small number of shared projects, auditing requires:
- The contextual information about the key (to determine how and where the key is used) to exist as metadate alongside the key itself.
- Contextual information about who should be granted access to that particular key be stored alongside that key or in some auditable store such as a ticketing system.
- That users of that key not only supply all that information upon key creation, but also keep it up to date.
If you’re like me, and have done audits on infrastructure (cloud or otherwise) you know that application owners tend to do the bare minimum when it comes to security processes. If you want them to fill out a form to get a key, they’ll only give you a vague description of what it’s for and you’ll have a ton of work to do to get them to keep that information up to date. If you get a notification a year from then that some new person or service needs access, you’ll likely have to go back to the original app owner to get that info, assuming they haven’t left the company. Subsequently, app owners often don’t know exactly what service accounts or human users will need access to the key when they are requesting it, which means you’re likely to have to go back and do this process multiple times.
Think about the following scenario for a second. Assume you are a Key Admin and want to do an audit of all your keys to make sure that they all have the least privilege possible. What this would look like in a centralized model is something like:
- For each keyring/key determine who is using it. Your only option is to look at the metadata for these keys. This is assuming that’s all accurate, which may not be the case in a large enterprise
- Now you’d want to look at how those keys are being used. Since you don’t have insight into the application to which these keys are tied, you are limited to log messages in your SIEM that correspond to Encrypt/Decrypt operations.
- Next you’d need to verify, somehow, that only those users who are supposed to have access to the keys actually do. This becomes tricky since you, as the Key Admin, with properly separated duties, have no idea what the application does apart from what the application owner tells you. There’s no way to really verify it so you need to rely on an email back to the app owner to verify, effectively delegating trust to the app owner.
So if we have to trust the app owner to input the right information and tell us who should have access to the keys. What benefit do we have from centrally managing and controlling the keys?
Why do we need to control keys?
Back in the datacenter, this model made sense because you have a single, or small set of KMS systems and/or HSMs that have to be managed by one team. Primarily, this is due to complexity as well as compliance. For example, “I need a key that is FIPS 140–2 Level 3 validated” or “I need to ensure that my key uses AES256 and is rotated every 30 days”. These kinds of requirements were placed on the Key Admin team to ensure because that’s how on-prem KMS systems work, i.e. limited number of licenses, or difficulty to use, etc. However in GCP, that’s no longer the case.
What we really want to control about our keys is generally, who can use them, for what purposes, and with what configuration. With Cloud KMS, you don’t have to have a single project to control all of these. Rather, you can empower application owners to create and manage keys for their own applications, while the Key Admin or security team simply provides best practices for key usage, patterns or tools to make this easy to comply with, and enforcement mechanisms to ensure that these standards are kept.
At the end of the day, you’re likely already trusting your application owners to tell you who needs to use keys and for what purpose. With Cloud KMS, you can also give them the tools they need to create and manage their own keys. With HashiCorp Terraform (or other IaC), you have the ability to create patterns as code for proper KMS key generation. Finally with Cloud Asset Inventory, you have the capability to monitor all the keys centrally, whether they are managed centrally or by the app owner, to ensure they all are configured correctly. With that introduction, let’s take a look at a decentralized model and if it might work for your enterprise.
Decentralized Model
In the decentralized model, an application owner takes control of the keyrings and keys they need for their application. The rationale here is that application owners know the most about the applications they maintain and how keys need to be consumed. Thus they are the source of truth for who “should” have access and for what purpose. This does not mean security loses visibility however. With Google’s Cloud Asset Inventory, security teams can see a snapshot of all the resources and their configuration in the organization even if they are decentralized and management is delegated to application owners. Subsequently, with Cloud Logging able to aggregate logs into a single pane of glass, you have full visibility of key usage.
Auditing in a Decentralized Model
If a service account from one project is expected to use a key from another project, the only way we can tell if that operation is allowed is by parsing the name of the project, service account and/or key.
From a security perspective, it is actually far simpler to ensure proper key usage in a decentralized model. Let’s say I know that App1 and App2 both live in separate GCP projects and have corresponding service accounts. In my audit logs, when a service account uses that key, I’ll get some tangible fields to go on:
- Service account name
- Project Name
- Key/Keyring Id and Location
- Any custom labels I’ve created
In a centralized model, I would necessarily have to grant access outside my Key Management project to Encrypt/Decrypt with a key that I generated, as we’ve discussed. If a service account from one project is expected to use a key from another project, the only way we can tell if that operation is allowed is by parsing the name of the project, service account and/or key.
Conversely in a decentralized model, it is incredibly easy to to write a rule in your SIEM of choice that ensures that if there is a service account that belongs to App1, it should not be allowed to use a KMS key meant for App2 because they are in separate projects. In fact you could even more tightly control this by making use of VPC Service Controls. By disallowing sharing of keys or other unnecessary resources between projects, you are building much stronger isolation between projects. This allows a very simple log query to make sure that all KMS Decrypt operations (for example) only happen from the same project:
protoPayload.methodName="Decrypt" AND NOT protoPayload.authenticationInfo.principalEmail:"$PROJECT_ID"
This also prevents the kind of misconfiguration where a key is used to encrypt data of different levels or a different environment. For example, if the process to obtain a key takes a long time, maybe an app owner takes some liberties and uses a key they meant for dev data for prod data. This kind of thing happens all the time in large enterprises. However, in a decentralized model, app owners do not need to wait on a long process to create a key, they simply use a pre-configured template to create one for their purpose. If security enforces that keys should only be used within their own projects, they can also enforce that keys in a dev project will not be used in a prod project.
Finally IAM permissions is much easier to manage if all the CMEKs for a project are used within that project only. This may mean you can allow project-level access to KMS Encryption, whereas in the centralized project you would need to scope that access to the key level. This is important because it’s much easier to view project-level permissions in the console at a glance. To get the same kind of view of key-level permissions as a list, you’d need to make a script that uses the gcloud utility to programmatically get the IAM policy document for each key and key ring. This can be fairly complex and difficult to parse.
Templates for best practices
Using Infrastructure as Code (IaC), i.e. Terraform or something similar, you can create a “module” for a KMS key that includes all the necessary configuration for compliance. Here’s an example of some Terraform code that generates a key with specific parameters.
As you can probably imagine, this can be modularized and be included as a dependency of application infrastructure code. If the application owner did not use this module and made a key that was not compliant. It is also possible to trigger an alert or even an auto-remediation based on the Cloud Asset Inventory mentioned above.
Conclusion
In this article we talked about a spectrum of key management solutions ranging from fully centralized to fully decentralized and a few options in between. I have seen customers be successful and run into issues with all of these. However, from a security perspective, I tend to see the most issues stemming from using a centralized model due to lack of auditability and contextual awareness on the part of the managing team. It is definitely possible to have a secure cloud infrastructure using a centralized model, you just need to be incredibly careful to ensure you have all the context you need when the keys are created and automate as much as you can through ticketing systems.
Using Google Cloud KMS, a decentralized model allows you to:
- Delegate trust to app owners and give them the tools they need to create and manage their own keys securely
- Create patterns as code for proper KMS key generation.
- Monitor all the keys using Cloud Asset Inventory, to ensure they all are configured correctly.
- Write better heuristics for security alerting
All the while, you’ll be saving time and money for the business by removing a road block and replacing it with a guardrail. Remember that just because something works on-prem, doesn’t mean that’s the best way to do it in the cloud.