This post represents an interesting departure for me. Typically, I cover technical topics related to applications, Kubernetes, and portions of the “cloud native” world. In this piece, however, I’d like to address an issue much more steeped in people and process concerns; cloud operations.
My colleagues and I spend quite a bit of time interacting with administrators and operators representing a wide variety of organizations. Whether these conversations begin with a particular set of technologies or projects, the discussion inevitably turns to the differences in operating across public cloud and software-as-a-service environments. From these conversations, our team has observed a few common patterns in organizations successfully running production environments in the public cloud. In sharing what we’ve learned here, I hope we can contribute to community conversation on this topic.
Admittedly, “public cloud operations” is quite a broad topic. I could write hundreds of pages and barely scratch the surface of the complex web of personnel, process, and tooling factors in play. In this piece, I’ll focus specifically on how organizations grant, constrain, and assure access to domain and services in public cloud environments. This is a different process in the public cloud than it is in a private data center environment, as the level of control the consuming organization has over its deployed assets is different. In the public cloud, the consumption and configuration motions are programmable. They don’t typically require the same amount of overhead to manage as they did on-premesis. The policy aspect, however, becomes increasingly important with the ease of consumption in public cloud environments. To outline these concepts, I will rely on a metaphor comprised of two entities; fences and gates.
In this context, fences represent hard boundaries. These are the organizational guardrails set around environments and services. As an example, cloud region(s) (such AWS us-west-2), or sets of approved services (eg VPC/EC2/EBS/RDS) can serve as “fences”. These boundaries are intended to be as stringently and broadly applied as possible. Within these guardrails, the intent is to allow users to deploy and configure as they need to. I think it’s important to note, fences aren’t any particular cloud construct, but rather a policy boundary. The most pervasive method for enforcing these policy domains is via the cloud providers identity and access management (IAM) capabilities. One of the most effective fences customers have identified for us is a small (in number) and well defined set of IAM roles with carefully designed permissions. When the number of roles increases, operational complexity appears to increase exponentially. A relatively small number of roles tends to allow for a lower friction onboarding of new services and environments, as fewer policies need to be modified.
Being a visual person, I believe an image might be of assistance here. As I was beginning to write this piece, an aerial view of farmland came to mind. It’s been a rather common sight for me as I travel over the years, and I think it encapsulates the ideas of fences well. The fields tend to be large, with well defined borders. Also, the contents within the individual plots; at least from the air, appear to be uniform within their boundaries.
But wait, weren’t there also gates involved in this metaphor? Why yes, astute reader, there were. The gates referenced above are the interconnection points between the areas contained by fences. In essence, they are the points of interconnection between policy domains. As a result, gates must have the ability to enforce policy as well. Software appliances such as loadbalancers, proxies, or security appliances represent one class of “gates”. These constructs exist in the data plane, as transit points for the traffic into, out of, or between environments. However, IAM may act as a gate as well. With much of the policy language of the public cloud enforced in the IAM layers of cloud providers, IAM becomes the construct that is modified in order to accommodate alterations in a user’s or group’s privileges, or to enable access to a new service. In this way it is a gate with respect to the processes and tooling implementing policies, with the ability to enforce policy and connect multiple policy domains (IAM roles).
Since we’ve established my affinity for visual examples, I’d like to introduce another. In the previous image, I can easily imagine crossing on the road interconnecting the fields. These could examples of the gate concept I’ve laid out here. However, I have another image I’d like to use here, as I personally find it a bit more apt. In a Venn diagram, the large circles could also be used to represent the concept of fences. If we accept that, then I think of the overlapping portions of these circles as gates. These sections are where both sets of policy are simultaneously enforced, and altering the policies described by the circles fundamentally alters which objects (users in our metaphor) fall into which sections of the diagram.
As discussed above, there are advantages to keeping the number of “fences” utilized by an organization as constrained as possible. By doing so, the number of gates should also be effectively kept in check. We have observed a correlation between the breadth of a “fenced” environment, and the complexity of the “gates” that connect it to other domains. While it may be tempting to use powerful IAM layers to assign very granular permissions to individual users, this can very easily become operationally untenable.
To highlight this last point, I’d like to include the following image:
Imagine for a moment that a single user is represented by a dartboard. Let’s say that we have narrowed the ideal policy for that user to a single sector of the board, such as the bull’s eye. Now imagine adding a user, and thus a second dartboard to the equation. That user has a separate and distinct ideal policy represented by a different sector, such as the ‘triple 20’ space on the board. Finally, add dartboards for every other user in your organization and wallpaper them across the aerial crop view we discussed in the “fences” section. Attempting to look at all of these hypothetical dartboards at once reveals little to no actionable information, which focusing in on a single board makes designing applicable policies very difficult. Either way, information is lost, and savvy policy design becomes extremely difficult. If the fences become broader and the domains bigger as a result, a landscape featuring an entire organizations worth becomes easy to interpret as a complete set.
I hope these “fences and gates” provide a useful metaphor to frame part of the challenge organizations face in modern cloud operations. I’ll be interested to hear if these patterns match what the rest of the community is experiencing, and how those of you responsible for public cloud administration and operations are handling these forces. I look forward to hearing and discussing these thoughts online or in person at an upcoming event.