On Establishing a Cloud Security Program

May 18, 2021 2021 cloud kubernetes strategy

Congratulations! You have been tasked with establishing a cloud security strategy. Now what?

In this post, I’m going to walk through actionable advice that can be undertaken to establish a cloud security program aimed at protecting a cloud native, service provider agnostic, container-based, offering.

The Goal: a Roadmap for Cloud Security Teams

Security strategies focusing on cloud native solutions are becoming prominent within the industry, but it feels like everyone is trying to - due to a lack of shared knowledge - reinvent the wheel every time.

Infact, there are not many public resources describing how to approach this topic: although different resources cover specific aspects of specific use cases (e.g., how to do container scanning, or how to deploy Open Policy Agent), there is a lack of a single holistic view on how to integrate everything together.

In this post, I will start with the foundations, and go through the different milestones (or maturity levels) required to reach a “best in class” solution to support and secure a product that span across multiple service providers (hence the requirement of not being tied to platform-specific solutions), runs on Kubernetes, and must comply with strict regulations (like the ones that apply to fintech companies).

The North Star

Before jumping into the details, I think it is important to define a “North Star” that can be used as a reference point (and driver) for the definition of your strategy.

These are the high-level goals that will then be reflected within the roadmap and mapped to actual controls that can be implemented. For cloud native solutions, I grouped these main pillars by the five functions of the NIST Cybersecurity Framework: Identify, Protect, Detect, Respond, and Recover.

Identify

Area Goals Architecture definition

Define and document architecture decisions, like network architecture diagrams to clearly identify high-risk environments and data flows, and threat model documentation to support the architecture definition.
Define and document a data classification scheme that classifies data according to its sensitivity and is used to ensure the implemented security controls are consistent, sufficient, and proportional.

Immutable infrastructure

Embed Infrastructure as Code (IaC) principles throughout the development, release, and deployment processes, so to ensure consistency and auditability of the resulting infrastructure.
Follow Secure Software Development Life Cycle (SSDLC) practices for IaC, and perform code reviews to validate any change to the infrastructure to confirm no reduction to the security controls are introduced.

Protect

Area Goals Known good state

Configure each core component of the infrastructure according to a known and approved secure baseline, based on best industry standards such as Center for Internet Security (CIS), Cloud Security Alliance (CSA), and National Institute of Standards and Technology (NIST)
Programmatically enforce the known good state, by ensuring there are no deviations from the baseline

Zero Trust model

Treat all hosting environments as hostile, encrypting data at rest and in flight, and retaining control of the associated cryptographic material
Enforce strong account authentication

Micro blast radius

Contain and respond to potential breaches, segregate networks, and provision accounts following least privilege principles

Strong authentication

Implement Authentication schemes to ensure that principals are strongly authenticated and the strength of each authentication mechanism increases proportionally with the criticality of the asset protected by it
Configure Identity and Access Management (IAM) to enforce strict account segregation and to require Multi-Factor Authentication (MFA) for sensitive operations and privileged accounts
Utilize Role-Based Access Control (RBAC) to manage access to resources and workloads
Continuously validate the known good state through regular scanning of account privileges, to ensure no privilege creep or permission drift arises

Continuous secure baseline validation

Continuously validate the approved secure baseline with an automated process integrated within the CI/CD pipeline which provides an inventory of assets, as well as validation of cloud deployments and cluster configurations

Detect

Area Goals Assumed breach

Assumed breach: at any given time your product, infrastructure, or an (even administrative) account could be compromised
Deploy controls to anticipate common Tactics Techniques and Procedures (TTPs) of attackers and identify potential Indicators of Compromise (IOCs)
Monitor the entire tech stack and thoroughly log events

Respond

Area Goals Containment

Leverage security monitoring to provide actionable events to trigger (semi-)automated containment
After containment is triggered, embed mechanisms for the forensic collection of evidence and recovery from the breach

Business continuity

Business continuity and security incident response plans shall also be subject to testing at planned intervals, or upon significant organizational or environmental changes

Recover

Area Goals Strong auditability and accountability

Consistently audit and assure immutable logs and traceability of the entire security solution

Subscribe to CloudSecList

If you found this article interesting, you can join thousands of security professionals getting curated security-related news focused on the cloud native landscape by subscribing to CloudSecList.com.

Building the Roadmap

As said, these high-level goals provide macro-areas that can be worked against, but they are very general (and open to interpretations). Taking a step further, how can they be applied to a cloud native platform, where multiple cloud service providers and Kubernetes clusters are involved?

Ideally, we would like to use a framework which:

Allows to embrace an agile approach (with multiple iterations, which enable continuous improvement).
Is transparent to other engineering teams (i.e., security teams should be low friction and not be blockers).
Will ultimately lead to a solution that is compliant with industry regulations (e.g., ISO27001, PCI DSS, etc.) by “default”.

Hence, I took the Cloud Security Alliance (CSA) Cloud Controls Matrix (CCM) and started performing a gap analysis and RACI matrix to map controls to Security teams, and selecting areas directly applicable to a cloud security team (i.e., excluding controls like physical security of a data center, usually not directly applicable to such teams). Then, I enhanced this list by adding cloud-specific controls I thought are essential for a comprehensive program (usually also backed by CNCF) and re-organized them in areas of interest.

In the sections below I will explain in detail these main areas (Domains), workstreams (Controls), and actionable Tasks which compose the Roadmap: from the definition of high-level security policies, network architecture, IAM, and assets inventory; to monitoring, code provenance, policy as code; and up to automatic enforcement of security policies, runtime anomaly detection, and business continuity.

Domains

Domains can be considered as “macro-areas” which can be used to group set of Controls:

Domain Description [1] Policies & Standards Definition of Security Policies and Standards which provide reference documentation on best practices for cloud security, with a particular focus on cloud providers and containerization solutions. [2] Architecture Definition and review of architectural decisions, with particular focus on network architecture, identity and access management, secrets management, and data classification. [3] Verification Continuously verify and enforce all cloud resources are abiding by the policies and expected baseline configuration. [4] Supply Chain Security Enforce security controls throughout the pipeline:

Image/Pod Security: enforcement of hardened base images and linting.
Continuous Integration (CI): IaC scanning (Dockerfiles, Kubernetes manifests, Terraform, etc.).
Continuous Delivery (CD): protect Supply Chain Integrity.
In-Cluster Controls: preventative controls like admission controllers.
Cloud provider-Specific Controls: deploy guardrails (SCPs/Org Policies), restrict access.

[5] Monitoring and Alerting Implement logging, monitoring, and alerting systems so to have visibility around activities and/or changes affecting the environments. [6] Incidents and Remediation Implement processes for containment, forensics, and automatic remediation of security violations. [7] Business Continuity Prepare countermeasures for unexpected incidents or disasters.

Controls

These domains can then be fleshed out into a variety of workstreams (or Controls).

Before exploring them in detail, it is worth noting that, generally speaking, a cloud security program can be implemented throughout a series of maturity levels. The sub-sections below will provide an overview of the main initiatives that, for each Domain, could be undertaken at each level of maturity.

Maturity Level 1 - The foundations

Definition of Security Policies: start by defining some overarching policies that will define your overall approach and that the business will have to abide by (e.g., Cloud Security Policy, Vulnerability/Patch Management Standard).
Architecture: review the network architecture and ensure proper segregation of environments (especially production), review the Identity and Access Management Framework, as well as how secrets management is performed.
Verification: start by getting the so-called “low hanging fruits” by validating no obvious misconfigurations (both at the CSP and K8s level) are present, as well as by starting obtaining a list of public endpoints.
Supply Chain: deploy container image scanning, and start restricting access to privileged AWS/GCP users.
Monitoring: start defining a security logging strategy (I provided examples for both AWS and GCP).

Maturity Level 2

Definition of Security Standards: continue developing standards covering more “advanced” topics like Key Management/Generation and Data Handling/Labeling.
Architecture: depending on the current state of IAM and Secrets management (found in Level 1), you might want to tackle processes like credentials management and user access provisioning.
Verification: start deploying a solution that can continuously provide an up-to-date asset inventory (for example, see “Mapping Moving Clouds: How to stay on top of your ephemeral environments with Cartography”). Improve the validation of the environments by deploying automation that can continuously report misconfigurations and drift.
Supply Chain: start working on securing the images used (define a list of base images and harden them). Enforce the use of these secure images in the CI/CD pipeline, and add automation able to scan Infrastructure as Code for security issues. Work with your Application Security team to ensure a system to prevent the leaking of secrets through the codebase is integrated into the pipeline.
Monitoring: deploy the security logging solution designed at Level 1, and ensure logs are collected from all environments. Start defining monitoring and alerting rules to act on indicators of compromise and/or known classes of issues.

Maturity Level 3

Definition of Security Standards: keep extending standards to cover Identity and Access Management, Encryption, Key Management/Generation, Data Handling/Labeling, Change Management.
Verification: provide continuous identification of deviations from defined Security Policies and compliance frameworks (e.g., via AWS Security Hub and GCP Security Command Center), with a process integrated within the security pipeline (i.e., your SIEM). Start deploying guardrails (e.g., SCPs and Org Policies) to prevent entire classes of misconfigurations.
Supply Chain: ensure automatic validation of the configuration of the Kubernetes clusters and running containers is performed so to detect any misconfiguration. Address hardening of the AWS/GCP organizations.
Monitoring: start aggregate and report on both logged data and anomalies, and create visualizations/dashboards to facilitate their consumption. Deploy processes and tools to detect cases of credential compromise.
Remediation: Employ processes to automate the remediation of (at least) the most common types of misconfigurations.

Maturity Level 4

Business Continuity: start tackling Business Continuity issues (Audit Planning, Business Continuity Planning, Incident Management).
Monitoring: any changes made to production should be logged and eventually alerted upon. In addition, file integrity (host) and network intrusion detection (IDS) tools should be deployed to help facilitate timely detection, investigation by root cause analysis, and response to incidents. In particular, processes and tools shall be put in place to implement a runtime anomaly detection solution, aligned with MITRE ATT&CK for Cloud.
Remediation: start creating playbooks to define detailed processes to follow in case of an incident. Timely de-provisioning of user access to data and systems should be implemented.
Business Continuity: a Disaster Recovery Plan should be outlined, in the eventuality of the outage/failure of one or more core components of the infrastructure (e.g., failure of an AZ or Region).

Maturity Level 5

Supply Chain: utilize a framework (like TUF, in-toto, providence) to protect the integrity of the Supply Chain.
Monitoring: a solution should be put in place to detect exfiltration of data, by monitoring egress traffic.
Remediation: automated processes should be put in place to automate the containment of (at least) the most common compromise types, and to automate the forensic collection of evidence after the declaration of a security incident.
Business Continuity: tabletop exercises and live tests should be conducted to test the effectiveness of controls put in place to mitigate an eventual failure of one or more core components of the infrastructure.

Tasks

At a first glance, the list of initiatives outlined above might seem quite dense (and not super-actionable). That’s why I expanded them into a set of Tasks (94 at the time of writing), which can be individually worked upon.

Having almost a hundred controls in a blog post wouldn’t be practical, though, so I created a micro-website to host them in a spreadsheet-style format.

A Cloud Security Roadmap Template

Each row represents a Task, and has the following attributes:

Attribute Description Domain The Domain the Task belongs to Control The Control the Task belongs to Task The Task name Description A description of what the Task involves Status To keep track of progress (NOT STARTED, IN PROGRESS, BLOCKED, DONE) Priority The Maturity Level the Task belongs to (1-5) Maturity How mature is the deployment/rollout of the Task, once you started working on it Layer Whether it affects a Cloud Provider, Kubernetes cluster, or both Epic Link to Jira/Issue Tracker, to keep track of progress Deliverable Type of deliverable for the Task (Documentation, Tooling, etc.) Artifact Link to the final deliverable for the Task Useful Resources Some useful resources that can help during the implementation phase Metrics Metrics that can be used to track the success of the Task CSA CCM Reference to the related entry in the CSA CCM, if any

From there, you’ll have the ability to export it as CSV and tailor it to your needs.

I’d like to stress that you don’t have to follow the tasks in order, but you should use the Priority column to define your own priorities, which can change based on your business priorities and industry.

Putting all Together: The Roadmap

The detailed list of Controls can be found at: roadmap.cloudsecdocs.com

Conclusions

In this post I outlined some actionable advice that can be undertaken to establish a cloud security program aimed at protecting a cloud native, service provider agnostic, container-based, offering.

It does represent my perspective and reflects my experiences, so it definitely won’t be a “one size fits all”, but I hope it could be a useful baseline.

I hope you found this post useful and interesting, and I’m keen to get feedback on it! If you find the information shared was useful, if something is missing, or if you have ideas on how to improve it, please let me know on Twitter.

On Establishing a Cloud Security Program

On Establishing a Cloud Security Program

The Goal: a Roadmap for Cloud Security Teams

The North Star

Identify

Protect

Detect

Respond

Recover

Subscribe to CloudSecList

Building the Roadmap

Domains

Controls

Maturity Level 1 - The foundations

Maturity Level 2

Maturity Level 3

Maturity Level 4

Maturity Level 5

Tasks

Putting all Together: The Roadmap

Conclusions

Recommend

YouTube ads in Safari: you see them now, will you see them in the future?

3D Math Primer for Graphics and Game Development

Ansible 4.0.0 final has been released!

今天，“场景赋能•驱动有数”，神策数据 2018 数据驱动大会在京成功举办

The Most Colossal Planning Failure in Human History

Hungry for More | Supporting Women in Emerging Tech

5 Places to Learn About Blockchain and Cryptocurrency for Free

SAP Commissions – Integration Suite Use Case

“星斗奖”九大类奖项揭晓：好未来、四川航空，瑞幸咖啡杨飞、中青旅张晓磊等入选

海通证券签约神策数据数据赋能构建数字化运营闭环

About Joyk