Nicolas • 发表于2022-06-17 15:20:26 • 82776次阅读
基础设施即代码(Infrastructure-as-Code,IaC)意味着使用代码来定义和管理基础设施,而不是使用手动流程。更重要的是,IaC是将软件工程原则和方法引入云基础设施。本文将探讨IaC的基础知识以及如何设置相关环境。
基础设施即代码(Infrastructure-as-Code,IaC)意味着使用代码来定义和管理基础设施,而不是使用手动流程。更重要的是,IaC是将软件工程原则和方法引入云基础设施。本文将探讨IaC的基础知识以及如何设置相关环境。
在IaC之前,基础设施是(某些情况下现在仍然是)通过简单操作用户界面、批处理脚本和配置管理工具等方法提供的,这些方法并不适用于当今的云计算。
同样,当下很多所谓的IaC其实很大程度上更接近于“基础设施即文本”。作为以结构化文本编写的基础设施,它是可重复并且可以进行版本控制的。但是,在实施需要与应用程序代码一起使用的软件工程时,它就无法胜任了。例如,无法支持标准开发工具、测试框架或包管理。
真正意义上的IaC方案需要使用专为云上基础设施设计的平台,从而使标准软件工程实施方案和工具都能够得以应用。
IaC为什么非常重要?
IaC之所以重要,主要有三个原因:
一是当下企业向云的全面迁移。越来越多的工作负载正在从本地数据中心迁移到云端,并且这种趋势会一直持续下去。然而,云计算本身并不是能够保证和维持基础设施可靠而又可扩展的灵丹妙药。与物理数据中心一样,云基础设施的脚本集也可能存在不一致、文档记录不充分等情况。由于IaC是强制执行经过验证的工程实施步骤,从而能够将混乱的状况整理得秩序井然。
二是公众使用云的方式更加复杂。商业用户尝试改变设施、模式和工作方式,以改善收益。IaC不只是简单的资本支出与运营支出,而是关乎如何整合构成工程生命周期的所有因素,例如版本控制和测试,以释放云可以提供的所有价值。它能够使用工程实践来充分挖掘云计算的潜力,从而更好更快地推动创新进而推动公司业务。
三是管理云上基础设施的负荷不断在增加。可用的云服务种类每年都在增长,越来越多的公司正在采用现代云设施(比如容器或无服务器设施)。这些设施往往包含许多松散耦合和相互依赖的组件,从而导致工程师必须管理的云资源数量以惊人的速度增长。这当然是一件好事,因为这意味着商业用户正在从云上获得更多收益,推动了公司业务发展,但结果是云资源的复杂性和规模不断增加。
例如,从云上获取收益的方法之一是充分利用云供应商提供的越来越多的服务。这些服务可以推动创新并加快业务进展。但同时,每一项新服务都会带来新的API,这会增加基础设施的复杂性。
随着云资源规模和复杂性的增加,现代的IaC方法就是亟需的了。它可以帮助工程师构建、部署和管理基础设施。如果工程师管理的资源介于1到10个之间,那么简单的点击就可能能正常工作;当管理10到100个资源时,“基础设施即文本”或旧版IaC工具可能仍然能胜任。
但是一旦资源数目到了成百上千或成千上万时会发生什么?这在今天并不罕见!最重要的是,这成千上万的资源不是每月更新一次,而是每天更新多次。管理好这一切的最佳途径是将用于应用程序代码的相同软件工程实施方案和工具落实到位。
思考下列问题:
(1)如何确保我的基础设施能够快速扩展、更改和发展,以支持业务并创造竞争优势?
(2)如何保持云基础设施以及对其任何更改的可见性?
(3)如何制定策略和数据护栏来确保安全性和可靠性?
(4)如何通过更好的协作和流程,最科学地授权我的团队构建、部署和管理基础设施?
要解决以上问题需要一种现代的IaC方法。现代的IaC方法是充分挖掘云计算潜力的方式。
IaC平台的选择至关重要。如果使用者意欲使用已有的标准软件工程工具和操作,那么在选择时就要注重以下特点:
对标准语言的良好支持,意味着开发人员可以方便地使用相同的应用程序代码语言来定义和配置基础设施,例如TypeScript、Go、Python和C#等常用语言。很多旧版的IaC工具使用自己的域特定语言 (DSL),这可能会导致一些问题,比如开发人员经常发现缺少常见的编程设施。
在所选择的平台上,工程师应该能够轻松创建强类型化、设施化的配置,并使用他们一直所依赖的功能,例如循环、常量和函数。并且,使用标准语言的另一个优势是开发人员对此早已熟悉,他们可以立即开始编码。若是需要再去学习DSL的特性和局限性,大概是一件既耗费时间又令人沮丧的事情。
使用标准编程语言意味着开发人员还可以使用标准开发工具,例如IDE。一个优点依然是熟悉程度,开发人员可以在他们习惯的环境中工作;另一个则是开发人员得以在一个能轻松编写、调试、测试和部署代码的环境中大显身手。
与应用程序一样,对基础设施进行彻底的测试非常重要。合格的IaC 平台应当支持标准测试框架,还应该能够帮助团队扩展其执行的测试类型。
标准运维测试侧重于验收测试。这意味着运维团队在云中启动基础设施,然后测试它们以查看是否完整无误。毫无疑问,如果它没有正确启动,运维团队需要将其销毁并重新部署。但这并不是一个最佳方法,因为可能不应该发生的事情已经发生了,这取决于团队的反应速度。合格的IaC平台应当通过部署前和部署期间的频繁测试来帮助团队“转移风险”。如果上述步骤还没有执行,那么团队应能使用IaC平台执行以下类型的测试:
(1)单元测试
单元测试单独评估基础设施的行为。外部依赖项(例如数据库)被替换为模拟,以检查资源配置和响应。之所以使用模拟是因为来自云服务供应商的响应是众所周知的并经过测试的。测试者已经知道给定一些参数后供应商将如何响应。
单元测试在内存中运行,没有任何进程外调用,这使得它们非常快。在开发过程中可以使用它们进行快速反馈循环。单元测试确实可以帮助开发人员在基础设施生命周期的早期解决问题。
(2)集成测试
集成测试(也称为黑盒测试)单元测试之后进行,它采用不同的方法。集成测试部署云资源并验证它们的实际行为——当然是在一个临时环境中。临时环境是模拟生产环境的短期环境。它通常很简单,且只包括正在测试的代码的第一级依赖项。集成测试完成后,可以销毁临时基础设施。
(3)安全测试
很多时候,安全测试被留到最后一刻,或者当做是“完成”的代码扔给安全团队,他们被排除在整个开发过程之外实际上,这种思路可以说是“自寻死路”。
现代IaC平台应该加密敏感的配置数据,并使遵循标准的安全实施(例如密钥轮换)能够顺利执行。还要检查平台是否加密了状态元数据,并确保机密值永远不会以纯文本形式公开。该平台还应与云提供商提供的安全服务能够轻松无障碍集成。
此外,与其他类型的测试一样,IaC平台应该帮助开发人员将自己编写的安全测试添加到工作流程中。正如尽早对代码实施单元测试一样,你也应该尽早测试以发现安全问题。这些测试属于 CI/CD 管道,因此基础设施在发布之前会进行彻底的漏洞测试。
可重用组件意味着开发人员可以从单个组件中构建更高级别的资源。有了它们,工程师就能够创建可以在其他地方重用的有用抽象。这些组件可以使用公司内部的最佳实践方案来编写,并在内部和社区内共享。使用可重用组件有助于创建可重复、可靠的基础设施。所以,要认真研究你正在考虑的平台能否轻松创建这些组件。
如若创建可重用的组件,则需要一种方法来打包它们,以便可以轻松实现共享。除了使用标准工具外,还需要对标准包管理器的支持。例如,开发人员可能希望将组件放入 GitHub存储库并通过NPM发布。那么IaC平台应该能够轻而易举做到这一点。
为便于明确责任和相互协作,所有基础设施资源的集中可见,更迭变化的历史视图可见,这些都很重要。开发人员选择的平台应该具备日志审核以及云资源更改时差异可见的能力(类似于团队使用Git等协作工具的方式),从而为开发人员提供整个基础设施的可见性。此外,该平台还应该可以设置细粒度的控制,以便控制哪些用户可以访问和更改基础设施。
并非每家公司都希望使用多个云供应商,但这是应该考虑到的事情。如果开发人员想要保留多个选项,那就选择不会局限于单个云供应商的IaC平台。
另一个经常被忽视的IaC方面是策略即代码。现代IaC平台应该允许开发人员将软件工程原则和方法应用于自己的策略,就像它对基础设施所做那样。策略即代码的好处与基础设施即代码的好处大致相同。策略会在安全性、合规性以及成本控制方面持续实施组织的云治理。策略是明确的,可以使用标准语言和工具编写,可以进行版本控制、测试并最终集成到 CI/CD 管道中,从而使所有基础设施都遵循公司的最佳实施方案。
许多开源IaC工具可用于自动执行资源分配、部署和管理。使用的关键在于正确选择适合自己的基础设施自动化工具。以下是常见的IaC类别和工具:
| 常见的 IaC 工具类型 | ||
|---|---|---|
| IaC 工具类别 | 描述 | IAC 工具 |
| 配置管理工具 | 管理现有服务器上的软件 | Chef、Puppet、Ansible |
| 服务器模板工具 | 使用映像配置基础设施(VM和容器) | Docker、Vagrant |
| 容器编排工具 | 编排容器工作负载 | Kubernetes、Docker Swarm |
| 配置工具 | 在所有云上配置资源 | Terraform |
Terraform是一个不受平台所限平台的开源工具,它允许开发人员将基础设施编写成为声明性配置文件。Terraform支持众多的云供应商,使得开发人员能够在AWS、Google Cloud、Azure和Oracle等主流云平台中配置资源。

借助Terraform,工程师可以快速扩展基础设施资源的配置。自动化的部署过程能够提高组织中开发团队的工作效率,从而使他们可以更改部署基础设施。并且Terraform有助于减少对于集中式基础设施团队的依赖,使开发团队能够更快地行动,缩短业务功能耗费的周期时间。
使用 Terraform 配置资源,请使用以下命令:
| Terraform 中的资源配置命令 | |
|---|---|
| Terraform 命令 | 功能 |
terraform init | 初始化包含 Terraform 配置文件的工作目录 |
terraform plan | Terraform 创建一个执行计划,显示它将对基础设施所做的更改 |
terraform apply | Terraform 将建议的更改应用于基础设施并更新状态文件 |
terraform destroy | 销毁由 Terraform 配置文件创建的基础设施资源 |
Terraform模块的概念很简单——开发人员可以在模块内编写代码,并在整个代码库的多个位置重用它。使用Terraform模块,只需几行代码就可以快速构建基础设施。随着基础设施的不断繁衍,开发人员需要在不同的环境(如开发和过渡期)中部署相似的资源,谁也不希望反复多次地复制粘贴相同的代码。
Terraform模块更易于阅读。它们强制执行最佳实施方案,开发人员不会在Terraform文件中进行硬编码。为使模块可以被不同的团队重用并适应各种实例,需要使其可配置。并且能够将附加参数传递给环境的多个资源。由于是经过严格测试和完全记录的集中式模块,因此Terraform非常可靠。
对于最佳实施方案来说,应当开始将基础设施视为可重用的模块。Terraform模块能够促进代码的重用,避免重复,并有助于在组织内部共享模块。使开发人员可以有更多时间精力来提高集中式可重用模块的质量。
示例代码:
下面是使用Terraform模块在各种环境中创建AWS S3存储桶所需的步骤。首先,我们使用AWS与所需资源进行交互。以下代码配置AWS提供者:
复制
1 terraform {2 required_providers {3 aws = {4 source = "hashicorp/aws"5 version = "~> 4.9"6 }7 }8 }910 provider "aws" {11 region = "us-east-1"12 }
现在创建一个用于配置S3存储桶资源的Terraform模块:
复制
xxxxxxxxxx1 resource "aws_s3_bucket" "s3-bucket" {2 bucket = var.bucket3 policy = var.policy != null ? var.policy : null4 tags = merge(var.tags, { Name = "${var.bucket}-bucket" })56 server_side_encryption_configuration {7 rule {8 apply_server_side_encryption_by_default {9 sse_algorithm = "AES256"10 }11 }12 }1314 dynamic "lifecycle_rule" {15 for_each = var.expire-days > 0 ? [var.expire-days] : []16 content {17 id = "expire"18 enabled = true19 expiration {20 days = var.expire-days21 }22 }23 }24 }
左右滑动查看完整代码
该模块支持各种参数,如存储桶、策略、过期日以及标签:
复制
xxxxxxxxxx1 variable "bucket" {2 description = "S3 Bucket Name"3 type = string4 }56 variable "policy" {7 description = "Optional S3 bucket policy to apply. Should be a valid JSON string"8 type = string9 default = null10 }1112 variable "expire-days" {13 description = "If set to positive number, lifecycle policy for expiring the objects after specified number of days will be attached to the bucket"14 type = number15 default = 016 }1718 variable "tags" {19 description = "Common tags to be applied to all resources"20 type = map(any)21}
左右滑动查看完整代码
由于已经创建了可重用的S3模块,我们现在可以从各种环境(如dev或live)调用该模块并传入所需的变量。
复制
xxxxxxxxxx1 module "dev-dzone-bucket" {2 source = "../modules/s3-bucket"3 bucket = "dev-dzone-iac-bucket"4 policy = null5 expire-days = 76 tags = local.tags7 }89 module "live-dzone-bucket" {10 source = "../modules/s3-bucket"11 bucket = "live-dzone-iac-bucket"12 policy = null13 expire-days = 1414 tags = local.tags15 }
左右滑动查看完整代码
Terraform项目中的文件布局类似于下图 ,在“terraform-modules”(即Terraform模块)下包含用于开发和生产环境的单独文件夹,其中包含 AWS 资源。

将现代IaC平台引入初创公司,或者是具有许多绿地软件项目的公司,可能并不困难。然而,对于大多数公司来说,却并不是一件简单的事情。许多公司,无论大小,都已经存在很多通过云供应商提供的简易控制台创建的基础设施。很多新项目就是这样简单启动的。然后,有一天,一位运维工程师幡然醒悟,意识到现在新项目是生产基础设施。为了使其更加“正式”,运维团队编写了一本运行手册或wiki,详细解说如果有人想要执行一项常见任务,应该单击哪些按钮等等。还有一种常见的情况,那就是周围全是只有一两个人明白的Bash或PowerShell脚本。如果你面临这样的情况,你该怎么办?
(1)保持冷静
你应该明白的是,变化有时候可能是令人感到恐惧的。许多人一想到要触及基础设施,就感到好像要崩溃一样。他们认为这太复杂了,搞不懂这个东西是如何工作的。所以,要有信心,才能解决问题。
(2)正确定义“完美”的概念
开发人员应该在开始评估工具和方法之前,要明白对于公司来说“完美”是什么一个概念。开发人员无论使用何种工具,一些假设都是既定存在的。明白这些才能实现目标。团体决策是正确制定公司云基础设施业务目标的途径之一。
(3)选择评估工具
考虑上述关键点之后,将筛选完美平台的选择范围缩小到几个评估对象。开发人员可以设计一个小项目,目的是测试平台,看它是否符合工作需求。
(4)导入现有基础设施
选择工具之后,试着导入一些现有的基础设施。如果开发人员选择的平台正确,这一步应该轻而易举。
(5)与现有工程实施方案集成
如果基础设施代码集成持续交付管道,则可以开始建立与应用程序代码相同的最佳实施方案。
(6)从小处着手
新建一项服务或从非关键服务开始——这样即使出问题也不会影响业务。选择一个项目,尽快看到意义和价值,然后进行更新迭代。
现代IaC是降低云复杂性、释放云潜力从而推动创新的最佳途径。选择合适的平台,使用现代IaC,开发人员可以将标准软件工程实施方案和工具应用于基础设施。大体来说,可以获得以下几点益处。
1.推动创新、提高速度和敏捷性
借助现代IaC,团队可以将现代软件开发中,相同的操作、严谨的测试和自动化应用于云基础设施,从而提高发布的速度和可靠性,以便公司对客户反馈做出快速反应并及时更新。
2.降低基础设施风险
由于开发人员可以使用标准测试框架,因此IaC能够“将风险转移”。尽早的、充分和完全的测试可以成为开发过程以及CI/CD 管道的一部分。由于策略和安全要求也包含在代码里,故每次部署都会自动测试合规性和安全性。
3.加强合作
现代IaC平台使用标准工具和语言,可以打破基础设施、应用程序开发和安全团队之间的孤岛。大家使用共享的实践方案和工具,能促进团队之间的协作。
参考链接:https://dzone.com/refcardz/getting-started-with-iac
基础设施即代码 (IaC) 意味着您使用代码来定义和管理基础设施,而不是使用手动流程。更广泛地说,也许更重要的是,IaC 是将软件工程原则和方法引入云基础设施。在此 Refcard 中,探索 IaC 的基础知识以及如何开始设置您的环境。
从最严格的意义上讲,基础设施即代码 (IaC) 意味着您使用代码来定义和管理基础设施,而不是使用手动流程。更广泛地说,也许更重要的是,IaC 是将软件工程原则和方法引入云基础设施。
IaC 是定义和管理基础设施不断发展过程中的最新一步。在 IaC 出现之前,基础设施是通过用户界面中的指向和单击、批处理脚本和配置管理工具等方法配置的(在某些情况下,现在仍然如此),而这些方法并非为现代云而设计。
当今所谓的 IaC 更接近于“文本形式的基础设施”。作为以结构化文本编写的基础设施,它是可重复的,可以进行版本控制,但它不支持与应用程序代码一起使用的软件工程实践。例如,它不支持标准开发工具、测试框架或包管理。
真正现代的 IaC 方法使用专为云基础设施设计的平台。这些平台允许您将标准软件工程实践和工具应用于云基础设施。
IaC 之所以重要有三个原因。一是向云的过渡。越来越多的工作负载从本地数据中心转移到云环境。没有任何迹象表明这种趋势会停止。但是,仅靠云计算并不是维护可扩展和可靠基础设施的灵丹妙药。云基础设施可能存在一组不一致、记录不全的脚本,就像物理数据中心一样。IaC 强制执行经过验证的工程实践,因此是您从混乱中理出头绪的方法。
第二个原因是人们使用云的方式更加复杂。公司正在改变架构、模式和工作方式,以优化他们可以获得的收益。这不再是简单的资本支出与运营支出之间的较量。它涉及如何整合构成工程生命周期的所有实践,例如版本控制和测试,以释放云可以提供的所有价值。它涉及使用工程实践来充分利用云的潜力并更快地创新以推动您的业务发展。
第三个原因是管理云中基础设施的负担越来越重。可用的云服务数量每年都在增长,越来越多的公司采用现代云架构(如容器或无服务器),这些架构通常具有 许多松散耦合且相互依赖的组件。结果是人们必须管理的云资源数量正在以惊人的速度增长。这当然是一件好事,因为这意味着公司从云中获得更多价值来推动其业务发展,但其结果是复杂性和规模的增加。
例如,从云中获得更多价值的一种方法是利用云供应商提供的不断增长的服务数量。这些服务可以加速创新并加快速度,但请记住,每项新服务都会带来新的 API。每项新服务都会增加基础设施的复杂性。
规模和复杂性的增加要求采用现代化的 IaC 方法来帮助您构建、部署和管理基础架构。如果您管理 1 到 10 个资源,那么点击操作可能就足够了。当您管理 10 到 100 个资源时,那么“文本基础架构”或传统 IaC 工具可能仍然足够。但是,如果您拥有数百或数千个资源,这在今天并不罕见,会发生什么?最重要的是,这数千个资源不是每月更改一次,而是每天更改多次。管理所有这些的好方法是实施与应用程序代码相同的软件工程实践和工具。
问你自己:
需要采用现代 IaC 方法来解决这些问题。它是通过将久经考验的软件工程实践应用于基础设施来利用现代云所需的关键工具。IaC 是我们利用云潜力的方式。
您选择的 IaC 平台至关重要。如果您的目标是使用现有的标准软件工程工具和实践,那么在评估您的选择时,请寻找以下品质。
支持标准语言意味着您的开发人员可以使用编写应用程序代码的相同语言来定义和配置基础架构。例如,TypeScript、Go、Python 和 C# 等常用语言。许多较旧的 IaC 工具都有自己的领域特定语言 (DSL),这可能会带来问题。开发人员经常发现缺少常见的编程结构。
您选择的平台应该允许工程师轻松创建强类型、结构化配置并使用他们一直依赖的功能,例如循环、常量和函数。使用标准语言的另一个优势当然是开发人员已经了解它。他们可以立即开始编码。学习 DSL 的特性和局限性可能既耗时又令人沮丧。
使用标准编程语言意味着您也可以使用标准开发工具,例如 IDE。一个优点是熟悉。开发人员可以在他们已经了解的环境中工作。另一个优点是开发人员可以在旨在帮助他们轻松编写、调试、测试和部署代码的环境中工作。
像应用程序一样,对基础设施进行全面测试非常重要。现代 IaC 平台应支持标准测试框架,还应帮助您的团队扩展他们执行的测试类型。
标准操作测试侧重于验收测试。这意味着操作团队在云中启动基础设施,然后测试该基础设施以查看其是否正确。当然,如果启动不正确,团队需要销毁并重新部署它。这不是最佳方法,因为根据团队的反应速度,可能已经发生了一些不应该发生的事情。现代 IaC 平台应该通过在部署之前和部署期间进行频繁测试来帮助您的团队“转移风险”。如果他们还没有执行这些测试,以下是您的团队应该能够使用现代 IaC 平台执行的测试类型。
单元测试会单独评估基础架构的行为。数据库等外部依赖项被模拟测试所取代,用于检查资源配置和响应。之所以可以使用模拟测试,是因为云提供商的响应众所周知且经过测试。您已经知道,在给定一些参数的情况下,提供商将如何响应。
单元测试在内存中运行,无需任何进程外调用,因此速度非常快。在开发过程中使用它们可以实现快速反馈循环。单元测试确实可以帮助您在基础架构生命周期的早期解决问题。
集成测试(也称为黑盒测试)是在单元测试之后进行的,它采用不同的方法。集成测试部署云资源并验证其实际行为 — 但在 临时环境中。临时环境是模拟生产环境的短暂环境。它通常更简单,仅包含您正在测试的代码的第一级依赖项。集成测试完成后,您可以销毁临时基础设施。
安全测试经常被拖到最后一分钟,或者被认为“完成”的代码被扔到安全团队那里,而安全团队却被排除在整个开发过程之外。考虑这种方法时,人们会想到“自找麻烦”这个词。
首先,现代 IaC 平台应该加密敏感配置数据。它还应该使遵循密钥轮换等标准安全实践变得容易。检查您正在评估的平台是否加密状态元数据并确保秘密值永远不会以纯文本形式暴露。该平台还应轻松与云提供商提供的安全服务集成。
此外,与其他类型的测试一样,IaC 平台应该可以帮助您将自己编写的安全测试纳入工作流程。就像您尽早开始使用单元测试测试代码一样,您也应该尽早开始测试以发现安全问题。这些测试属于您的 CI/CD 管道,因此在发布之前,基础设施会经过彻底的漏洞测试。
可重用组件意味着你可以用单个资源构建更高级别的资源。借助这些资源,你可以创建有用的抽象,这些抽象可以在其他地方重用。这些组件可以采用公司的最佳实践编写,并在公司内部和社区内进行测试和共享。使用可重用组件有助于创建可重复、可靠的基础架构。看看你正在考虑的平台是否能帮助你轻松创建这些组件。
如果您想要创建可重复使用的组件,则需要一种打包它们的方法,以便轻松共享它们。除了使用标准工具外,您还需要支持标准包管理器。例如,您可能希望将组件放入 GitHub 存储库并通过 NPM 发布。您的 IaC 平台应该使这项任务变得简单。
所有基础设施资源的集中可视性以及过去更改的历史视图对于问责和协作都很重要。您的平台应通过支持审计日志和在云资源更改时查看差异的能力(类似于团队使用 Git 等协作工具的方式)为您提供整个基础设施的可视性。此外,该平台应允许您设置细粒度的控制,以便您可以控制谁可以访问和更改您的基础设施。
并非每家公司都希望使用多家云供应商,但这是您应该考虑的事情。您想保留这个选项吗?如果是这样,请寻找不会将您锁定在单一提供商的 IaC 平台。
IaC 的另一个经常被忽视的方面是策略即代码。现代 IaC 平台应该允许您将软件工程原则和方法应用于策略,就像它应用于基础设施一样。策略即代码的好处与它们应用于基础设施的好处大致相同。策略在安全性、合规性和成本控制方面不断执行组织的云治理。策略是明确的,它们可以用标准语言和工具编写,可以对它们进行版本控制、测试,并最终集成到 CI/CD 管道中,因此所有基础设施都遵循公司的最佳实践。
本节将回顾您可以用来开始基础设施即代码之旅的各种 IaC 工具;我们将探索使用 Terraform 的基础知识以及常用命令,然后深入研究创建可重用的模块。
您可以使用许多开源 IaC 工具来自动化资源配置、部署和管理。选择适合您用例的正确基础设施自动化工具是关键。让我们来看看一些流行的 IaC 类别和工具:
| 常见的 IaC 工具类型 | ||
|---|---|---|
| IaC 工具类别 | 描述 | IAC 工具 |
| 配置管理工具 | 管理现有服务器上的软件 | Chef、Puppet、Ansible |
| 服务器模板工具 | 使用映像(虚拟机和容器)配置基础设施 | Docker,流浪者 |
| 容器编排工具 | 协调容器工作负载 | Kubernetes、Docker Swarm |
| 配置工具 | 在任何云上配置资源 | 地形 |
Terraform 是一款与平台无关的开源工具,可让您将基础架构编纂为声明性配置文件。Terraform 支持许多提供商,并使您能够在 AWS、Google Cloud、Azure 和 Oracle 等主要云平台中配置资源。

使用 Terraform,您可以快速扩展基础设施资源的配置。将自动化融入部署流程可提高组织中开发团队的工作效率,他们现在可以放心地安全部署基础设施变更。它有助于减少对集中式基础设施团队的依赖,并使开发团队能够更快地行动,从而缩短业务功能的周期。
要使用 Terraform 配置资源,请使用以下命令:
| Terraform 中的资源配置命令 | |
|---|---|
| Terraform 命令 | 功能 |
terraform init | 初始化包含 Terraform 配置文件的工作目录 |
terraform plan | Terraform 创建一个执行计划,显示它将对你的基础设施所做的更改 |
terraform apply | Terraform 将建议的更改应用于基础设施并更新状态文件 |
terraform destroy | 销毁 Terraform 配置文件创建的基础设施资源 |
Terraform 模块的概念很简单 - 您可以在模块内编写代码,并在整个代码库的多个位置重复使用它。使用 Terraform 模块,您只需几行代码即可快速构建基础架构。随着基础架构的增长,您需要在开发和暂存等不同环境中部署类似的资源,您不想多次复制粘贴相同的代码。
Terraform 模块更易于阅读。它们强制实施最佳实践,即您不会在 Terraform 文件中对值进行硬编码。为了使模块可供不同团队重复使用并满足各种用例,您需要使其可配置。您应该能够将其他参数传递给环境的多个资源。Terraform 具有经过严格测试和记录的集中式模块,因此非常可靠。
作为最佳实践,开始将基础设施视为可重用模块。Terraform 模块促进代码重用、避免重复并帮助在组织内共享模块。这让您可以投入更多时间来提高集中式可重用模块的质量。
在本节中,我们将介绍使用 Terraform 模块在各种环境中创建 AWS S3 存储桶所需的步骤。让我们开始使用 AWS 与所需资源进行交互。以下代码配置 AWS 提供程序:
xterraform {required_providers {aws = {source = "hashicorp/aws"version = "~> 4.9"}}}provider "aws" {region = "us-east-1"}
现在让我们创建一个 Terraform 模块来配置 S3 存储桶资源:
xxxxxxxxxxresource "aws_s3_bucket" "s3-bucket" {bucket = var.bucketpolicy = var.policy != null ? var.policy : nulltags = merge(var.tags, { Name = "${var.bucket}-bucket" })server_side_encryption_configuration {rule {apply_server_side_encryption_by_default {sse_algorithm = "AES256"}}}dynamic "lifecycle_rule" {for_each = var.expire-days > 0 ? [var.expire-days] : []content {id = "expire"enabled = trueexpiration {days = var.expire-days}}}}
该模块支持各种参数,如 bucket、 policy、 expire-days和 tags:
xxxxxxxxxxvariable "bucket" {description = "S3 Bucket Name"type = string}variable "policy" {description = "Optional S3 bucket policy to apply. Should be a valid JSON string"type = stringdefault = null}variable "expire-days" {description = "If set to positive number, lifecycle policy for expiring the objects after specified number of days will be attached to the bucket"type = numberdefault = 0}variable "tags" {description = "Common tags to be applied to all resources"type = map(any)}
由于可重复使用的 S3 模块已经创建,我们现在可以从各种环境(如 dev 或 live)调用该模块并传入所需的变量。
xxxxxxxxxxmodule "dev-dzone-bucket" {source = "../modules/s3-bucket"bucket = "dev-dzone-iac-bucket"policy = nullexpire-days = 7tags = local.tags}module "live-dzone-bucket" {source = "../modules/s3-bucket"bucket = "live-dzone-iac-bucket"policy = nullexpire-days = 14tags = local.tags}
Terraform 项目中的文件布局类似于下面的图 2,其中“terraform-modules”下包含 AWS 资源的开发和生产环境有单独的文件夹。

将现代 IaC 平台引入初创公司或拥有许多绿地应用程序的公司可能并不困难。然而,对于大多数公司来说,这并不是那么简单。许多公司,无论大小,都拥有大量通过在云提供商的控制台上点击来创建的基础设施。这就是许多新项目的开始方式。然后,有一天,一位运营工程师醒来,意识到新项目现在是生产基础设施。为了使其更“正式”,团队编写了一本运行手册或 wiki,描述了当有人想要执行常见任务时要点击哪些按钮。另一种常见的情况是,有 Bash 或 PowerShell 脚本在流传,只有一两个人知道。如果你的情况是这样的,你会怎么做?
请记住,改变是可怕的。许多人一想到要接触他们的基础设施就感到不知所措。它太复杂了,他们不明白它是如何工作的。花点时间建立你的信心。
第一步,甚至可能在您开始评估工具和方法之前,就是定义对您的公司而言什么是“好的”。实现这一理想取决于了解 无论 您使用哪种工具,哪些假设将始终成立。由所有利益相关者组成的团队是定义您的公司希望通过其云基础设施实现什么目标的一种方式。
考虑完上述关键点后,将完美平台的搜索范围缩小到几个候选平台进行评估。您可能希望设计一个小项目,其唯一目的是测试平台并查看它如何帮助您实现目标。
选择工具后,尝试导入一些现有基础设施。如果您使用的是正确的平台,这应该很简单。
假设您的基础设施代码与您的持续交付管道集成在一起,您可以开始实施与您的应用程序代码相同的最佳实践。
从一项新服务或非关键服务开始 — 即使出现故障也不会影响您的业务。选择一个可以尽早看到价值的项目,然后进行迭代。
采用现代 IaC 方法是降低云复杂性、释放现代云潜力和实现更快创新的绝佳方法。使用现代 IaC 方法,您可以将标准软件工程实践和工具应用于基础设施,通常使用支持这些实践的 IaC 平台。简而言之,以下是您可以期待的高级优势的总结。
借助现代 IaC 方法,团队可以将现代软件开发的相同实践、测试严谨性和自动化应用于云基础设施。这提高了发布速度和可靠性,使公司能够对客户反馈做出反应并快速迭代。
由于开发人员可以使用标准测试框架,IaC“将风险向左转移”。早期、频繁和彻底的测试可以成为创作过程和 CI/CD 管道的一部分。由于策略和安全要求也以代码形式编写,因此每次部署都会自动测试合规性和安全性。
现代 IaC 平台使用标准工具和语言,可以打破基础设施、应用程序开发和安全团队之间的孤岛。使用共享实践和工具可以增强不同团队之间的协作。
Infrastructure as code (IaC) means that you use code to define and manage infrastructure rather than using manual processes. More broadly, and perhaps more importantly, IaC is about bringing software engineering principles and approaches to cloud infrastructure. In this Refcard, explore the fundamentals of IaC and how to get started setting up your environment.
In its strictest sense, Infrastructure as code (IaC) means that you use code to define and manage infrastructure rather than using manual processes. More broadly, and perhaps more importantly, IaC is about bringing software engineering principles and approaches to cloud infrastructure.
IaC is the latest step in the evolving process of defining and managing infrastructure. Before IaC, infrastructure was (and, in some cases, still is) provisioned by methods such as pointing and clicking in a user interface, batch scripts, and configuration management tools that weren’t designed for the modern cloud.
It’s also true that much of what’s called IaC today is closer to “infrastructure as text.” As infrastructure that is written as structured text, it’s repeatable and can be versioned, but it does not support the software engineering practices that are used with application code. For example, there’s no support for standard development tools, testing frameworks, or package management.
Truly modern approaches to IaC use platforms that are designed for infrastructure in the cloud. These platforms allow you to apply standard software engineering practices and tools to your cloud infrastructure.
IaC matters for three reasons. One is the transition to the cloud. More and more workloads are being moved from on-premises data centers to cloud environments. Nothing suggests that this trend is going to stop. However, cloud computing alone isn’t a panacea for maintaining scalable and reliable infrastructure. It’s just as possible to have an inconsistent, poorly documented set of scripts for cloud infrastructure as it is for a physical datacenter. IaC, because it enforces proven engineering practices, is how you make order out of the chaos.
The second reason is a greater sophistication in how people use the cloud. Companies are changing architectures, patterns, and ways of working to optimize the benefits they can get. It's no longer simply CapEx versus OpEx. It's about how to incorporate all the practices that make up the engineering lifecycle, such as versioning and testing to unlock all the value that the cloud can provide. It’s about using engineering practices to take advantage of the cloud’s potential and innovate faster to drive your business.
The third reason is that the burden of managing infrastructure in the cloud is increasing. The number of cloud services available is growing every year and more companies are adopting modern cloud architectures (like containers or serverless), which often have many loosely-coupled and interdependent components. The result is that the number of cloud resources that people must manage is going up at a tremendous pace. This is certainly a good thing, because it means companies are getting more value from the cloud to drive their business forward, but the consequence is an increase in complexity and scale.
For example, one way to get more value from the cloud is to take advantage of the ever-growing number of services that cloud vendors are providing. Those services can speed innovation and accelerate velocity but remember that with every new service comes new APIs. Each new service adds complexity to the infrastructure.
Increased scale and complexity demand a modern approach to IaC to help you build, deploy, and manage your infrastructure. If you’re managing between 1 and 10 resources, point and click probably works fine. When you're managing between 10 and 100 resources, then “infrastructure as text” or legacy IaC tools might still suffice. But what happens when you have hundreds or thousands of resources, which is not at all uncommon today? On top of that, those thousands of resources change not once a month but multiple times a day. A great way to manage all this is to put in place the same software engineering practices and tools that you use for application code.
Ask yourself:
A modern approach to IaC is needed to address these questions. It is the critical tool needed to harness the modern cloud through tried-and-true software engineering practices applied to infrastructure. IaC is how we can harness the cloud’s potential.
The IaC platform you choose is critical. If your goal is to use standard software engineering tools and practices that are already in place, then look for the following qualities when you evaluate your choices.
Support for standard languages means that your developers can define and configure infrastructure using the same languages used to write application code. For example, common languages like TypeScript, Go, Python, and C#. Many older IaC tools have their own domain-specific language (DSL), and these can be problematic. Developers often find that common programming constructs are missing.
The platform you choose should allow engineers to easily create strongly typed, structured configurations and to use features they’ve always relied on such as loops, constants, and functions. Another advantage to using standard languages is, of course, that the developers already know it. They can begin coding right away. Learning the idiosyncrasies and limitations of a DSL can be time-consuming and frustrating.
Using standard programming languages means that you can also use standard development tools such as IDEs. One advantage is, again, familiarity. Developers can work in an environment they already understand. The other is that developers can work in environments designed to help them easily author, debug, test, and deploy code.
It’s important that infrastructure is tested thoroughly, just as applications are. A modern IaC platform should support standard testing frameworks and it should also help your teams to expand the types of tests they perform.
Standard ops testing focuses on acceptance tests. That means the ops team spins up infrastructure in the cloud and they then test that infrastructure to see if it’s correct. Of course, if it wasn’t spun up correctly, the team needs to destroy and redeploy it. That’s not an optimal approach because, potentially, something that shouldn’t have happened already has, depending on how quickly the team reacts. A modern IaC platform should help your teams “shift risk left” through frequent testing before and during deployment. If they’re not already performing them, here are the types of tests your teams should be able to perform with a modern IaC platform.
Unit tests evaluate the behavior of your infrastructure in isolation. External dependencies, such as databases, are replaced by mocks to check your resource configuration and responses. It’s possible to use mocks because responses from cloud providers are well known and tested. You already know how, given some parameters, the provider will respond.
Unit tests run in memory without any out-of-process calls, which makes them very fast. Use them for fast feedback loops during development. Unit tests really help you solve problems early in the lifecycle of your infrastructure.
Integration testing (also known as black-box testing) comes after unit testing, and it takes a different approach. Integration tests deploy cloud resources and validate their actual behavior — but in an ephemeral environment. An ephemeral environment is a short-lived environment that mimics a production environment. It’s often simpler and only includes the first-level dependencies of the code you’re testing. Once the integration tests are finished, you can destroy the ephemeral infrastructure.
Too often, security tests are left until the last minute, or code that’s considered “finished” gets thrown over the wall to a security team, who’ve been left out of the entire development process. The phrase “courting disaster” comes to mind when considering this approach.
First, a modern IaC platform should encrypt sensitive configuration data. It should also make it easy to follow standard security practices such as key rotation. Check to see if the platform you’re evaluating encrypts state metadata and ensures that secret values are never exposed in plain text. The platform should also integrate easily with security services offered by the cloud providers.
In addition, as with other types of tests, the IaC platform should help you include security tests that you write yourself into your workflow. Just as you start testing your code early with unit tests, so should you start testing early to find security problems. Those tests belong in your CI/CD pipeline, so the infrastructure is thoroughly tested for vulnerabilities before it’s released.
Reusable components mean you build higher level resources out of individual ones. With them, you can create useful abstractions that can be reused in other places. These components can be written with your company’s best practices built in, tested, and shared within the company and with the community. Using reusable components helps to create repeatable, reliable infrastructure. Look to see if the platform you’re considering helps you create these components easily.
If you want to create reusable components, you’ll need a way to package them so you can share them easily. Along with using standard tools, you’ll want support for standard package managers. For example, you might want to put your component into a GitHub repo and publish it through NPM. Your IaC platform should make that a simple task.
Central visibility across all infrastructure resources, with an historical view of past changes, is important both for accountability and collaboration. Your platform should give you visibility across your infrastructure by supporting audit logs and the ability to see diffs when cloud resources change (similarly to how teams use collaborative tools such as Git). Additionally, the platform should allow you to set fine-grained controls so you can control who can access and change your infrastructure.
Not every company wants to use multiple cloud vendors but it’s something you should consider. Do you want to leave that option open? If so, look for an IaC platform that won’t lock you into a single provider.
Another too-often ignored facet of IaC is policy as code. A modern IaC platform should allow you to apply software engineering principles and approaches to your policies, just as it does with infrastructure. The benefits for policy as code are much the same as they are for infrastructure. Policies continuously enforce your organization's cloud governance in terms of security, compliance, and cost controls. Policies are unambiguous, they can be written with standard languages and tools, they can be versioned, tested, and finally integrated into the CI/CD pipeline so all infrastructure follows the company’s best practices.
This ill review the various IaC tools that you can use to get started with your infrastructure-as-code journey; we will explore the basics using Terraform, along with common commands, and then dive into creating reusable modules.
There are many open-source IaC tools that you can use to automate your resource provisioning, deployment, and management. Choosing the right infrastructure automation tool that fits your use case is the key. Let us look at some of the popular IaC categories and tools:
| Common Types of IaC Tools | ||
|---|---|---|
| IaC Tools Categories | Description | IaC Tools |
| Configuration management tools | Manage software on existing servers | Chef, Puppet, Ansible |
| Server templating tools | Provision infrastructure using an image (VMs and containers) | Docker, Vagrant |
| Container orchestration tools | Orchestrate container workloads | Kubernetes, Docker Swarm |
| Provisioning tools | Provision resources on any cloud | Terraform |
Terraform is an open-source platform-agnostic tool that allows you to codify your infrastructure as declarative configuration files. Terraform supports many providers and enables you to provision resources in major cloud platforms like AWS, Google Cloud, Azure, and Oracle.

With Terraform, you can quickly scale the provisioning of infrastructure resources. Building automation into your deployment process improves the productivity of the development teams in your organization, who can now safely deploy infrastructure changes with confidence. It helps reduce the dependency on a centralized infrastructure team and empowers dev teams to move faster, reducing the cycle time of business features.
To provision resources using Terraform, use the following commands:
| Resource Provisioning Commands in Terraform | |
|---|---|
| Terraform commands | Functionality |
terraform init | Initialize the working directory containing Terraform configuration files |
terraform plan | Terraform creates an execution plan showing the changes it is going to make to your infrastructure |
terraform apply | Terraform applies the proposed changes to the infrastructure and updates the state file |
terraform destroy | Destroy infrastructure resources created by Terraform configuration files |
The concept of Terraform modules is straightforward — you can write your code inside a module and reuse it in multiple places throughout the codebase. With Terraform modules, you can build infrastructure quickly with a few lines of code. As your infrastructure grows and you need to deploy similar resources in different environments like dev and staging, you don’t want to copy-paste the same code multiple times.
Terraform modules are much easier to read. They enforce best practices wherein you are not hardcoding values in the Terraform file. To make the module reusable by different teams and cater to various use cases, you need to make it configurable. You should be able to pass additional parameters to multiple resources for environments. Terraform is highly reliable due to its rigorously tested and documented centralized modules.
As a best practice, start thinking about infrastructure as reusable modules. Terraform modules promote the reuse of code, avoid duplication, and help share modules within your organization. This allows you to invest more time in improving the quality of the centralized reusable modules.
In this section, we will look at the steps required to create an AWS S3 bucket in various environments using Terraform modules. Let's get started by using AWS to interact with the required resources. The code below configures the AWS provider:
xxxxxxxxxxterraform {required_providers {aws = {source = "hashicorp/aws"version = "~> 4.9"}}}provider "aws" {region = "us-east-1"}
Let us now create a Terraform module for provisioning an S3 bucket resource:
xxxxxxxxxxresource "aws_s3_bucket" "s3-bucket" {bucket = var.bucketpolicy = var.policy != null ? var.policy : nulltags = merge(var.tags, { Name = "${var.bucket}-bucket" })server_side_encryption_configuration {rule {apply_server_side_encryption_by_default {sse_algorithm = "AES256"}}}dynamic "lifecycle_rule" {for_each = var.expire-days > 0 ? [var.expire-days] : []content {id = "expire"enabled = trueexpiration {days = var.expire-days}}}}
The module supports various arguments like bucket, policy, expire-days, and tags:
xxxxxxxxxxvariable "bucket" {description = "S3 Bucket Name"type = string}variable "policy" {description = "Optional S3 bucket policy to apply. Should be a valid JSON string"type = stringdefault = null}variable "expire-days" {description = "If set to positive number, lifecycle policy for expiring the objects after specified number of days will be attached to the bucket"type = numberdefault = 0}variable "tags" {description = "Common tags to be applied to all resources"type = map(any)}
Since the reusable S3 module is already created, we can now call the module from various environments like dev or live and pass in the required variables.
xxxxxxxxxxmodule "dev-dzone-bucket" {source = "../modules/s3-bucket"bucket = "dev-dzone-iac-bucket"policy = nullexpire-days = 7tags = local.tags}module "live-dzone-bucket" {source = "../modules/s3-bucket"bucket = "live-dzone-iac-bucket"policy = nullexpire-days = 14tags = local.tags}
The file layout in the Terraform project can look something like Figure 2 below, with separate folders for development and production environments that contain AWS resources under “terraform-modules.”

Bringing a modern IaC platform into a startup or a company with many greenfield applications may not be difficult. For most companies, however, it’s not so straightforward. Many companies, both large and small, have a lot of infrastructure that was created by pointing and clicking in the console of a cloud provider. That’s how many new projects get started. Then, one day, an ops engineer wakes up and realizes that the new project is now production infrastructure. To make it more “official,” the team writes a run book or a wiki that describes what buttons to click when someone wants to perform a common task. Another common situation is that there are Bash or PowerShell scripts floating around that only one or two people know about. What do you do if that’s your situation?
Remember that change can be scary. Many people feel paralyzed when they think about touching their infrastructure. It's too complicated and they don't understand how it works. Take the time to build up your confidence.
The first step, perhaps even before you begin to evaluate tools and approaches, is to define what “good” looks like to your company. Achieving that ideal depends on understanding what assumptions will remain true regardless of which tools you use. A team made up of all the stakeholders is one way to define what your company wants to achieve with its cloud infrastructure.
After thinking about the critical points listed above, narrow your search for the perfect platform down to a few candidates to evaluate. You might want to design a small project whose only purpose is to test the platform and see how well it helps you reach your goals.
Once you’ve selected a tool, try importing some existing infrastructure. If you’re working with the right platform, this should be straightforward.
Assuming your infrastructure code is integrated with your continuous delivery pipeline, you can start instituting the same best practices you use with your application code.
Start with a new service or non-critical service—something that won’t disrupt your business if it fails. Pick a project where you’ll start seeing value early and then iterate.
A modern approach to IaC is a great way to reduce cloud complexity, unlock the potential of the modern cloud, and achieve faster innovation. With a modern IaC approach, you apply standard software engineering practices and tools to infrastructure, usually with an IaC platform that supports these practices. Briefly, here is a summary of the high-level benefits that you can expect.
With a modern IaC approach, teams can apply the same practices, testing rigor, and automation of modern software development to cloud infrastructure. This increases the rate and reliability of releases so that companies can react to customer feedback and iterate quickly.
Because developers can use standard testing frameworks, IaC “shifts risk left”. Early, frequent, and thorough testing can be a part of the authoring process and CI/CD pipeline. Since policy and security requirements are also written as code, compliance and safety are automatically tested with every deployment.
Modern IaC platforms use standard tools and languages, which can break down silos between infrastructure, application development, and security teams. Using shared practices and tools increases collaboration between different teams.