From d5049b65da441b6e7eea7c447d6a8334bf9bd8c5 Mon Sep 17 00:00:00 2001 From: Mitchell Hashimoto Date: Fri, 25 Jul 2014 21:37:22 -0700 Subject: [PATCH] website: internals --- .../docs/internals/architecture.html.markdown | 117 ------------------ website/source/docs/internals/graph.html.md | 98 +++++++++++++++ website/source/docs/internals/index.html.md | 19 +++ .../source/docs/internals/lifecycle.html.md | 58 +++++++++ website/source/layouts/docs.erb | 24 +--- 5 files changed, 179 insertions(+), 137 deletions(-) delete mode 100644 website/source/docs/internals/architecture.html.markdown create mode 100644 website/source/docs/internals/graph.html.md create mode 100644 website/source/docs/internals/index.html.md create mode 100644 website/source/docs/internals/lifecycle.html.md diff --git a/website/source/docs/internals/architecture.html.markdown b/website/source/docs/internals/architecture.html.markdown deleted file mode 100644 index f5c6e8267..000000000 --- a/website/source/docs/internals/architecture.html.markdown +++ /dev/null @@ -1,117 +0,0 @@ ---- -layout: "docs" -page_title: "Terraform Architecture" -sidebar_current: "docs-internals-architecture" ---- - -# Terraform Architecture - -Terraform is a complex system that has many different moving parts. To help -users and developers of Terraform form a mental model of how it works, this -page documents the system architecture. - -
-Advanced Topic! This page covers technical details of -the internals of Terraform. You don't need to know these details to effectively -operate and use Terraform. These details are documented here for those who wish -to learn about them without having to go spelunking through the source code. -
- -## Glossary - -Before describing the architecture, we provide a glossary of terms to help -clarify what is being discussed: - -* Agent - An agent is the long running daemon on every member of the Terraform cluster. -It is started by running `terraform agent`. The agent is able to run in either *client*, -or *server* mode. Since all nodes must be running an agent, it is simpler to refer to -the node as either being a client or server, but there are other instances of the agent. All -agents can run the DNS or HTTP interfaces, and are responsible for running checks and -keeping services in sync. - -* Client - A client is an agent that forwards all RPCs to a server. The client is relatively -stateless. The only background activity a client performs is taking part of LAN gossip pool. -This has a minimal resource overhead and consumes only a small amount of network bandwidth. - -* Server - An agent that is server mode. When in server mode, there is an expanded set -of responsibilities including participating in the Raft quorum, maintaining cluster state, -responding to RPC queries, WAN gossip to other datacenters, and forwarding queries to leaders -or remote datacenters. - -* Datacenter - A datacenter seems obvious, but there are subtle details such as multiple -availability zones in EC2. We define a datacenter to be a networking environment that is -private, low latency, and high bandwidth. This excludes communication that would traverse -the public internet. - -* Consensus - When used in our documentation we use consensus to mean agreement upon -the elected leader as well as agreement on the ordering of transactions. Since these -transactions are applied to a FSM, we implicitly include the consistency of a replicated -state machine. Consensus is described in more detail on [Wikipedia](http://en.wikipedia.org/wiki/Consensus_(computer_science)), -as well as our [implementation here](/docs/internals/consensus.html). - -* Gossip - Terraform is built on top of [Serf](http://www.serfdom.io/), which provides a full -[gossip protocol](http://en.wikipedia.org/wiki/Gossip_protocol) that is used for multiple purposes. -Serf provides membership, failure detection, and event broadcast mechanisms. Our use of these -is described more in the [gossip documentation](/docs/internals/gossip.html). It is enough to know -gossip involves random node-to-node communication, primarily over UDP. - -* LAN Gossip - This is used to mean that there is a gossip pool, containing nodes that -are all located on the same local area network or datacenter. - -* WAN Gossip - This is used to mean that there is a gossip pool, containing servers that -are primary located in different datacenters and must communicate over the internet or -wide area network. - -* RPC - RPC is short for a Remote Procedure Call. This is a request / response mechanism -allowing a client to make a request from a server. - -## 10,000 foot view - -From a 10,000 foot altitude the architecture of Terraform looks like this: - -![Terraform Architecture](/images/terraform-arch.png) - -Lets break down this image and describe each piece. First of all we can see -that there are two datacenters, one and two respectively. Terraform has first -class support for multiple datacenters and expects this to be the common case. - -Within each datacenter we have a mixture of clients and servers. It is expected -that there be between three to five servers. This strikes a balance between -availability in the case of failure and performance, as consensus gets progressively -slower as more machines are added. However, there is no limit to the number of clients, -and they can easily scale into the thousands or tens of thousands. - -All the nodes that are in a datacenter participate in a [gossip protocol](/docs/internals/gossip.html). -This means there is a gossip pool that contains all the nodes for a given datacenter. This serves -a few purposes: first, there is no need to configure clients with the addresses of servers, -discovery is done automatically. Second, the work of detecting node failures -is not placed on the servers but is distributed. This makes the failure detection much more -scalable than naive heartbeating schemes. Thirdly, it is used as a messaging layer to notify -when important events such as leader election take place. - -The servers in each datacenter are all part of a single Raft peer set. This means that -they work together to elect a leader, which has extra duties. The leader is responsible for -processing all queries and transactions. Transactions must also be replicated to all peers -as part of the [consensus protocol](/docs/internals/consensus.html). Because of this requirement, -when a non-leader server receives an RPC request it forwards it to the cluster leader. - -The server nodes also operate as part of a WAN gossip. This pool is different from the LAN pool, -as it is optimized for the higher latency of the internet, and is expected to only contain -other Terraform server nodes. The purpose of this pool is to allow datacenters to discover each -other in a low touch manner. Bringing a new datacenter online is as easy as joining the existing -WAN gossip. Because the servers are all operating in this pool, it also enables cross-datacenter requests. -When a server receives a request for a different datacenter, it forwards it to a random server -in the correct datacenter. That server may then forward to the local leader. - -This results in a very low coupling between datacenters, but because of failure detection, -connection caching and multiplexing, cross-datacenter requests are relatively fast and reliable. - -## Getting in depth - -At this point we've covered the high level architecture of Terraform, but there are much -more details to each of the sub-systems. The [consensus protocol](/docs/internals/consensus.html) is -documented in detail, as is the [gossip protocol](/docs/internals/gossip.html). The [documentation](/docs/internals/security.html) -for the security model and protocols used are also available. - -For other details, either terraformt the code, ask in IRC or reach out to the mailing list. - diff --git a/website/source/docs/internals/graph.html.md b/website/source/docs/internals/graph.html.md new file mode 100644 index 000000000..70b49a45b --- /dev/null +++ b/website/source/docs/internals/graph.html.md @@ -0,0 +1,98 @@ +--- +layout: "docs" +page_title: "Resource Graph" +sidebar_current: "docs-internals-graph" +--- + +# Resource Graph + +Terraform builds a +[dependency graph](http://en.wikipedia.org/wiki/Dependency_graph) +from the Terraform configurations, and walks this graph to +generate plans, refresh state, and more. This page documents +the details of what are contained in this graph, what types +of nodes there are, and how the edges of the graph are determined. + +
+Advanced Topic! This page covers technical details +of Terraform. You don't need to understand these details to +effectively use Terraform. The details are documented here for +those who wish to learn about them without having to go +spelunking through the source code. +
+ +## Graph Nodes + +There are only a handful of node types that can exist within the +graph. We'll cover these first before explaining how they're +determined and built: + + * **Resource Node** - Represents a single resource. If you have + the `count` metaparameter set, then there will be one resource + node for each count. The configuration, diff, state, etc. of + the resource under change is attached to this node. + + * **Provider Configuration Node** - Represents the time to fully + configure a provider. This is when the provider configuration + block is given to a provider, such as AWS security credentials. + + * **Resource Meta-Node** - Represents a group of resources, but + does not represent any action on its own. This is done for + convenience on dependencies and making a prettier graph. This + node is only present for resources that have a `count` + parameter greater than 1. + +When visualizing a configuration with `terraform graph`, you can +see all of these nodes present. + +## Building the Graph + +Building the graph is done in a series of sequential steps: + + 1. Resources nodes are added based on the configuration. If a + diff (plan) or state is present, that meta-data is attached + to each resource node. + + 1. Resources are mapped to provisioners if they have any + defined. This must be done after all resource nodes are + created so resources with the same provisioner type can + share the provisioner implementation. + + 1. Explicit dependencies from the `depends_on` meta-parameter + are used to create edges between resources. + + 1. If a state is present, any "orphan" resources are added to + the graph. Orphan resources are any resources that are no + longer present in the configuration but are present in the + state file. Orphans never have any configuration associated + with them, since the state file does not store configuration. + + 1. Resources are mapped to providers. Provider configuration + nodes are created for these providers, and edges are created + such that the resources depend on their respective provider + being configured. + + 1. Interpolations are parsed in resource and provider configurations + to determine dependencies. References to resource attributes + are turned into dependencies from the resource with the interpolation + to the resource being referenced. + + 1. Create a root node. The root node points to all resources and + is created so there is a single root to the dependency graph. When + traversing the graph, the root node is ignored. + + 1. If a diff is present, traverse all resource nodes and find resources + that are being destroyed. These resource nodes are split into two: + one node that destroys the resource and another that creates + the resource (if it is being recreated). The reason the nodes must + be split is because the destroy order is often different from the + create order, and so they can't be represented by a single graph + node. + + 1. Validate the graph has no cycles and has a single root. + +## Walking the Graph + +To walk the graph, a standard depth-first traversal is done. Graph +walking is done with as much parallelism as possible: a node is walked +as soon as all of its dependencies are walked. diff --git a/website/source/docs/internals/index.html.md b/website/source/docs/internals/index.html.md new file mode 100644 index 000000000..ad2a08192 --- /dev/null +++ b/website/source/docs/internals/index.html.md @@ -0,0 +1,19 @@ +--- +layout: "docs" +page_title: "Internals" +sidebar_current: "docs-internals" +--- + +# Terraform Internals + +This section covers the internals of Terraform and explains how +plans are generated, the lifecycle of a provider, etc. The goal +of this section is to remove any notion of "magic" from Terraform. +We want you to be able to trust and understand what Terraform is +doing to function. + +
+Note: Knowledge of Terraform internals is not +required to use Terraform. If you aren't interested in the internals +of Terraform, you may safely skip this section. +
diff --git a/website/source/docs/internals/lifecycle.html.md b/website/source/docs/internals/lifecycle.html.md new file mode 100644 index 000000000..aa072dfd9 --- /dev/null +++ b/website/source/docs/internals/lifecycle.html.md @@ -0,0 +1,58 @@ +--- +layout: "docs" +page_title: "Resource Lifecycle" +sidebar_current: "docs-internals-lifecycle" +--- + +# Resource Lifecycle + +Resources have a strict lifecycle, and can be thought of as basic +state machines. Understanding this lifecycle can help better understand +how Terraform generates an execution plan, how it safely executes that +plan, and what the resource provider is doing throughout all of this. + +
+Advanced Topic! This page covers technical details +of Terraform. You don't need to understand these details to +effectively use Terraform. The details are documented here for +those who wish to learn about them without having to go +spelunking through the source code. +
+ +## Lifecycle + +A resource roughly follows the steps below: + + 1. `ValidateResource` is called to do a high-level structural + validation of a resource's configuration. The configuration + at this point is raw and the interpolations have not been processed. + The value of any key is not guaranteed and is just meant to be + a quick structural check. + + 1. `Diff` is called with the current state and the configuration. + The resource provider inspects this and returns a diff, outlining + all the changes that need to occur to the resource. The diff includes + details such as whether or not the resource is being destroyed, what + attribute necessitates the destroy, old values and new values, whether + a value is computed, etc. It is up to the resource provider to + have this knowledge. + + 1. `Apply` is called with the current state and the diff. Apply does + not have access to the configuration. This is a safety mechanism + that limits the possibility that a provider changes a diff on the + fly. `Apply` must apply a diff as prescribed and do nothing else + to remain true to the Terraform execution plan. Apply returns the + new state of the resource (or nil if the resource was destroyed). + + 1. If a resource was just created and did not exist before, and the + apply succeeded without error, then the provisioners are executed + in sequence. If any provisioner errors, the resource is marked as + _tainted_, so that it will be destroyed on the next apply. + +## Partial State and Error Handling + +If an error happens at any stage in the lifecycle of a resource, +Terraform stores a partial state of the resource. This behavior is +critical for Terraform to ensure that you don't end up with any +_zombie_ resources: resources that were created by Terraform but +no longer managed by Terraform due to a loss of state. diff --git a/website/source/layouts/docs.erb b/website/source/layouts/docs.erb index 5b4bc5aa8..22b63d664 100644 --- a/website/source/layouts/docs.erb +++ b/website/source/layouts/docs.erb @@ -158,28 +158,12 @@ > Internals