website: internals
This commit is contained in:
parent
9ff8856fe8
commit
d5049b65da
|
@ -1,117 +0,0 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Terraform Architecture"
|
||||
sidebar_current: "docs-internals-architecture"
|
||||
---
|
||||
|
||||
# Terraform Architecture
|
||||
|
||||
Terraform is a complex system that has many different moving parts. To help
|
||||
users and developers of Terraform form a mental model of how it works, this
|
||||
page documents the system architecture.
|
||||
|
||||
<div class="alert alert-block alert-warning">
|
||||
<strong>Advanced Topic!</strong> This page covers technical details of
|
||||
the internals of Terraform. You don't need to know these details to effectively
|
||||
operate and use Terraform. These details are documented here for those who wish
|
||||
to learn about them without having to go spelunking through the source code.
|
||||
</div>
|
||||
|
||||
## Glossary
|
||||
|
||||
Before describing the architecture, we provide a glossary of terms to help
|
||||
clarify what is being discussed:
|
||||
|
||||
* Agent - An agent is the long running daemon on every member of the Terraform cluster.
|
||||
It is started by running `terraform agent`. The agent is able to run in either *client*,
|
||||
or *server* mode. Since all nodes must be running an agent, it is simpler to refer to
|
||||
the node as either being a client or server, but there are other instances of the agent. All
|
||||
agents can run the DNS or HTTP interfaces, and are responsible for running checks and
|
||||
keeping services in sync.
|
||||
|
||||
* Client - A client is an agent that forwards all RPCs to a server. The client is relatively
|
||||
stateless. The only background activity a client performs is taking part of LAN gossip pool.
|
||||
This has a minimal resource overhead and consumes only a small amount of network bandwidth.
|
||||
|
||||
* Server - An agent that is server mode. When in server mode, there is an expanded set
|
||||
of responsibilities including participating in the Raft quorum, maintaining cluster state,
|
||||
responding to RPC queries, WAN gossip to other datacenters, and forwarding queries to leaders
|
||||
or remote datacenters.
|
||||
|
||||
* Datacenter - A datacenter seems obvious, but there are subtle details such as multiple
|
||||
availability zones in EC2. We define a datacenter to be a networking environment that is
|
||||
private, low latency, and high bandwidth. This excludes communication that would traverse
|
||||
the public internet.
|
||||
|
||||
* Consensus - When used in our documentation we use consensus to mean agreement upon
|
||||
the elected leader as well as agreement on the ordering of transactions. Since these
|
||||
transactions are applied to a FSM, we implicitly include the consistency of a replicated
|
||||
state machine. Consensus is described in more detail on [Wikipedia](http://en.wikipedia.org/wiki/Consensus_(computer_science)),
|
||||
as well as our [implementation here](/docs/internals/consensus.html).
|
||||
|
||||
* Gossip - Terraform is built on top of [Serf](http://www.serfdom.io/), which provides a full
|
||||
[gossip protocol](http://en.wikipedia.org/wiki/Gossip_protocol) that is used for multiple purposes.
|
||||
Serf provides membership, failure detection, and event broadcast mechanisms. Our use of these
|
||||
is described more in the [gossip documentation](/docs/internals/gossip.html). It is enough to know
|
||||
gossip involves random node-to-node communication, primarily over UDP.
|
||||
|
||||
* LAN Gossip - This is used to mean that there is a gossip pool, containing nodes that
|
||||
are all located on the same local area network or datacenter.
|
||||
|
||||
* WAN Gossip - This is used to mean that there is a gossip pool, containing servers that
|
||||
are primary located in different datacenters and must communicate over the internet or
|
||||
wide area network.
|
||||
|
||||
* RPC - RPC is short for a Remote Procedure Call. This is a request / response mechanism
|
||||
allowing a client to make a request from a server.
|
||||
|
||||
## 10,000 foot view
|
||||
|
||||
From a 10,000 foot altitude the architecture of Terraform looks like this:
|
||||
|
||||
![Terraform Architecture](/images/terraform-arch.png)
|
||||
|
||||
Lets break down this image and describe each piece. First of all we can see
|
||||
that there are two datacenters, one and two respectively. Terraform has first
|
||||
class support for multiple datacenters and expects this to be the common case.
|
||||
|
||||
Within each datacenter we have a mixture of clients and servers. It is expected
|
||||
that there be between three to five servers. This strikes a balance between
|
||||
availability in the case of failure and performance, as consensus gets progressively
|
||||
slower as more machines are added. However, there is no limit to the number of clients,
|
||||
and they can easily scale into the thousands or tens of thousands.
|
||||
|
||||
All the nodes that are in a datacenter participate in a [gossip protocol](/docs/internals/gossip.html).
|
||||
This means there is a gossip pool that contains all the nodes for a given datacenter. This serves
|
||||
a few purposes: first, there is no need to configure clients with the addresses of servers,
|
||||
discovery is done automatically. Second, the work of detecting node failures
|
||||
is not placed on the servers but is distributed. This makes the failure detection much more
|
||||
scalable than naive heartbeating schemes. Thirdly, it is used as a messaging layer to notify
|
||||
when important events such as leader election take place.
|
||||
|
||||
The servers in each datacenter are all part of a single Raft peer set. This means that
|
||||
they work together to elect a leader, which has extra duties. The leader is responsible for
|
||||
processing all queries and transactions. Transactions must also be replicated to all peers
|
||||
as part of the [consensus protocol](/docs/internals/consensus.html). Because of this requirement,
|
||||
when a non-leader server receives an RPC request it forwards it to the cluster leader.
|
||||
|
||||
The server nodes also operate as part of a WAN gossip. This pool is different from the LAN pool,
|
||||
as it is optimized for the higher latency of the internet, and is expected to only contain
|
||||
other Terraform server nodes. The purpose of this pool is to allow datacenters to discover each
|
||||
other in a low touch manner. Bringing a new datacenter online is as easy as joining the existing
|
||||
WAN gossip. Because the servers are all operating in this pool, it also enables cross-datacenter requests.
|
||||
When a server receives a request for a different datacenter, it forwards it to a random server
|
||||
in the correct datacenter. That server may then forward to the local leader.
|
||||
|
||||
This results in a very low coupling between datacenters, but because of failure detection,
|
||||
connection caching and multiplexing, cross-datacenter requests are relatively fast and reliable.
|
||||
|
||||
## Getting in depth
|
||||
|
||||
At this point we've covered the high level architecture of Terraform, but there are much
|
||||
more details to each of the sub-systems. The [consensus protocol](/docs/internals/consensus.html) is
|
||||
documented in detail, as is the [gossip protocol](/docs/internals/gossip.html). The [documentation](/docs/internals/security.html)
|
||||
for the security model and protocols used are also available.
|
||||
|
||||
For other details, either terraformt the code, ask in IRC or reach out to the mailing list.
|
||||
|
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Resource Graph"
|
||||
sidebar_current: "docs-internals-graph"
|
||||
---
|
||||
|
||||
# Resource Graph
|
||||
|
||||
Terraform builds a
|
||||
[dependency graph](http://en.wikipedia.org/wiki/Dependency_graph)
|
||||
from the Terraform configurations, and walks this graph to
|
||||
generate plans, refresh state, and more. This page documents
|
||||
the details of what are contained in this graph, what types
|
||||
of nodes there are, and how the edges of the graph are determined.
|
||||
|
||||
<div class="alert alert-block alert-warning">
|
||||
<strong>Advanced Topic!</strong> This page covers technical details
|
||||
of Terraform. You don't need to understand these details to
|
||||
effectively use Terraform. The details are documented here for
|
||||
those who wish to learn about them without having to go
|
||||
spelunking through the source code.
|
||||
</div>
|
||||
|
||||
## Graph Nodes
|
||||
|
||||
There are only a handful of node types that can exist within the
|
||||
graph. We'll cover these first before explaining how they're
|
||||
determined and built:
|
||||
|
||||
* **Resource Node** - Represents a single resource. If you have
|
||||
the `count` metaparameter set, then there will be one resource
|
||||
node for each count. The configuration, diff, state, etc. of
|
||||
the resource under change is attached to this node.
|
||||
|
||||
* **Provider Configuration Node** - Represents the time to fully
|
||||
configure a provider. This is when the provider configuration
|
||||
block is given to a provider, such as AWS security credentials.
|
||||
|
||||
* **Resource Meta-Node** - Represents a group of resources, but
|
||||
does not represent any action on its own. This is done for
|
||||
convenience on dependencies and making a prettier graph. This
|
||||
node is only present for resources that have a `count`
|
||||
parameter greater than 1.
|
||||
|
||||
When visualizing a configuration with `terraform graph`, you can
|
||||
see all of these nodes present.
|
||||
|
||||
## Building the Graph
|
||||
|
||||
Building the graph is done in a series of sequential steps:
|
||||
|
||||
1. Resources nodes are added based on the configuration. If a
|
||||
diff (plan) or state is present, that meta-data is attached
|
||||
to each resource node.
|
||||
|
||||
1. Resources are mapped to provisioners if they have any
|
||||
defined. This must be done after all resource nodes are
|
||||
created so resources with the same provisioner type can
|
||||
share the provisioner implementation.
|
||||
|
||||
1. Explicit dependencies from the `depends_on` meta-parameter
|
||||
are used to create edges between resources.
|
||||
|
||||
1. If a state is present, any "orphan" resources are added to
|
||||
the graph. Orphan resources are any resources that are no
|
||||
longer present in the configuration but are present in the
|
||||
state file. Orphans never have any configuration associated
|
||||
with them, since the state file does not store configuration.
|
||||
|
||||
1. Resources are mapped to providers. Provider configuration
|
||||
nodes are created for these providers, and edges are created
|
||||
such that the resources depend on their respective provider
|
||||
being configured.
|
||||
|
||||
1. Interpolations are parsed in resource and provider configurations
|
||||
to determine dependencies. References to resource attributes
|
||||
are turned into dependencies from the resource with the interpolation
|
||||
to the resource being referenced.
|
||||
|
||||
1. Create a root node. The root node points to all resources and
|
||||
is created so there is a single root to the dependency graph. When
|
||||
traversing the graph, the root node is ignored.
|
||||
|
||||
1. If a diff is present, traverse all resource nodes and find resources
|
||||
that are being destroyed. These resource nodes are split into two:
|
||||
one node that destroys the resource and another that creates
|
||||
the resource (if it is being recreated). The reason the nodes must
|
||||
be split is because the destroy order is often different from the
|
||||
create order, and so they can't be represented by a single graph
|
||||
node.
|
||||
|
||||
1. Validate the graph has no cycles and has a single root.
|
||||
|
||||
## Walking the Graph
|
||||
|
||||
To walk the graph, a standard depth-first traversal is done. Graph
|
||||
walking is done with as much parallelism as possible: a node is walked
|
||||
as soon as all of its dependencies are walked.
|
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Internals"
|
||||
sidebar_current: "docs-internals"
|
||||
---
|
||||
|
||||
# Terraform Internals
|
||||
|
||||
This section covers the internals of Terraform and explains how
|
||||
plans are generated, the lifecycle of a provider, etc. The goal
|
||||
of this section is to remove any notion of "magic" from Terraform.
|
||||
We want you to be able to trust and understand what Terraform is
|
||||
doing to function.
|
||||
|
||||
<div class="alert alert-block alert-info">
|
||||
<strong>Note:</strong> Knowledge of Terraform internals is not
|
||||
required to use Terraform. If you aren't interested in the internals
|
||||
of Terraform, you may safely skip this section.
|
||||
</div>
|
|
@ -0,0 +1,58 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Resource Lifecycle"
|
||||
sidebar_current: "docs-internals-lifecycle"
|
||||
---
|
||||
|
||||
# Resource Lifecycle
|
||||
|
||||
Resources have a strict lifecycle, and can be thought of as basic
|
||||
state machines. Understanding this lifecycle can help better understand
|
||||
how Terraform generates an execution plan, how it safely executes that
|
||||
plan, and what the resource provider is doing throughout all of this.
|
||||
|
||||
<div class="alert alert-block alert-warning">
|
||||
<strong>Advanced Topic!</strong> This page covers technical details
|
||||
of Terraform. You don't need to understand these details to
|
||||
effectively use Terraform. The details are documented here for
|
||||
those who wish to learn about them without having to go
|
||||
spelunking through the source code.
|
||||
</div>
|
||||
|
||||
## Lifecycle
|
||||
|
||||
A resource roughly follows the steps below:
|
||||
|
||||
1. `ValidateResource` is called to do a high-level structural
|
||||
validation of a resource's configuration. The configuration
|
||||
at this point is raw and the interpolations have not been processed.
|
||||
The value of any key is not guaranteed and is just meant to be
|
||||
a quick structural check.
|
||||
|
||||
1. `Diff` is called with the current state and the configuration.
|
||||
The resource provider inspects this and returns a diff, outlining
|
||||
all the changes that need to occur to the resource. The diff includes
|
||||
details such as whether or not the resource is being destroyed, what
|
||||
attribute necessitates the destroy, old values and new values, whether
|
||||
a value is computed, etc. It is up to the resource provider to
|
||||
have this knowledge.
|
||||
|
||||
1. `Apply` is called with the current state and the diff. Apply does
|
||||
not have access to the configuration. This is a safety mechanism
|
||||
that limits the possibility that a provider changes a diff on the
|
||||
fly. `Apply` must apply a diff as prescribed and do nothing else
|
||||
to remain true to the Terraform execution plan. Apply returns the
|
||||
new state of the resource (or nil if the resource was destroyed).
|
||||
|
||||
1. If a resource was just created and did not exist before, and the
|
||||
apply succeeded without error, then the provisioners are executed
|
||||
in sequence. If any provisioner errors, the resource is marked as
|
||||
_tainted_, so that it will be destroyed on the next apply.
|
||||
|
||||
## Partial State and Error Handling
|
||||
|
||||
If an error happens at any stage in the lifecycle of a resource,
|
||||
Terraform stores a partial state of the resource. This behavior is
|
||||
critical for Terraform to ensure that you don't end up with any
|
||||
_zombie_ resources: resources that were created by Terraform but
|
||||
no longer managed by Terraform due to a loss of state.
|
|
@ -158,28 +158,12 @@
|
|||
<li<%= sidebar_current("docs-internals") %>>
|
||||
<a href="/docs/internals/index.html">Internals</a>
|
||||
<ul class="nav">
|
||||
<li<%= sidebar_current("docs-internals-architecture") %>>
|
||||
<a href="/docs/internals/architecture.html">Architecture</a>
|
||||
<li<%= sidebar_current("docs-internals-graph") %>>
|
||||
<a href="/docs/internals/graph.html">Resource Graph</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-consensus") %>>
|
||||
<a href="/docs/internals/consensus.html">Consensus Protocol</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-gossip") %>>
|
||||
<a href="/docs/internals/gossip.html">Gossip Protocol</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-sessions") %>>
|
||||
<a href="/docs/internals/sessions.html">Sessions</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-security") %>>
|
||||
<a href="/docs/internals/security.html">Security Model</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("docs-internals-jepsen") %>>
|
||||
<a href="/docs/internals/jepsen.html">Jepsen Testing</a>
|
||||
<li<%= sidebar_current("docs-internals-lifecycle") %>>
|
||||
<a href="/docs/internals/lifecycle.html">Resource Lifecycle</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue