Like almost everyone these days, Splitgraph has a pretty elaborate CI/CD system.
It handles everything from building
our Docker images
to running tests
to staging/production deployments
. To achieve this, we use the extensive GitLab CI/CD tool. Through a combination
of a base
.gitlab-ci.yml configuration file coupled with arbitrary scripts, it
lets us perform a variety of potential DevOps actions.
One such example that recently attracted our attention was something GitLab calls review apps. The concept is pretty straight-forward: a CI/CD job that deploys code changes from a branch, integrated with GitLab's UI. This deployment could be a work-in-progress or a proof-of-concept. GitLab review apps let developers, designers or product owners preview and iterate on their work faster.
We decided to make use of this functionality in our own workflow. We call it preview environments, as the name is a better fit for our use case (more on that below).
Our proprietary product, Splitgraph Cloud, can be deployed to private as well as public clouds. In the latter case, we use a set of multi-cloud Terraform templates that cover the public cloud triumvirate of Azure, AWS and GCP.
Terraform allows us to write maintainable and re-usable infrastructure-as-code in a cloud-agnostic way. Thus our preview environments not only have to deal with application-level deployments, but also with provisioning the cloud infrastructure during the initial setup (as well as cleaning up on tear-down). This makes them veritable environments, in parallel to staging and production.
Terraform also allows us to persist the state of all our existing preview environments between CI/CD job runs. For this, it uses the remote backend configuration (which in our case is set to a GCS bucket).
Here's an example of the preview environment job spec in the
deploy_preview: image: registry.gitlab.com/splitgraph/splitgraph-cloud/cd-environment:development stage: preview rules: - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH when: manual allow_failure: true environment: name: preview/$CI_COMMIT_REF_SLUG url: https://www.$CI_COMMIT_REF_SLUG.splitgraph.io deployment_tier: development auto_stop_in: 3 days on_stop: terminate_preview script: - source "$CI_PROJECT_DIR"/.ci/envs/preview.env - $CI_PROJECT_DIR/.ci/preview_env.sh deploy
A couple of noteworthy points:
- We use a custom "Docker-in-Docker" image with Terraform to manage the
infrastructure. We also use
yqand our open source sgr client to seed all our preview environments with a whole lot of data upon the initial deploy.
- The job can only run if the previous build and test stage succeeded.
- The job can be run with or without an open Merge Request, and it needs to be manually triggered. It cannot run on our main branch.
- The name and domain for the preview environment are generated based on the
CI_COMMIT_REF_SLUGpredefined variable .
The actual provisioning and deployment is handled through the
script. Before that, source a
.env file to populate some environment
variables. These include Terraform variables, as well as some variables
controlling app-level settings. One more thing we had to resort to here is trim
the name of the preview environment for branches with long names. This is to
avoid the length constraints put on us by the cloud providers or Let's Encrypt,
since we use this name for various resources too.
When provisioning/deploying, in the
preview_env.sh script we first check
whether a Terraform workspace for the given branch already exists:
workspace_exists=0 terraform workspace select "$CI_COMMIT_REF_SLUG" && workspace_exists=1
If the workspace doesn't exist, it means that the environment has not been provisioned, so the job then executes:
terraform workspace new "$CI_COMMIT_REF_SLUG" terraform apply -auto-approve source ./first_deploy.sh
Terraform then creates the required resources. This includes the VPC, firewalls, subnetwork, virtual machines, disks, as well as Cloudflare domain records and Gitlab project variables (for per-environment secrets, e.g. SSH keys).
In case of preview environments we always deploy a 2-node cluster. The main VM hosts our DBs and the backend services, while the lighter, auxiliary VM, hosts analytics and monitoring services.
first_deploy.sh subsequently takes care of a number of other
- generating a deploy artifact
- uploading and executing it
- spinning up all our containerized services in the cluster reliably (with the help of Nomad)
- registering admin/test users
- renewing certificates using LetsEncrypt
If the workspace does exist, then the job has been triggered on a pre-existing preview environment. This means we only need to upgrade our app to pick up the latest changes for the given branch:
The complete initial setup takes about 15 minutes. A redeploy (if all services are updated) takes about 10.
Finally, in either case we use chatops to distribute/persist some environment-specific user details (e.g. URL, user credentials, etc.) in a Markdown-formatted message. If a merge request is open, the job posts a note on it with the details. If not, the job pushes them directly to a corresponding Mattermost channel through a hook.
There is one more job, called
terminate_preview, which is used for
de-provisioning the environment. Besides being executable manually, note that it
is specified as
environment:on_stop reference for the
This means that once the
environment:auto_stop_in period since the last deploy
job ran expires, the job gets triggered automatically by GitLab.
It sources the same
preview.env and then
terraform destroy -auto-approve terraform workspace select default terraform workspace delete "$CI_COMMIT_REF_SLUG"
thereby ensuring that all resources are cleaned up, including the Terraform workspace.
In summary, we utilize preview environments for smoke testing and rapid iteration, in an on-demand, fully fledged production-grade Splitgraph instance. The infrastructure provisioning and application deployment done in this case exercise the same code that we run for our customer environments and production releases. For this reason we view preview environments as an extension of integration tests and, in fact, are planning to start using them for automated end-to-end tests.