Splitgraph has been acquired by EDB! Read the blog post.
 
Previous Post
Dogfooding Splitgraph for cross-database analytics in Metabase
Dec 20, 2021 · By Marko Grujić
READING TIME: 5 min

Preview Environments: Spinning up temporary Splitgraph instances from any commit

We talk about how we use GitLab's review apps functionality to preview and test Splitgraph Cloud deployments.

Like almost everyone these days, Splitgraph has a pretty elaborate CI/CD system. It handles everything from building our Docker images to running tests to staging/production deployments . To achieve this, we use the extensive GitLab CI/CD tool. Through a combination of a base .gitlab-ci.yml configuration file coupled with arbitrary scripts, it lets us perform a variety of potential DevOps actions.

One such example that recently attracted our attention was something GitLab calls review apps. The concept is pretty straight-forward: a CI/CD job that deploys code changes from a branch, integrated with GitLab's UI. This deployment could be a work-in-progress or a proof-of-concept. GitLab review apps let developers, designers or product owners preview and iterate on their work faster.

We decided to make use of this functionality in our own workflow. We call it preview environments, as the name is a better fit for our use case (more on that below).

Splitgraph infrastructure

Our proprietary product, Splitgraph Cloud, can be deployed to private as well as public clouds. In the latter case, we use a set of multi-cloud Terraform templates that cover the public cloud triumvirate of Azure, AWS and GCP.

Terraform allows us to write maintainable and re-usable infrastructure-as-code in a cloud-agnostic way. Thus our preview environments not only have to deal with application-level deployments, but also with provisioning the cloud infrastructure during the initial setup (as well as cleaning up on tear-down). This makes them veritable environments, in parallel to staging and production.

Terraform also allows us to persist the state of all our existing preview environments between CI/CD job runs. For this, it uses the remote backend configuration (which in our case is set to a GCS bucket).

Setup

Here's an example of the preview environment job spec in the .gitlab-ci.yml configuration file:

deploy_preview:
  image: registry.gitlab.com/splitgraph/splitgraph-cloud/cd-environment:development
  stage: preview
  rules:
  - if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH != $CI_DEFAULT_BRANCH
    when: manual
    allow_failure: true
  environment:
    name: preview/$CI_COMMIT_REF_SLUG
    url: https://www.$CI_COMMIT_REF_SLUG.splitgraph.io
    deployment_tier: development
    auto_stop_in: 3 days
    on_stop: terminate_preview
  script:
  - source "$CI_PROJECT_DIR"/.ci/envs/preview.env
  - $CI_PROJECT_DIR/.ci/preview_env.sh deploy

A couple of noteworthy points:

  • We use a custom "Docker-in-Docker" image with Terraform to manage the infrastructure. We also use yq and our open source sgr client to seed all our preview environments with a whole lot of data upon the initial deploy.
  • The job can only run if the previous build and test stage succeeded.
  • The job can be run with or without an open Merge Request, and it needs to be manually triggered. It cannot run on our main branch.
  • The name and domain for the preview environment are generated based on the URL-friendly CI_COMMIT_REF_SLUG predefined variable .

The actual provisioning and deployment is handled through the preview_env.sh script. Before that, source a .env file to populate some environment variables. These include Terraform variables, as well as some variables controlling app-level settings. One more thing we had to resort to here is trim the name of the preview environment for branches with long names. This is to avoid the length constraints put on us by the cloud providers or Let's Encrypt, since we use this name for various resources too.

Deployment

When provisioning/deploying, in the preview_env.sh script we first check whether a Terraform workspace for the given branch already exists:

workspace_exists=0
terraform workspace select "$CI_COMMIT_REF_SLUG" && workspace_exists=1

If the workspace doesn't exist, it means that the environment has not been provisioned, so the job then executes:

terraform workspace new "$CI_COMMIT_REF_SLUG"
terraform apply -auto-approve

source ./first_deploy.sh

Terraform then creates the required resources. This includes the VPC, firewalls, subnetwork, virtual machines, disks, as well as Cloudflare domain records and Gitlab project variables (for per-environment secrets, e.g. SSH keys).

In case of preview environments we always deploy a 2-node cluster. The main VM hosts our DBs and the backend services, while the lighter, auxiliary VM, hosts analytics and monitoring services.

The script first_deploy.sh subsequently takes care of a number of other important tasks:

  • generating a deploy artifact
  • uploading and executing it
  • spinning up all our containerized services in the cluster reliably (with the help of Nomad)
  • registering admin/test users
  • renewing certificates using LetsEncrypt

If the workspace does exist, then the job has been triggered on a pre-existing preview environment. This means we only need to upgrade our app to pick up the latest changes for the given branch:

"$CI_PROJECT_DIR"/.ci/deploy/deploy.sh

The complete initial setup takes about 15 minutes. A redeploy (if all services are updated) takes about 10.

Output

Finally, in either case we use chatops to distribute/persist some environment-specific user details (e.g. URL, user credentials, etc.) in a Markdown-formatted message. If a merge request is open, the job posts a note on it with the details. If not, the job pushes them directly to a corresponding Mattermost channel through a hook.

Termination

There is one more job, called terminate_preview, which is used for de-provisioning the environment. Besides being executable manually, note that it is specified as environment:on_stop reference for the deploy_preview job. This means that once the environment:auto_stop_in period since the last deploy job ran expires, the job gets triggered automatically by GitLab.

It sources the same preview.env and then executes $CI_PROJECT_DIR/.ci/preview_env.sh terminate:

terraform destroy -auto-approve
terraform workspace select default
terraform workspace delete "$CI_COMMIT_REF_SLUG"

thereby ensuring that all resources are cleaned up, including the Terraform workspace.

Conclusion

In summary, we utilize preview environments for smoke testing and rapid iteration, in an on-demand, fully fledged production-grade Splitgraph instance. The infrastructure provisioning and application deployment done in this case exercise the same code that we run for our customer environments and production releases. For this reason we view preview environments as an extension of integration tests and, in fact, are planning to start using them for automated end-to-end tests.

Airbyte, dbt, Splitgraph: how we built our modern data stack