GTA: Detecting affected dependent Go packages

Posted 2021-01-12  in Engineering
blog header

Today we are announcing the open sourcing of gta, which we use to understand the downstream dependencies of Go packages changed in pull requests to our monorepo, cthulhu. Technically, gta stands for Go Test Auto, but a more proper name might be Go Transitive Analysis. 

In this article, we'll go through the primary use case for gta, its options, and how it can improve build times on feature branches by targeting only packages impacted by the changes on the feature branch.

Matt Layher first introduced gta in his blog article about cthulhu, where he discussed the motivation and positive impact that gta had on DigitalOcean's build times of monorepo pull requests. In short, gta compares the current branch with its merge base of the destination branch to determine what's been changed in the branch. It then calculates all dependencies of those changes and outputs the import paths of all the affected packages.

After experiencing slow and unreliable builds, one of our engineers, Justin Hines, set out to solve the problem once and for all. After a few hours of work, he authored a build tool called gta, designed to inspect the git history to determine which files changed between the merge base of the destination branch and a feature branch. It uses this information to determine which packages must be tested for a given build, including packages that import the changed package.

As an example, suppose a change is committed which modifies a package, such as: do/teams/example/droplet. Suppose this package is imported by another package: do/teams/example/hypervisor. Gta is used to inspect the git history and determine that both of these packages must be tested, although only the first package was changed.

The introduction of gta into our CI build process dramatically reduced the amount of time taken by builds. When gta was introduced in early 2016, the average build time dropped from 20 minutes to 2-3 minutes! This tool is now used almost everywhere in our build pipeline, including static analysis checks, code compilation and testing, and artifact builds and deployment.

There are cases where building everything is still useful regardless of which files have actually changed. To support that use case, our build pipelines will bypass gta when either the name of the branch being tested has -force-test anywhere in its name or the pull request has a force-test label, restoring the old default behavior of “build everything for every change.”

input

Gta needs a list of changed files. In the usual case, gta uses git to determine which files have changed by running git diff --name-only --no-renames It also supports a flag, --changed-files, to provide a file that contains a newline separated list of absolute paths of changed files for cases where the file list needs to be prefiltered.

analysis

The --tags flag is a comma separated list of build tags to consider satisfied while analyzing the changes. You will be familiar with this if you rely on build constraints to build your packages.

output

The --include flag is used as a filter to control which packages are output. Its value is expected to be a comma separated list of package path prefixes that must match on an affected package's import path. Go developers will be familiar with this concept; gta essentially appends ... to each of the entries in the comma separated list.  A value of net/ would cause gta to output any affected package whose import path begins with net/ (e.g. net/http, net/httputil, or net/url).

Two flags, --buildable-only and --json are used to control the output. The former,
--buildable-only, is a boolean flag that cannot be on when --json is on. Because --buildable-only is on by default, it must be explicitly set to false when -json is used.

The --buildable-only flag causes gta to output a newline-separated list of buildable packages that were affected. This flag will elide any packages that were fully deleted or that were fully excluded by build constraints (i.e. --tags). The latter, --json, outputs a JSON object that fully describes the changes, including deleted packages.

When --json is used, the output will be a JSON object with three properties: dependencies, changes, and all_changes. When piped to jq, gta's JSON output can be transformed as needed.

The dependencies property is a JSON object whose keys are the import paths of packages that have changed. Each key's value is a JSON array of strings whose values are import paths of the packages dependent on the package identified in the key. The changes property is a JSON array of strings whose values are the import paths of the packages that have changed. The final property, all_changes, is a JSON array of strings whose values are the import paths of all packages affected by changes.

The --merge and -base flags are used to control the left-hand side of the git diff operation. The former, --merge, will cause gta to use the most recent merge commit on the current branch as the left-hand side. The latter, -base, will cause gta to use the provided git revision as the left-hand side.

gotchas

Gta assumes that the source control system is git. It is unlikely that other systems will be supported. The --changed-files flag can be used to provide a list of files to inspect and completely skip the git operations in gta.

Gta will consider a package to have changed even when none of the changed files in the directory are Go files; as long as there is a valid Go package in the directory, gta will consider that package to have been changed. This is intentional: it helps ensure that if tests use those files or go generate needs to be run, that the tests or build scripts can be informed of the package change. We believe the tradeoff of sometimes being overly aggressive is worth the practical guarantee that it provides.

Gta does not report a package as having changed if files in its testdata directory have changed. For consistency with how non-Go files in a package directory are handled, we are reconsidering how changed files in a testdata directory should affect gta's output.

To get the full benefit of using gta to reduce build times, it is important to structure your Go packages efficiently. When possible, put interface definitions in a separate package from implementations, program against the interfaces, and reference the implementations of those interfaces in main packages. This is not always practical or desirable; the important thing is to design your package layout thoughtfully and be aware that some package changes will necessarily affect a large number of dependents.

Conclusion

DigitalOcean was able to dramatically reduce the time required to build and test a pull request while still ensuring complete analysis and testing by focusing only on the packages that are affected by the changes in the pull request. Thanks to Go's excellent support for static analysis, gta is able to determine which packages are affected by those changed packages with a high degree of confidence. We hope gta will be able to streamline your builds, too.

Billie Cleek is a Staff Engineer in the PaaS group where he supports teams building DigitalOcean's PaaS product line and  internal tools to provide a consistent deployment surface for DigitalOcean's microservices. In his spare time, Billie is the maintainer of vim-go, infrequent contributor to other open source projects, and can be found working on his 100 year old house, sailing, or in the forests of the Pacific Northwest regardless of the weather. You may also find Billie on GitHub and Twitter.