Automating Git Bisect with Ephemeral Environments

cortesoft 9 hours ago

The tricky thing with git bisect run is that you usually have to have your test script not be part of your git checkout for the process to work.

This is because of the particular situation that bisect is valuable in; you find a regression that doesn’t have a test, and you aren’t sure when it was committed.

First, it has to be a regression; if it is just a bug, then there was no previous version that didn’t have the bug, it just hadn’t been found until now. It has to be something that worked before and now doesn’t.

Second, it has to not have had a test before the regression was found. If you had a test for it already, it would have been found by CI as soon as it was committed, and you wouldn’t need to figure out which commit broke it.

So if you try to add a test to the normal test suite of the code and commit it, git bisect run is not going to work; as soon as you check out the older code, your new test won’t be there and the tests will pass because your new test of the breakage doesn’t run. You have to have the new test persist across git checkouts. This is not trivial, because you can’t just exclude your test files from being updated by git bisect, since other tests will also be changing through versions. You need to have your tests always include some non-version controlled file, and you need to have added that include PRIOR to your last known good version.

The only other use case would be if you are making a lot of commits without running tests, so you actually could break a pre-existing test and not know which commit broke it. If that is your situation, you should probably change your workflow to test every commit instead of trying to get git bisect to work.

For these reasons, I have never found ‘git bisect run’ to be as valuable as it seemed when I first learned about it.

jakub_g 8 hours ago

As long as your test suite just finds and runs all files in a given folder (without needing to explicitly "enable" them in some index file), this should work:
- create a NEW test file in `some/path/to/test.ext` (and back it up outside repo just in case)
- do NOT commit it in the repo
- `git bisect`
That way, bisect would check out different commits, but without touching `some/path/to/test.ext` because it's not tracked by git.
It could be also helpful to make git not see some file changes through the diff/status:
`git update-index --assume-unchanged path/to/some/file`
to trick git into thinking the file didn't change. (Although when you checkout a commit which did modify that file, this would crash).
- hughdbrown 8 hours ago
  
  This is exactly what I do: I put my test file outside the git repo's tree. This does not introduce any complications to testing and is minimally annoying.
chw9e 8 hours ago

Another good case is for rolling back a single bad commit from a batch that got merged into main at the same time.
Doing batch merges with a merge queue can speed up things if you have a ton of longer running end to end and integration tests. But then if a test fails you need to identify which commit out of the batch is causing it so you don’t reject the entire batch.
chw9e 8 hours ago

Now with AI test frameworks like stagehand it’s actually possible to write end to end tests after a bug appears that can be backwards compatible as long as changes to the dom are not too extreme. But things like broken selectors won’t be an issue.
I wrote about that here: https://qckfx.com/blog/ai-powered-stagehand-git-bisect-findi...
arccy 8 hours ago

git bisect run can take a script, you can easily script adding a test case and running the test, either to an existing file or a new file.

LegionMammal978 9 hours ago

I've always wished for a "git trisect", or a "git n-sect" in general, that can try multiple commits in parallel. The use case would be for testing changes in software that has a long, single-threaded component in the build process (e.g., a heavily overloaded configure script). For long-running projects where each bisect takes over a dozen steps, those components lead to lots of thumb-twiddling.

andmarios 8 hours ago

The magic of bisect is that you rule out half of your remaining commits every time you run it. So even if you have 1000 commits, it takes at most 10 runs. An n-bisect wouldn't be that much faster, it could be slower because you will not always be able to rule out half your commits.
- LegionMammal978 4 hours ago
  
  The idea is, suppose I did a trisect, splitting the range [start,end) into [start,A), [A,B), and [B,end). At each step, I test commits A and B in parallel. If both A and B are bad, I continue with [start,A). If A is good and B is bad, I continue with [A,B). If both A and B are good, I continue with [B,end).
  This lets me rule out two thirds of the commits, in the same time that an ordinary bisect would have ruled out half. (I'm assuming that the tests don't benefit from having additional cores available.) In general, for an n-sect, you'd test n - 1 commits in parallel, and divide the number of remaining commits by n each time.
- actionfromafar 8 hours ago
  
  Yes, but I could also see the case where you have 10 commits to check, each bisect takes 20 minutes and it takes 40 minutes to find the problem.
  Or 20 minutes if you had 10-'sect.
flir 8 hours ago

I think you'd have to hack it together on a per-project basis, but could you do it with containers? You'd have to identify the point where the build process diverges, make n copies of the container...
(If I'm understanding you correctly).
But there's not much that's faster than a binary search.
- LegionMammal978 4 hours ago
  
  The problem isn't actually running the builds, so much as selecting the commits. Ordinary bisect gives you one commit to test at each step, and one result to report. But I want to be given n - 1 commits at each step, evenly spread out along the search range, and I want to report all n - 1 results at once, to cut the range by a factor of 1/n. If the testing process doesn't utilize all available cores, this will be faster than just testing commits one at a time.

nlunbeck 7 hours ago

> Because ephemeral environments are reproducible on demand (via Docker images, Kubernetes pods, or a cloud VM), you can guarantee that each bisect step sees the same conditions. This drastically reduces "works on my machine" fiascos.

Agree on this pattern for all code changes. Hard to understate the amount of time we've saved by testing against the full prod-like environment right away. An ephemeral env implementation makes this easy and low stakes, so diving right into E2E testing a copy of your real infra isn't wildly unreasonable. However, I work for Shipyard (https://shipyard.build) so I'm a bit biased on these processes.

Ramiro 8 hours ago

Very cool! This is a great example of how ephemeral environments can help for a lot more than just fast inner loops or manual verification.

dagelf 10 hours ago

TL;DR Lazy compute intensive way to find what non-commit change broke your tests... if your tests are any good.

daveguy 9 hours ago

Lazy, or just the value of human time prioritied over the value of computer time?
I'd rather use git bisect over checking a whole bunch of possibilities manually.