StaleBots are fine, NudgeBots are better

I don’t know how we got here, but the idea of a GitHub StaleBot has somehow become a controversial one over the past few years.

This might be a vocal minority, as most of the devs I talk to day-to-day don’t appear to care one iota about how projects triage their tickets. The ones that do care seem to REALLY care, though.

What even is a StaleBot?

The default GitHub stale action will iterate through the open issues and pull requests in a repository and label them as stale after 60 days, then close them after 7 more days if there are no comments/updates.

It's a very well documented and configurable action with around 55 configuration options.

Instead of running the action when a specific repository event occurs, this is the sort of action that will likely run daily using the schedule event. There are a limited number of per-run operations, so if you have even a mild backlog, this action will take a few days to iterate through them before starting at the beginning. This action only became stateful in version 9.x about 2 years ago.

I literally don’t care

I’m solidly in the group of devs who don’t particularly care how other projects/teams manage their issue backlogs. Especially if I’m not contributing my time or money to their project. When there is something about a project that I don't like - a project that I'm basically freeloading, mind you - I’ll either fork it and fix it, look for an alternative, or build a subset. That’s not always possible - but I’d wager it should be more common than it is.

And just to get ahead of the:

well are you going to make a new browser, or new language, or new OS?

Or other strawman examples... The philosophy of "find another or build my own" is for the 80% of random used-in-one-place dependencies or micro-libraries which somehow creep into a project over the years, not the 20% of foundational bedrock. Am I going swap out the Tokio runtime or re-write the Linux kernel? Obviously not... Will I drop a procedural macro library that was used in one debugging message? Of course. Will I vendor left-pad? ... Well, I wouldn't ever pull this into my code intentionally, but I think my point is made.

It's always Microsoft

As with everything, there are caveats with what I just wrote. But, instead of listing the caveats directly, I’ll give the example of what I consider to be the worst stale-bot around. Unsurprisingly, it belongs to Microsoft. Because of course it does.

OfficeJS is an extension framework to allow developers to create plugins for Microsoft Office products. It’s a neat idea and I was using it when it was very rough around the edges, but it was still pretty reasonable overall. The Microsoft maintainers were responsive to issues, but not responsive with fixes. I made a handful of issues and some were resolved, some just abandoned, some were maybe fixed or maybe closed as not planned - hard to tell.

The stalebot, though, that was a piece of work. More accurately, stalebot(s), as they went through several names over the years. Here is the message you get today when your issue is marked stale:

This issue has been automatically marked as stale because it is marked as needing author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. Thank you for your interest in Office Add-ins!

That’s right. 4 days to reply, 3 more days before it’s closed. So, never go on vacation with an open ticket...

But hold on, this is the microsoft-github-policy-service bot. I've never interacted with that. Back in my day it had a different name. They must have updated this policy... Let me check out one of my issues from 4 years ago:

This issue has been automatically marked as stale because it is marked as needing author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. Thank you for your interest in Office Add-ins!

Oh damn, deleted bot (ghost) but exact same stale comment.

The first stalebot comment I could find in that repo goes back to August 2019 with the exact same message. I'd bet Microsoft has even earlier instances of it in other repositories but I don't care enough to look.

To be completely fair, this stalebot is strictly an annoyance, since you can bump or re-open issues. But it is the prototypical example I give of how not to make a stalebot. In fact, I think it's the ONLY example I personally have of a bad stalebot. Here's why:

  • I am a paying customer of the product/service (MS Office)
  • The product/services (MS Office and the OfficeJS internal bindings) are closed source so I can’t fork, I can't vendor, I can’t PR
  • The maintainers are employees/contractors paid by Microsoft to work on this project
  • There can be months/years before a fix is issued, and then more months/years before a fix makes it into my hands due to release cycles
  • 4 day stale + 3 day close is incredibly aggressive, given the fix and release timeline

If any of those bullet points change in the future, it would prompt me to re-consider my position. But I don't anticipate any of that ever changing.

NudgeBot

The reason any of this is relevant is because I recently added similar functionality to Pants. Even though I don’t think stalebots are inherently bad, I decided to ease into the underlying problem of spare-time volunteers triaging 1100 open tickets spanning 8 years and a major re-write.

So, instead of a "stale bot", I decided to build something closer to a "nudge bot". It entirely ignores pull requests (which are a more tractable problem). For issues, it does not automatically close anything. Instead, it applies a stale label and a comment asking the author for action if there hasn't been activity for over a year, AND if the issue is not already labeled as a bug.

This issue has been open for over one year without activity and is not labeled as a bug. It has been labeled as stale to invite any further updates. If you can confirm whether the issue is still applicable to the latest version of Pants, appears in other contexts, or its priority has changed, please let us know. Please feel free to close this issue if it is no longer relevant.

I “borrowed” the idea from OCaml with the drawn-out stale duration.

NudgeBot's Github Action

# https://github.com/actions/stale

name: Label inactive issues
on:
  schedule:
    - cron: "42 4 * * *"
  workflow_dispatch:

jobs:
  label-inactive-issues:
    runs-on: ubuntu-latest
    if: github.repository_owner == 'pantsbuild'
    permissions:
      issues: write
    steps:
      - uses: actions/stale@v10
        with:
          days-before-issue-stale: 366
          exempt-issue-labels: 'bug'

          stale-issue-label: 'stale'
          stale-issue-message: >
            This issue has been open for over one year without activity and is not labeled as a bug. 
            It has been labeled as stale to invite any further updates. 
            If you can confirm whether the issue is still applicable to the latest version of Pants, 
            appears in other contexts, or its priority has changed, please let us know. 
            Please feel free to close this issue if it is no longer relevant.
          
          # Don't close Issues
          days-before-issue-close: -1

          # Don't touch PRs
          days-before-pr-stale: -1
          days-before-pr-close: -1

          operations-per-run: 100
          repo-token: ${{ secrets.GITHUB_TOKEN }}

Results so far

As we have so many open issues, and I’m slow-rolling the labeling, only a few hundred issues have been processed so far, and 97 have been marked stale, 5 have been closed, and about 8 have been since updated.

We’ve only processed about 2 years of issues, out of 8 years total. The closed tickets appear to be ones that were previously completed but their associated tickets weren’t closed off.

A nice benefit, for me personally, is that I’m reminded of issues that I’ve been involved with that I’ve just forgotten over time. Not just ones I’ve created, but ones I’ve been interested in or an active contributor to. Could I do this by looking for open tickets where I'm a commenter? Yes - for sure. Will I? Probably not. That link was the first time I've ever done it.

Next steps

I’m going to let this NudgeBot run for a few months to see how the tickets naturally shake themselves out, and I’ve been periodically reviewing the stale tickets to see if there is anything we’ve already completed. I anticipate that when we start hitting tickets older than 4-5 years, there will be more attrition - since the code has gone through so much churn, re-factoring, and some major changes in that time.

I was going to add more automation to check whether the next-action is on the maintainers or the author, but just using who "last commented" is a sledgehammer, as it doesn't take the content of the comment into question. Since we're not currently auto-closing anything and eventually someone will triage stale tickets, I don't think it's worth the brainpower to update the automation yet.

My off-the-cuff guess is that the total number of actionable, valid tickets is closer to the 500-700 range (at most). That’s still a lot of tickets, but would also mean roughly 50% of tickets are open just for the sake of being open.