Open source repositories tagged with #terminal-bench, ranked by health score.
Harbor is a framework for running agent evaluations and creating and using RL environments.