A minimalistic test runner

Published: 2022-11-30 by Lars  tooltestcode

This is part 1 of Build your own test runner.

Registering and running individual tests

A test is a piece of code that either fails or succeeds. Here is an example (still too simple to run). It calls an add-function, which is called the code under test.

if (add(2, 2) !== 4) throw new Error();

If the add-function is implemented like this, the test should succeed:

function add(a, b) {
  return a + b;
}

A test runner needs to know which code to treat as "tests", and most test runners provide syntax to register a test and provide a descriptive name. Here we will use a registration function named it (a popular name among modern test runners):

it("should calculate the sum", () => {
  if (add(2, 2) !== 4) throw new Error();
});

Our very first primitive test runner could thus be implemented like this:

const testList = [];
function it(name, fn) {
  testList.push({ name, fn });
}
const run = () => {
  for (const test of testList) test.fn();
};
run();

Here we leverage the fact that JavaScript provides first-class functions, allowing us to keep a list of the test-functions fn in a list. In other languages without first-class functions, we might have to use reflection, if available. If all else fails we can fall back to a switch statement calling the right test function for a given test name.

Putting this code into a single run-tests.js file, we can run it and see it succeed with:

node run-tests.js

If we introduce a bug in the add function and re-run the test we will get an error message:

  if (add(2, 2) !== 4) throw new Error();
                             ^

Note that for this first step we simply put the test, the code under test, and the test runner all in the same source file. We will continue this approach for the next few sections, and then properly separate out things in the section on "Loading test files".

Failing tests

In the example in the previous section, the error message given when a test fails is not very informative: for example, it would be useful if it would show the actual return value (the result of add(2, 2)).

There is an entire class of existing tools to help with that: an assertion library. For Node.js there is a built-in library node:assert. Other popular libraries include Chai and Unexpected. Some test runners, such as Jest and Vitest, comes with their own assertion library built in.

Rewriting our test from before using node:assert looks like this:

it("should calculate the sum", () => {
  assert.equal(add(2, 2), 4);
});

Assertion libraries implements a well-defined contract that allows them to work with any test runner. Basically, when the assertion is not met, they throw a specific exception AssertionError providing an informative message. This allows test runners to distinguish between test failures (a failed assertion) and test errors (any other exception).

To be minimalistic, we want our test runner to support existing assertion libraries, instead of adding our own.

We need to ensure that our test runner will continue to run the remaining tests, even when one or more tests are failing, which we neglected in the first primitive version above. This requires us to catch the exception thrown by a failing test, and do something with it, for example write out the message. However, test runners are used in many different environments (e.g. in a terminal, from an IDE, on a CI-server), and we want different output depending on the environment. So to avoid hard-coding a specific output format, we will supply a reporting module. Unfortunately there is no well-defined API for test runner reporting modules, so here we will come up with our own simple API.

A special form of output that all existing test runners provide is the exit code of the test runner process, which by convention is the number of failing tests, so 0 (which indicates success to the calling process) if all tests succeed.

We can extend our test runner to catch exceptions and telling a reporter module about it by changing the run method to this:

function run(reporter) {
  for (const test of testList) {
    const { name, fn } = test;
    try {
      fn();
      reporter({ type: "success", name });
    } catch (ex) {
      if (ex instanceof AssertionError) {
        const { message } = ex;
        reporter({ type: "failure", name, message });
      } else {
        reporter({ type: "error", name, message: inspect(ex) });
      }
    }
  }
}

A simple reporter module that just outputs messages can be added like this:

function consoleReporter(event) {
  const { type, name, message } = event;
  switch (type) {
    case "error":
      console.log("!", name, message);
      break;
    case "failure":
      console.log("x", name, message);
      break;
    case "success":
      console.log("✔", name);
      break;
  }
}

To produce the exit code, we need to count the errors and failures. We can create another reporter module for that, so we won't have to impact the run method any further:

let failureCount = 0;
const failureAggregator = ({ type }) => {
  if (["failure", "error"].includes(type)) ++failureCount;
};

Since we now have 2 reporter modules, we also need a way to combine a list of reporters into a single reporter, again to keep the run method unchanged:

const combineReporters = (reporters) => (event) => {
  reporters.forEach((report) => report(event));
};

Then we need to invoke the run method with the combined reporters:

const reporter = combineReporters([consoleReporter, failureAggregator]);
run(reporter);

And finally we can exit the process with the correct exit code:

process.exit(failureCount);

To demonstrate the error handling, let's also add a failing test:

it("should fail", () => {
  assert.equal(add(2, 2), 5);
});

Combining all this into report-failures.js we can run it (here using Bash) with:

node report-failures.js; echo "exit code is $?"

and we will get this output:

x should fail Expected values to be strictly equal:

4 !== 5

✔ should calculate the sum
exit code is 1

Waiting for asynchronous tests

Until now, we have only tested synchronous code, but we should also be able to test asynchronous code that needs to be await'ed. An example of asynchronous code could be a timer which asynchronously sends a "ring" event when the specified time has expired:

function createTimer(ms) {
  const timer = new EventEmitter();
  setTimeout(() => timer.emit("ring"), ms);
  return timer;
}

A simple test that a "ring" event is actually emitted, can then be written using await:

it("should eventually ring", async () => {
  const timer = createTimer(50);
  await new Promise((resolve) => timer.on("ring", resolve));
});

Now we will have to turn our run method asynchronous, so it can wait for such an async test. To know whether a test is asynchronous or not, we can use the built-in isPromise function to inspect the value returned from the test function. (The exception handling code from the previous section has been left out here for brevity).

async function run() {
  for (const test of testList) {
    const result = test.fn();
    if (isPromise(result)) {
      await result;
    }
  }
}

And we will have to await the call of the run method:

await run();

Combining this with the code from the previous section, we get await-async.js.

Running this will succeed, and to see that the test runner actually waits for the test to complete, you can extend the timer to 5000 milliseconds instead of just 50.

Grouping tests

Having hundreds of tests in one long list quickly becomes impractical. Test runners must also allow tests to be grouped into test suites, where suites can be nested inside each other. Usually developers like to make the structure of test suites follow the structure of their code, so test suites corresponds to name spaces, source code folders, classes, etc. As an example, we might want to group our previous tests into two small suites, like this:

describe("add", () => {
  it("should calculate the sum", () => {
    assert.equal(add(2, 2), 4);
  });
});

describe("createTimer", () => {
  it("should eventually ring", async () => {
    const timer = createTimer(50);
    await new Promise((resolve) => timer.on("ring", resolve));
  });
});

To implement support for this we will need to make three main changes in our test runner:

  1. Registration of tests will no longer simply build a list, but a tree structure reflecting the nested suites.
  2. The run function will become recursive, so it can traverse this tree of nested suites.
  3. When sending events to reporters, we will no longer identify a test with just its name, but with the full list of names of the test and all its parent suites.

First, when building the tree of tests, we will need to distinguish between suites and tests, and will therefore introduce a type field. We will also need to keep track of what the currentDescribe is, so that inner tests are added in the right place in the tree. To ensure that there is always only a single root node, we will crete an implicit, name-less, outer "describe" that contains everything else.

const root = { type: "describe", name: "", testList: [] };
let currentDescribe = root;

function it(name, fn) {
  const it = { type: "it", name, fn };
  currentDescribe.testList.push(it);
}

function describe(name, fn) {
  const describe = { type: "describe", name, testList: [] };
  currentDescribe.testList.push(describe);
  const previousDescribe = currentDescribe;
  currentDescribe = describe;
  fn();
  currentDescribe = previousDescribe;
}

Second, when running the tests, we will traverse down suites to find the tests to run, keeping track of the list of parent tests, as we go. Note that we skip the blank name of the implicit root "describe" when sending events to the reporter. (Again error handling has been left out here for brevity):

async function run(reporter) {
  await runTest(reporter, root, [root]);
}

async function runTest(reporter, test, parentTests) {
  const { type } = test;
  switch (type) {
    case "describe": {
      const { testList } = test;
      for (const childTest of testList) {
        await runTest(reporter, childTest, [...parentTests, childTest]);
      }
      break;
    }
    case "it": {
      const { fn } = test;
      const names = parentTests.slice(1).map(({ name }) => name);
      const result = fn();
      if (isPromise(result)) {
        await result;
      }
      reporter({ type: "success", names });
      break;
    }
  }
}

Finally, our console reporter should now simply concatenate the list of names, like this:

function consoleReporter(event) {
  const { type, names, message } = event;
  const fullName = names.join(" - ");
  switch (type) {
    case "error":
      console.log("!", fullName, message);
      break;
    case "failure":
      console.log("x", fullName, message);
      break;
    case "success":
      console.log("✔", fullName);
      break;
  }
}

Combining all these changes with the code from the previous section, we get group-suites.js, and when running it we get this output:

✔ add - should calculate the sum
✔ createTimer - should eventually ring

Setting up and tearing down

This section is yet to be written. An example implementation can be seen here.

Loading test files

Until now we have kept our test runner code in the same file as our test code and the code under test. This is obviously not going to work for a real test runner. We want to be able to have test code in test files separately from the code under test, and especially to have the test runner as a completely separate code base.

So an important feature of a test runner is that it can load test files and code under test from other files. We will also want to let the user specify which test files to load, so that different runs of the test runner can load different test files.

Until now, by keeping everything in a single file, we have been able to ignore an issue of circular dependencies: Our tests depend on the test runner (for the definition of describe and it) and the test runner depend on the tests (otherwise it would have nothing to run). Now we need to solve that, by extracting the definition of describe and it and the root of the data structure we build. Then both the test files, and the test runner can import these definitions, and we no longer have a circular dependency. We will put the definitions into index.js, and no changes are needed to the definitions themselves.

We will then extract our tests and the code under test into 4 proper files:

To tell the test runner which test files to load, we will let the user provide the names of the test files on the command-line, which means that we can get an array of test file paths by adding this line to the test runner:

const testFiles = process.argv.slice(2);

To actually load the files we can leverage asynchronous module import in Node.js:

for (const path of testFiles) {
  await import(pathToFileURL(path).href);
}

Combining these simple additions with the code from the previous section, we get load-files.js, and we can run it with the names of our 2 test files like this:

node load-files.js calc.test.js timer.test.js

Running tests in parallel

Until now, our run function will run a single test at a time, waiting for it to complete before running the next test. On modern computers with multiple CPUs this is not very time efficient. We want to be able to run the tests in parallel, potentially speeding up a full test run by a factor close to the number of CPUs.

This implementation will be slightly more involved compared to the features we implemented above. We will use Node.js worker threads to run each test file in their own thread. In a simple attempt to maximize CPU utilization, we will use the p-queue library to limit the number of concurrent threads to the number of CPUs on the machine, as reported by os.cpus().

To use worker threads, we need a worker.js file to use as the entry point for each thread. The main program can send data to the worker thread via the built-in workerData variable. The thread can send back events to the main program via the built-in postMessage() method and we will utilize our Reporter interface to create a reporter to do just that. We will move the run method into worker.js and call it like this, after using the asynchronous module import to load the test code into the worker thread:

const path = workerData;
await import(pathToFileURL(path).href);
const postReporter = (event) => parentPort?.postMessage(event);
await run(postReporter);

To create a worker thread for each test, we need to do a number of things: figure out the number of CPUs and create a queue to throttle the number of running threads; then for each test file, we will create a worker thread for that file, and forward any events from it to the actual reporters; and finally wait for all the worker threads to complete their work.

We can get the number of CPUs and a queue that limits the number of running threads with this:

const concurrency = os.cpus().length;
const queue = new PQueue({ concurrency });

To create a worker, we need a URL to the worker.js file:

const workerUrl = new URL("./worker.js", import.meta.url);

To loop over all the test files and get a list of promises to wait for, we can do this:

const threads = [];
for (const path of testFiles) {
  const thread = queue.add(/* create worker for "path", see code below */);
  threads.push(thread);
}
await Promise.all(threads);

To create a worker to process a specified test file path and forward the posted events to the actual reporters, we will create an instance of the built-in Worker class, and pass in the path of the test file as its workerData. The Worker class implements the EventEmitter interface to send error and exit events as well as the custom message event that we use for our own reporter events. To be able to wait for the worker to complete, either successfully or in an error, we return a promise. The promise will be resolved when the worker exits with code 0, and the promise will be rejected when the worker fails either through an error event or through a non-zero exit code.

async () => {
  const workerData = path;
  const worker = new Worker(workerUrl, { workerData });
  return new Promise((resolve, reject) => {
    worker.on("message", reporter);
    worker.on("error", reject);
    worker.on("exit", (code) => {
      if (code === 0) {
        resolve();
      } else {
        reject(new Error(`Worker stopped with exit code ${code}`));
      }
    });
  });
};

Combining these refactorings and additions with our previous code, we have now created an implementation spanning these two files (plus index.js, with it and describe definitions, which is unchanged):

To verify the speed-up, we create a number of simple and equivalent tests which all take a second to run, see a.test.js:

describe("a", () => {
  it("should take a second", async () => {
    await new Promise((resolve) => setTimeout(resolve, 1000));
  });
});

When running with no concurrency, 8 tests take a little more than 8 seconds to run (here using Bash):

$ time ls -1 tests/*.test.js | xargs node run-concurrently.js
✔ a - should take a second
✔ b - should take a second
✔ c - should take a second
✔ d - should take a second
✔ e - should take a second
✔ f - should take a second
✔ g - should take a second
✔ h - should take a second

real    0m8.189s

When running with full concurrency on a machine with 8 CPUs, the 8 tests take only slightly more than 1 second to run:

$ time ls -1 tests/*.test.js | xargs node run-concurrently.js
✔ g - should take a second
✔ d - should take a second
✔ e - should take a second
✔ a - should take a second
✔ b - should take a second
✔ h - should take a second
✔ f - should take a second
✔ c - should take a second

real    0m1.202s

This completes our minimalistic test runner. Every other feature that we might want a full-blown test runner to have, can be implemented with existing tools and libraries and combined with the minimalistic test runner presented here. You can read more about how to do that in Part 2.

Discuss on Twitter