Introduction

Resilience4ts is a distributed-first fault tolerance library for TypeScript inspired by resilience4j, Hystrix, and Polly. Following in the footsteps of its namesake resilience4j, Resilience4ts also aims to be a transparent fault-tolerance layer via higher-order functions (decorators). Decorators can be stacked to create reusable pipelines of decorators, and can be applied to any asynchronous function or method.

Modularization

Resilience4ts decorators are modularized into separate packages, each with its own peer dependencies. This allows you to install only the decorators you need, and to avoid installing unnecessary dependencies.

All Core Modules and Pipeline Decorators

@forts/resilience4ts-all

Core Modules

@forts/resilience4ts-bulkhead
@forts/resilience4ts-cache
@forts/resilience4ts-circuit-breaker
@forts/resilience4ts-concurrent-lock
@forts/resilience4ts-concurrent-queue
@forts/resilience4ts-fallback
@forts/resilience4ts-hedge
@forts/resilience4ts-rate-limiter
@forts/resilience4ts-retry
@forts/resilience4ts-timeout

Framework Modules

@forts/resilience4ts-nestjs

Core Concepts

Across all modules under the @forts/resilience4ts namespace, there are a few core concepts that are shared to provide a consistent experience.

Core Configuration

All @forts/resiliencets decorators are backended by the ResilienceProviderService class, which is responsible for providing a unified interface for interacting with the underlying logging and persistence mechanisms. The ResilienceProviderService needs to be initialized prior to using any of the decorators. This can be done by calling the ResilienceProviderService.forRoot, method, which takes in a ResilienceConfig object, or by defining a resilience.toml file in the root of your project.

Example `resilience.toml` file:

[resilience]
serviceName = "my-service"
collectResourceUsage = true
observationInterval = 3000
maxUtilization = 0.9
maxSafeUtilization = 0.75
maxCpuUtilization = 0.9
maxSafeCpuUtilization = 0.75
delimiter = "::"

[redis]
redisHost = "localhost"
redisPort = 6379
redisPrefix = "local"
maxConnectionAttempts = 100
maxBackoff = 3000
maxIncrBackoff = 500

Example `ResilienceConfig` object:

type ResilienceConfig = {
  resilience: {
    serviceName: string;
    serviceVersion?: string;
    delimiter?: string;
    collectResourceUsage?: boolean;
    observationInterval?: number;
    maxUtilization?: number;
    maxSafeUtilization?: number;
    maxCpuUtilization?: number;
    maxSafeCpuUtilization?: number;
  };
  redis: {
  redisHost: string;
  redisPort: number;
  redisPassword?: string;
  redisUser?: string;
  redisPrefix?: string;
  maxConnectionAttempts?: number;
  maxBackoff?: number;
  maxIncrBackoff?: number;
  rejectUnauthorized?: boolean;
  useTls?: boolean;
};
};

Example `ResilienceProviderService.forRoot` call:

import { ResilienceProviderService } from '@forts/resilience4ts-core';

async function bootstrap() {
  svc = ResilienceProviderService.forRoot({
    resilience: {
      serviceName: 'r4t-test',
    },
    redis: {
      redisHost: 'localhost',
      redisPort: 6379,
      redisPassword: 'pwd',
      redisUser: 'user',
      redisPrefix: 'r4t-test',
    },
  });
  await svc.start();
}

bootstrap();

PredicateBuilder

A PredicateBuilder is a function that takes in a Predicate and returns a Predicate. A Predicate is a function that takes in a Context and returns a boolean. In the context of a resilience4ts decorator, the Context is typically the result of the decorated function. PredicateBuilders are commonly used to create Predicates that check the result of the decorated function for a certain value, or to check the Context for a certain value. An example of this can be found in the @forts/resilience4ts-fallback module, where the optional shouldHandle property on the Fallback decorator config takes a PredicateBuilder to determine whether or not the fallback action should be executed based on the result of the decorated function.

import { 
  PredicateBuilder, 
  OperationCancelledException
} from '@forts/resilience4ts-core';
import { Fallback } from '@forts/resilience4ts-fallback';

const fallback = Fallback.of('my-fallback', {
  shouldHandle: new PredicateBuilder()
    .isnot(OperationCancelledException),
  fallbackAction: () => 'fallback',
});

const result = await fallback.on(async () => {
  // do something
})();

Core Modules

Bulkhead

Introduction

Resilience4ts provides two implementations of the bulkhead pattern: Distributed and Instance. The Distributed bulkhead is a distributed-first implementation of the bulkhead pattern, while the Instance bulkhead is an instance-scoped implementation. The Distributed implementation is backed by Redis, and will work across multiple instances of your application. The Instance implementation is backed by a simple in-memory store, and will only limit the number of concurrent executions within a single instance of your application.''

Defaults to Distributed bulkhead.

Create and Configure a Bulkhead

import { Bulkhead } from '@forts/resilience4ts-bulkhead';

const bulkhead = Bulkhead.of('my-bulkhead', {
  maxConcurrentCalls: 10,
  maxWait: 1000,
});

const result = await bulkhead.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
getUniqueId		Function that returns a unique id for the call from the decorated function args.
maxConcurrent	10	Maximum duration in milliseconds that a call is allowed to wait for a permit to be issued.
executionTimeout	1000	Maximum duration in milliseconds that a call is allowed to wait for execution.
maxWait	1000	Maximum duration in milliseconds that a call is allowed to wait for execution.
kind	`BulkheadStrategy.Distributed`	Strategy to use for bulkhead.

Cache

Introduction

Resilience4ts provides decorators for two caching strategies along with a decorator for busting cached values.

Installation

npm i @forts/resilience4ts-cache

Distributed Cache

The DistributedCache decorator is a distributed-first implementation of the cache pattern. It is backed by Redis, and will work across multiple instances of your application.

import { Cache } from '@forts/resilience4ts-cache';

const cache = Cache.of('my-cache', {
  extractKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId, // Function that returns a unique id for the call from the decorated function args.
  ttl: 1000, // Time to live in milliseconds.
  maxCapacity: 100, // Maximum number of entries in the cache.
});

const result = await cache.on(async () => {
  // do something
})();

Request-Scoped Cache

The RequestScopedCache decorator is an instance-scoped implementation of the cache pattern. It is backed by a simple in-memory store, and will only cache values within the lifecycle of a single request. Once the configured RequestContext object is garbage-collected, any cached values under that context will be garbage-collected as well.

import { RequestScopedCache, RequestScopedCacheType } from '@forts/resilience4ts-cache';

const cache = RequestScopedCache.of('my-cache', {
  extractScope: (...args: Parameters<MyDecoratedMethod>) => Record<string, any>, // Function that returns a "scope" to associate with the cache entry from the decorated function args.
  extractKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId, // Function that returns a unique id for the call from the decorated function args.
});

const result = await cache.on(async () => {
  // do something
})();

Cache Buster

The CacheBuster decorator is used to bust cached values and is used as a companion to the distributed @Cache decorator. It can be used to bust one or more cached values based on the result of the decorated function.

import { CacheBuster } from '@forts/resilience4ts-cache';

const cacheBuster = CacheBuster.of('my-cache-buster', {
  invalidatesKeys: (...args: Parameters<MyDecoratedMethod>) => string | string[], // Function that returns a key or list of keys to bust from the cache.
});

const result = await cacheBuster.on(async () => {
  // do something
})();

A CacheBuster can optionally take a PredicateBuilder via the shouldInvalidate property to determine whether or not the cache should be busted based on the result of the decorated function.

import { CacheBuster, PredicateBuilder } from '@forts/resilience4ts-cache';

const cacheBuster = CacheBuster.of('my-cache-buster', {
  invalidatesKeys: (...args: Parameters<MyDecoratedMethod>) => string | string[], // Function that returns a key or list of keys to bust from the cache.
  shouldInvalidate: new PredicateBuilder().isnot(OperationCancelledException), // Optional. Function that returns a boolean to determine whether or not the cache should be busted based on the result of the decorated function.
});

By default, the CacheBuster will only bust the cache if the decorated function does not throw an error.

Circuit Breaker

Introduction

The CircuitBreaker is implemented via a finite state machine with three normal states: CLOSED, OPEN and HALF_OPEN. The CLOSED state is the normal state of the circuit breaker. In this state, the circuit breaker is allowing executions of the decorated function. If the decorated function fails, the circuit breaker will record the failure. If the number of failures exceeds the configured threshold, the circuit breaker will transition to the OPEN state. In the OPEN state, the circuit breaker will not allow executions of the decorated function. After the configured interval has elapsed, the circuit breaker will transition to the HALF_OPEN state. In the HALF_OPEN state, the circuit breaker will allow a configurable number of executions of the decorated function. If all executions succeed, the circuit breaker will transition back to the CLOSED state. If any executions fail, the circuit breaker will transition back to the OPEN state.

Count-based sliding window

The count-based sliding window is implemented with a circular array of N measurements. If the count window size is 10, the circular array has always 10 measurements. The sliding window incrementally updates a total aggregation. The total aggregation is updated when a new call outcome is recorded. When the oldest measurement is evicted, the measurement is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

Time-based sliding window

The time-based sliding window is implemented with a circular array of N partial aggregations (buckets). If the time window size is 10 seconds, the circular array has always 10 partial aggregations (buckets). Every bucket aggregates the outcome of all calls which happen in a certain epoch second. (Partial aggregation). The head bucket of the circular array stores the call outcomes of the current epoch second. The other partial aggregations store the call outcomes of the previous seconds. The sliding window does not store call outcomes individually, but incrementally updates partial aggregations (bucket) and a total aggregation. The total aggregation is updated incrementally when a new call outcome is recorded. When the oldest bucket is evicted, the partial total aggregation of that bucket is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)

Failure Rate Threshold

The state of the CircuitBreaker changes from CLOSED to OPEN when the failure rate is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed. By default all exceptions count as a failure. You can define a list of exceptions which should count as a failure. All other exceptions are then counted as a success, unless they are ignored. Exceptions can also be ignored so that they neither count as a failure nor success.

The failure rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.

Create and Configure a CircuitBreaker

import { CircuitBreaker, CircuitBreakerStrategy } from '@forts/resilience4ts-circuit-breaker';

const circuitBreaker = CircuitBreaker.of('my-circuit-breaker', {
  strategy: CircuitBreakerStrategy.Percentage,
  threshold: 0.5,
  interval: 1000 * 60 * 15,
  minimumFailures: 3,
  whitelist: [],
  circuitConnectionRetries: 3,
  halfOpenLimit: 3,
});

const result = await circuitBreaker.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
strategy	`CircuitBreakerStrategy.Percentage`	Strategy to use for circuit breaker.
threshold	0.5	Threshold for circuit breaker. When `strategy` is `Percentage`-based, this threshold represents the maximum allowable failure rate as a percent. When `strategy` is `Volume`-based, this threshold represents the maximum allowable failures in the configured time window
interval	1000 * 60 * 15	Interval in milliseconds that the circuit breaker will transition to the `HALF_OPEN` state after being in the `OPEN` state.
minimumFailures	3	Minimum number of failures that must be recorded before the circuit breaker can trip open.
whitelist	[]	Error[]. If the decorated method throws an error that is in the whitelist, the circuit breaker will not record it as a failure.
circuitConnectionRetries	3	Number of times to retry connecting to the circuit breaker store.
halfOpenLimit	3	Number of executions allowed in the `HALF_OPEN` state.

Default Circuit Breaker Config

const DefaultCircuitBreakerConfig = {
  strategy: CircuitBreakerStrategy.Percentage,
  threshold: 0.5,
  interval: 1000 * 15,
  minimumFailures: 3,
  whitelist: [],
  circuitConnectionRetries: 3,
  halfOpenLimit: 3,
};

Concurrent Lock

Introduction

The ConcurrentLock module provides a distributed lock implementation. At a high level, there are two reasons why you might want a lock in a distributed application: for efficiency or for correctness [2]. To distinguish these cases, you can ask what would happen if the lock failed:

Efficiency: Taking a lock saves you from unnecessarily doing the same work twice (e.g. some expensive computation). If the lock fails and two nodes end up doing the same piece of work, the result is a minor increase in cost (you end up paying 5 cents more to AWS than you otherwise would have) or a minor inconvenience (e.g. a user ends up getting the same email notification twice).
Correctness: Taking a lock prevents concurrent processes from stepping on each others’ toes and messing up the state of your system. If the lock fails and two nodes concurrently work on the same piece of data, the result is a corrupted file, data loss, permanent inconsistency, the wrong dose of a drug administered to a patient, or some other serious problem.

Installation

npm i @forts/resilience4ts-concurrent-lock

Create and Configure a Lock

import { ConcurrentLock } from '@forts/resilience4ts-concurrent-lock';

const lock = ConcurrentLock.of('my-lock', {
  withKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId,
});

const result = await lock.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
withKey		Function that returns a unique id for the call from the decorated function args.
duration		Duration in milliseconds to wait for the lock to be released.
extensible		Whether the lock is extensible.

Concurrent Queue

Introduction

The ConcurrentQueue decorator wraps a function with a distributed, blocking queue that ensures only one instance of the function is running at a time. If the function is called while another instance is running, the function will be queued and executed when the previous instance completes.

Because the queue is blocking, the caller will wait until the function completes before continuing. If the caller fails to acquire the lock, a QueueWaitExceeded exception will be thrown. The use-cases for this are limited, but it can be useful in some situations so please consider your application's needs before using this module.

Installation

npm i @forts/resilience4ts-concurrent-queue

Create and Configure a Queue

import { ConcurrentQueue } from '@forts/resilience4ts-concurrent-queue';

const queue = ConcurrentQueue.of('my-queue', {
  withKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId,
});

const result = await queue.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
withKey		Function that returns a unique id for the call from the decorated function args.
maxAttempts	10	Maximum number of attempts to acquire the lock and execute the function.
backoff	0.01	Backoff factor to use when retrying to acquire the lock.

Fallback

Introduction

The fallback strategy provides an interface to define a callback that will be executed if the decorated function fails. This strategy is useful when you want to provide a default value or behavior in the event of a failure. For example, you may want to return a cached value or a default value from a configuration file. The fallback strategy is also useful for providing a graceful degradation of functionality when a service is unavailable, although you should consider using the circuit breaker strategy for this purpose as it provides more control over the failure state.

Installation

npm i @forts/resilience4ts-fallback

Create and Configure a Fallback

import { Fallback } from '@forts/resilience4ts-fallback';

const fallback = Fallback.of('my-fallback', {
  fallbackAction: async () => {
    return "my fallback value";
  },
});

const result = await fallback.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
fallbackAction		Function that returns a fallback value or executes a fallback action.
shouldHandle		`PredicateBuilder` that evaluates to a boolean indicating whether the fallback should be executed.

Hedge

Introduction

The hedging strategy enables the re-execution of a user-defined callback if the previous execution takes too long. This approach gives you the option to either run the original callback again or specify a new callback for subsequent hedged attempts. Implementing a hedging strategy can boost the overall responsiveness of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If low latency is not a critical requirement, you may find the retry strategy is more appropriate.

Installation

npm i @forts/resilience4ts-hedge

Create and Configure a Hedge

import { Hedge } from '@forts/resilience4ts-hedge';

const hedge = Hedge.of('my-hedge', {
  maxAttempts: 3,
  delay: 1000,
});

const result = await hedge.on(async () => {
  // do something
})();

Rate Limiter

Introduction

Rate limiting is an imperative technique to prepare your API for scale and establish high availability and reliability of your service. But also, this technique comes with a whole bunch of different options of how to handle a detected limits surplus, or what type of requests you want to limit. You can simply decline this over limit request, or build a queue to execute them later or combine these two approaches in some way.

The @forts/resilience4ts-rate-limiter module provides strategies for Distributed and Instance-scoped rate limiting. The Distributed implementation is backed by Redis, and will work across multiple instances of your application. The Instance implementation will only limit the number of concurrent executions within a single instance of your application.

Installation

npm i @forts/resilience4ts-rate-limiter

Create and Configure a Rate Limiter

import { RateLimiter } from '@forts/resilience4ts-rate-limiter';

const rateLimiter = RateLimiter.of('my-rate-limiter', {
  permitLimit: 10,
  queueLimit: 1000,
  window: 1000,
});

const result = await rateLimiter.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
requestIdentifier		Function that returns a unique id for the call from the decorated function args.
permitLimit	10	Maximum number of permits to issue per window.
queueLimit	1000	Maximum number of requests to queue.
window	1000	Duration in milliseconds that a call is allowed to wait for a permit to be issued.
scope	`RateLimiterStrategy.Distributed`	Strategy to use for rate limiting.

Retry

Introduction

The retry strategy enables the re-execution of a user-defined callback if the previous execution fails. This approach gives you the option to either run the original callback again or specify a new callback for subsequent attempts. Implementing a retry strategy can boost the overall reliability of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If high availability is not a critical requirement, you may find the hedging strategy is more appropriate.

Installation

npm i @forts/resilience4ts-retry

Create and Configure a Retry

import { Retry } from '@forts/resilience4ts-retry';

const retry = Retry.of('my-retry', {
  maxAttempts: 3,
  backoff: 1000,
});

const result = await retry.on(async () => {
  // do something
});

Options

Config Property	Default Value	Description
wait	500	Wait in milliseconds before retrying.
maxAttempts	3	Maximum number of attempts to retry.
maxInterval	60000	Maximum wait in milliseconds between retries.
retryMode	`RetryMode.Linear`	Strategy to use for calculating backoff.
validateResult		Function returning a boolean indicating whether the result should be retried.
whitelistErrors		Array of errors that should be ignored, skipping retry.
onRuntimeError		Callback function to execute when an error occurs.

Timeout

The timeout module provides a way to limit the amount of time a function may take to execute. If the function does not complete within the specified time, the module will throw an error.

Installation

npm i @forts/resilience4ts-timeout

Create and Configure a Timeout

import { Timeout } from '@forts/resilience4ts-timeout';

const timeout = Timeout.of('my-timeout', {
  timeout: 1000,
});

const result = await timeout.on(async () => {
  // do something
})();

Options

Config Property	Default Value	Description
timeout		Timeout in milliseconds.

Frameworks

NestJS

@forts/resilience4ts-nestjs

Getting started with resilience4ts + NestJS

Introduction

While resilience4ts works well as a standalone library, it also provides a set of decorators for NestJS. These decorators can be used to decorate any NestJS controller or service method. @forts/resilience4ts-nestjs wraps all the core resilience4ts decorators plus the @forts/resilience4ts-all decorator into a single package, and re-exports all of them along with convenient method decorators for use with NestJS controllers and services.

Installation

npm i @forts/resilience4ts-nestjs

Adding Decorators to a NestJS Injectable Service

Taken from the NestJS example

import {
  Bulkhead,
  CircuitBreaker,
  Fallback,
  CircuitBreakerImpl,
  CircuitBreakerStrategy,
} from '@forts/resilience4ts-nestjs';
import { Inject, Injectable, UnauthorizedException } from '@nestjs/common';
import { AppGateway } from './app.gateway';

type HelloWorldArgs = {
  id: string;
};

@Injectable()
export class AppService {
  constructor(
    @Inject('AppGateway')
    private readonly appGateway: AppGateway,
  ) {}

  // decorators can be stacked, and will be applied in the order they are listed
  @Bulkhead({
    getUniqueId: (args: HelloWorldArgs) => args.id,
    maxConcurrent: 1,
    maxWait: 250,
  })
  @Fallback({
    shouldHandle: new PredicateBuilder(UnauthorizedException).or(
      (e: Error) => e.message === 'asdf',
    ),
    fallbackAction: async () => 'fallback',
  })
  @CircuitBreaker({
    strategy: CircuitBreakerStrategy.Percentage,
    threshold: 0.2,
  })
  async getHello(args: Record<'id', string>) {
    // The original, functional decorators are also available
    // To use them import them as their name + Impl
    // e.g. CircuitBreakerImpl, CacheImpl, etc.
    return await CircuitBreakerImpl.of('gateway.call', {
      strategy: CircuitBreakerStrategy.Percentage,
      threshold: 0.2,
    }).on(this.appGateway.getHello)(args);
  }
}