Introduction
Resilience4ts is a distributed-first fault tolerance library for TypeScript inspired by resilience4j, Hystrix, and Polly. Following in the footsteps of its namesake resilience4j, Resilience4ts also aims to be a transparent fault-tolerance layer via higher-order functions (decorators). Decorators can be stacked to create reusable pipelines of decorators, and can be applied to any asynchronous function or method.
Modularization
Resilience4ts decorators are modularized into separate packages, each with its own peer dependencies. This allows you to install only the decorators you need, and to avoid installing unnecessary dependencies.
All Core Modules and Pipeline Decorators
- @forts/resilience4ts-all
Core Modules
- @forts/resilience4ts-bulkhead
- @forts/resilience4ts-cache
- @forts/resilience4ts-circuit-breaker
- @forts/resilience4ts-concurrent-lock
- @forts/resilience4ts-concurrent-queue
- @forts/resilience4ts-fallback
- @forts/resilience4ts-hedge
- @forts/resilience4ts-rate-limiter
- @forts/resilience4ts-retry
- @forts/resilience4ts-timeout
Framework Modules
- @forts/resilience4ts-nestjs
Core Concepts
Across all modules under the @forts/resilience4ts
namespace, there are a few core concepts that are shared to provide a consistent experience.
Core Configuration
All @forts/resiliencets
decorators are backended by the ResilienceProviderService
class, which is responsible for providing a unified interface for interacting with the underlying logging and persistence mechanisms. The ResilienceProviderService
needs to be initialized prior to using any of the decorators. This can be done by calling the ResilienceProviderService.forRoot
, method, which takes in a ResilienceConfig
object, or by defining a resilience.toml
file in the root of your project.
Example resilience.toml
file:
[resilience]
serviceName = "my-service"
collectResourceUsage = true
observationInterval = 3000
maxUtilization = 0.9
maxSafeUtilization = 0.75
maxCpuUtilization = 0.9
maxSafeCpuUtilization = 0.75
delimiter = "::"
[redis]
redisHost = "localhost"
redisPort = 6379
redisPrefix = "local"
maxConnectionAttempts = 100
maxBackoff = 3000
maxIncrBackoff = 500
Example ResilienceConfig
object:
type ResilienceConfig = {
resilience: {
serviceName: string;
serviceVersion?: string;
delimiter?: string;
collectResourceUsage?: boolean;
observationInterval?: number;
maxUtilization?: number;
maxSafeUtilization?: number;
maxCpuUtilization?: number;
maxSafeCpuUtilization?: number;
};
redis: {
redisHost: string;
redisPort: number;
redisPassword?: string;
redisUser?: string;
redisPrefix?: string;
maxConnectionAttempts?: number;
maxBackoff?: number;
maxIncrBackoff?: number;
rejectUnauthorized?: boolean;
useTls?: boolean;
};
};
Example ResilienceProviderService.forRoot
call:
import { ResilienceProviderService } from '@forts/resilience4ts-core';
async function bootstrap() {
svc = ResilienceProviderService.forRoot({
resilience: {
serviceName: 'r4t-test',
},
redis: {
redisHost: 'localhost',
redisPort: 6379,
redisPassword: 'pwd',
redisUser: 'user',
redisPrefix: 'r4t-test',
},
});
await svc.start();
}
bootstrap();
PredicateBuilder
A PredicateBuilder
is a function that takes in a Predicate
and returns a Predicate
. A Predicate
is a function that takes in a Context
and returns a boolean
. In the context of a resilience4ts decorator, the Context
is typically the result of the decorated function. PredicateBuilder
s are commonly used to create Predicate
s that check the result of the decorated function for a certain value, or to check the Context
for a certain value. An example of this can be found in the @forts/resilience4ts-fallback
module, where the optional shouldHandle
property on the Fallback
decorator config takes a PredicateBuilder
to determine whether or not the fallback action should be executed based on the result of the decorated function.
import {
PredicateBuilder,
OperationCancelledException
} from '@forts/resilience4ts-core';
import { Fallback } from '@forts/resilience4ts-fallback';
const fallback = Fallback.of('my-fallback', {
shouldHandle: new PredicateBuilder()
.isnot(OperationCancelledException),
fallbackAction: () => 'fallback',
});
const result = await fallback.on(async () => {
// do something
})();
Core Modules
- Bulkhead
- Cache
- Circuit Breaker
- Concurrent Lock
- Concurrent Queue
- Fallback
- Hedge
- Rate Limiter
- Retry
- Timeout
Bulkhead
Introduction
Resilience4ts provides two implementations of the bulkhead pattern: Distributed
and Instance
. The Distributed
bulkhead is a distributed-first implementation of the bulkhead pattern, while the Instance
bulkhead is an instance-scoped implementation. The Distributed
implementation is backed by Redis, and will work across multiple instances of your application. The Instance
implementation is backed by a simple in-memory store, and will only limit the number of concurrent executions within a single instance of your application.''
Defaults to Distributed
bulkhead.
Create and Configure a Bulkhead
import { Bulkhead } from '@forts/resilience4ts-bulkhead';
const bulkhead = Bulkhead.of('my-bulkhead', {
maxConcurrentCalls: 10,
maxWait: 1000,
});
const result = await bulkhead.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
getUniqueId | Function that returns a unique id for the call from the decorated function args. | |
maxConcurrent | 10 | Maximum duration in milliseconds that a call is allowed to wait for a permit to be issued. |
executionTimeout | 1000 | Maximum duration in milliseconds that a call is allowed to wait for execution. |
maxWait | 1000 | Maximum duration in milliseconds that a call is allowed to wait for execution. |
kind | BulkheadStrategy.Distributed | Strategy to use for bulkhead. |
Cache
Introduction
Resilience4ts provides decorators for two caching strategies along with a decorator for busting cached values.
Installation
npm i @forts/resilience4ts-cache
Distributed Cache
The DistributedCache
decorator is a distributed-first implementation of the cache pattern. It is backed by Redis, and will work across multiple instances of your application.
import { Cache } from '@forts/resilience4ts-cache';
const cache = Cache.of('my-cache', {
extractKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId, // Function that returns a unique id for the call from the decorated function args.
ttl: 1000, // Time to live in milliseconds.
maxCapacity: 100, // Maximum number of entries in the cache.
});
const result = await cache.on(async () => {
// do something
})();
Request-Scoped Cache
The RequestScopedCache
decorator is an instance-scoped implementation of the cache pattern. It is backed by a simple in-memory store, and will only cache values within the lifecycle of a single request. Once the configured RequestContext
object is garbage-collected, any cached values under that context will be garbage-collected as well.
import { RequestScopedCache, RequestScopedCacheType } from '@forts/resilience4ts-cache';
const cache = RequestScopedCache.of('my-cache', {
extractScope: (...args: Parameters<MyDecoratedMethod>) => Record<string, any>, // Function that returns a "scope" to associate with the cache entry from the decorated function args.
extractKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId, // Function that returns a unique id for the call from the decorated function args.
});
const result = await cache.on(async () => {
// do something
})();
Cache Buster
The CacheBuster
decorator is used to bust cached values and is used as a companion to the distributed @Cache
decorator. It can be used to bust one or more cached values based on the result of the decorated function.
import { CacheBuster } from '@forts/resilience4ts-cache';
const cacheBuster = CacheBuster.of('my-cache-buster', {
invalidatesKeys: (...args: Parameters<MyDecoratedMethod>) => string | string[], // Function that returns a key or list of keys to bust from the cache.
});
const result = await cacheBuster.on(async () => {
// do something
})();
A CacheBuster
can optionally take a PredicateBuilder
via the shouldInvalidate
property to determine whether or not the cache should be busted based on the result of the decorated function.
import { CacheBuster, PredicateBuilder } from '@forts/resilience4ts-cache';
const cacheBuster = CacheBuster.of('my-cache-buster', {
invalidatesKeys: (...args: Parameters<MyDecoratedMethod>) => string | string[], // Function that returns a key or list of keys to bust from the cache.
shouldInvalidate: new PredicateBuilder().isnot(OperationCancelledException), // Optional. Function that returns a boolean to determine whether or not the cache should be busted based on the result of the decorated function.
});
By default, the CacheBuster
will only bust the cache if the decorated function does not throw an error.
Circuit Breaker
Introduction
The CircuitBreaker is implemented via a finite state machine with three normal states: CLOSED
, OPEN
and HALF_OPEN
. The CLOSED
state is the normal state of the circuit breaker. In this state, the circuit breaker is allowing executions of the decorated function. If the decorated function fails, the circuit breaker will record the failure. If the number of failures exceeds the configured threshold, the circuit breaker will transition to the OPEN
state. In the OPEN
state, the circuit breaker will not allow executions of the decorated function. After the configured interval has elapsed, the circuit breaker will transition to the HALF_OPEN
state. In the HALF_OPEN
state, the circuit breaker will allow a configurable number of executions of the decorated function. If all executions succeed, the circuit breaker will transition back to the CLOSED
state. If any executions fail, the circuit breaker will transition back to the OPEN
state.
Count-based sliding window
The count-based sliding window is implemented with a circular array of N measurements. If the count window size is 10, the circular array has always 10 measurements. The sliding window incrementally updates a total aggregation. The total aggregation is updated when a new call outcome is recorded. When the oldest measurement is evicted, the measurement is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)
Time-based sliding window
The time-based sliding window is implemented with a circular array of N partial aggregations (buckets). If the time window size is 10 seconds, the circular array has always 10 partial aggregations (buckets). Every bucket aggregates the outcome of all calls which happen in a certain epoch second. (Partial aggregation). The head bucket of the circular array stores the call outcomes of the current epoch second. The other partial aggregations store the call outcomes of the previous seconds. The sliding window does not store call outcomes individually, but incrementally updates partial aggregations (bucket) and a total aggregation. The total aggregation is updated incrementally when a new call outcome is recorded. When the oldest bucket is evicted, the partial total aggregation of that bucket is subtracted from the total aggregation and the bucket is reset. (Subtract-on-Evict)
Failure Rate Threshold
The state of the CircuitBreaker changes from CLOSED
to OPEN
when the failure rate is equal or greater than a configurable threshold. For example when more than 50% of the recorded calls have failed.
By default all exceptions count as a failure. You can define a list of exceptions which should count as a failure. All other exceptions are then counted as a success, unless they are ignored. Exceptions can also be ignored so that they neither count as a failure nor success.
The failure rate can only be calculated, if a minimum number of calls were recorded. For example, if the minimum number of required calls is 10, then at least 10 calls must be recorded, before the failure rate can be calculated. If only 9 calls have been evaluated the CircuitBreaker will not trip open even if all 9 calls have failed.
Create and Configure a CircuitBreaker
import { CircuitBreaker, CircuitBreakerStrategy } from '@forts/resilience4ts-circuit-breaker';
const circuitBreaker = CircuitBreaker.of('my-circuit-breaker', {
strategy: CircuitBreakerStrategy.Percentage,
threshold: 0.5,
interval: 1000 * 60 * 15,
minimumFailures: 3,
whitelist: [],
circuitConnectionRetries: 3,
halfOpenLimit: 3,
});
const result = await circuitBreaker.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
strategy | CircuitBreakerStrategy.Percentage | Strategy to use for circuit breaker. |
threshold | 0.5 | Threshold for circuit breaker. When strategy is Percentage -based, this threshold represents the maximum allowable failure rate as a percent. When strategy is Volume -based, this threshold represents the maximum allowable failures in the configured time window |
interval | 1000 * 60 * 15 | Interval in milliseconds that the circuit breaker will transition to the HALF_OPEN state after being in the OPEN state. |
minimumFailures | 3 | Minimum number of failures that must be recorded before the circuit breaker can trip open. |
whitelist | [] | Error[]. If the decorated method throws an error that is in the whitelist, the circuit breaker will not record it as a failure. |
circuitConnectionRetries | 3 | Number of times to retry connecting to the circuit breaker store. |
halfOpenLimit | 3 | Number of executions allowed in the HALF_OPEN state. |
Default Circuit Breaker Config
const DefaultCircuitBreakerConfig = {
strategy: CircuitBreakerStrategy.Percentage,
threshold: 0.5,
interval: 1000 * 15,
minimumFailures: 3,
whitelist: [],
circuitConnectionRetries: 3,
halfOpenLimit: 3,
};
Concurrent Lock
Introduction
The ConcurrentLock
module provides a distributed lock implementation. At a high level, there are two reasons why you might want a lock in a distributed application: for efficiency or for correctness [2]. To distinguish these cases, you can ask what would happen if the lock failed:
- Efficiency: Taking a lock saves you from unnecessarily doing the same work twice (e.g. some expensive computation). If the lock fails and two nodes end up doing the same piece of work, the result is a minor increase in cost (you end up paying 5 cents more to AWS than you otherwise would have) or a minor inconvenience (e.g. a user ends up getting the same email notification twice).
- Correctness: Taking a lock prevents concurrent processes from stepping on each others’ toes and messing up the state of your system. If the lock fails and two nodes concurrently work on the same piece of data, the result is a corrupted file, data loss, permanent inconsistency, the wrong dose of a drug administered to a patient, or some other serious problem.
Installation
npm i @forts/resilience4ts-concurrent-lock
Create and Configure a Lock
import { ConcurrentLock } from '@forts/resilience4ts-concurrent-lock';
const lock = ConcurrentLock.of('my-lock', {
withKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId,
});
const result = await lock.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
withKey | Function that returns a unique id for the call from the decorated function args. | |
duration | Duration in milliseconds to wait for the lock to be released. | |
extensible | Whether the lock is extensible. |
Concurrent Queue
Introduction
The ConcurrentQueue
decorator wraps a function with a distributed, blocking queue that ensures only one instance of the function is running at a time. If the function is called while another instance is running, the function will be queued and executed when the previous instance completes.
Because the queue is blocking, the caller will wait until the function completes before continuing. If the caller fails to acquire the lock, a QueueWaitExceeded
exception will be thrown. The use-cases for this are limited, but it can be useful in some situations so please consider your application's needs before using this module.
Installation
npm i @forts/resilience4ts-concurrent-queue
Create and Configure a Queue
import { ConcurrentQueue } from '@forts/resilience4ts-concurrent-queue';
const queue = ConcurrentQueue.of('my-queue', {
withKey: (...args: Parameters<MyDecoratedMethod>) => UniqueId,
});
const result = await queue.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
withKey | Function that returns a unique id for the call from the decorated function args. | |
maxAttempts | 10 | Maximum number of attempts to acquire the lock and execute the function. |
backoff | 0.01 | Backoff factor to use when retrying to acquire the lock. |
Fallback
Introduction
The fallback strategy provides an interface to define a callback that will be executed if the decorated function fails. This strategy is useful when you want to provide a default value or behavior in the event of a failure. For example, you may want to return a cached value or a default value from a configuration file. The fallback strategy is also useful for providing a graceful degradation of functionality when a service is unavailable, although you should consider using the circuit breaker strategy for this purpose as it provides more control over the failure state.
Given the limitations of Typescript's type system, specifically when dealing with method decoators, you may find it most appropriate to type the decorated function to use a Result
or Either
monad. This approach will allow you to gracefully handle both the success and failure cases. Helpfull packages for this strategy include oxide.ts or neverthrow.
Installation
npm i @forts/resilience4ts-fallback
Create and Configure a Fallback
import { Fallback } from '@forts/resilience4ts-fallback';
const fallback = Fallback.of('my-fallback', {
fallbackAction: async () => {
return "my fallback value";
},
});
const result = await fallback.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
fallbackAction | Function that returns a fallback value or executes a fallback action. | |
shouldHandle | PredicateBuilder that evaluates to a boolean indicating whether the fallback should be executed. |
Hedge
Introduction
The hedging strategy enables the re-execution of a user-defined callback if the previous execution takes too long. This approach gives you the option to either run the original callback again or specify a new callback for subsequent hedged attempts. Implementing a hedging strategy can boost the overall responsiveness of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If low latency is not a critical requirement, you may find the retry strategy is more appropriate.
Installation
npm i @forts/resilience4ts-hedge
Create and Configure a Hedge
import { Hedge } from '@forts/resilience4ts-hedge';
const hedge = Hedge.of('my-hedge', {
maxAttempts: 3,
delay: 1000,
});
const result = await hedge.on(async () => {
// do something
})();
Rate Limiter
Introduction
Rate limiting is an imperative technique to prepare your API for scale and establish high availability and reliability of your service. But also, this technique comes with a whole bunch of different options of how to handle a detected limits surplus, or what type of requests you want to limit. You can simply decline this over limit request, or build a queue to execute them later or combine these two approaches in some way.
The @forts/resilience4ts-rate-limiter
module provides strategies for Distributed
and Instance
-scoped rate limiting. The Distributed
implementation is backed by Redis, and will work across multiple instances of your application. The Instance
implementation will only limit the number of concurrent executions within a single instance of your application.
Installation
npm i @forts/resilience4ts-rate-limiter
Create and Configure a Rate Limiter
import { RateLimiter } from '@forts/resilience4ts-rate-limiter';
const rateLimiter = RateLimiter.of('my-rate-limiter', {
permitLimit: 10,
queueLimit: 1000,
window: 1000,
});
const result = await rateLimiter.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
requestIdentifier | Function that returns a unique id for the call from the decorated function args. | |
permitLimit | 10 | Maximum number of permits to issue per window. |
queueLimit | 1000 | Maximum number of requests to queue. |
window | 1000 | Duration in milliseconds that a call is allowed to wait for a permit to be issued. |
scope | RateLimiterStrategy.Distributed | Strategy to use for rate limiting. |
Retry
Introduction
The retry strategy enables the re-execution of a user-defined callback if the previous execution fails. This approach gives you the option to either run the original callback again or specify a new callback for subsequent attempts. Implementing a retry strategy can boost the overall reliability of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If high availability is not a critical requirement, you may find the hedging strategy is more appropriate.
Installation
npm i @forts/resilience4ts-retry
Create and Configure a Retry
import { Retry } from '@forts/resilience4ts-retry';
const retry = Retry.of('my-retry', {
maxAttempts: 3,
backoff: 1000,
});
const result = await retry.on(async () => {
// do something
});
Options
Config Property | Default Value | Description |
---|---|---|
wait | 500 | Wait in milliseconds before retrying. |
maxAttempts | 3 | Maximum number of attempts to retry. |
maxInterval | 60000 | Maximum wait in milliseconds between retries. |
retryMode | RetryMode.Linear | Strategy to use for calculating backoff. |
validateResult | Function returning a boolean indicating whether the result should be retried. | |
whitelistErrors | Array of errors that should be ignored, skipping retry. | |
onRuntimeError | Callback function to execute when an error occurs. |
Timeout
The timeout module provides a way to limit the amount of time a function may take to execute. If the function does not complete within the specified time, the module will throw an error.
Installation
npm i @forts/resilience4ts-timeout
Create and Configure a Timeout
import { Timeout } from '@forts/resilience4ts-timeout';
const timeout = Timeout.of('my-timeout', {
timeout: 1000,
});
const result = await timeout.on(async () => {
// do something
})();
Options
Config Property | Default Value | Description |
---|---|---|
timeout | Timeout in milliseconds. |
Frameworks
@forts/resilience4ts-nestjs
Getting started with resilience4ts + NestJS
Introduction
While resilience4ts works well as a standalone library, it also provides a set of decorators for NestJS. These decorators can be used to decorate any NestJS controller or service method. @forts/resilience4ts-nestjs
wraps all the core resilience4ts decorators plus the @forts/resilience4ts-all
decorator into a single package, and re-exports all of them along with convenient method decorators for use with NestJS controllers and services.
Installation
npm i @forts/resilience4ts-nestjs
Adding Decorators to a NestJS Injectable Service
Taken from the NestJS example
import {
Bulkhead,
CircuitBreaker,
Fallback,
CircuitBreakerImpl,
CircuitBreakerStrategy,
} from '@forts/resilience4ts-nestjs';
import { Inject, Injectable, UnauthorizedException } from '@nestjs/common';
import { AppGateway } from './app.gateway';
type HelloWorldArgs = {
id: string;
};
@Injectable()
export class AppService {
constructor(
@Inject('AppGateway')
private readonly appGateway: AppGateway,
) {}
// decorators can be stacked, and will be applied in the order they are listed
@Bulkhead({
getUniqueId: (args: HelloWorldArgs) => args.id,
maxConcurrent: 1,
maxWait: 250,
})
@Fallback({
shouldHandle: new PredicateBuilder(UnauthorizedException).or(
(e: Error) => e.message === 'asdf',
),
fallbackAction: async () => 'fallback',
})
@CircuitBreaker({
strategy: CircuitBreakerStrategy.Percentage,
threshold: 0.2,
})
async getHello(args: Record<'id', string>) {
// The original, functional decorators are also available
// To use them import them as their name + Impl
// e.g. CircuitBreakerImpl, CacheImpl, etc.
return await CircuitBreakerImpl.of('gateway.call', {
strategy: CircuitBreakerStrategy.Percentage,
threshold: 0.2,
}).on(this.appGateway.getHello)(args);
}
}