Allocator library#

This library provides primitives to efficiently manage dynamic allocation of memory containers used for scheduling accelerated kernel execution. This library could be used like following:

#include <metalchat/allocator.h>

Hardware memory allocator#

class hardware_memory_allocator#

This class creates tracked buffer resources directly from the device.

This is the default implementation of the hardware memory allocator, all resources are tracked and shared with CPU. In some workloads this implementation might provide suboptimal results due to frequent allocation/deallocation/wiring of the memory.

Hardware heap allocator#

template<typename T> class hardware_heap_allocator#

This class creates a GPU-CPU shared memory fixed sized heap.

This allocator pre-allocates a fixed-sized contiguous shared memory and make it resident. All subsequent allocations are happening within that memory and are added to the resident set. Once the allocation is deleted, it also freed from the heap and from the residence set.

When there is not enough memory in the heap to allocate the requested amount of memory, the implementation throws a metalchat::alloc_error exception.

Hardware resident allocator#

template<hardware_allocator Allocator> class hardware_resident_allocator#

This class template moves all allocations to the residency set. On container destruction allocations are removed from the residency set. When all allocations are remove, the set ends it’s residency.

All containers produced by this allocator keep pointers to the residency set, so it is safe to use this class within a scope.

Users could explicitly call hardware_resident_allocator::detach, when the underlying set is supposed to be made resident. End of residency happens automatically, once all allocations are removed. Also, allocator makes all containers resident on the object destruction.

Example:

using namespace metalchat;

using Container = hardware_memory_container<void>;
std::shared_ptr<Container> c1;
std::shared_ptr<Container> c2;

auto gpu = hardware_accelerator();
{
   auto alloc0 = gpu.get_allocator();
   auto alloc = hardware_resident_allocator(alloc0, gpu.get_metal_device());

   c1 = alloc.allocate(10);
   c2 = alloc.allocate(20);

   // Scope ends, c1 and c2 become resident. This could be done explicitly
   // by calling alloc.detach();
}


c1 = nullptr;
c2 = nullptr;

// Containers are deleted, end of the residency happens here.

Template Parameters:: Allocator – A type of wrapped allocator.

Public Functions

inline hardware_resident_allocator(const Allocator &alloc, metal::shared_device device, std::size_t capacity = 256)#

Constructs hardware_resident_allocator from a base allocator.

Parameters:

alloc – A base allocator. All allocation requests are delegated to this allocator.
device – A shared pointer to the Metal device instance.
capacity – A maximum number of allocations that residency set could hold.

inline hardware_resident_allocator(Allocator &&alloc, metal::shared_device device, std::size_t capacity = 256)#

Constructs hardware_resident_allocator from a base allocator.

Parameters:

alloc – A base allocator. All allocation requests are delegated to this allocator.
device – A shared pointer to the Metal device instance.
capacity – A maximum number of allocations that residency set could hold.

hardware_resident_allocator(hardware_resident_allocator &&other) = default#: The move constructor of the hardware_resident_allocator.

hardware_resident_allocator(const hardware_resident_allocator &other) = default#: The copy constructor of the hardware_resident_allocator.

inline void detach()#: Permit allocations to be moved to resident memory and be used idependently from the given allocator.

Polymorphic hardware allocator#

template<typename T> struct basic_hardware_allocator#

This class template is a virtual class that should be inherited by allocator implementations that are expected to be used within a polymorphic hardware memory allocator.

Essentially, all virtual methods presented in this class represent all necessary methods that are requested by metalchat::allocator concept, so all allocators should automatically implement this virtual class, if inherited from this struct.

Example of usage:

using namespace metalchat;

template <typename T> struct custom_hardware_allocator :
public basic_hardware_allocator<T> {

    using value_type = T;
    using pointer = value_type*;
    using const_pointer = const value_type*;
    using size_type = std::size_t;
    using container_type = hardware_memory_container<value_type>;
    using container_pointer = std::shared_ptr<container_type>;

    container_pointer
    allocate(size_type size)
    {
        // allocate a new container.
    }

    container_pointer
    allocate(const_pointer ptr, size_type size)
    {
        // allocate a new container and initialize with data from ptr.
    }
};

Note

Alternatively, you could simply use hardware_allocator_wrapper in order to avoid creating a custom type.

Public Functions

virtual container_pointer allocate(size_type) = 0#: Allocates size * sizeof(T) bytes of uninitialized memory by calling an outer allocator type.

virtual container_pointer allocate(const_pointer, size_type) = 0#: Allocates size * sizeof(T) bytes and initializes them with the data stored at ptr.

template<typename T> class polymorphic_hardware_allocator#

The class template is an metalchat::allocator which exhibits different allocation behaviour depending on a particular implementation of the metalchat::basic_hardware_allocator.

This allocator is used in order to avoid creating separate instances of device and thread, when kernel of different types (bf16, float, double) are expected to be scheduled within a single device.

Example:

using namespace metalchat;

// Create a default hardware accelerator, then decorate the default allocator
// with no-copy allocator (keep all CPU allocations shared with GPU), and resident
// allocator (which moves all allocations to a resident set on request).
auto gpu = hardware_accelerator("metalchat.metallib");
auto alloc1 = nocopy_allocator(alloc0, gpu.get_metal_device());
auto alloc2 = hardware_resident_allocator(alloc1, gpu.get_metal_device());
auto alloc3 = polymorphic_hardware_allocator(alloc2);

// Update device allocator with a new implementation of the allocator.
auto alloc_ptr = std::make_shared(std::move(alloc3));
gpu.set_allocator(alloc_ptr);

Template Parameters:: T – Scalar type of the container data.

Public Functions

inline polymorphic_hardware_allocator(std::shared_ptr<outer_allocator_type> alloc)#: Construct a new allocator instance given an implementation of the basic_hardware_allocator.

inline container_pointer allocate(size_type size)#: Allocates size * sizeof(T) bytes of uninitialized memory by calling an outer allocator type.

inline container_pointer allocate(const_pointer ptr, size_type size)#: Allocates size * sizeof(T) bytes and initializes them with the data stored at ptr.

Scalar memory allocator#

template<typename T> struct scalar_memory_allocator#

Random memory allocator#

template<typename T> struct random_memory_allocator#

Aliasing allocator#

template<allocator Allocator> class aliasing_allocator#

Rebind allocator#

template<typename T, allocator_t<void> Allocator> struct rebind_allocator#

This allocator is used to cast type of elements allocated in the contiguous memory, that are allocated with incomplete allocator type. Allocator is incomplete, when Allocator::value_type is equal to void.

The implementation only allows cast from incomplete allocator type, since the parent allocator might exploit different memory alignment depending from the underlying type.

Example:

auto gpu = hardware_accelerator("metalchat.metallib");
auto alloc = rebind_allocator<float>(gpu.get_allocator());
auto floats_container_ptr = alloc.allocate(10);

Public Functions

inline container_pointer allocate(size_type size)#

Allocates size * sizeof(T) bytes of uninitialized memory by calling an underlying allocator.

Use of this function is ill-formed if T is incomplete type.

inline container_pointer allocate(const_pointer ptr, size_type size)#: Allocates size * sizeof(T) bytes and initializes them with the data stored at ptr.

Pooling allocator adapter#

template<allocator_t<void> Allocator> class pooling_allocator_adapter#

This class creates buffer resources with an offset from the specified buffer.

Use this class when you want to maintain a single buffer (potentially mapped to another memory, like memory-mapped file). And want to allocate containers that point to the same underlying buffer with a different size and offset.

When the specified pointer does not belong to the memory pools, the implementation raises a metalchat::alloc_error exception.

Public Functions

inline pooling_allocator_adapter(const Allocator &alloc, container_pointer container_ptr)#

Construct a new pooling allocator with the specified container. All allocations with “new” semantic will be proxied to the specified allocator.

Parameters:

alloc – Proxy allocator for allocations without backing memory.
container_ptr – Underlying container from which allocations are created.

inline pooling_allocator_adapter(Allocator &&alloc, container_pointer container_ptr)#

Constructs a new pooling allocator with the specified container.

Parameters:

alloc – Proxy allocator for allocations without backing memory.
container_ptr – Underlying container from which allocations are created.

inline pooling_allocator_adapter(const Allocator &alloc, std::vector<container_pointer> containers)#

Constructs a new pooling allocator with the specified sequence of containers.

Parameters:

alloc – Proxy allocator for allocations without backing memory.
containers – Underlying containers from which allocations are created.

inline pooling_allocator_adapter(Allocator &&alloc, std::vector<container_pointer> containers)#

Constructs a new pooling allocator with the specified sequence of containers.

Parameters:

alloc – Proxy allocator for allocations without backing memory.
containers – Underlying containers from which allocations are created.

No-copy allocator#

template<typename T, allocator Allocator> class nocopy_allocator#: The allocator that creates a shallow container resource for allocations with memory-copy semantic. All containers created with that method do not manage the underlying memory (specified by a const_pointer). And caller is responsible for a proper memory management of the original memory deallocation.

File-buffered memory allocator#

template<typename T> struct filebuf_memory_allocator#

Public Functions

inline container_pointer allocate(size_type size)#: Allocate the block of memory of the specified size and immediately write it to the anonymous file buffer. The lifetime of the backing file buffer ends with the end of lifetime of the returned container.

Paginated allocator adapter#

template<allocator_t<void> Allocator> class paginated_allocator_adapter#

Allocator concepts#

template<typename Allocator> concept allocator#

#include <allocator.h>

The concept specifies the requirements for a type to allocate elements contiguously stored in the memory. The allocator is used to allocate underlying memory for a tensor.

Depending on the tensor type, memory could be allocated on stack, within random accessible memory, or memory, that is shared between CPU and GPU, using different implementations of hardware allocators.

template<typename Allocator, typename T> concept allocator_t#: #include <allocator.h>

The concept specifies the requirements for a type to allocate elements of type T contiguously stored in the memory.

template<typename Allocator> concept hardware_allocator#: #include <allocator.h>

The concept specifies the requirements for a type to allocate elements contiguously stored in the hardware (Metal) memory.

template<typename Allocator, typename T> concept hardware_allocator_t#: #include <allocator.h>

The concept specifies the requirements for a type to allocate elements of a type T conguously stored in the hardware (Metal) memory.