DynaSOAr: A Parallel Memory Allocator for Object-oriented Programming on GPUs with Efficient Memory Access (ECOOP 2019)

Who

Matthias Springer, Hidehiko Masuhara

Track

ECOOP 2019 Research Papers

Time Zone

The program is currently displayed in (GMT+01:00) Belfast.

Use conference time zone: (GMT+01:00) BelfastSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Jul 2019 15:40 - 16:00 at Mancy - Concurrency and Parallelism Chair(s): Stephen Kell

Abstract

Object-oriented programming has long been regarded as too inefficient for SIMD high-performance computing, despite the fact that many important applications in HPC have an inherent object structure. On SIMD accelerators including GPUs, this is mainly due to performance problems with memory allocation: There are a few libraries that support parallel memory allocation directly on accelerator devices, but all of them suffer from uncoalesed memory accesses.

In this work, we present DynaSOAr, a C++/CUDA data layout DSL for object-oriented programming, combined with a parallel dynamic object allocator. DynaSOAr was designed for a class of object-oriented programs that we call Single-Method Multiple Objects (SMMO), in which parallelism is expressed over a set of objects. DynaSOAr is the first GPU object allocator that provides a parallel do-all operation, which is the foundation of SMMO applications.

DynaSOAr improves the usage of allocated memory with a Structure of Arrays (SOA) data layout and achieves low memory fragmentation through efficient management of free and allocated memory blocks with lock-free, hierarchical bitmaps. In our benchmarks, DynaSOAr achieves a significant speedup of application code of up to 3x over state-of-the-art allocators. Moreover, DynaSOAr manages heap memory more efficiently than other allocators, allowing programmers to run up to 2x larger problem sizes with the same amount of memory.

Link to Preprint

https://arxiv.org/pdf/1810.11765

DOI

https://doi.org/10.4230/LIPIcs.ECOOP.2019.17

Matthias Springer

Tokyo Institute of Technology

Germany

Hidehiko Masuhara

Tokyo Institute of Technology

Japan

Recording