Volume 29, Issue 15 e4111
SPECIAL ISSUE PAPER

JParEnt: Parallel entropy decoding for JPEG decompression on heterogeneous multicore architectures

Wasuwee Sodsong

Wasuwee Sodsong

Department of Computer Science, Yonsei University, Seoul, Korea

Search for more papers by this author
Minyoung Jung

Minyoung Jung

Department of Computer Science, Yonsei University, Seoul, Korea

Search for more papers by this author
Jinwoo Park

Jinwoo Park

Department of Computer Science, Yonsei University, Seoul, Korea

Search for more papers by this author
Bernd Burgstaller

Corresponding Author

Bernd Burgstaller

Department of Computer Science, Yonsei University, Seoul, Korea

Correspondence to:Bernd Burgstaller, Department of Computer Science, Yonsei University, Seoul, Korea.

Email: [email protected]

Search for more papers by this author
First published: 10 July 2017
Citations: 3
This paper extends work presented at the PMAM'16 workshop 2016, to use integrated graphics processors (IGPs) for parallel entropy decoding. We introduce a novel, dynamic partitioning scheme for JPEG decoding which takes into account the considerably lower compute-power of IGPs relative to the on-chip CPU. We present new insights from an extended experimental evaluation that includes the achieved energy savings on an i7-6700k CPU with an HD Graphics 530 IGP. We report on the performance improvements achieved on a Tesla K80 GPU.

Summary

The JPEG format employs Huffman codes to compress the entropy data of an image. Huffman codewords are of variable length, which makes parallel entropy decoding a difficult problem. To determine the start position of a codeword in the bitstream, the previous codeword must be decoded first. We present JParEnt, a new approach to parallel entropy decoding for JPEG decompression on heterogeneous multicores. JParEnt conducts JPEG decompression in two steps: (1) an efficient sequential scan of the entropy data on the CPU to determine the start-positions (boundaries) of coefficient blocks in the bitstream, followed by (2) a parallel entropy decoding step on the graphics processing unit (GPU). The block boundary scan constitutes a reinterpretation of the Huffman-coded entropy data to determine codeword boundaries in the bitstream. We introduce a dynamic workload partitioning scheme to account for GPUs of low compute power relative to the CPU. This configuration has become common with the advent of SoCs with integrated graphics processors (IGPs). We leverage additional parallelism through pipelined execution across CPU and GPU. For systems providing a unified address space between CPU and GPU, we employ zero-copy to completely eliminate the data transfer overhead.

Our experimental evaluation of JParEnt was conducted on six heterogeneous multicore systems: one server and two desktops with dedicated GPUs, one desktop with an IGP, and two embedded systems. For a selection of more than 1000 JPEG images, JParEnt outperforms the SIMD–implementation of the libjpeg-turbo library by up to a factor of 4.3×, and the previously fastest JPEG decompression method for heterogeneous multicores by up to a factor of 2.2×. JParEnt's entropy data scan consumes 45% of the entropy decoding time of libjpeg-turbo on average. Given this new ratio for the sequential part of JPEG decompression, JParEnt achieves up to 97% of the maximum attainable speedup (95% on average).

On the IGP-based desktop platform, JParEnt achieves energy savings of up to 45% compared to libjpeg-turbo's SIMD-implementation.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.