# DYNAMICALLY RECONFIGURABLE DCT ARCHITECTURES BASED ON BITRATE, POWER, AND IMAGE QUALITY CONSIDERATIONS

Yuebing Jiang and Marios Pattichis

University of NewMexico Department of Electrical and Computer Engineering Albuquerque, NM 87106 {yuebing, pattichis}@ece.unm.edu

### ABSTRACT

We propose a dynamically reconfigurable DCT architecture system that can be used to optimize performance objectives while meeting real-time constraints on power, image quality, and bitrate. The proposed system can be dynamically reconfigured between 4 different modes: (i) minimum power mode, (ii) minimum bitrate mode, (iii) maximum image quality mode, and (iv) typical mode.

The proposed system relies on the use of efficient DCT implementations that are parameterized by the word-length of the DCT transform coefficients and the use of different quantization quality factors. Optimal DCT architectures and quality factors are pre-computed on a training dataset. The proposed system is validated on the LIVE database using leaveone-out. From the results, it is clear that real-time constraints can be successfully met for the majority of the test images while optimizing for the 4 modes of operation.

*Index Terms*— DCT, finite word length, quantization, Dynamic Partial Reconfiguration.

# 1. INTRODUCTION

Modern mobile devices require image processing solutions that can scale with available power. Wireless networks impose specific constraints on available bandwidth, while users are interested in the quality of the transmitted images. Taken together, we thus have opposing constraints on power, bitrate, and image quality. The design of image compression hardware usually attempts to provide a static hardware solution that can balance these constraints. Unfortunately, the constraints can change in real-time. For example, there is a need for a low-power solution when the battery level of a mobile device is significantly depleted. Network requirements can vary based on location. Furthermore, requirements on image quality can vary based on the changing interests of the user. We propose the development of a dynamically reconfigurable framework that allows the hardware to adapt to these requirements.

For addressing some of these issues, we focus on the development of a dynamically-reconfigurable system for the DCT. Thus, our proposed system will be compatible with image and video compression standards that rely on the DCT. In fact, we will demonstrate our system as a part of baseline JPEG. In terms of practical usage, we introduce new DCT hardware configurations that reflect realistic scenarios for image compression. These scenarios include: (i) minimum power mode, (ii) minimum bitrate mode, (iii) max image quality mode, and (iv) typical mode. All modes are subject to constraints based on the maximum available power, minimum acceptable image quality, and maximum bitrate. We refer to the methods section for details.

The proposed system can dynamically reconfigurable the hardware from one of these modes to another. Furthermore, by changing the constraints, it is clear that we can generate different hardware solutions.

The use of dynamic partial reconfiguration with 2D DCT architectures has also been considered in [4, 5]. In [5], the authors use dynamic partial reconfiguration to modify the DCT architectures based on an estimate of the number of DCT coefficients that will be zero. In [2], the authors considered the effect of varying the number of bits used for representing the data path. Compared to prior work in this area, this paper introduces a new, multi-objective optimization framework for evaluating DCT architectures in terms of image quality, dynamic power, and bitrate, and training and validation of the method using the LIVE Image Quality database [3].

The rest of the papers is organized as follows. The methodology is given in section 2. Implementation results are given in 3. Concluding remarks are given in section 4.

# 2. METHODOLOGY

A block diagram of the proposed system is shown in Fig 2. Real-time constraints and selected mode of operation are input to the dynamic reconfiguration (DR) controller. The precomputed Pareto-optimal DCT architectures and corresponding Quality factor are selected by the DR controller. The



**Fig. 1**. Dynamically reconfigurable DCT architecture tested with baseline JPEG.

FPGA hardware is dynamically reconfigured to implement the optimal DCT architecture while the quality factor is used for controlling the level of quantization.

We perform dynamic reconfiguration using the internal configuration access port (ICAP) provided by Xilinx. We refer to our related work in [6] for a high-speed dynamic partial reconfiguration controller that can also be used in this application.

To introduce the different modes, we first provide the necessary definitions. First, let the objective functions be defined as follows: (i) DP is the dynamic power that is affected by dynamic reconfiguration, (ii) SSIM is used to measure image quality [7], (iii) BPS is the number of bits per pixel. For each word-length value we have a different DCT architecture as given by  $\mathcal{A}(WL)$ . The objective functions are functions of the word-length (WL) and the quality factor (QF). For example, we have that BPS( $\mathcal{A}(WL), QF$ ) and similarly for DP and SSIM. To have a more compact notation, we will not list the arguments of the objective functions. Furthermore, let  $P_{max}, B_{max}, Q_{min}$  denote bounds on objective functions. Based on this compact notation we define the four modes using:

# 1. Minimum power mode (mode=0):

 $\min_{WL,QF} DP \text{ subj. to: } (SSIM \ge Q_{min}) \& (BPS \le B_{max}).$ (1)

#### 2. Minimum bitrate mode (mode=1):

$$\min_{WL,QF} BPS \text{ subj. to } (SSIM \ge Q_{min}) \& (DP \le P_{max}).$$
(2)

## 3. Maximum image quality mode (mode=2):

$$\max_{WL,QF} \text{SSIM subj. to } (\text{BPS} \le B_{max}) \& (\text{DP} \le P_{max}).$$
(3)

4. Typical mode (mode=3):

$$\max_{WL,QF} \alpha \cdot \text{SSIM} - \beta \cdot \text{BPS} - \gamma \cdot \text{DP}$$
(4)

subj. to  $(BPS \leq B_{max}) \& (SSIM \geq Q_{min}) \& (DP \leq P_{max}).$ 

For implementing the **typical** mode, we need to provide appropriate weighting factors  $\alpha$ ,  $\beta$ ,  $\gamma$ . Note that the objective functions in (4) may not share the same range of values. Here, each one of the objectives needs to be scaled according to its estimated variation over the training set. This is done without making any assumptions on the distributions using the statistical *range* parameter of each objective function. Here, we use the standard estimate for the range using  $\operatorname{range}(Z) = 75^{\mathrm{th}}\operatorname{percentile}(Z) - 25^{\mathrm{th}}\operatorname{percentile}(Z)$ , where Z is either the SSIM, the BPS, or DP. We then set the corresponding weighting factor to be  $1/\operatorname{range}(Z)$ . Thus, for example,  $\alpha = 1/\operatorname{range}(\mathrm{SSIM})$ .

To perform the optimization, we need estimates of the objective functions. For each configuration pair, we obtain estimates of the objective functions by using the median values over the training set. We test the performance of the proposed approach using leave-one-out on the LIVE database.

In order to satisfy the constraints for a variety of test images, we perform the optimization over the training set using tighter bounds. This approach ensures that the constraints will be satisfied by the majority of the images. Here, we use  $0.9 \cdot B_{max}$  for the bitrate bound and  $1.1 \cdot Q_{min}$  for the minimum image quality bound, where  $B_{max}$  and  $Q_{min}$  denote the target bounds to be met by the test images.

Our DCT implementation is based on Chen's algorithm [1]. To explain the algorithm implementation, let  $C_i = 0.5 * cos(i\pi/16)$ , and define  $b = C_1, c = C_2, d = C_3, a = C_4, e = C_5, f = C_6$ , and  $g = C_7$ . Now, if we let X(0-7) denote an input column, the column DCT transform is given by:

$$\begin{bmatrix} Y(0) \\ Y(2) \\ Y(4) \\ Y(6) \end{bmatrix} = \begin{bmatrix} a & a & a & a \\ c & f & -f & -c \\ a & -a & -a & a \\ f & -c & c & -f \end{bmatrix} \begin{bmatrix} X(0) + X(7) \\ X(1) + X(6) \\ X(2) + X(5) \\ X(3) + X(4) \end{bmatrix},$$
(5)
$$\begin{bmatrix} Y(1) \\ Y(3) \\ Y(5) \\ Y(5) \\ Y(7) \end{bmatrix} = \begin{bmatrix} b & d & e & g \\ d & -g & -b & -e \\ e & -b & g & d \\ g & -e & d & -b \end{bmatrix} \begin{bmatrix} X(0) - X(7) \\ X(1) - X(6) \\ X(2) - X(5) \\ X(3) - X(4) \end{bmatrix}.$$
(6)

It takes 4 cycles to compute 1D column DCT and save it in transposed memory. Similarly, it takes 4 cycles for the 1D row DCTs that follow. It takes at total of 22 cycles to compute the first  $8 \times 8$  DCT. When the pipeline is filled, it takes 16 cycles for each subsequent  $8 \times 8$  2D DCT. We generate different DCT architectures by varying the word-length (*WL*) of  $\pm a, \pm b, \pm c, \pm d, \pm e, \pm f, \pm g$  using two's complement representations. Based on the range of values of the DCT coefficients, we use  $WL = 3, 4, \ldots, 9$ . Here, we have 8-bit input images. The outputs of the column-DCTs are truncated to 11 bits, while the final results are truncated to 14 bits.

### 3. RESULTS

We present results for the DCT architectures in Table 1. From the Table, it is clear that all architectures can be



(a) Min power mode: QF=75, WL=4



(b) Min bitrate mode: QF=20, WL=7





(d) Typical mode: QF=80,WL=5



(e) Minimum dynamic power mode (f) Minmum bitrate mode (g) Maximum image quality mode (h) Typical mode

**Fig. 2**. Multi-objective Optimization spaces and reconstruction results for different modes. Here, we are using the Woman hat" image as the test image. The rest 28 LIVE database reference images are used as training data set. The multi-objective optimization plots for the different modes are shown in (a)-(d). The resulting test images are shown in (e)-(h). The optimal mode results were: (1) Minimum dynamic power mode: DP = 88.5 mW, BPS = 0.80, SSIM = 0.872, (2) Minimum bitrate mode: BPS = 0.22, SSIM = 0.767, DP = 156.3mW, (3) Maximum image quality mode: SSIM = 0.916, BPS = 0.88, DP = 138.6mW, (4) Typical mode: DP = 109.2mW, BPS = 0.88, SSIM = 0.907.

operated at 200 MHz. Furthermore, we have a dynamic power that varies from 85 to 203mW. As mentioned earlier, we use the Xilinx ICAP controller to implement the DCT architecture bitstreams using dynamic partial reconfiguration. The partial reconfiguration area was defined to occupy SLICE\_X54Y90:SLICE\_X95Y148. Each patial reconfiguration bitstream required 689kbits. At an ideal reconfiguration speed of 400MB/s, the reconfiguration overhead is approximately 1.7ms.

For generating optimal solutions for different modes, we consider the performance achieved by the 7 DCT architectures  $\mathcal{A}(WL = 3, ..., 9)$  and for 20 possibilities of the Quality Factor QF = 5, 10, ..., 100. For each configuration, we estimate the bitrate, SSIM, and dynamic power by taking the median value over the training set. An example is shown in

Fig. 2. In this example, we implement the four modes of (1)-(4) using the bitrate constraint of  $B_{max} = 1.5$  bits per sample, an SSIM image quality level of  $Q_{min} = 0.7$ , and a maximum dynamic power constraint of  $P_{max} = 180mW$ . Note that after the optimization, a single DCT architecture is selected. As expected, we have that the minimum power modes requires the minimum amount of power, while the maximum image quality mode provides a solution with the highest possible quality, and the minimum bitrate requires the minimum number of bits per sample. The typical mode provides a relatively high image quality result at a reasonable bitrate and dynamic power.

For the same constraints as for Fig. 2, we present results over the entire LIVE image database in Fig. 3. Here, we estimate the optimal DCT architecture and Quality Factors

| WL | Slices | Dyn nower | Frequency |
|----|--------|-----------|-----------|
|    | bliees | (mW)      | (MHz)     |
| 3  | 807    | 85.2      | 274.88    |
| 4  | 894    | 88.48     | 272.93    |
| 5  | 1082   | 109.16    | 206.74    |
| 6  | 1212   | 138.64    | 203.09    |
| 7  | 1332   | 156.28    | 202.47    |
| 8  | 1532   | 194.97    | 201.13    |
| 9  | 1657   | 203.19    | 200.08    |

**Table 1.** DCT architecture results on XC5VLX110T (Virtex-5) device. The device has 17, 280 slices.

for each mode and then apply them to the test images using leave-one-out. Over the entire LIVE database, we have the following constraint satisfaction results:  $22/29 \approx 75.9\%$  for the minimum power mode,  $28/29 \approx 96.6\%$  for the minimum bitrate mode, and  $21/29 \approx 72.41\%$  for the maximum image quality and typical modes. The optimal parameters are also shown in Fig. 3. Note that we have high quality factors (low quantization) for all modes except for the minimum bitrate mode. It is also interesting to note that the optimal DCT architectures are given as WL = 4 for the minimum power mode, WL = 5, 6 for minimum bitrate, WL = 6 for maximum image quality, and WL = 5 for the typical mode. Thus, we have the maximum word-length for maximum image quality and the minimum word-length for the minimum power mode.

### 4. CONCLUSION

In this paper, we presented a dynamically reconfigurable DCT architecture system that can be used to optimize performance objectives subject to real-time constraints on power, image quality, and bitrate. The proposed system is validated on the LIVE image database for maximum image quality, minimum power, minimum bitrate, and typical modes.

### 5. REFERENCES

- Wen-Hsiung Chen, C. Smith, and S. Fralick. A fast computational algorithm for the discrete cosine transform. *IEEE Transactions on Communications*, 25(9):1004 – 1009, sep 1977.
- [2] Jiun-In Guo, Rei-Chin Ju, and Jia-Wei Chen. An efficient 2-d dct/idct core design using cyclic convolution and adder-based realization. *IEEE Transactions on Circuits and Systems for Video Technology*, 14(4):416–428, april 2004.
- [3] M.F. Sabir H.R. Sheikh and A.C. Bovik. A statistical



Fig. 3. Dynamically reconfigurable DCT architecture results over the LIVE database. We have boxplots for the minimum power mode (A), the minimum bitrate mode (B), the maximum image quality mode (C), and the typical mode (D). The boxplots were generated on the test images using leave-one-out. The optimal DCT configurations are indexed based on their word-lengths WL. The corresponding quality factor results are also shown.

evaluation of recent full reference image quality assessment algorithms. *IEEE Transactions on Image Processing*, 15:3440–3451, Nov. 2006.

- [4] Jian Huang and Jooheung Lee. A self-reconfigurable platform for scalable dct computation using compressed partial bitstreams and blockram prefetching. *IEEE Transactions on Circuits and Systems for Video Technology*, 19(11):1623 –1632, 2009.
- [5] Jian Huang and Jooheung Lee. Reconfigurable architecture for zqdct using computational complexity prediction and bitstream relocation. *IEEE Embedded Systems Letters*, 3(1):1 –4, march 2011.
- [6] John C. Hoffman Marios S. Pattichis. A high-speed dynamic partial reconfiguration controller using direct memory access through a multiport memory controller and overclocking with active feedback. *International Journal of Reconfigurable Computing*, 2011, 2011.
- [7] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. *IEEE Transactions on Image Processing*, 13(4):600 –612, 2004.