• Inventory
  • Products
  • Technical Information
  • Circuit Diagram
  • Data Sheet
Technical Information
Home > Technical Information > Communication And Network > Application of TMS320C6201 in MPEG-4 video decoder

Application of TMS320C6201 in MPEG-4 video decoder

Source:davidli88
Category:Communication And Network
2023-05-30 03:39:15
23

Abstract: TMS320C6201 is a high performance digital signal processor produced by TI Corporation in the United States. This paper introduces how to decode MPEG-4 SVP video using a piece of TMS320C6201 digital signal processor. The structure, algorithm, memory allocation and program optimization of the decoder are discussed, and the overall characteristics of the decoder are given.

Key word:TMS320C6201 Video Compression MPEG-4 VOP MB IDCT Motion Compensation

Introduction

With the development of network and multimedia technology, the importance and demand of visual communication has increased dramatically, such as desktop video conferencing, mobile devices, Internet-based video and audio communication. Following this is the rapid development of video compression technology and the continuous introduction of video compression standards. The International Moving Image Coding Group (MPEG) has introduced MPEG-1, MPEG-2 and MPEG-4.MPEG-4, which were developed by the International Moving Image Group in November 1998. It is a compression standard for multimedia applications, and its application coverage is much larger than that of MPEG-1 and MPEG-2. From mobile video phones to video editing, it supports both natural images and computer-generated images. Most importantly, it supports interaction. This is because MPEG-4 uses object-based image descriptions that are different from other standards. At present, the research and development of MPEG-4 application technology is rising quietly in China. After studying the MPEG-4 video standard, the author makes full use of the hardware resources and software optimization of TMS320C6201 to implement the embedded MPEG-4 video decoder in real time.

1 MPEG-4 video stream and its main algorithms

MPEG-4 uses object-based compression encoding technology. Before encoding, the video sequence is analyzed, each video object is segmented from the principle image, and then each video object is encoded separately in the south. Each object has its own shape information, motion information, and texture information. The encoding of video objects is to encode these three kinds of information. MPEG-4 removes time redundancy between consecutive frames by motion prediction and run compensation. Motion prediction and motion compensation can be whole pixel, half pixel or 1/4 pixel, in addition to adding overlapping motion compensation methods. Shape-related algorithms include arithmetic coding based on proximity information, horizontal and vertical filling, expansion filling, and so on. Algorithms related to texture coding include discrete cosine transformation (DCT), quantization, DC/AC differential prediction of DCT coefficients, Zig-Zag scanning, run-length coding, Hoffman variable-length coding, and so on.

The author implements SVP (Simple Visual Profile) video decoding for MPEG-4. Video sequences are all rectangular, so there is no arbitrary shape encoding. Video sequences are organized into a series of codestreams in the visual order of video object layer VOL (Video Object Layer), video object platform VOP (Video Object Plane), and macro block MB (Macro Block). One VOL contains more than one VOP, and one VOP contains more than one MB. MB is the basic unit in the stream. MB is divided into intra MB and imterMB. Inter 4VMB, etc. In I-VOP, all macroblocks are intraMB. Macro blocks in P-VOP can be intraMB, interMB, or inter4VMB. The interMB or inter4VMB streams in P-VOP are described as follows:

MB Stream=MB Shape+MB Header Information+MV+DCT Texture Information (Y1+Y2+Y3+Y4+U+V)

Because it is a rectangular frame, no shape codes the MB shape part.

There are four main parameters in MB header information:not_ Coded (whether this MB codes), mcbpc (whether color blocks U and V encode), cbpy (whether brightness blocks Y1-Y4 encode), dquant (incremental value of the quantification step of DCT coefficients in this MB).

MV is a motion vector, and what is actually written to the stream is the error value of the motion vector( Δ MV), because MV uses differential encoding in encoding. If MB is an interMB, then this MB has only one motion vector, so only one is transmitted in the stream Δ MV; If MB is an inter4VMB, it means that this MB has four running vectors, that is, one moving vector per block, so it needs to transmit four in the stream Δ MV, that is, mvd1, mvd2, mvd3 and mvd4.

It is a series of DCT coefficients that are quantized, Zig-Zag scanned, run-length coded, and Hoffman variable-length coded, encoded in the order of Y1, Y2, Y3, Y4, U, and V.

Introduction of 2 TMS320C6201 DSP and EVM board

2.1 TMS320C6201 DSP

TMS320C6201 is the first product of TI's C6000 Series New Generation Digital Signal Processor, released in March 1997. It is a 32bit fixed-point DSP chip. The chip has eight independent functional units that can operate at 200 MHz CPU clock frequency and 1600 MIPS at full speed. Its main features include:

* The chip core uses VelociTI TM's advanced Very Long Instruction Word (VLIW) structure, which has high parallelism and fast running ability. It can execute up to 8 32-bit instructions at the same time per clock cycle, and all instructions can be executed conditionally.

* Has a rich instruction system and can operate on bytes, supporting 16-bit multiplication;

*There are four fast DMA channels independent of each other, which can carry out various forms of data transmission.

* 64 KB for in-chip data storage and 64 KB for program storage, and supports multiple data widths of 8 bits/16 bits/32 bits; A 32-bit external memory interface that provides a direct interface to off-chip SDRAM, SDSRAM, and SRAM.

The above features of TMS320C6201 can meet the real-time requirements of video image processing. For example: TMS320C6201 DSP calculates 1 block 8 × 8 IDCT (Inverse Dispersive Cosine Transform) only uses (168+62) clock cycles, which is 1.15 μ S.

2.2 EVM board

The TMS320C6201 EVM board is a card with PCI interface. Besides being able to be used in the PCI slot of the computer motherboard, it can also work as an independent module and be debugged by XDS510 emulator. The board is equipped with a TMS320C6201 DSP, working at 160MHz.

Off-chip memory provided on the EVM board consists of a set of 64Kbit × 32 (256KB), 133MHz SBSRAM, configured as CE0; Two groups of SDRAM with 4MB and 100MHz were configured as CE 2 and CE3 respectively. You can also extend the storage space through the external memory interface (EMIF) on the board, which is configured as the CEI.

3 Implement MPEG-4 SVP decoding with TMS320C6201

3.1 MPEG-4 Video Decoding Principle

The decoding process of a VOP for MPEG-4 is shown in Figure 1, which is used by the decoder to recover video objects from the encoded bitstream. It is not difficult to see that the decoder mainly consists of shape decoder, motion decoder and texture decoder 3 parts.

3.2 Procedure Flow

The whole program is designed with modularization, mainly to optimize C language programming. Only the main program flow (shown in Figure 2) and the MB decoding process (shown in Figure 3) are listed, limited to space.

After initialization, the main program first extracts the VOL and VOP headers from the stream, and then decodes them in macroblocks based on the header information. MB decoding is a function of decoding separately, and it is also the first step to decode the header information, from which the type of macroblock can be determined: intra-frame MB, inter-frame MB or inter-frame 4VMB. Intra-frame MB decoding is a texture decoding in block [6][64], and the texture value is stored in the decoded macroblock line cache. The same part of the two interframe MB decoding is to decode the motion vector MV first, and the predicted value is stored in the macroblock line cache after decoding according to the MV motion compensation. Then the texture decoding is done in block units, and the residual value is stored in block[6][64]. The result is obtained by adding block[6][64] into the macroblock gate cache after decoding. The difference is that intraMB solves one MV when decoding MV; Inter4VMB solved 4 MVs. Therefore, when compensating for motion, one by macro block and one by block. Another case is that MB in P-VOP is not encoded (not_coded=1) and there is no data in the code stream for this macro block. MV=0 and DCT coefficients should all be processed at 0, i.e. the reference block is found from the same location in the previous frame as the result of the current macro block.

3.3 Memory Allocation

The MPEG-4 SVP decoder is programmed on the EVM board. Because the data storage space of TMS320C6201 DSP on-chip is only 64KB, and the data of image processing is very large, the key problem in decoding design is to analyze the storage space reasonably and effectively. The internal 64KB storage space opens up some space for some commonly used information in temporary decoding. The specific settings are listed in Table 1.

Table 1 Internal data storage space allocation

global variable Space occupied/B
Variable-length decoding (VLD) table 4906
Zig-Zag Sweep 192
Header information for VOL, VOP, and MB 108
DC/AC prediction and MB mode 5560
MV Forecast 9504
quantization step 396
Decode Output Cache (1 Macro Block Line) 8448
Input Compressed Bit Stream Cache 10K

Total

38.3K

Both the input compressed stream and the decoded video output are stored off-chip. The input compressed code stream is transmitted by PC to the external memory of the EVM board before the program starts, and the data is copied to the chip by DMA in batches during decoding. A compressed bitstream cache is set in the chip. The decoded video sequence is stored in the external memory, and one macroblock line is cached internally. After each macroblock line is resolved, it is transferred to the external memory by DMA.

3.4 Program Optimization

(1) Software development process and tools

The whole program is written and debugged according to the C6000 software development process, which is divided into three stages: C code generation, C code optimization and linear assembler writing. The development tool used is TI's Integrated Development Environment CCS (Code Composer Studio). Under CCS, software can be edited, compiled, debugged, code performance testing (profile), and so on.

(2) Procedure optimization measures

To optimize the program, the following measures are taken:

(1) In order to write optimized C program, try to program according to the method of optimizing C supported in C6000 environment, which will help C compiler produce efficient assembly code.

(2) Using library functions provided by TI greatly improves the programming efficiency.

(3) The use of DMA data transfer improves the CPU efficiency.

The following data are transmitted by DMA in the decoder:

*Code stream input - the stream is transferred from off-chip memory to in-chip;

*Result output after decoding - After decoding a macroblock line, the result is transferred from in-chip to Out-of-chip storage;

*Filling at the top and bottom;

* Reference blocks found out of the chip are transferred to the in-chip during motion compensation.

(3) Linear assembly is used to further optimize some program segments.

To improve code performance, code that affects your application can be rewritten with linear assembly.

Features and test results of 3.5 MPEG-4 SVP decoder

The MPEG-4 SVP video decoder implemented according to the above ideas fully conforms to the SVP specification of MPEG-4, and its features are listed in Table 2. The input image resolution can be QCIF or CIF, the input bit rates are 64Kbps, 128Kbps, and 384Kbps, the output image format is 4:2:0YUV, and the decoding rate is 30 frames/s.

Table 2 MPEG-4 SVP video decoder characteristics table

Compression standard MPEG-4SVP
Input Image Resolution QCIF (176) × 144), CIF (352) × 288)
pixel 8bit/pixel
Scan Format Progressive scanning
Input Bit Rate/Kbps 64, 128, 384
Decoded Frame Rate 30 frames/s
Output Image Format 4:2:0 YUV

The decoding software is debugged on the EVM board and the decoding time can be measured in the CCS environment. The time spent on decoding different images varies. The author tests the codestream in many situations, and can decode 25-30 frames or even more in one second to achieve real-time decoding.

Concluding remarks

After studying the video encoding and decoding algorithm of MPEG-4, the author successfully programmed the real-time decoding of subMPEG-4 SVP on the TMS320C6201 EVM board, which laid the foundation for the final design of an independent MPEG-4 decoder. This decoder can be embedded in some mobile devices to decode VPEG-4 streams, such as PDA, set-top box, home gateway, etc. It can also be used in remote monitoring with corresponding encoders.



Source:Xiang Xueqin