SIMD-VITERBI 3
NAME
create_viterbi27, set_viterbi27_polynomial, init_viterbi27, update_viterbi27_blk, chainback_viterbi27, delete_viterbi27, create_viterbi29, set_viterbi_29_polynomial, init_viterbi29, update_viterbi29_blk, chainback_viterbi29, delete_viterbi29, create_viterbi39, set_viterbi_39_polynomial, init_viterbi39, update_viterbi39_blk, chainback_viterbi39, delete_viterbi39, create_viterbi615, set_viterbi615_polynomial, init_viterbi615, update_viterbi615_blk, chainback_viterbi615, delete_viterbi615 - IA32 SIMD-assisted Viterbi decoders
SYNOPSIS
#include "fec.h"
void *create_viterbi27(int blocklen);
void set_viterbi27_polynomial(int polys[2]);
int init_viterbi27(void *vp,int starting_state);
int update_viterbi27_blk(void *vp,unsigned char syms[],int nbits);
int chainback_viterbi27(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi27(void *vp);
void *create_viterbi29(int blocklen);
void set_viterbi29_polynomial(int polys[2]);
int init_viterbi29(void *vp,int starting_state);
int update_viterbi29_blk(void *vp,unsigned char syms[],int nbits);
int chainback_viterbi29(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi29(void *vp);
void *create_viterbi39(int blocklen);
void set_viterbi39_polynomial(int polys[3]);
int init_viterbi39(void *vp,int starting_state);
int update_viterbi39_blk(void *vp,unsigned char syms[],int nbits);
int chainback_viterbi39(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi39(void *vp);
void *create_viterbi615(int blocklen);
void set_viterbi615_polynomial(int polys[6]);
int init_viterbi615(void *vp,int starting_state);
int update_viterbi615_blk(void *vp,unsigned char syms[],int nbits);
int chainback_viterbi615(void *vp, unsigned char *data,unsigned int nbits,unsigned int endstate);
void delete_viterbi615(void *vp);
DESCRIPTION
These functions implement high performance Viterbi decoders for four convolutional codes: a rate 1/2 constraint length 7 (k=7) code ("viterbi27"), a rate 1/2 k=9 code ("viterbi29"), a rate 1/3 k=9 code ("viterbi39") and a rate 1/6 k=15 code ("viterbi615"). The decoders use the Intel IA32 or PowerPC SIMD instruction sets, if available, to improve decoding speed. On the IA32 there are three different SIMD instruction sets. The first and most common is MMX, introduced on later Intel Pentiums and then on the Intel Pentium II and most Intel clones (AMD K6, Transmeta Crusoe, etc). SSE was introduced on the Pentium III and later implemented in the AMD Athlon 4 (AMD calls it "3D Now! Professional"). Most recently, SSE2 was introduced in the Intel Pentium 4, and has been adopted by more recent AMD CPUs. The presence of SSE2 implies the existence of SSE, which in turn implies MMX. Altivec is the PowerPC SIMD instruction set. It is roughly comparable to SSE2. Altivec was introduced to the general public in the Apple Macintosh G4; it is also present in the G5. Altivec is actually a Motorola trademark; Apple calls it "Velocity Engine" and IBM calls it "VMX". All refer to the same thing. When built for the IA32 or PPC architectures, the functions automatically use the most powerful SIMD instruction set available. If no SIMD instructions are available, or if the library is built for a non-IA32, non-PPC machine, a portable C version is executed instead.
USAGE
Four versions of each function are provided, one for each code. In the following discussion, change "viterbi" to "viterbi27", "viterbi29", "viterbi39" or "viterbi615" as desired. Before Viterbi decoding can begin, an instance must first be created with create_viterbi(). This function creates and returns a pointer to an internal control structure containing the path metrics and the branch decisions. create_viterbi() takes one argument that gives the length of the data block in bits. You must not attempt to decode a block longer than the length given to create_viterbi(). Before decoding a new frame, init_viterbi() must be called to reset the decoder state. It accepts the instance pointer returned by create_viterbi() and the initial starting state of the convolutional encoder (usually 0). If the initial starting state is unknown or incorrect, the decoder will still function but the decoded data may be incorrect at the start of the block. Blocks of received symbols are processed with calls to update_viterbi_blk(). The nbits parameter specifies the number of data bits (not channel symbols) represented by the syms buffer. (For rate 1/2 codes, the number of symbols in syms is twice nbits, and so on.) Each symbol is expected to range from 0 through 255, with 0 corresponding to a "strong 0" and 255 corresponding to a "strong 1". The caller is responsible for determining the proper pairing of input symbols (commonly known as decoder symbol phasing). At the end of the block, the data is recovered with a call to chainback_viterbi(). The arguments are the pointer to the decoder instance, a pointer to a user-supplied buffer into which the decoded data is to be written, the number of data bits (not bytes) that are to be decoded, and the terminal state of the convolutional encoder at the end of the frame (usually 0). If the terminal state is incorrect or unknown, the decoded data bits at the end of the frame may be unreliable. The decoded data is written in big-endian order, i.e., the first bit in the frame is written into the high order bit of the first byte in the buffer. If the frame is not an integral number of bytes long, the low order bits of the last byte in the frame will be unused. Note that the decoders assume the use of a tail, i.e., the encoding and transmission of a sufficient number of padding bits beyond the end of the user data to force the convolutional encoder into the known terminal state given to chainback_viterbi(). The tail is always one bit less than the constraint length of the code, so the k=7 code uses 6 tail bits (12 tail symbols), the k=9 code uses 8 tail bits (16 tail symbols) and the k=15 code uses 14 tail bits (84 tail symbols). The tail bits are not included in the length arguments to create_viterbi() and chainback_viterbi(). For example, if the block contains 1000 user bits, then this would be the length parameter given to create_viterbi27() and chainback_viterbi27(), and update_viterbi27_blk() would be called with a total of 2012 symbols - the last 12 encoded symbols representing the tail bits. After the call to chainback_viterbi(), the decoder may be reset with a call to init_viterbi() and another block can be decoded. Alternatively, delete_viterbi() can be called to free all resources used by the Viterbi decoder. The set_viterbi_polynomial() function allows use of other than the default code generator polynomials. Although only one set of polynomials are generally used with each code, there can are different conventions as to their order and symbol polarity, and these functions simplifies their use. The default polynomials for the viterbi27 routes are those of the NASA-JPL convention without symbol inversion. The NASA-JPL convention normally inverts the first symbol. The CCSDS/NASA-GSFC convention swaps the two symbols and inverts the second. To set the NASA-JPL convention with symbol inversion:
int polys[2] = { -V27POLYA,V27POLYB };
set_viterbi27_polynomial(polys);
and to set the CCSDS convention with symbol inversion:
int polys[2] = { V27POLYB,-V27POLYA };
set_viterbi27_polynomial(polys);
The default polynomials for the viterbi615 routines are those used by the Cassini spacecraft without symbol inversion. Mars Pathfinder (MPF) and STEREO swap the third and fourth polynomials. Both conventions invert the first, third and fifth symbols. Refer to fec.h for the polynomial constant definitions. To set the Cassini convention with symbol inversion, do the following:
int polys[6] = { -V615POLYA,V615POLYB,-V615POLYC,V615POLYD,-V615POLYE,V615POLYF };
set_viterbi615_polynomial(polys);
and to set the MPF/STEREO convention with symbol inversion:
int polys[6] = { -V615POLYA,V615POLYB,-V615POLYD,V615POLYC,-V615POLYE,V615POLYF };
set_viterbi615_polynomial(polys);
For performance reasons, calling this function changes the code generator polynomials for all instances of corresponding Viterbi decoder, including those already created.
ERROR PERFORMANCE
These decoders have all been extensively tested and found to provide performance consistent with that expected for soft-decision Viterbi decoding with 8-bit symbols. Due to internal differences, the implementations vary slightly in error performance. In general, the portable C versions exhibit the best error performance because they use full-sized branch metrics, and the MMX versions exhibit the worst because they use 8-bit branch metrics with modulo comparisons. The SSE, SSE2 and Altivec implementations of the r=1/2 k=7 and r=1/2 k=9 codes use unsigned 8-bit branch metrics, and are almost as good as the C versions. The r=1/3 k=9 and r=1/6 k=15 codes are implemented with 16-bit path metrics in all SIMD versions.
DIRECT ACCESS TO SPECIFIC FUNCTION VERSIONS
Calling the functions listed above automatically calls the appropriate version of the function depending on the CPU type and available SIMD instructions. A particular version can also be called directly by appending the appropriate suffix to the function name. The available suffixes are "_mmx", "_sse", "_sse2", "_av" and "_port", for the MMX, SSE, SSE2, Altivec and portable versions, respectively. For example, the SSE2 version of the update_viterbi27_blk() function can be invoked as update_viterbi27_blk_sse2(). Naturally, the _av functions are only available on the PowerPC and the _mmx, _sse and _sse2 versions are only available on IA-32. Calling a SIMD-enabled function on a CPU that doesn't support the appropriate set of instructions will result in an illegal instruction exception.
RETURN VALUES
create_viterbi returns a pointer to the structure containing the decoder state. The other functions return -1 on error, 0 otherwise.
AUTHOR & COPYRIGHT
Phil Karn, KA9Q (karn@ka9q.net)
LICENSE
This software may be used under the terms of the GNU Limited General Public License (LGPL).