AXI Stream Compressor

Inspired by a project I did for a job interview. I've deliberately obscured the code to protect their privacy.

Purpose

The purpose of this module is to take in streaming data, compress it, and feed the results to a streaming output. Each streaming data element consits of: For example, a baseball statistic stream element might have a name, Batting Average, At-Bats, Hits, Walks, Doubles, Triples and Home Runs. The name is floating width because players will of course have different length names. The rest of the statistics are static widths.
NameAvgABHBB2B3BHRKSB
Char[16]16-bits16-bits16-bits16-bits8-bits8-bits8-bits16-bits8-bits
Don Mattingly0.3325121704130777512
Ken Griffey0.300512154 1223487512
Chipper Jones0.28151214477212187513
Dave Winfield0.3045001546419407513

An entry might be represented as:
NameDelimAVGABHBB3B
01234567 89101112131415 1617181920212223 2425262728293031
'D''o''n'' ''M''a''t''t''i' 'n''g''l''y' 0x2c 0x10x4c 0x000xaa 0x00x2C 0x00x1E 0x00x28 0x1F 0 0x4b 0xC

Data was passed in an 8-bit bus, packed and unaligned. This means that the first entry might start on byte 0 but if it were, say, 22 bytes long, the second entry would start on byte 22 (or byte 6 of the fourth bus cycle). So some level of multiplexing was necessary to get the bits into the compressor.

Discussion

There are some obvious points where we could optimize to reduce size of the message: The challenges here are:

Approach

My original approach was to implement two 64-deep fifos- one for input and one for output- and attempt to process the data "in flight". This approach to several issues: Throughout the exercise I assumed I had to processes the data at the input clock rate; (i.e. I couldn't have a clock running at a higher rate to "subprocess" the data).
For my second attept I decided to make a design with a multi-element receiver. Each Stream element holds one entry from above. A token is passed from each stream element to the next as the element's data arrives.
For Design Verification I rely on Verilator, which in my experience has faster compile and simulation performance than xsim/xelab.

For example, upon reset, Stream Element 0 holds the token. Suppose there are three stream entries coming in: The rough sequence of events is (starting with Stream element 0 holding the token): Once data is compressed, it is routed to a return FIFO which outputs the data.

Source

Github Source

Compilation Parameters

DATA_BYTE_WIDTH Maximum byte width taken in the streaming interface at once
ALGORITHM_SELECTEDAlgorithm to be ifdef-compied in.
VARIABLE_FIELD_DELIMITERASCII Value of Delimiter to detect end of floating field.
VARIABLE_FIELD_MAX_BYTESMaximum VARIABLE_FIELD length (not including delimiter)
FIXED_FIELD_MAX_BYTESByte Width of Fixed Field

Design Verification Plan

Obviously since this is a hobby/demo project I did not do a full-scale DV effort. Such an effort would involve adding the following points: This would become part of a Jenkins/Kubernetes environment which would run nightly as part of CI/CD.

Timing and Logic Levels

So far the best results I've been able to achieve are about 9 ns. This is with the ZYNQ processor which has a slower process.

Next Steps

Generify input interface to be AXI rather than a continuous stream. This would involve adding FRAME ends.