Frame Coprocesor

Processing video frames is a common problem for FPGAs and ASICs. Presenting a raw 1080p60 image on a TV screen requires movement of roughly 6 MB of data every 60th of a second.
Common video problems include:

Relevant work experience

At RESI I evaluated numerous GPU and some FPGA-based solutions for encoding, decoding and At Texas Instruments I worked on numerous chip projects including Davinci, and our (cellular)

Common approaches and tradeoffs

General Purpose Software Processing

Using a General Purpose datapath is the most flexible way to handle image processing, since you can iterate on software more easily than you can on hardware. The main disadvantage of this approach is that, especially for large canvases, it's difficult to achieve realtime speeds. This might be fine for archiving purposes but for live feeds realtime speeds are crucial. Generally, canvases at or below 720p can probably be processed realtime but higher resolution images may be a challenge.

GPU Processing with Specific hardware

GPU Processing with specific hardware (NVENC/NVDEC or other encoder/decode hardware) would be the fastest option for encoding and decoding. The obvious tradeoff is that the hardware can't be iterated in a two-week sprint. Innovation may take several multi-year design cycles and involve large developer groups and huge expense fabricating an ASIC.

GPU Processing with GPU Cores

Processing Frames with GPU Cores would be a useful tradeoff between specific hardware because it would be iterable in software but offer the advantage of high frequency dedicated cores which sacrifice some of the classic operations of CPUs for a simpler datapath. For simpler problems like downscaling, colorspace or framerate changes this is probably very workable by simply assigning each GPU core to a specific section of the frame.
There are not many GPU based encoding engines out there. My presumptions are that encoding might be practical on a GPU if bitrates were not a concern as each GPU core could be assigned a separate section of the picture and encode it. However, encoding algorithms are often evaluated by quality against bitrate, which would mean the GPU would require expensive synchronization operations after initial encoding to determine which sections of the image were most worth updates to keep the bitrate down.

FPGA Processing

An FPGA implementation of a GPU core would likely be a wasted effort (except as an ASIC qualification effort). An FPGA implementation at any lithography (say, 10 nM) is likely going to be much slower than an ASIC at that same lithography, because you can custom-route the ASIC. However, a FPGA implementation of an encoder circuit allow the ability to iterate on encoding quality. Xilinx and other FPGA IP providers provide IPs for encoding.

Data Formats

Picture data is normally provided in two formats: The advantages of the RGB format are: The advantages of the YUV format are: So looking at each of the above problems and applying to FPGAs:

Encoding

Encoding would involve receiving an entire frame into pre-set sections of memory and doing some sort of DCT or other encoding algorithm on specific sections of the frame to get a good set of frame data. The tradeoffs would likely be that you could evaluate more solutions but that would require more hardware. A good FPGA-based solution might become the basis for an ASIC in future development.

Decoding

Decoding is likely a deterministic process which could certainly be done with an FPGA. However, since the process is easier, by the time an FPGA image was developed there would likely be a faster ASIC-based solution available. The biggest reason to do this is if you are evaluating a new encoding algorithm.

Upscaing/Downscaling

Upscaling/Downscaling (as long as they are by integer values) is fairly trivial and usually involves either replicating pixels (upscaling) or averaging/maxing pixels (downscaling). A full frame buffer is probably not necessary but you may need a few lines of the image stored in memory.

Changing Framerate

Changing framerate can either be a simple process with poor results or a more complicated process with better results. For example, if you wanted to change a picture from 23.98 fps to 60 fps there are a few options:

Changing ColorSpace

The biggest challenge in changing color spaces are that you would need to have the entire frame loaded (for YUV to RGB conversion) or a place to save it (for RGB to YUV). There would also be some (integer * floating point) math involved. However, an (8-bit integer)*(4-digit floating point) multiply and some adds is probably not a major area or speed challenge.

Filtering

Simple pixel-wide filters such as color filters could be easily done on the fly. More complicated filters such as edge-detection filters would require more complicated operations. A major challenge in FPGAs is scarce routing resource so likely redundant hardware would be dedicated to specific sections of the frames in order to reduce routing and increase performance.

Image Classification

Imagine you had a 720p60 video of a highway and you wanted to identify license plates so you could read them and see if any cars on the highway are stolen.
My approach would be: Key design considerations: