Background
With my background in ASIC design, I decided to start pursuing FPGA Design as a serious hobby. I acquired a ZYNQ TUL-2 Board Board, which is a processor-FPGA Hybrid containing a dual-core ARM interfacing to an FPGA Fabric.
Similarities with ASIC design
The initial steps of FPGA and ASIC design are essentially the same:
- Design and Simulate Verilog. FPGAs tend to come with packaged IPs specific to their use which presumably use the LUTs/DSP slices more effectively. I’ve been using Verilator because I believe it is faster and allows me a better interface to compiled software which might interface with an FPGA.
- Elaborate Design – turn the design into a “generic” technology to make sure your Verilog is synthesizeable.
- Synthesize Design – map the design into ASIC Cells or LUTs/FPGA Assets.
- Simulate gate-level netlist, to verify the RTL was properly mapped to gates.
- Implement Design – Now, take your design and implement (choose the actual locations for cell placement or LUT setting)
- Floorplan Design – Give the implementer ‘hints’ on where to place circuits for optimal timing. Give the tool only as much information it needs to get the job done.
- Re-simulate with Backannotation.. I have usually done these as a final QC step. Among the other cases where I’ve needed to do this:
- I once had a synthesis library which had allowed a flop to have both its SET and CLEAR bits set. This caused unknowns in simulation which we wouldn’t have caught in RTL simulation.
- Verification of external memory or peripheral access timings against a vendor-provided model.
- They are very slow, because you are simulating many more elements.
- You probably didn’t capture waveforms for every internal signal you needed, forcing a rerun (I’ve had this take weeks).
- Your SDF file may not match your gate-level generic parmeters. This is less likely in an FPGA because the process libraries are far more widely used and so errors are likely caught a lot earlier.
- If you are dealing with asynchronous clock domains, you will probably need to squelch the clock-boundary-crossing setup/hold times or run with a synchronous clock (or run a two-state simulation). Otherwise you will probably violate setup on a flop and the simulators I have used have not modeled metastability.
In FPGAs, your final implemenatation comes in the form of a bit/xsvf file which is uploaded to the FPGA, either in an SRAM pattern or a Flash/Fuse pattern. You are bound by the number of FPGA Assets (LUTS/DSP Multipliers/etc). In ASICs, you produce a set of reticle layer patterns (GDSII files).
Differences
The differences I’ve noted so far:
- ASIC designs are normally modeled in an FPGA-board implementation (we used Quickturn). This is obviously not necessary for an FPGA-based design; verification is likely handled as a software task which feeds test inputs to the FPGA.
- In ASIC designs we tended to make our testbenches synthesizeable so they could be tested in an FPGA implementation. This would likely not be efficient in an single-FPGA design as that testbench would use a significant chunk of the FPGA’s resources. Plus the FPGA is likely part of a board implementation with its own system integration plan.
- ASIC designs usually run faster for the same process. This is because they are custom-laid out for a specific application. The price you pay for the reconfigurability of the FPGA is that it simply won’t go as fast within the same semiconductor process.
- ASIC Designs require a manufacturing validation step (often a few)… you need to develop test vectors to go on a tester which will apply those patterns to each manufactured chip, both at multiprobe (pre-packaging) and final test (post-packaging).
- ASIC designs normally include on board test structures, including:
- Oscillators for process speed tests.
- Parametric “scribe” structures (between chips) for test.
- ATPG scan insertion. In ASIC designs we tend to avoid asynchronous resets because of ATPG scan insertion- a toggling bit in a scan chain might set or clear a flop.
- Verilog synthesis of RTL does not always produce the most optimal circuit. In those cases (such as datapaths or memories), Hard Macros or custom bar designs are normally required to achieve necessary speeds. FPGAs supply hard IP (DSP Slices/Memories) for those specific purposes but obviously bar design is not possible.
Relevant Work Experience
Most of my work experience involving Verilog was ASIC design at Texas Instruments…. I worked on several projects including being the DV lead for the TMS320C6201, 02 and 03 processors (I worked on the HPIF, DMA and Data Memory Controller plus did much of the design verification). I also trained multiple engineers on the OMAP 3430 and worked on the DaVinci processor doing Design and Verification. I spent a few years in a Quality and Test group doing Soft Error work with Robert Baumann. At Detechtion I co-developed our FPGA solution for our main hub.
All of my hobby experience is with a ZYNQ TUL-2 Board and is what the rest of this page is based on.
Processor/FPGA Analysis
The ZYNQ 7000 processor FPGA Fabric contains 13000 Logic Slices with four 6-input LUTS and 8 Flip Flops and 630 K of RAM plus 220 DSP Slices. This is too small for a pipelined datapath for a full processor but usable for processor offloading such as z-transforms, HDMI, and some small processor logic.
Tutorials
I struggled with the PYNQ board at first because the use of the PL with Verilog was not well documented (they were focused on HLS). I puzzled through it and ended up writing some tutorials:
- Creating a module overlay
- Creating a Verilog Overlay with Bidirectional Pins
- Interfacing to streaming interfaces
The tutorials got enough attention they made me a moderator on the site so I could maintain them!
Reference Projects
These are projects which act as notes on how individual features work.
- AXI I2C 1602 module Simple LCD drive driver project… This is more for the Python; I only instanced an already provided IP.
Asynchronous Crossings
More complex projects
- AXIStreamCompressor A module for streaming data to be compressed (work in progress).
- SubtractBranchNegative A very simple datapath.
- FrameGenerator Simple video frame generator.
- RandomNumberGenerator Random Number Generator using a Chip parametric structure.