This section is a quick overview of the Pipe and PipeSpec topics. Subsequent sections take it more slowly.
For FPGA modules to communicate, conventions must be adopted.
The information that gets conveyed in these interactions can be systematized to the extent that a large proportion of module functionality can be written adopting a small handful of conventions.
Some of the fields adopted in SpokeFPGA modules follow:
datathe actual data that flows from producer to consumer.
startflag indicating that the current data word is the start of a message
stopflag indicating that the current data word is the last word in a message
validflag suggesting when the
datavalue is legitimate.
readyflag suggesting then the data will be consumed if it is provided.
Routing these signals from module to module is error-prone and tedious, so SpokeFPGA provides some help to bundle them all up into a single array which can be easily managed.
If, for example, the data field was 16 bits, then the rest of the data can fit in to a Pipe as follows.
The pipe would be 20 bits of data flowing in one direction, except the
ready signal which always flows backwards. All signals are optional except the
With the pipe data just being one array, it can be declared and used very conveniently.
wire [PipeWidth-1:0] pipe_p2c; producer p( pipe_p2c ); consumer c( pipe_p2c );
Users of the pipe don’t have to know what’s inside at all, just its width.
While users don’t need to know what’s in a pipe, the modules that actually do the work certainly do. We need something that describes what is actually in a pipe. This is just a data structure with a set of bitfields that describes fields that make up the pipeline. This PipeSpec is simply an integer and is typically passed as a parameter to the module that’s using the pipe. The module can then ensure that it is set up correctly to process the pipe that will be connected.
It looks like this:
// create a PipeSpec for 16 bit data, and Start and Stop signals, using `PS_ macros localparam PipeSpec = `PS_DATA( 16 ) | `PS_START_STOP; // declare an array (the pipe) of the correct width wire [`P_w(PipeSpec)-1:0] pipe_p2c; // tell the modules what kind of data to expect, then just use the pipe to move data around. producer #( PipeSpec ) p( pipe_p2c ); consumer #( PipeSpec ) c( pipe_p2c );
PipeSpec itself looks a lot like the data pipe:
- 8 bits to describe the pipe data width (meaning there can be 0 - 255 bits of data)
- 1 bit to suggest whether or not the the pipe has
validsignals are compulsory so don’t have a field for themselves dc
In the code, macro helpers construct the
PipeSpec and obtain information from it (like how wide the actual pipe is).
- `PS_DATA( 16 ) declares that the data width will be 16 bits. The value 16 is put in the data_width field.
- `PS_START_STOP declares that start and stop signals will be used.
PipeSpec constructor macros are or’ed together to build up a spec, so the
PipeSpec from the above will look like the following:
The PipeSpec can be expanded quite significantly to hold much more data. While the spec will get wider, the pipes that hold the data will only contain the fields that are required. Some examples of additional fields in the spec are:
data_sizefield is one bit describing how many bits are legal in the
datafield. This allows larger data fields to be partially filled, including being empty.
commandis 5 bits specifying up to 32 bits of command flowing in the same direction as the data, additional to the data path, defined by the application
resultis 5 bits specifying up to 32 bits of result flowing in the opposite direction as the data, defined by the application
requestis 5 bits specifying up to 32 bits of request flowing in the opposite direction as the data, defined by the application
reverse- a big one - if set this says that there are mirrored signals of
dataall working in the opposite direction.
addressis 5 bits specifying up to 32 bits of address - the final field that permits the Pipe to be a simple memory bus.
Information about all fields (like their locations in the pipe, and their widths can all be obtained by macro, for example:
`P_w( PipeSpec ) // provides the width of the Pipe described by PipeSpec `P_Data_w( PipeSpec ) // provides the width of the Data field
To make it easy to use pipes specified in this flexible way, packer and unpacker modules are provided for all fields. This means that inside a module, appropriate helper modules are used to unpack (extract) and pack (insert) data. Packers take the PipeSpec as a parameter, the relevant fields as inputs, and the pipe that the field has to be inserted into. Unpackers also take the PipeSpec, but they take the pipe first then the fields as out-parameters.
Remember the pipes themselves are bidirectional - they have constituent bits that go in both directions.
valid pack and unpack, which are the only compulsory fields, and which are done together. Note that there is a tricky reversal for the
ready field. It runs in the opposite direction from the others.
module p_pack_valid_ready #( parameter PipeSpec = `PS_d8s ) ( input valid, output ready, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_valid_ready #( parameter PipeSpec = `PS_d8s ) ( inout [`P_w(PipeSpec)-1:0] pipe, output valid, input ready ); ... endmodule
stop pack and unpack, which are also always done together
module p_pack_start_stop #( parameter PipeSpec = `PS_d8s ) ( input start, input stop, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_start_stop #( parameter PipeSpec = `PS_d8s ) ( inout [`P_w(PipeSpec)-1:0] pipe, output start, output stop ); ... endmodule
Finally for these examples,
data pack and unpack
module p_pack_data #( parameter PipeSpec = `PS_d8s ) ( input [`P_Data_w(PipeSpec)-1:0] data, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_data #( parameter PipeSpec = `PS_d8s ) ( inout [`P_w(PipeSpec)-1:0] pipe, output [`P_Data_w(PipeSpec)-1:0] data ); ... endmodule
If a packer module is invoked on a field that doesn’t exist in the pipe, the value is ignored. If an unpacker module is used to extract a field that doesn’t exist, 0 is returned.
With these tools. Modules can be written that cover an enormous range of functionality and for which specs guide those that can be connected together and safely handle mismatches.
That was an overview of the pipeline system. Let’s take a wider perspective, for a moment to see what we are aiming for.
New FPGA developers get some idea of how things work by programming a blinking LED (without library help!), but almost immediately the next step is soft CPU, IP with Wishbone interfaces, C compilers and C code.
Another effect of a lack of library behavior is that even companies providing FPGA add-on hardware often don’t provide low level HDL to use the devices, preferring instead offer nothing at all or generic bus interfaces.
Pipelines are a convention that aims to make connecting Verilog modules easier and promoting re-use. They do this by formalizing how modules connect together by building on well known Ready/Valid signals, and then offering a shorthand to make this easier.
Pretending for a minute that the above problems did not exist, and that we had a decade or more of compatible IP library development, what would it be reasonable to expect to be able to do? Dreaming a little…
A simple character echo would have been so reassuring for a first timer.
USB CDC module would be generic and reusable. Note how thinking about it as a pipeline component and not a bus slave means interesting things can be done with it directly. In this case just connecting
While first learning, wouldn’t it have been great to be able to write a math module and then test it in real hardware with a terminal program?
USB CDC module is being reused. It is not too far fetched to think of
remove int string and
add int string as general purpose reusable functions. They would be customizable - with definable (via parameter) delimiters, etc.
unique would clearly have many uses.
The only code that would need to be written is the new code in the function module.
The handshaking on all the modules has small overhead and permits them to all govern their own execution.
It would have been nice to be able to run some message passing code on the host, and have real delimited packets be available for local code. Also, ideally if local code created packets, something to process them and get them back to the host, that would have been great, too.
Again, the only code that would need to be written in this case is the highlighted code : the app code on the host and the Verilog internal code on the FPGA.
Let’s use the following shorthand for this kind of host communication:
to Host and
from Host ports will be assumed to take and provide messages from the host code.
FPGA pins can trivially toggle at hundreds of metahetz, so why aren’t there off-the-shelf, architecture independent modules to create fast, reliable message communication?
comms module would create a bi-directional, reliable link and would transparently deliver messages from one FPGA to another.
A small extension to the
comms module would allow a ring network to be created between nodes. Distributed applications could be built with these tools.
There are so many incredible SPI chips. Wouldn’t it have been great if it were possible to experiment with these devices from a host?
spi master accepts messages and puts their content out to the SPI bus. Received data is returned.
Of particular interest here is that no functional code is written for the FPGA at all. All the functionality is provided by reusable library modules.
Could we go further? Take an IMU as an example - wouldn’t it have been great if it were possible to wrap the
spi master module with code that handles the IMU’s initialization and interface requirements and then just get IMU data out of it?
Of course there would need to be a some configuration, etc. but this module should exist - one for each type of IMU.
ADC’s in general are another kind of SPI-based device that would be incredible to have code libraries available for. The list of the devices is long.
Another good point to be emphazed here is that the pipeline code needs to be in pure Verilog, because it needs to be possible to wrap pipeline modules in code to create new pipeline modules. In the above example, there would be an SPI Master pipeline module inside the
imu spi module, doing all the SPI master stuff. The code around it would handle initialization, and getting data in and out of the device.
Of course there are thousands of other applications like this.
- PID Controller
- PWM generator
- I2C Master
- Display Controller
The foregoing have been heavily biased towards interfacing and communicating but there are many internal functions that could benefit from the Pipeline approach. For example,
- multiplication, division, CORDIC functions, filtering, etc.
- memory - fifo’s, caches, etc
- parsing and generating structured messages
Things that are often specific to a particular FPGA family could be wrapped up providing a way to isolate libraries needing to know these details. For example,
- LVDS IO
- Clock PLLs
- FPGA internals
Finally various combinations of pipeline components could be put together to make even more interesting functionality. The new modules so formed would themselves be pipeline modules, reusable, and with all the other desirable pipeline characteristics.
The hope in providing so many examples is to build the case for a Pipelining approach. But is it even feasible and convenient to express these ideas in FPGAs? Let’s now turn to how we might do that.
A cornerstone principle of component reuse is that component interfaces have to be compatible. One big problem when working with FPGA code is that different teams adopt different techniques.
For one system to be able to communicate with another, either someone has to learn Apple language, Orange language and write some glue code, or the teams need to agree on some standards.
What if there were a way to gather together functionality that could, with some assumptions, talk together. What would that look like? We’re going to call this integration Pipelines. The idea will be to first define a minimal standard for communication, and to then optionally draw other details into the standard. We will start with the
Valid signals, but then expand much further. At each step we’ll show examples that help motivate the changes.
Let’s assume the functionality that needs to be connected is based on latched data transfers (i.e. not combinatorial hardware). So the direction would be to find some way for code to transfer data one time per clock tick. One approach is called the Valid-Ready technique. We’ll start there. Let’s call the data sender the Producer and the data receiver the Consumer
In this diagram, the edge is drawn thick, because the data is mostly more than one bit wide. We can also use edge labels to describe how big the data field is if necessary.
But when is this data actually available?
After a reset it might take a while for data to be valid, more might turn up, but then there may nothing for a while, then more data could become available, etc. Think about a UART, for example. Every now and again a character just appears!
How is the consumer to know when the data being presented is good?
The accepted solution to this problem is to provide a signal alongside the data, often called
valid is raised when the data being provided by the Producer is valid.
This is great. Now a Consumer can tell when another valid data item is available.
However, there is another problem here. What if the Consumer is not ready? It may take a little bit after a reset to be receptive to data. Some data may cause it to pause for a little while, etc. If the Producer sends a value, but the Consumer can’t accept it at that cycle, the data gets lost. To solve this problem we give the Consumer a signal,
The ‘ready’ signal is raised when the Consumer is ready to receive data.
Valid together form a handshake.
This is an improvement, however, the transfer situation just got a lot more complex.
d2 look reasonable, but look at what happens to
d3. The Consumer is not ready when the Producer is, so the Producer has to hang onto its data until the Consumer is ready. When the Consumer finally is ready, the Producer may release the data.
It should be obvious from here, data is only transfered when both ready and valid signals are true. In the example above at times 3,5 and 10.
Codewise, if you’re a producer and you have data, you’ll be in a producer-valid kind of state until you see a
ready from the consumer. In Verilog, this might look like the following:
... case ( state ) ... STATE_DATA_VALID: // out_data is set up // out_valid is high if ( out_ready ) begin state <= STATE_DO_THE_NEXT_THING; out_valid <= 0; end ...
On the consumer side, when you’re in a consumer-ready sort of state, you’ll sit and wait until you get a
valid from the producer. In Veriog this might look like this:
... case ( state ) ... STATE_READY_TO_RECEIVE: // in_ready is high if ( in_valid ) begin internal_data <= in_data state <= STATE_DO_THE_NEXT_THING; in_ready <= 0; end ...
One of the great things about the Ready-Valid connection is that if both
valid are held high, transfers occur every clock cycle. To someone used to laboriously reading or toggling data from one place to another, this is pretty exciting.
Good Pipeline modules ought to be able to do this whenever possible, but it does make things even more complex. Modules supporting this kind of fast transfer have to take into account data appearing at every clock cycle, and have to be careful about getting in and out of the “one transfer every cycle” mode. See the Appendix for more of the darkness that awaits a pipeline module in the middle programmer.
The details of a full pipeline module require their own article. So for an excellent overview of handshaking in general see the excellent ZipCPU article “Strategies for Pipelining”
Many application areas are appropriate for Ready-Valid co-ordination.
Communication is an obvious great application for simple synchronization.
In this example, if
usb_serial were written to support Pipelines, we could have confidence that we could write code to that interface, that we could change
usb_serial implementations and the implementation of the
Internals independently without concern that they will become incompatible.
If we agree on how data is transfered, we can build general utilities. Here’s a FIFO (First In, First Out) memory unit.
Producer P creates data. When it is available (raised
valid) the FIFO can accept data until it is full in which case it can signal (lowering
ready) that it’s busy. On the other side, in the beginning, the Consumer C is signaled “nothing available” from the FIFO (
valid is low). When characters are available,
valid goes high. If the Consumer is ready (
ready is high) information is transferred.
These are two simple examples of how agreement on a few conventions can make parts transparently internconnectable
Start + Stop
Very often sequential data items in a pipeline are parts of multiword Messages (also known as Packets or Frames). In order to delineate these we add appropriate signals to the pipeline.
Not surprisingly, the
stop signals mark the words that begin and end the message.
With this scheme is is possible to have one word messages, but not zero word messages, since there is always a word alongside the
Buses sometimes omit the
start signal, and have a
stop signal only (often called
eof or “end of frame”) to indicate that what has come before, since the last
stop was part of a message and now it has ended. But this means that non message and message data can’t be mixed, and any stray words get prepended to the next message, so here at the expense of an additional signal line, we have both
stop to make it clear.
Often, and especially with larger data fields, it is useful to be able to specify how much of the data field is being used. This is a count in bits.
This is especially useful when converting from serial pipelines to parallel ones and vice versa. Since all pipelines are fixed width, this permits messages of less than the full size of the parallel data field to be transfered.
Let’s imagine that we have such a pipeline:
In this setup, there is a module
p1 that produces messages of data delineated with
stop signals. This serial message is converted to a parallel one, and the data size is reported as the number of words in the message x word size in bits.
data_size field can also solve the “can’t make an empty message” problem by setting the data_size on the included data item to zero for an empty message. This comes at the expense of lugging around a data_size field so this is not an ideal technique.
Collectively, all the non-handshaking fields make up the Payload.
Tools exist (as you’ll see below) to manipulate the whole payload at once.
This is very handy for modules that don’t much care what the various fields are, they just want to do something correctly, transparently with the entire payload. Like a fifo, or many communication modules.
The last module diagram was a bit of an eye-full and the textual Verilog needed to make all these connections is even worse.
Visually many tools let you make short cuts in these circumstances. The diagram above might become something more like the following:
So much better! The thicker lines indicate that what is being transfered is a bundle of signals, or Pipe(, or bus!), and the edge labels say what kind of pipeline we are implementing.
Visually, the convention of the thicker lines is a simpler way to draw pipelines. How can we help ourselves in code too? In almost any other language a data structure would be the natural way to handle this situation. Put all the fields into a structure and pass it around as one unified object. The compiler might even type check it for us if we’re so lucky. The (very) bad news is that Verilog does not permit structures. They are one of the benefits of System Verilog, but questions about uniform support for System Verilog in Vendor and OSS tools results in smirks and head shakes.
So what can we do in Verilog itself? The one thing Verilog does allow is arrays of wires, so could we try to do that?
Let’s take a look at a realistic module header in Verilog
module sample #( parameter DataWidth = 8 )( input clock, input reset, input [DataWidth-1:0] in_data, input in_valid, output in_ready, ... ); ... endmodule
What we want is something more like:
module sample #( parameter PipeWidth = 8 )( input clock, input reset, inout [PipeWidth-1:0] in_pipe, ... ); ... endmodule
Where all the wires of the pipe are packed into the one array:
This is definitely an improvement, but we just spent all the sections above adding new (and optional) wires to our idea of a pipeline, so how do we work out the width?
The answer: Macros. (Sorry! They’re nasty, but in Verilog, they’re all we have)
To use them we include a header before our module declaration. It needs to be before because we are able to use some of its functionality in the module header itself.
First we need a way to specify the Pipes we’re using. If this were just a single value, that would be great. Let’s try to pack the various configuration options into a single 32bit value called a PipeSpec. This value will only be used during synthesis, and will not make it into our designs unless we explicitly use it.
This is not the bundle of wires we’re passing around as our pipe, it’s a single 32b value encoding what’s in our pipe. With a lot of room for growth.
Note that there is no mention of
ready_valid since these signals must be present in all pipes.
There is a shorthand for these PipeSpecs: not all combinations are defined but the pattern is
`PS_ + d + data_size + [optionally] s + [optionally] z
- dn means data size n
- s means start and stop bits
- z means data size
Let’s look at a few
A pipe with just 8 data bits (and the
ready of course)
The PipeSpec for a pipe with 8 bits of data, and no other features is just
The field with the size of data is the first one, so this is really,
8 << 0
The shorthand for this is
Anywhere you need a PipeSpec, you can use this shorthand to define a pipe with 8 bits of data and no other additional features.
A pipe with start and stop, 8 data bits (and the compulsory
The PipeSpec is
8 | `PS_START_STOP
where PS_START_STOP is defined as
The short hand for this is
A pipe with 64 bits of data, start stop and data size is
The Pipespec that defines it is built up like this:
64 | `PS_START_STOP_BIT | `PS_DATA_SIZE
where PS_DATA_SIZE is defined as
Now we can specify pipes in a single integer, we can add more macros to do things for us. For a start, and most importantly, we can do what we set out to do earlier, we can calculate Pipe widths!
`P_w( PipeSpec )
Calculates pipe width given the PipeSpec. This is pretty straight forward - it just adds up the widths of all the enabled fields.
So finally we can define a module using this system.
module sample #( parameter PipeSpec = `PS_d8 )( input clock, input reset, inout [P_w(PipeSpec)-1:0] in_pipe, ... ); ... endmodule
in_pipe port will be the right size, and we have the definition of the pipe, in the form of a PipeSpec available for later use, as we will see.
Where are we? Here’s how connecting modules together by pipe used to look:
... localparam DataSize = 8; wire xfer_start; wire xfer_stop; wire [DataSize-1:0] xfer_data; wire xfer_valid; wire xfer_ready; ... producer #( .DataSize( DataSize ) ) p ( ... .out_start( xfer_start ), .out_stop( xfer_stop ), .out_data( xfer_data ), .out_valid( xfer_valid ), .out_ready( xfer_ready ), ... ); consumer #( .DataSize( DataSize ) ) c ( ... .in_start( xfer_start ), .in_stop( xfer_stop ), .in_data( xfer_data ), .in_valid( xfer_valid ), .in_ready( xfer_ready ), ... ); ...
Exhausting. The two modules are connected, but at what price to your sanity. Forget about linking chains of them up.
Fortunately, having all the present and future features of pipes being passed around as a single packed array means that using pipeline modules is very easy.
... localparam XferPipeSpec = `PS_d8s; wire [P_w(XferPipeSpec)-1:0] xferPipe; producer #( .PipeSpec( XferPipeSpec ) ) p ( ... .out_pipe( xferPipe ) ... ); consumer #( .PipeSpec( XferPipeSpec ) ) c ( ... .in_pipe( xferPipe ) ... ); ...
With these few lines the modules are connected. So much better. Even if we have to tolerate a macro or two.
Pipe Macro Summary
Here are the PipeSpec shorthand macros we can use:
`define PS_d8 ( 8 ) `define PS_d8s ( 8 | `PS_START_STOP ) // 8 bit data, and start stop signals `define PS_d8sz ( 8 | `PS_START_STOP | `PS_DATA_SIZE ) // 8 bit data, and data size `define PS_d16 ( 16 ) `define PS_d16s ( 16 | `PS_START_STOP ) // 16 bit data, and start stop signals `define PS_d16sz ( 16 | `PS_START_STOP | `PS_DATA_SIZE ) // 16 bit data, and data size `define PS_d32 ( 32 ) // 32 bit data onlyw `define PS_d32s ( 32 | `PS_START_STOP ) // 32 bit data, and start stop signals `define PS_d32sz ( 32 | `PS_START_STOP | `PS_DATA_SIZE ) // 32 bit data, start stop signals and data size `define PS `PS_d8s
Here are the Macros for calculating widths. Note that if the field is not present, the width is 0.
`define P_Data_w( spec ) ( ( spec ) & `PS_DATA ) `define P_DataSize_w( spec ) ( ( ( spec ) & `PS_DATA_SIZE ) ? $clog2( `P_Data_w( spec ) ) + 1 : 0 ) `define P_Start_w( spec ) ( ( ( spec ) & `PS_START_STOP ) ? 1 : 0 ) `define P_Stop_w( spec ) ( ( ( spec ) & `PS_START_STOP ) ? 1 : 0 ) `define P_Error_w( spec ) ( ( ( spec ) & `PS_ERROR ) ? 1 : 0 ) `define P_Valid_w( spec ) ( 1 ) `define P_Ready_w( spec ) ( 1 )
The only vaguely interesting thing is the calculation the width of the data_size field. If the data_size itself is zero the data_size width is zero. Otherwise, data_size width is ( log2(data_size) + 1 ). There needs to be room to express a 100% full data field. log2( data_size) bits is not enough. For example, for data_size 8 (a common number), the legal sizes are 0 (empty) through 8 (full). The data_size width couldn’t be log2(8) = 3 bits, since a three bit register can’t contain 8 (only 0 - 7)! Hence the extra bit.
Here’s the width of the payload part of the pipe. Recall that the payload is all the actual data in a pipe.
`define P_Payload_w( spec ) ( `P_w( spec ) - 2 )
It’s just the width of the whole pipe less the two handshake signals.
Pipe Helper Modules
Turning our attention from module use to module creation, how are we going to get the data out of these packed arrays? This time tiny sub modules are the way to go. Instead of having to do elaborate bit location calculations there are module helpers.
Declare the data values you need, and then instanciate the helper to pack or unpack the pipe field you want. A great part of this story is that these packing and unpacking functions will work even if the fields are not there in the PipeSpec. For example unpacking Start and Stop fields from a pipe that according to the PipeSpec doesn’t have them results in registers being defined that are always 0. Similarly attempting to pack Start and Stop fields into a pipe that doesn’t have them just silently doesn’t do it. This cuts down on conditional code in the rest of the module.
The best part of the pipe packer and unpacker story is that these modules are mostly just helpers. They add almost nothing to the final design.
What is this going to look like?
Somehow, inside the module, some code is going to take the packed arrays that contain all the pipe data and handshake signals and expand them into something useful.
Here’s the top of a simple producer module:
module producer #( parameter PipeSpec = `PS_d8 )( input clock, input reset, inout [P_w(PipeSpec)-1:0] out_pipe, // using the `P_w( ) macro returning a pipe's overall width ... ); localparam PipeData_w = `P_Data_w(PipeSpec); // using the `P_Data_w( ) macro returning a pipe's data width reg out_start; reg out_stop; reg [PipeData_w-1:0] out_data; reg out_valid; wire out_ready; p_pack_start_stop #( .PipeSpec( PipeSpec ) ) p_pp_ss( .start(out_start), .stop(out_stop), .pipe(out_pipe) ); p_pack_data #( .PipeSpec( PipeSpec ) ) p_pp_d( .data(out_data), .pipe(out_pipe) ); p_pack_valid_ready #( .PipeSpec( PipeSpec ) ) p_pp_vr( .valid(out_valid), .ready(out_ready), .pipe(out_pipe) ); // out_start, out_stop, out_data, out_valid, out_ready are all now automatically bundled into out_pipe ... endmodule
Here, as required, the separate signals that are used internal to the module are bundled by helper submodules into a single array,
out_pipe for routing outside.
Note that the default PipeSpec (PS_d8) has no Start Stop flags, but the code can still work with them if they’re there. The code can assign to them but there is no further effect.
Here’s what a consumer looks like:
module consumer #( parameter PipeSpec = `PS_d8 )( input clock, input reset, inout [P_w(PipeSpec)-1:0] in_pipe, // using the `P_w( ) macro returning a pipe's overall width ... ); localparam PipeIn_Data_w = `P_Data_w(PipeSpec); // using the `P_Data_w( ) macro returning a pipe's data width wire in_start; wire in_stop; wire [PipeIn_Data_w-1:0] in_data; wire in_valid; reg in_ready; p_unpack_start_stop #( .PipeSpec( PipeSpec ) ) p_upp_ss( .pipe(in_pipe), .start(in_start), .stop(in_stop) ); p_unpack_data #( .PipeSpec( PipeSpec ) ) p_upp_d( .pipe(in_pipe), .data(in_data) ); p_unpack_valid_ready #( .PipeSpec( PipeSpec ) ) p_upp_vr( .pipe(in_pipe), .valid(in_valid), .ready(in_ready) ); // in_start, in_stop, in_data, in_valid, in_ready are all now automatically available for use in the rest of the module ... endmodule
Similarly, as required, the single
in_pipe array from outside is unbundled by helper submodules into separate signals to be used internal to the module.
Note again that the default PipeSpec (PS_d8) has no Start Stop flags, but the code can still work with them if they’re there. They will appear as always off wires to the synthesizer.
Pipe Helper Module Summary
Here are the various packers and unpackers.
Payload as a whole pack and unpack
module p_pack_payload #( parameter PipeSpec = `PS ) ( input [`P_Payload_w(PipeSpec)-1:0] payload, input [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_payload #( parameter PipeSpec = `PS ) ( inout [`P_w(PipeSpec)-1:0] pipe, output [`P_Payload_w(PipeSpec)-1:0] payload ); ... endmodule
Ready Valid pack and unpack
module p_pack_valid_ready #( parameter PipeSpec = `PS ) ( input valid, output ready, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_valid_ready #( parameter PipeSpec = `PS ) ( inout [`P_w(PipeSpec)-1:0] pipe, output valid, input ready ); ... endmodule
Start Stop pack and unpack
module p_pack_start_stop #( parameter PipeSpec = `PS ) ( input start, input stop, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_start_stop #( parameter PipeSpec = `PS ) ( inout [`P_w(PipeSpec)-1:0] pipe, output start, output stop ); ... endmodule
Data pack and unpack
module p_pack_data #( parameter PipeSpec = `PS ) ( input [`P_Data_w(PipeSpec)-1:0] data, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_data #( parameter PipeSpec = `PS ) ( inout [`P_w(PipeSpec)-1:0] pipe, output [`P_Data_w(PipeSpec)-1:0] data ); ... endmodule
Data Size pack and unpack
module p_pack_data_size #( parameter PipeSpec = `PS ) ( input [`P_DataSize_w(PipeSpec)-1:0] data_size, inout [`P_w(PipeSpec)-1:0] pipe ); ... endmodule module p_unpack_data_size #( parameter PipeSpec = `PS ) ( inout [`P_w(PipeSpec)-1:0] pipe, output [`P_DataSize_w(PipeSpec)-1:0] data_size ); ... endmodule
There are now many more fields!
Since there is no runtime downside to having extra fields that are not being used, it is tempting to wonder about supporting other fields. This has to be approached with caution, howeve, since adding features creates a requirement that existing modules support them.
- Address - When forming a read or write operation to a memory, it might be handy to have an Address field.
- Meta Character - When communicating over a lossy channel, it is frequently desirable to have access to meta characters (message start, message end, message crc follows, etc.)
- Error - Perhaps it might be useful to send an error signal that many different kinds of module could interpret
- Flags - sometimes a tiny bit of extra data is critical to have along side a data word, could a general facility be developed around a generic “flags” field of a certain width.
- Under Icarus, a kind of conditional compile time error can be created that can be used to cause errors when necessary. Errors of configuration could stop the build process with an error message. This would be very handy to allow modules to insist that connected pipes have certain features, for example, that their data width is greater or less than a certin amount, that the Start Stop signals are supported, etc. What is a technique that works universally?
- the macro’s are a little scruffy still, they need another pass or two
- we need a “rich library of components”
Fast Pipeline Programming
Ready - Valid handshaking is a great way for modules to connect together, at its best permitting controlled data transfers on every clock cycle. Pure pipeline Producers Consumers are relatively easy to implement, but in a situation where a module is sending and receiving, things get complex.
There is the ugly possibility that the middle module-in-a-chain, while receiving valid data from upstream, has its
ready signal withdrawn from the downstream module, resulting in the need for the middle module to retain its data and store the next one. Effects ripple back up the chain.
Let’s see what’s happening in this train wreck. The trouble starts at Cycle 7 when p3 decides to be not ready for a cycle.
|Cycle||p1 -> p2||p2 -> p3|
|<3||Ready, No Data||Ready, No Data|
|3||d1 transfered||Ready, No Data|
|4||d2 transfered||d1 transfered|
|5||d3 transfered||d2 transfered|
|6||d4 transfered||d3 transfered|
|7||d5 transfered||d4 waiting, p3 not ready, nothing transfered|
|8||d6 waiting, p2 not ready, d5 stored in overflow||d4 transfered, p3 ready again|
|9||d6 transfered, p2 ready again, overflow cleared||d5 transfered|
|10||d7 transfered||d6 transfered|
|11||Ready, No Data||d7 transfered|
|>11||Ready, No Data||Ready, No Data|
The consequences of the hold-up propagate back up through the pipeline, one module per cycle.
Codewise, this isn’t too dire for a Producer or Consumer. They just wait for conditions to be right again for transfers. Any module in the middle, dealing with both its commitments downstream and upstream, however has to have provisions for stalling. And unstalling. It can be quite a mind-bender.
Codewise, pipeline modules with pipes in and out will have states much like the following:
- STATE_STARVING - we are ready for
in_readyis high), but the upstream module doesn’t have any (
in_validwas low), there may be
out_readywhen we set
- STATE_TRANSFERING - we are ready for data (
in_readyis high), there was valid
out_readywas asserted by the downstream module, if all is well, we transfer the data to
- STATE_STALLED - we are not ready for new data (
in_readywas low. And we have
out_validis true), but the downstream module was not ready (
- STATE_OVERFLOWED - we are not ready for
in_data, since we’re full up. We have data in
data_overflow. The downstream module was not ready (
out_readywas low). We have
out_datawaiting to be transfered and
It can (did) take many weeks of sober contemplation to get this all straight.
If you are kindly reviewing this thank you!
What do you think about the general concept?
What do you think about the general presentation?
Is a macro scheme to cut back on typing worth it?
Is the macro scheme presented here reasonable?
Is the pipeline macro and function naming too terse?
Any other comments?
- @ me on Twitter - @davidthings
- leave issues in the repo
- email me - email@example.com