I had Been trying to come up with a good problem to illustrate design trade offs when I came across this:

**Design a module that can sort four numbers.**

I’ve only done a few sorts in BASIC ages ago and more recently in OpenCL. Off the top of my head, I can think of multiple approaches:

**Completely parallel.**

Compare all four numbers against each other and use the resulting outputs of the six comparators to mux the input registers to the output registers. This results in a pipelined design which can process comparisons every clock cycle. The disadvantage is that we are comparing and muxing in a single cycle. This limits the fmax of the design.

The next post will detail this in more detail.

**Pipelined parallel**

We can insert one pipe stage into the design to make the Fmax as high as possible.

**Even-Odd pipelined**

This is a very expensive approach. It yields a great fmax, but because it requires two even and two odd pipe stages, it will use much more design area than the completely parallel or pipelined parallel approach.

**Even-Odd sequential**

This will result in the smallest design, however, it will have to stall while processing. This approach is not good for this problem definition.

**Now if we expand the problem to a large number of inputs, say 1000, what is the best approach.**

Completely parallel now becomes prohibitively expensive and unwieldy to implement. The only approach that makes sense is now the Even odd sequential approach. (There may be other approaches that would be better and I’d be happy to hear about them.) The solution to this phase of the problem will be discussed in more detail in a subsequent post.

*note: I’m updating the repo on github as I work through the problems. I’m learning SystemC verification at the same time which is why this is taking longer than my other posts.