Home / Advanced / The Dangers of Over-Optimization

The Dangers of Over-Optimization

Recently I dealt with some code in a design where the original designer was trying to outsmart the synthesis engine. The code was largely unreadable to determine it’s function and he had it down to optimizing for just the number of registers he needed to store the final value. You would think something like this would produce the best results, however, this is not always the case.

Many years ago, when design compiler was on version 3 or 4 (in the mid 1990’s), it didn’t always have the best optimization engine. It did a fairly good job for most things, however, you could often outsmart the compiler. One example of this was when we designed the first floating point units for the #9 Imagine series of chips. We needed a leading ones detector for normalization of the results. The original code looked as follows (this was pre loop support days, so it could be written much more compactly today):

function [5:0] find_first_one;    
  input [24:0] a;    
  reg [5:0] ffo;    
  reg [31:0] b;    
  reg [15:0] c;    
  reg [7:0] d;    
  reg [3:0] e;    
  reg [1:0] f;    
  reg g;    
    find_first_one = 0;    
    casex(a) /* synopsys full_case */
      25'b1xxxxxxxxxxxxxxxxxxxxxxxx: find_first_one=0;      
      25'b01xxxxxxxxxxxxxxxxxxxxxxx: find_first_one=1;      
      25'b001xxxxxxxxxxxxxxxxxxxxxx: find_first_one=2;      
      25'b0001xxxxxxxxxxxxxxxxxxxxx: find_first_one=3;      
      25'b0000000000000000000000001: find_first_one=24;
    endcase // casex(a)
 endfunction // find_first_one
The engineer who optimized the function did what I talked about first by 
forcing the tool to implement an "optimal" design:

    b = {a,7'h0};    
    c = {b[31],b[29],b[27],b[25],b[23],b[21],b[19],b[17], 
         b[15],b[13],b[11],b[9],b[7],b[5],b[3],b[1]} |   
    d = {c[15],c[13],c[11],c[9],c[7],c[5],c[3],c[1]} |      
    e = {d[7],d[5],d[3],d[1]} | {d[6],d[4],d[2],d[0]};    
    f = {e[3],e[1]} | {e[2],e[0]};    
    g = f[1] | f[0];

    ffo[5] = g;    
    ffo[4] = ~f[1];    
    // b[31:16]    
    if (f[1]) begin      
      ffo[3] = ~e[3];      
      // b[31:24]      
      if (e[3]) begin 
        ffo[2] = ~d[7]; 
        // b[31:28] 
        if (d[7]) begin  
          ffo[1] = ~c[15];  
          if (c[15])    
            ffo[0] = ~b[31];  
            ffo[0] = ~b[29]; 
        end // if (d[7]) 
        // b[27:24] 
        else begin  
          ffo[1] = ~c[13];  
          if (c[13])    
            ffo[0] = ~b[27];  
            ffo[0] = ~b[25]; 
        end // else: !if(d[7])      
      end // if (e[3])      
      // b[23:16]      
      else begin 
        ffo[2] = ~d[5]; 
        // b[23:20] 
        if (d[5]) begin  
          ffo[1] = ~c[11];  
          if (c[11])    
            ffo[0] = ~b[23];  
            ffo[0] = ~b[21]; 
find_first_one = ffo;

The end result was better performance for the ASIC in question when comparing this code vs. the simpler version.

Shortly afterwards, I took a job at Synopsys and with another year of tool improvements, the original casex became a much better solution, improving both area and performance metrics given the same target library.

Similarly, recently I removed the “optimized” code from the FPGA I’m working on and was able to close timing much more easily by allowing the tool the freedom to replicate, optimize and better route the final implementation than the original “optimized” version. It’s also cleaner to read and easier to change.

The moral of this is that optimizations can be great, but do not throw out the original code, because eventually the tools get better and you may be able to achieve better results by trying out code that you felt was less than optimal.

About admin

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.