Folding (DSP Implementation)
   HOME

TheInfoList



OR:

Folding is a transformation technique using in DSP architecture implementation for minimizing the number of functional blocks in synthesizing DSP architecture. Folding was first developed by
Keshab K. Parhi Keshab is a village in the Ardabil Province of Iran. References Tageo Populated places in Ardabil Province {{Ardabil-geo-stub ...
and his students in 1992. Its concept is contrary to unfolding. Folding transforms an operation from a unit-time processing to ''N'' unit-times processing where ''N'' is called folding factor. Therefore, multiple same operations (less than ''N'') used in original system could be replaced with a signal operation block in transformed system. Thus, in ''N'' unit-times, a functional block in transformed system could be reused to perform ''N'' operations in original system. While the folding transformation reduces the number of functional units in the architecture, it needs more memory element to store the temporary data. The reason is that multiple data produced from an operation block needs to be distinguished from ''N'' data produced from original operations. Therefore, the number of register may be increased. Furthermore, it needs additional multiplexer for switching different operation paths. Hence, the number of switching elements would also be increased. To counterattack such issues, the considerations of folding is * How to schedule multiple operations into an operation block. * How to schedule the memory element for reducing the number of registers and multiplexers.


Example

The following graph shows the example of folding transformation. The original DSP system produces ''y''(''n'') at each unit time. The transformed DSP system produces ''y''(''n'') in each 2 ''l'' where each 2 ''l'' increase 1 ''n'', index of ''y''. The resource used in original system are 2 adders, and the resource used in transformed system are 1 adder, 1 register, 3 multiplexer. The functional block, adder, is therefore reduced.


Algorithm

The DSP implementation in the folding algorithm is a Data flow graph(DFG), which is a graph composed of functional nodes and delay edges. Another input for folding algorithm is folding set which is the function maps an operation unit of original DFG to an operation of transformed DFG with the number ''n'' <= ''N'' indicated the order of reused operation. Given a DFG, a folding factor ''N'', and folding set. The transformation is performing: # Creating folded nodes which are the node of the
image An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
of folding set. # Computing the delay elements for storing the distinguished data among different operation cycles as equation: #:D_F(U\xrightarrowV)=Nw(e)-P_U+v-u where #* \scriptstyle D_F is the number of delay elements needed between element \scriptstyle U,V, the operation units of original DFG. #* \scriptstyle w(e) is the delay elements used in original DFG between \scriptstyle U,V. #* \scriptstyle u is the order of \scriptstyle U in the transformed operation block. #* \scriptstyle v is the order of \scriptstyle V in the transformed operation block. #* \scriptstyle P_U is the internal delay in transformed operation of \scriptstyle U. #:  # Merging the delay elements forms the data path between the functional elements in transformed DFG.


Biquad filter Electronic filter topology defines electronic filter circuits without taking note of the values of the components used but only the manner in which those components are connected. Filter design characterises filter circuits primarily by their t ...
example

The following graph show the example of folding algorithm. The folding set is \scriptstyle \ where \scriptstyle S_i is the transformed operator and \scriptstyle j is the order of such operator. Therefore, the image of the folding set are \scriptstyle S_1,S_2 representing adder and multiplier respectively. Furthermore, in this example, we use the pipelining adder and multiplier which have 1 and 2 delay respectively in right graph. Next, we compute the delay elements for storing the data. :\scriptstyle D_F(1\rightarrow 2)= 4(1)-1+1-3=1 :\scriptstyle D_F(1\rightarrow 5)= 4(1)-1+0-3=0 :\scriptstyle D_F(1\rightarrow 6)= 4(1)-1+2-3=2 :\scriptstyle D_F(1\rightarrow 7)= 4(1)-1+3-3=3 :\scriptstyle D_F(1\rightarrow 8)= 4(2)-1+1-3=5 :\scriptstyle D_F(3\rightarrow 1)= 4(0)-1+3-2=0 :\scriptstyle D_F(4\rightarrow 2)= 4(0)-1+1-0=0 :\scriptstyle D_F(5\rightarrow 3)= 4(0)-2+2-0=0 :\scriptstyle D_F(6\rightarrow 4)= 4(1)-2+0-2=0 :\scriptstyle D_F(7\rightarrow 3)= 4(1)-2+2-3=1 :\scriptstyle D_F(8\rightarrow 4)= 4(1)-2+0-1=1 After computing the delay element needed, we construct the data path to connect the functional blocks with corresponding multiplexer. The final graph is shown as below where \scriptstyle \ represents the switching moment.


Register minimization

K.K. Parhi, "Systematic synthesis of DSP data format converters using life time analysis and forward-backward register allocation," IEEE Trans. on Circuits and Systems -II, vol. 39, no. 7, pp.423-440, July 1992 In the above example, if we perform register minimization, we could reduce the number of register significantly. The technique for minimizing register is call lifetime analysis, which analyzes the time for when a data is produced and when a data finally s consumed. The time for producing a data is denoted \scriptstyle T_, and the time for the last consumed data is denoted \scriptstyle T_. :\scriptstyle T_=u+P_U where u is the folding order of \scriptstyle U and \scriptstyle P_U is the number of pipelining stages in the functional unit that executes \scriptstyle u. :\scriptstyle T_ for the node \scriptstyle U is \scriptstyle u+P_U+max_V\ Therefore, we could perform life time analysis from the above example as following table. {, class="wikitable" , - ! node!! \scriptstyle T_{input} !! \scriptstyle T_{output} , - , 1 , , 4 , , 9 , - , 2 , , 2 , , 2 , - , 3 , , 3 , , 3 , - , 4 , , 1 , , 1 , - , 5 , , 2 , , 2 , - , 6 , , 4 , , 4 , - , 7 , , 5 , , 6 , - , 8 , , 3 , , 4 From the life time analyzing above, we could analyze the minimal register needed. In this case, we construct the lifetime chart corresponding to the lifetime table in above. For node 1, we plot a horizontal line from cycle 4 to 9 indicating that the data is need to be stored from cycle 4 to cycle 9. In the same method, we could construct the chart to indicating that how many data need to be stored in each cycle. Hence, cycle 6 needs to store 2 data. Maximum number of data need to be store d in this example is 2. Hence, we allocate 2 delay element for constructing the transformed data path. After allocating 2 delay element for storing the temporary data, we need to schedule data stored at which register. The following table shows the data stored in each register R1 and R2, such that the number of multiplexer could be minimized. Finally, we could reconstruct the data path with fewer delay element and switching element in the folded design.


See also

*
Unfolding (DSP implementation) Unfolding is a transformation technique of duplicating the functional blocks to increase the throughput of the DSP program in such a way that preserves its functional behavior at its outputs. Unfolding was first proposed by Keshab K. Parhi and Dav ...
* High-level synthesis


References

Digital signal processing