Skip to main content
  • Research Article
  • Open access
  • Published:

Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions

Abstract

This paper presents an architecture that combines VLIW (very long instruction word) processing with the capability to introduce application-specific customized instructions and highly parallel combinational hardware functions for the acceleration of signal processing applications. To support this architecture, a compilation and design automation flow is described for algorithms written in C. The key contributions of this paper are as follows: (1) a 4-way VLIW processor implemented in an FPGA, (2) large speedups through hardware functions, (3) a hardware/software interface with zero overhead, (4) a design methodology for implementing signal processing applications on this architecture, (5) tractable design automation techniques for extracting and synthesizing hardware functions. Several design tradeoffs for the architecture were examined including the number of VLIW functional units and register file size. The architecture was implemented on an Altera Stratix II FPGA. The Stratix II device was selected because it offers a large number of high-speed DSP (digital signal processing) blocks that execute multiply-accumulate operations. Using the MediaBench benchmark suite, we tested our methodology and architecture to accelerate software. Our combined VLIW processor with hardware functions was compared to that of software executing on a RISC processor, specifically the soft core embedded NIOS II processor. For software kernels converted into hardware functions, we show a hardware performance multiplier of up to times that of software with an average times faster. For the entire application in which only a portion of the software is converted to hardware, the performance improvement is as much as 30X times faster than the nonaccelerated application, with a 12X improvement on average.

References

  1. Altera Corporation : Stratix II Device Handbook, Volume 1. available on-line: https://doi.org/www.altera.com

  2. Xilinx Incorporated : Virtex-4 Product Backgrounder. available on-line: https://doi.org/www.xilinx.com

  3. Lattice Semiconductor Corporation : LatticeECP and EC Familiy Data Sheet. available on-line: https://doi.org/www.latticesemi.com

  4. Apple Computer Inc : Optimizing with SHARK, Big Payoff, Small Effort.

  5. Suresh DC, Najjar WA, Vahid F, Villarreal JR, Stitt G: Profiling tools for hardware/software partitioning of embedded applications. Proceedings of ACM SiGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '03), June 2003, San Diego, Calif, USA 189–198.

    Google Scholar 

  6. De Micheli G, Ku D, Mailhot F, Truong T: The Olympus synthesis system. IEEE Design and Test of Computers 1990, 7(5):37–53. 10.1109/54.60605

    Article  Google Scholar 

  7. Lavagno L, Sentovich E: ECL: a specification environment for system-level design. Proceedings of 36th Design Automation Conference (DAC '99), June 1999, New Orleans, La, USA 511–516.

    Google Scholar 

  8. Gupta S, Dutt N, Gupta R, Nicolau A: SPARK: a high-level synthesis framework for applying parallelizing compiler transformations. Proceedings of 16th IEEE International Conference on VLSI Design (VLSI Design '03), January 2003, New Delhi, India 461–466.

    Google Scholar 

  9. Gupta S, Savoiu N, Dutt N, Gupta R, Nicolau A: Using global code motions to improve the quality of results for high-level synthesis. IEEE Transactions On Computer-Aided Design Of Integrated Circuits and Systems 2004, 23(2):302–312. 10.1109/TCAD.2003.822105

    Article  Google Scholar 

  10. Jones AK, Bagchi D, Pal S, Banerjee P, Choudhary A: Pact HDL: compiler targeting ASIC's and FPGA's with power and performance optimizations. In Power Aware Computing. Edited by: Graybill R, Melhem R. Kluwer Academic, Boston, Mass, USA; 2002:169–190. chapter 9

    Chapter  Google Scholar 

  11. Tang X, Jiang T, Jones AK, Banerjee P: Behavioral synthesis of data-dominated circuits for minimal energy implementation. Proceedings of 18th IEEE International Conference on VLSI Design (VLSI Design '05), January 2005, Kolkata, India 267–273.

    Google Scholar 

  12. Jung E: Behavioral synthesis using systemC compiler. Proceedings of 13th Annual Synopsys Users Group Meeting (SNUG '03), March 2003, San Jose, Calif, USA

    Google Scholar 

  13. Black D, Smith S: Pushing the limites with behavioral compiler. Proceedings of 9th Annual Synopsys Users Group Meeting (SNUG '99), March 1999, San Jose, Calif, USA

    Google Scholar 

  14. Bartleson K: A New Standard for System-Level Design. Synopsys White Paper, 1999

    Google Scholar 

  15. Goering R: Behavioral Synthesis Crossroads. EE Times Article, 2004

    Google Scholar 

  16. Pursley DJ, Cline BL: A practical approach to hardware and software SoC tradeoffs using high-level synthesis for architectural exploration. Proceedings of of the GSPx Conference, March–April 2003, Dallas, Tex, USA

    Google Scholar 

  17. Chappell S, Sullivan C: Handel-C for Co-Processing and Co-Design of Field Programmable System on Chip. Celoxica White Paper, 2002

    Google Scholar 

  18. Banerjee P, Haldar M, Nayak A, et al.: Overview of a compiler for synthesizing MATLAB programs onto FPGAs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2004, 12(3):312–324.

    Article  Google Scholar 

  19. Banerjee P, Shenoy N, Choudhary A, et al.: A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. Proceedings of 8th Annual IEEE International Symposium on FPGAs for Custom Computing Machines (FCCM '00), April 2000, Napa Valley, Calif, USA 39–48.

    Google Scholar 

  20. McCloud S: Catapult C Synthesis-Based Design Flow: Speeding Implementation and Increasing Flexibility. Mentor Graphics White Paper, 2004

    Google Scholar 

  21. Chaiyakul V, Gajski DD: Assignment decision diagram for high-level synthesis. In Tech. Rep. #92-103. University of California, Irvine, Calif, USA; December 1992.

    Google Scholar 

  22. Chaiyakul V, Gajski DD, Ramachandran L: High-level transformations for minimizing syntactic variances. Proceedings of 30th Design Automation Conference (DAC '93), June 1993, Dallas, Tex, USA 413–418.

    Google Scholar 

  23. Ghosh I, Fujita M: Automatic test pattern generation for functional RTL circuits using assignment decision diagrams. Proceedings of 37th Design Automation Conference (DAC '00), June 2000, Los Angeles, Calif, USA 43–48.

    Chapter  Google Scholar 

  24. Zhang L, Ghosh I, Hsiao M: Efficient sequential ATPG for functional RTL circuits. Proceedings of IEEE International Test Conference (ITC '03), September–October 2003, Charlotte, NC, USA 1: 290–298.

    Google Scholar 

  25. Chouliaras VA, Nunez J: Scalar coprocessors for accelerating the G723.1 and G729A speech coders. IEEE Transactions on Consumer Electronics 2003, 49(3):703–710. 10.1109/TCE.2003.1233807

    Article  Google Scholar 

  26. Atzori E, Carta SM, Raffo L: 44.6% processing cycles reduction in GSM voice coding by low-power reconfigurable co-processor architecture. IEE Electronics Letters 2002, 38(24):1524–1526. 10.1049/el:20021019

    Article  Google Scholar 

  27. Hilgenstock J, Herrmann K, Otterstedt J, Niggemeyer D, Pirsch P: A video signal processor for MIMD multiprocessing. Proceedings of 35th Design Automation Conference (DAC '98), June 1998, San Francisco, Calif, USA 50–55.

    Google Scholar 

  28. Garg R, Chung CY, Kim D, Kim Y: Boundary macroblock padding in MPEG-4 video decoding using a graphics coprocessor. IEEE Transactions on Circuits and Systems for Video Technology 2002, 12(8):719–723. 10.1109/TCSVT.2002.800857

    Article  Google Scholar 

  29. Hinds CN: An enhanced floating point coprocessor for embedded signal processing and graphics applications. Proceedings of Conference Record 33rd Asilomar Conference on Signals, Systems, and Computers, October 1999, Pacific Grove, Calif, USA 1: 147–151.

    Google Scholar 

  30. Alves JC, Matos JS: RVC-a reconfigurable coprocessor for vector processing applications. Proceedings of 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '98), April 1998, Napa Valley, Calif, USA 258–259.

    Google Scholar 

  31. Bridges T, Kitchel SW, Wehrmeister RM: A CPU utilization limit for massively parallel MIMD computers. Proceedings of 4th Symposium on the Frontiers of Massively Parallel Computation, October 1992, McLean, Va, USA 83–92.

    Chapter  Google Scholar 

  32. Schmit H, Whelihan D, Tsai A, Moe M, Levine B, Taylor RR: PipeRench: A virtualized programmable datapath in 0.18 micron technology. Proceedings of IEEE Custom Integrated Circuits Conference (CICC '02), May 2002, Orlando, Fla, USA 63–66.

    Google Scholar 

  33. Goldstein SC, Schmit H, Budiu M, Cadambi S, Moe M, Taylor RR: PipeRench: a reconfigurable architecture and compiler. Computer 2000, 33(4):70–77. 10.1109/2.839324

    Article  Google Scholar 

  34. Goldstein SC, Schmit H, Moe M, et al.: PipeRench: a coprocessor for streaming multimedia acceleration. Proceedings of 26th IEEE International Symposium on Computer Architecture (ISCA '99), May 1999, Atlanta, Ga, USA 28–39.

    Google Scholar 

  35. Cadambi S, Weener J, Goldstein SC, Schmit H, Thomas DE: Managing pipeline-reconfigurable FPGAs. Proceedings of 6th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '98), February 1998, Monterey, Calif, USA 55–64.

    Google Scholar 

  36. Schmit H: Incremental reconfiguration for pipelined applications. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 47–55.

    Google Scholar 

  37. Levine BA, Schmit H: Efficient application representation for HASTE: hybrid architectures with a single, transformable executable. Proceedings of 11th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '03), April 2003, Napa Valley, Calif, USA 101–110.

    Google Scholar 

  38. Ebeling C, Cronquist DC, Franklin P: RaPiD - reconfigurable pipelined datapath. Proceedings of 6th International Workshop on Field-Programmable Logic and Applications (FPL '96), September 1996, Darmstadt, Germany 126–135.

    Google Scholar 

  39. Ebeling C, Cronquist DC, Franklin P, Fisher C: RaPiD - a configurable computing architecture for compute-intensive applications. In Tech. Rep. TR-96-11-03. University of Washington, Department of Computer Science & Engineering, Seattle, Wash, USA; 1996.

    Google Scholar 

  40. Ebeling C, Cronquist DC, Franklin P, Secosky J, Berg SG: Mapping applications to the RaPiD configurable architecture. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 106–115.

    Google Scholar 

  41. Cronquist DC, Franklin P, Berg SG, Ebeling C: Specifying and compiling applications for RaPiD. Proceedings of 6th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '98), April 1998, Napa Valley, Calif, USA 116–125.

    Google Scholar 

  42. Cronquist DC, Fisher C, Figueroa M, Franklin P, Ebeling C: Architecture design of reconfigurable pipelined datapaths. Proceedings of 20th Anniversary Conference on Advanced Research in VLSI, March 1999, Atlanta, Ga, USA 23–40.

    Chapter  Google Scholar 

  43. Mirsky E, DeHon A: MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources. Proceedings of 4th IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '96), April 1996, Napa Valley, Calif, USA 157–166.

    Google Scholar 

  44. Kapasi UJ, Dally WJ, Rixner S, Owens JD, Khailany B: The imagine stream processor. Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, September 2002, Freiberg, Germany 282–288.

    Chapter  Google Scholar 

  45. Khailany B, Dally WJ, Kapasi UJ, et al.: Imagine: media processing with streams. IEEE Micro 2001, 21(2):35–46. 10.1109/40.918001

    Article  Google Scholar 

  46. Owens JD, Rixner S, Kapasi UJ, et al.: Media processing applications on the Imagine stream processor. Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, September 2002, Freiberg, Germany 295–302.

    Chapter  Google Scholar 

  47. Hauser JR, Wawrzynek J: Garp: a MIPS processor with a reconfigurable coprocessor. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 12–21.

    Google Scholar 

  48. Callahan TJ, Hauser JR, Wawrzynek J: The Garp architecture and C compiler. Computer 2000, 33(4):62–69. 10.1109/2.839323

    Article  Google Scholar 

  49. Callahan T: Kernel formation in Garpcc. Proceedings of 11th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '03), April 2003, Napa Valley, Calif, USA 308–309.

    Google Scholar 

  50. Hauck S, Fry TW, Hosler MM, Kao JP: The Chimaera reconfigurable functional unit. Proceedings of 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines (FCCM '97), April 1997, Napa Valley, Calif, USA 87–96.

    Google Scholar 

  51. Hauck S, Hosler MM, Fry TW: High-performance carry chains for FPGAs. Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '98), February 1998, Monterey, Calif, USA 223–233.

    Google Scholar 

  52. Hoare R, Tung S, Werger K: A 64-way SIMD processing architecture on an FPGA. Proceedings of 15th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS '03), November 2003, Marina del Rey, Calif, USA 1: 345–350.

    Google Scholar 

  53. Dutta S, Wolfe A, Wolf W, O'Connor KJ: Design issues for very-long-instruction-word VLSI video signal processors. Proceedings of IEEE Workshop on VLSI Signal Processing, IX, October–November 1996, San Francisco, Calif, USA 95–104.

    Google Scholar 

  54. Capitanio A, Dutt N, Nicolau A: Partitioned register files For VLIWs: a preliminary analysis of tradeoffs. Proceedings of 25th Annual International Symposium on Microarchitecture (MICRO '92), December 1992, Portland, Ore, USA 292–300.

    Chapter  Google Scholar 

  55. Trimaran, An Infrastructure for Research in Instruction-Level Parallelism 1998, https://doi.org/www.trimaran.org

  56. Jones AK, Hoare R, Kourtev IS, et al.: A 64-way VLIW/SIMD FPGA architecture and design flow. Proceedings of 11th IEEE International Conference on Electronics, Circuits and Systems (ICECS '04), December 2004, Tel Aviv, Israel 499–502.

    Google Scholar 

  57. Lee C, Potkonjak M, Mangione-Smith WH: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. Proceedings of 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '97), December 1997, Research Triangle Park, NC, USA 330–335.

    Google Scholar 

  58. Degener J, Bormann C: GSM 06.10 lossy speech compression library. available on-line: https://doi.org/kbs.cs.tu-berlin.de/~jutta/toast.html

  59. Golub G, Loan CFV: Matrix Computational. Johns Hopkins University Press, Baltimore, Md, USA; 1991.

    Google Scholar 

  60. Hassibi B, Vikalo H: On sphere decoding algorithm. I. Expected complexity. submitted to IEEE Transactions on Signal Processing, 2003

    Google Scholar 

  61. Hassibi B, Vikalo H: On sphere decoding algorithm. II. Examples. submitted to IEEE Transactions on Signal Processing, 2003

    Google Scholar 

  62. Chobe Y, Narahari B, Simha R, Wong WF: Tritanium: augmenting the trimaran compiler infrastructure to support IA64 code generation. Proceedings of 1st Annual Workshop on Explicitly Parallel Instruction Computing Architectures and Compiler Techniques (EPIC '01), December 2001, Austin, Tex, USA 76–79.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raymond R. Hoare.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hoare, R.R., Jones, A.K., Kusic, D. et al. Rapid VLIW Processor Customization for Signal Processing Applications Using Combinational Hardware Functions. EURASIP J. Adv. Signal Process. 2006, 046472 (2006). https://doi.org/10.1155/ASP/2006/46472

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/ASP/2006/46472

Keywords