SciELO - Scientific Electronic Library Online

vol.34 issue1A wavelet-based method for power-line interference removal in ECG signals author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand




Related links


Research on Biomedical Engineering

Print version ISSN 2446-4732On-line version ISSN 2446-4740

Res. Biomed. Eng. vol.34 no.1 Rio de Janeiro Jan./Mar. 2018  Epub Jan 15, 2018 

Technical communication

Modeling and FPGA-based implementation of an efficient and simple envelope detector using a Hilbert Transform FIR filter for ultrasound imaging applications

Amauri Amorin Assef1  2  * 

Breno Mendes Ferreira2 

Joaquim Miguel Maia1  3 

Eduardo Tavares Costa4 

1Graduate Program in Electrical and Computer Engineering, Electrical Engineering Department, Federal University of Technology - Parana, Curitiba, PR, Brazil.

2Graduate Program in Energy Systems, Electrical Engineering Department, Federal University of Technology - Parana, Curitiba, PR, Brazil.

3Graduate Program in Biomedical Engineering, Electronic Engineering Department, Federal University of Technology - Parana, Curitiba, PR, Brazil.

4Biomedical Engineering Department, School of Electrical and Computing Engineering, State University of Campinas, Campinas, SP, Brazil.



Although the envelope detection is a widely used method in medical ultrasound (US) imaging to demodulate the amplitude of the received echo signal before any back-end processing, novel hardware-based approaches have been proposed for reducing its computational cost and complexity. In this paper, we present the modeling and FPGA implementation of an efficient envelope detector based on a Hilbert Transform (HT) approximation for US imaging applications.


The proposed model exploits both the symmetry and the alternating zero-valued coefficients of a HT finite impulse response (FIR) filter to generate the in-phase and quadrature components that are necessary for the envelope computation. The hardware design was synthesized for a Stratix IV FPGA, by using the Simulink and the integrated DSP Builder toolbox, and implemented on a Terasic DE4-230 board. The accuracy of our algorithm was evaluated by the normalized root mean square error (NRMSE) cost function in comparison with the conventional method based on the absolute value of the discrete-time analytic signal via FFT.


An excellent agreement was achieved between the theoretical simulations with the experimental result. The NRMSE was 0.42% and the overall FPGA utilization was less than 1.5%. Additionally, the proposed envelope detector is capable of generating envelope data at every FPGA clock cycle after 19 (0.48 µs) cycles of latency.


The presented results corroborate the simplicity, flexibility and efficiency of our model for generating US envelope data in real-time, while reducing the hardware cost by up to 75%.

Keywords Ultrasound; Envelope detection; Hilbert transform; FPGA; Simulink


In medical ultrasound (US) imaging, the envelope detection is a commonly used digital signal processing (DSP) technique to extract the magnitude of the oscillating broadband radio-frequency (RF) signals from the high-frequency echo carrier before any back-end processing (Chang et al., 2007). Typically, there are three conventional methods that can be used to extract the low frequency envelope of the received echo signals: Hilbert Transform (HT) based demodulation, squaring and filtering based demodulation, and quadrature demodulation (Zhou and Zheng, 2015). As discussed by Levesque and Sawan (2009), although HT is more accurate and efficient than other mixing methods, filtering and quadrature algorithms are generally preferred, because of their lower computational requirements.

In order to reduce the complexity of the HT based demodulation, several hardware-based algorithms, typically implemented in programmable logic devices, such as Field Programmable Gate Array (FPGA), have been proposed by the research community (Chang et al., 2007; Hassan and Kadah, 2013; Levesque and Sawan, 2009; Qiu et al., 2012). Such demodulation algorithms involve extracting the analytic signal of the received RF signal using the HT. As the analytic signal is a complex signal, where the real part (I-in-phase) is the original signal and the imaginary part (Q-quadrature) is the HT of the original signal with a 90-degree phase shift in the operation band, its magnitude is calculated as the square root of the sum of the squares of the I and Q components (Schlaikjer et al., 2003). For example, Chang et al. (2007) proposed an envelope detector using the look-up table (LUT) method to compute the magnitude of the I/Q signals extracted from the quadrature demodulation. On the other hand, Levesque and Sawan (2009) presented a fully hardware-based quadrature demodulation processor that combines two finite impulse response (FIR) filters and a piecewise linear function to complete a square root unit. Qiu et al. (2012) used the same approach, however, the authors included a Cordic algorithm to calculate the modulus of I/Q data. In common, all these studies reported significant storage and/or arithmetic requirements, which can be a problem in terms of power consumption, delay and FPGA chip area occupation.

In this study, we present the modeling, validation and implementation of a fully FPGA-based digital envelope detector based on an optimized HT FIR filter for US imaging applications using the Matlab/Simulink (MathWorks, USA) software. In addition to the inherent nature of the symmetric coefficients, the proposed envelope detector exploits the alternating zero-valued coefficients in the HT FIR filter impulse response to achieve a cost-efficient hardware implementation with low complexity and latency. The discrete system is built by using the DSP Builder development tool (Intel Corp., USA) and synthesized for an Intel Stratix IV FPGA. The accuracy of the design is evaluated by the normalized root mean square error (NRMSE), and its operating time is estimated.


HT based demodulation algorithms can be built using different methods and techniques (Levesque and Sawan, 2009; Qiu et al., 2012). However, due to its inherent stability, linear phase and easiness for realization (DeBrunner and Wang, 2006; Soderstrand et al., 2000), a digital FIR filter is used in this model implementation.

The conventional digital FIR filter with constant coefficients of order N can be expressed in the following discrete convolution sum form:

y(n)=a0x(n)+a1x(n1)++aNx(nN)= i=0Naix(ni) (1)

where n is the sample index (n = 0, 1, 2, ..., N), x(n) is the input signal, y(n) is the output signal and ai are the length-N FIR filter coefficients. By taking advantage of the odd anti-symmetry impulse response (ie, a0=aN, a1=aN1, ) that we are considering in this work, Equation 1 can be written as

y(n)=a0[x(n)x(0)]+a1[x(n1)x(1)]++aN2xN2++aN21[x(N2+1)x(N21)]=aN2xN2+i=0N21ai[x(ni)x(i)] (2)

As discussed by Zhou and Zheng (2015), the conventional FIR convolution algorithm can also be applied to a HT approximation, which is characterized by an impulse response with interleaved zeros coefficients (ie, a0=a2==aN=0 ). Therefore, assuming that the even N-order (N+1 taps) filter should be such that the zero-valued coefficients form the first and last entry of the impulse response, where N must obey N=4+4n, it can be shown that

y(n)=a1[x(n1)x(1)]+a3[x(n3)x(3)]++aN21[x(N2+1)x(N21)]=i=1N4a2i1[x(n2i+1)x(2i1)]. (3)

From (3), the proposed cost-efficient hardware-based HT FIR filter architecture is shown Figure 1. The output signal Q(n) is the quadrature component of the signal produced by the HT FIR filter and I(n) is the in-phase component, which corresponds to the input signal x(n) delayed by an appropriate amount of cycles to compensate the phase delay of the FIR process employed for generating the Q(n) output. As described before, this scheme exploits the HT FIR filter coefficients properties to reduce the required number of multiplication operations and adders to N/4, and shift registers to N/2-1 in the convolution sum algorithm.

Figure 1 Proposed HT FIR filter architecture to produce an efficient hardware realization. 

For simplicity and according to Levesque and Sawan (2009) and Qiu et al. (2012), which investigated a satisfactory trade-off between delay and filter order for envelope detectors, we choose to evaluate a 32th-order HT FIR filter in this study. The filter coefficient values were calculated by using the Matlab FDATool with the equiripple FIR filter design method (McClellan et al., 1973). The HT FIR filter impulse response with normalized pass-band of 0.05 to 0.95 is presented in Figure 2.

Figure 2 Impulse response of the 33-tap HT FIR filter with negative odd symmetry and interleaved zero-valued coefficients. 

The FPGA-based envelope detector was modeled in Simulink by using the integrated DSP Builder toolbox, allowing fast and automatic generation of hardware description language (HDL) code. Figure 3 shows the top-level design of the proposed hardware model. Initially, the “Input” block casts double precision RF data loaded from the Matlab workspace into 16-bit signed fixed-point representation for hardware efficiency. Then, the HT FIR filter structure computes the I and Q signals of the input RF_signal to minimize the amount of computation needed. As a result, the implemented DSP modeling function was reduced to: (1) 9 “Parallel Adder Subtractor” blocks, where the + and - operators determine whether each input is added to or subtracted from the total; (2) 15 “Delay” blocks with two pipeline stages; (3) and 8 shared “Multiplier” blocks based on predefined signed fractional coefficients, which are imported from Matlab workspace and stored as constants. Following the 8-input “Parallel Adder” block, the output bit width was limited to 16 bits to save hardware resources in the subsequent signal processing chain. After the HT FIR filter, the sum of the squares of the I and Q signals was achieved by the “Multiply Add” block with two multipliers. Before the next operation, an “AltBus” module was used to optimize the bit width of the output signal to 32 bits. Finally, the “Square Root” block returns the square root of the received argument with 16-bit resolution through the “Output” block envelope. The generated output signals I, Q and envelope data, labeled as In, Qn and Envelope, respectively, were exported to the Matlab workspace for subsequent off-line analysis. To complete the hardware implementation, a “ROM” block that maps data to an embedded RAM in the FPGA was included to store the echo signal.

Figure 3 Architecture implementation of the FPGA-based envelope detector using the DSP Builder Blockset. 

To evaluate the performance of the proposed model, we used real US data (2000 samples) captured by a 3.2 MHz central frequency AT3C52B (Broadsound Corp., Taiwan) convex array transducer connected to an US research system that has been developed in our University (Assef et al., 2012). This RF signal was acquired from a tissue-cyst mimicking phantom (84-317, Nuclear Associates) and sampled at 40 MHz with 12-bit resolution. The complete acquisition setup is described by Assef et al. (2016). Once verified and validated in both Simulink and ModelSim (Intel Corp., USA) softwares, the generated VHDL (Very High Speed Integrated Circuits HDL) output files were synthesized and compiled with the Quartus II (Intel Corp., USA) software. For the experimental implementation, we used a DE4-230 FPGA development board (Terasic Tech., Taiwan), containing a Stratix IV EP4SGX230KF40C2 FPGA, and the SignalTap II Logic Analyzer, available in Quartus II, for data acquisition.

The accuracy of the envelope detector model was evaluated graphically, as well as quantified by the NRMSE cost function in comparison with the ideal HT (Chang et al., 2007), calculated from the absolute value of the discrete-time analytic signal (DTAS) via Fast-Fourier Transform (FFT) (Marple, 1999) in Matlab.


Figures 4a and b show the graphical comparison between the Matlab simulation and experimental results, in which Figure 4b is the enlargement of Figure 4a from 11 to 14 µs, for better comparison. Here, the original input data (RF signal), the simulated envelope obtained with the reference method (DTAS-FFT response) and the resultant envelope information acquired by the FPGA (FPGA HT FIR response) are presented. As it can be seen, an excellent agreement was achieved between the reference simulation and our method, which was also confirmed by the calculated NRMSE of 0.42% and by the respective frequency spectrum responses (Figure 5).

Figure 4 Comparison between the simulation and experimental results. (a) Original input RF signal and comparison between the HT envelope information simulated in Matlab by the DTAS-FFT method and processed by the HT FIR hardware implementation in the FPGA device. (b) Enlargement of (a) from 11 to 14 µs, for better illustration. 

Figure 5 Comparison between the magnitude responses of the Matlab simulation (DTAS-FFT response) and experimental results (FPGA HT FIR response).  

In terms of FPGA clock cycles, the latency for the I and Q computation is 16 clock cycles and for the square and square root operations is 3 clock cycles. As a result, the proposed 33-tap HT FIR filter model is capable of generating real-time envelope data at every FPGA clock cycle after 19 clock cycles [N/2+3] of latency. Therefore, 2019 clock cycles of 40 MHz are needed to complete the envelope detection process, so that the fastest triggering frequency is about 19.81 kHz.

Additionally, other filters with order 8, 16, 64 and 128 were also tested using this method. As expected, the latency values were 0.18, 0.28, 0.88 and 1,68 µs, respectively. However, comparing to the ideal HT, the filters with order 8 and 16 were not efficient enough to compute the envelope data, resulting in a weak NRMSE of 47.45% and 11.45%, respectively. On the other hand, although the filters with higher order have produced an excellent NRMSE (<1%), no significant difference in the results was found, at the expense of additional computation time.

The FPGA resources utilization of the hardware design can be summarized as follows: maximum operating frequency of 301.84 MHz; 529 ALUTs (<1%); 496 dedicated logic registers (<1%); 1 PLL (13%); and 18 18-bit elements DSP blocks (1.4%).


The fully hardware-based digital envelope detector model presented here offers an excellent alternative when compared with other demodulation methods that require, for example, the use of complex FFT algorithms (Hans, 2005) or mixing sine and cosine functions (Qiu et al., 2012) to compute the analytic signal of the RF echo data in real-time. The proposed model is more simple, easy to implement and computationally efficient in terms of hardware requirements, while yielding similar results. Additionally, as the DSP Builder automatically translates the Simulink design into VHDL code, biomedical students and researchers do not need previous knowledge of HDL programming to implement, simulate and synthesize the algorithm in FPGA. Consequently, this methodology accelerates the investigation of new DSP algorithms and shortens the development cycle considerably.

In comparison with other conventional solutions (Hassan and Kadah, 2013; Schlaikjer et al., 2003), the symmetry properties and interleaved zero tap coefficients of the HT FIR filter impulse response were exploited to efficiently reduce in approximately 75% the number of 18x18 DSP blocks available in the FPGA to realize the filter. Consequently, our method consumed only 16 DSP blocks to multiply the filter coefficient – two DSP blocks for each multiplication – and two DSP blocks to square the incoming I and Q signals. However, optimized filter designs with more coefficients N and/or more bits can be evaluated to increase performance (Schlaikjer et al., 2003; Soderstrand et al., 2000), resulting in the usage of N/2+2 DSP blocks. On the other hand, as the latency for the envelope detection depends on the number of taps needed for the HT FIR filter (Chang et al., 2007; DeBrunner and Wang, 2006), the major flaw of this method is that the increase in the filter order also increase the delay. Consequently, a longer time will be required for the envelope computation process.

In this work, we model a 33-tap HT FIR filter, resulting in the generation of envelope data with a total latency of 0.48 µs. Theoretically, considering a frame with 128 scanlines, for example, the total time to obtain the envelope information of the frame is 6.46 ms, which corresponds to a frame rate about 154 frames per second, and, thus, satisfying the requirement for the real-time US imaging (Chang et al., 2007; Jensen et al., 2005). Obviously, this frame rate tends to decrease when considering the various stages of echo signal processing chain, such as logarithmic compression and scan conversion, amongst others.

As the signals shown in Figure 4 are very close, the NRMSE was a good index to assess the accuracy between the results obtained by our model algorithm and those of the reference method. According to the scientific literature, the performance of the model is considered excellent if NRMSE is less than 10%, which corroborates the effectiveness of the proposed envelope detector model. Also, it can be seen in Figure 5 that there are minor differences between magnitude responses within the pass-band and there is no frequency dependence attenuation, as expected (Levesque and Sawan, 2009; Zhou and Zheng, 2015). This result can be explained by the coefficient rounding and by the chosen filter length, which can be adjustable to improve the HT FIR filter response, in addition to the difference between the double precision Matlab implementation and the 16-bit fixed-point precision used in the model, as discussed by DeBrunner and Wang (2006), and Hassan and Kadah (2013).

In conclusion, we have successfully modeled and evaluated an efficient FPGA-based envelope detector for US imaging applications. The HT FIR filter algorithm has been realized easily and quickly by combining the Matlab/Simulink and DSP Builder tool, and proved to be able to produce accurate results with less computational cost.


The authors would like to thank CNPq and CAPES for their financial support. We would also like to thank the Intel Corporation for the donation of the DE4-230 FPGA board, which was used in this work.

How to cite this article: Assef AA, Ferreira BM, Maia JM, Costa ET. Modeling and FPGA-based implementation of an efficient and simple envelope detector using a Hilbert Transform FIR filter for ultrasound imaging applications. Res Biomed Eng. 2018; 34(1):. DOI: 10.1590/2446-4740.02417


Assef AA, Maia JM, Costa ET. Initial experiments of a 128-channel FPGA and PC-based ultrasound imaging system for teaching and research activities. In: Proceedings of the 38th Annual International Conference of the Engineering in Medicine and Biology Society (IEEE 2016); 2016 Aug 16-20; Orlando. FL. USA: IEEE; 2016. p. 5172-5. [ Links ]

Assef AA, Maia JM, Schneider FK, Costa ET, Button VL. Design of a 128-channel FPGA-based ultrasound imaging beamformer for research activities. In: Proceedings of the 2012 IEEE International Ultrasonics Symposium (IUS 2012); 2012 Oct 7-10; Dresden. Germany: USA: IEEE; 2012. p. 635-8. 10.1109/ULTSYM.2012.0158. [ Links ]

Chang JH, Yen JT, Shung KK. A novel envelope detector for high-frame rate, high-frequency ultrasound imaging. IEEE Trans Ultrason Ferroelect Freq Control. 2007; 54(9):1792-801. PMid: 17941385. [ Links ]

DeBrunner LS, Wang Y. Optimizing filter order and coefficient length in the design of high performance FIR filters for high throughput FPGA implementations. In: 4th Digital Signal Processing Workshop. Proceedings of the 12th-Signal Processing Education Workshop; 2006 Sep 24-27; Teton National Park, WY. USA: IEEE; 2006. p. 608-12. [ Links ]

Hans V. Signal processing of complex modulated ultrasonic signals. In: Merzkirch W, editors. Fluid mechanics of flow metering. Germany: Springer Berlin Heidelberg; 2005. p. 79-94. [ Links ]

Hassan MA, Kadah YM. Digital signal processing methodologies for conventional digital medical ultrasound imaging system. Am J Biomed Eng. 2013; 3(1):14-30. [ Links ]

Jensen JA, Holm O, Jerisen LJ, Bendsen H, Nikolov SI, Tomov BG, Munk P, Hansen M, Salomonsen K, Hansen J, Gormsen K, Pedersen HM, Gammelmark KL. Ultrasound research scanner for real-time synthetic aperture data acquisition. IEEE Trans Ultrason Ferroelect Freq Control. 2005; 52(5):881-91. PMid: 16048189. [ Links ]

Levesque P, Sawan M. Real-time hand-held ultrasound medical-imaging device based on a new digital quadrature demodulation processor. IEEE Trans Ultrason Ferroelectr Freq Control. 2009; 56(8):1654-65. PMid: 19686981. [ Links ]

Marple L. Computing the discrete-time "analytic" signal via FFT. IEEE Trans Sig Process. 1999; 47(9):2600-3. [ Links ]

McClellan J, Parks TW, Rabiner L. A computer program for designing optimum FIR linear phase digital filters. IEEE Trans Audio Electroacoust. 1973; 21(6):506-26. [ Links ]

Qiu W, Yu Y, Tsang FK, Sun L. An FPGA-based open platform for ultrasound biomicroscopy. IEEE Trans Ultrason Ferroelectr Freq Control. 2012; 59(7):1432-42. PMid: 22828839. [ Links ]

Schlaikjer M, Bagge JP, Sorensen OM, Jensen JA. Trade off study on different envelope detectors for B-mode imaging. In: Proceedings of the 2003 IEEE International Ultrasonics Symposium (IUS 2003); 2003 Oct 5-8; Honolulu, HI. USA: IEEE; 2003. p. 1938-41. [ Links ]

Soderstrand MA, Johnson LG, Arichanthiran H, Hoque MD, Elangovan R. Reducing hardware requirement in FIR filter design. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00); 2000 June 5-9; Istanbul, Turkey. USA: IEEE; 2000. p. 3275-8. [ Links ]

Zhou H, Zheng YF. An efficient quadrature demodulator for medical ultrasound imaging. Front Inf Technol Electr Eng. 2015; 16(4):301-10. [ Links ]

Received: August 14, 2017; Accepted: December 06, 2017

*Corresponding author: Amauri Amorin Assef, Graduate Program in Energy Systems, Graduate Program in Electrical and Computer Engineering, Electrical Engineering Department, Federal University of Technology - Parana, Av. Sete de Setembro, 3165, CEP 80230-901, Curitiba, PR, Brazil. E-mail:

Creative Commons License This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.