Experiment Design RegularizationBased Hardware/Software Codesign for RealTime Enhanced Imaging in Uncertain Remote Sensing Environment
 A. Castillo Atoche^{1, 2}Email author,
 D. Torres Roman^{1} and
 Y. Shkvarko^{1}
DOI: 10.1155/2010/254040
© A. Castillo Atoche et al. 2010
Received: 14 July 2009
Accepted: 13 January 2010
Published: 9 March 2010
Abstract
A new aggregated Hardware/Software (HW/SW) codesign approach to optimization of the digital signal processing techniques for enhanced imaging with realworld uncertain remote sensing (RS) data based on the concept of descriptive experiment design regularization (DEDR) is addressed. We consider the applications of the developed approach to typical singlelook synthetic aperture radar (SAR) imaging systems operating in the realworld uncertain RS scenarios. The software design is aimed at the algorithmiclevel decrease of the computational load of the largescale SAR image enhancement tasks. The innovative algorithmic idea is to incorporate into the DEDRoptimized fixedpoint iterative reconstruction/enhancement procedure the convex convergence enforcement regularization via constructing the proper multilevel projections onto convex sets (POCS) in the solution domain. The hardware design is performed via systolic array computing based on a Xilinx Field Programmable Gate Array (FPGA) XC4VSX3510ff668 and is aimed at implementing the unified DEDRPOCS image enhancement/reconstruction procedures in a computationally efficient multilevel parallel fashion that meets the (near) realtime image processing requirements. Finally, we comment on the simulation results indicative of the significantly increased performance efficiency both in resolution enhancement and in computational complexity reduction metrics gained with the proposed aggregated HW/SW codesign approach.
1. Introduction
In this paper, we address a new aggregated Hardware/Software (HW/SW) codesign approach to optimization of the digital signal and image processing techniques as required for enhanced remote sensing (RS) of the environment with the use of highresolution array radar and synthetic aperture radar (SAR) systems. At the algorithmdesign level, the RS imaging problem is treated as an illposed nonlinear inverse problem of reconstruction of the spatial spectrum pattern (SSP) of the backscattered field distributed over the remotely sensed scene via processing the SAR data signals distorted in the uncertain stochastic measurement channel [1–6]. The operational scenario uncertainties are attributed to inevitable random signal perturbations in inhomogeneous propagation medium [1, 2, 4], possible imperfect radar/SAR system calibration [1, 3], and SAR carrier trajectory deviations [3, 5, 6]. The unified approach that we address to solve such a problem is based on the recently proposed concept of descriptive experiment design regularization (DEDR) [7, 8]. The general DEDR method constructed in [7, 8] incorporates into the minimum risk (MR) nonparametric estimation strategy [4] the experiment designmotivated constraints of the image identifiably for the discreteform signal formation operator (SFO) specified by the employed signal modulation format [4–6]. On one hand, a considerable advantage of such DEDR paradigm relates to its flexibility in designing the desirable error metrics in the corresponding image representation space via defining different descriptive cost functions [7, 9]. On the other hand, the crucial limitations of the DEDR method relate to the necessity of performing simultaneously the solutiondependent SFO inversion operations and adaptive adjustments of the degrees of freedom of the overall DEDR image enhancement techniques ruled by the employed fixedpoint iterative process [8]. For the realworld largescale RS scenes, such adaptive fullformat DEDRoptimal method turns out to be computationally extremely consuming, therefore cannot be recommended as a practical technique realizable in (near) real time [10]. The innovative idea of this paper is to aggregate the DEDRoptimal fixedpoint iterative reconstruction/enhancement procedures developed in the previous studies [7, 8, 10] with the multilevel robustness and convergence enforcing regularization via constructing the proper projections onto convex sets (POCS) in the solution domain. The established POCSregularized iterative DEDR technique is performed separately along the range and azimuth directions over the scene frame making an optimal use of the rangeazimuth sparseness properties of the employed radar/SAR modulation format. Thus, at the SW codesign stage, we address two conceptually innovative propositions that distinguish our approach from the previous studies [7, 10]. First, two possible observation scenarios (instead of one) are unified now under the DEDR paradigm for the HW/SW codesign, namely, (i) regular case without model uncertaintiess and (ii) uncertain scenario with random perturbations in the SFO. Second, the POCS regularization is proposed to be performed in an aggregated multilevel fashion to make the optimal use of the nontrivial RS system model information for constructing the corresponding robustness and convergence enforcing POCS operators. In particular, the positivity and rangeazimuth orthogonalization projectors of [10] are aggregated with the point spread function (PSF) sparseness enforcing sliding window projectors acting in parallel over both range and azimuth image frames that set the corresponding PSF pixel values to zeroes outside their specified support regions. Such aggregated POCS regularization drastically speeds up the resulting fixedpoint iterative DEDR techniques making them exactly well fitted for the systolic computational implementation form; that is, provides the SW algorithmic base for the further HW codesign level of the problem treatment.
At the HW codesign stage, we propose to pursue the SystemonChip (SoC) single Field Programmable (FP) unit integration approach [9–14], which allows efficient coupling/integration of a number of predetermined complex components. Such a programmable unit is a viable solution for rapid prototyping and digital implementation of the radar/SAR image enhancement techniques developed at the SW codesign stage, in spite of designing the process in a common personal computer (PC) [11–14]. The main advantage of the proposed FPSoC platform is that all required component designs, including the embedded processor unit, memory, and peripherals are algorithmically "adapted" for the particular developed POCSregularized iterative fixedpoint DEDR image enhancement techniques. Therefore, at the HW design stage, the novel contribution of this study is twofold: first, the addressed HW/SW codesign methodology is aimed at an HW implementation of the developed software using systolic arrays as coprocessors units; second, the proposed systolicbased processing architecture is particularly adapted for computational implementation of the unified DEDRPOCS techniques in a computationally efficient fashion that meets the (near) realtime overall RS imaging system requirements. We resume this study with the analysis of the simulation results related to enhancement of the realworld degraded largescale SAR imagery (i.e., acquired in uncertain operational scenarios) indicative of the significantly increased reconstruction efficiency gained with the proposed HW/SW codesign approach.
2. Background
2.1. ContinuousForm Problem Model
The random kernel of the perturbed random signal formation operator (SFO) given by (1) defines the signal field formation model. Its mean is referred to as the nominal modulation law in the data formation channel defined by the timespace modulation of signals employed in a particular imaging radar/SAR system [3], and the variation about the mean models the stochastic perturbations of the wavefield at different propagation paths, where represents the zeromean multiplicative noise that models random propagation perturbations in the medium (the socalled general Rytov model [3, 5, 6]). The fields in (1) are assumed to be zeromean complex valued Gaussian random fields [3]. Next, we assume an incoherent nature of the backscattered field . This is naturally inherent to the RS experiments [1, 3, 5, 6] and leads to the form of the object field correlation function, , where (x) and b(x) are referred to as the object random complex scattering function and its average power scattering function or spatial spectrum pattern (SSP), respectively.
The problem of enhanced RS imaging is to develop a signal processing method for performing the efficient estimation of the SSP b(x) by processing the available radar/SAR measurements of the data wavefield u( y ). Such estimate of the SSP b(x) is referred to as the desired reconstructed RS image of the remotely sensed scene.
2.2. DiscreteForm Problem Model
is the discreteform approximation of the integral SFO defined by the EO (1), and e, n, u represent zeromean vectors composed of the decomposition (sampling) coefficients and , respectively [7]. These vectors are characterized by the correlation matrices:R _{ e } D D(b) diag(b) (a diagonal matrix with vector b at its principal diagonal), , and + R _{ n }, respectively, where defines the averaging performed over the randomness of characterized by the probability density function p( ) unknown to the observer, and superscript ^{+} stands for Hermitian conjugate (conjugate transpose). Vector b is composed of the elements b _{ k } ; k 1, , K, and is referred to as a KD vectorform representation of the SSP. The SSP vector b is associated with the socalled lexicographically ordered image pixels [7, 9]. The corresponding conventional K _{ y } K _{ x } rectangular frameordered scene image B relates to its lexicographically ordered vectorform representation b via the standard row by row expansion (socalled lexicographical reordering) procedure, B L [9]. Note that in the simple case of a certain operational scenario [1, 3, 7], the discreteform (i.e., matrixform) SFO S is assumed to be deterministic, in which case the random perturbation term in (3) is irrelevant, .
The digital enhanced RS imaging problem is formally stated as follows: to reconstruct the scene pixel frame image via lexicographical reordering L of the SSP vector estimate estimated from whatever available discrete measurements of the recorded radar/SAR data u. The reconstructed SSP vector is an estimate of the secondorder statistics of the scattering vector e observed through the perturbed SFO (3) contaminated with additive noise n and corrupted also with the signaldependent multiplicative noise , hence, the enhanced RS imaging problem at hand must be qualified and treated as a statistical nonlinear inverse problem with model uncertainties. The highresolution sensing implies formation of the RS image based on some statistically optimal solution of such an inverse problem robust against the problem model uncertainties. In this paper we propose to unify the POCS regularization with the DEDR method originally developed in [7, 8].
3. Unified DEDR Method
3.1. DEDR Strategy for Certain Operational Scenario
where defines the vector composed of the principal diagonal of the embraced matrix.
that implies the minimization of the weighted sum of the systematic and fluctuation errors in the desired estimate where the selection (adjustment) of the regularization parameter and the weight matrix A provide the additional experiment design degrees of freedom incorporating any descriptive properties of a solution if those are known a priori [3, 7]. It is easy to recognize that the strategy (6) is a structural extension of the statistical minimum risk estimation strategy [4] for the nonlinear spectral estimation problem at hand because in both cases the balance between the gained spatial resolution and the noise suppression in the resulting estimate is to be optimized.
3.2. Extended DEDR Strategy for Uncertain Scenario
where the regularization parameter and the metrics inducing weight matrix A compose the processing level "degrees of freedom" of the DEDR method.
3.3. DEDROptimal Solution Operators
 (1)SO for certain operational scenario follows directly from the solution to the optimization problem (6) found in the previous study [7] that results in(14)where(15)
represents the socalled regularized reconstruction operator [7]; is the noise whitening filter, and the adjoint (i.e., Hermitian transpose) SFO defines the matched spatial filter in the conventional signal processing terminology [1, 3];
 (2)SO for uncertain operational scenario follows as structural extension of (14) for the augmented (diagonal loaded) that yields [8](16)where(17)
represents the robustified reconstruction operator for the uncertain scenario.
3.4. DEDRRelated Imaging Techniques
 (1)
 (2)RSF. The RSF method implies no preference to any prior model information (i.e., I) and balanced minimization of the systematic and noise error measures in (9), (11) by adjusting the regularization parameter to the inverse of the signaltonoise ratio (SNR). In that case the SO becomes the Tikhonovtype robust spatial filter (RSF) [7]:(19)
in which the RSF regularization parameter _{ RSF } is adjusted to a particular operational scenario model, namely, ( / ) for the case of a certain operational scenario [7], and ( / ) in the uncertain operational scenario case [8], respectively, where represents the white observation noise power density, is the average a priori SSP value, and + corresponds to the augmented noise power density in the correlation matrix specified by (13).
 (3)
RASF. In the Bayesian statistically optimal problem treatment, and A are adjusted in an adaptive fashion following the Bayesian minimum risk strategy [8], that is, diag( ), the diagonal matrix with the estimate at its principal diagonal, in which case the SOs (14), (16) become itself solutiondependent operators that result in the following robust adaptive spatial filters (RASFs):
(20)for the certain operational scenario [7], and(21)for the uncertain operational scenario [8], respectively. Next, in all practical RS scenarios [1–3] (and specifically, in SAR uncertain imaging applications [2, 7, 8]), it is a common practice to accept the robust white additive noise model, that is, , attributing the unknown correlated noise component as well as multiplicative speckle noise to the composite uncertain noise term in (2), in which case I with the composite noise power density , the initial observation noise variance augmented by the loading factor specified by (13).
with = ; , and , , respectively. Any other feasible adjustments of the DEDR degrees of freedom (the regularization parameters , , and the weight matrix A) provide other possible DEDRrelated SSP reconstruction techniques, that we do not consider in this paper.
4. POCS Regularized DEDR Method
Because of the extremely high dimension 10^{12} of the operator inversions required to form the corresponding SOs specified by (20), (21), it is questionable to recommend the generalform DEDRoptimal method (22) as a practical enhanced RS imaging technique realizable in (near) real computational time. Hence, one has to proceed from the conventionalform dimensional RSF and RASF algorithms (that require cumbersome operator inversions (20)–(22) to more computationally efficient iterative techniques that do not involve the largescale operator inversions and incorporate the convergence enforcement regularization into the DEDR procedure via constructing the proper projections onto convex sets (POCS) in the solution domain. In the considered here RS imaging applications, such POCS is aimed at performing the factorization of the overall procedures over the orthogonal range (y)azimuth (x) coordinates in the scene frame making also an optimal use of the sparseness properties of the employed radar/SAR modulation format. Thus, the innovative idea is to perform the POCS regularization in an aggregated multilevel fashion. In particular, we propose to aggregate the positivity and rangeazimuth orthogonalization projectors constructed previously in [10] with the point spread function (PSF) sparseness enforcing sliding window projectors acting in parallel over both range and azimuth image frames that set the corresponding PSF pixel values to zeroes outside their specified support regions. In this section, we address such a unified multilevel POCSregularized iterative DEDR method as an extension of the previously proposed singlelevel DEDRPOCS [10] that we develop here in two stages.
4.1. First Stage: FixedPoint Iterative DEDR Algorithm
is formed as an outcome of the MSF algorithm from the DEDR family (22) specified for the adjoint SFO solution operator .
4.2. B. Second Stage: Multilevel POCS Regularization
in which the zerostep iteration L is formed using the conventional (i.e., lowresolution) MSF imaging algorithm (30), the aggregated convergence enforcing POCS regularizing operator is constructed by (31), and the matrixform fixedpoint iteration operator is specified by (27).
4.3. DEDRPOCS Convergence
and is guaranteed to converge to the point in the intersection of the convex sets specified by provided for all 1, 2, 3 regardless of the initialization that is a direct sequence of the fundamental theorem of POCS [9, page 1066]. Note that the employed specifications of the projectors in (33), that is, ; ; ; with 1 for all and L , satisfy these POCS convergence conditions, in which case the formal convergent POCS procedure (34) becomes the developed above fixedpoint DEDRPOCS algorithm given by (32).
Now we are ready to proceed with the hardware codesign implementation stage of our development.
5. Hardware/Software Codesign Methodology
The allsoftware execution of the prescribed RS image formation and reconstruction operations in modern highspeed personal computers (PC) or any existing digital signal processors (DSP) may be intensively time consuming [15]. These high computational complexities of the generalform DEDRPOCS algorithms make them definitely unacceptable for real time PCaided implementation.
When a coprocessorbased solution is employed in the HW/SW codesign architecture, the computational time can be drastically reduced [16]. As an introductive example, consider computation of the matrix product AB, where A and B define matrices of sizes k m and m p, respectively. Then to execute this product in a conventional sequential way, k m p multiply accumulation (MAC) operations are required. Therefore, the computational time required by a sequential processor or a highspeed PC for the allsoftware execution of the matrix product is of the order With the incorporation of a parallel and/or pipelined coprocessor alongside an embedded processor the required computational time is immediately reduced to O( where n defines the employed parallelism level.
In the HW design, we use the precision of 32 bits for performing all fixedpoint operations, in particular, 9bit integer and 23bits decimal for the implementation of each coprocessor. Such the precision guarantees numerical computational errors less than 10^{5} referring to the MATLAB Fixed Point Toolbox [17]. Using such the MATLAB fixedpoint toolbox we generated all the numerical test sequences required to verify computationally the proposed HW/SW codesign methodology (i.e., test sequences for performing the SW simulation and for the HW verifications). The results of such SW simulation and HW performance analysis will be presented and discussed further on in Sections 6.3 and 6.4. Finally, the host processor (the standard MicroBlaze embedded processor [18] in this study) performs the following functions: loading and storing of images, data transfer to the HW coprocessors, and data formatting for performing the correspondent mathematical operations.
5.1. Algorithmic Implementation
In this section, we develop the procedures for computational implementation of the DEDRPOCSrelated RSF and RASF algorithms in the MATLAB platform. This reference implementation scheme will be next compared with the proposed HW/SW codesign architecture based on the use of the single Field Programmable Gate Array chip.
Computational scheme for implementing the POCSregularized RSF and RASF algorithms.
(i)  Data acquisition 

(ii)  Formation of the current RS correlation data matrix Y(4) 
(iii)  Specification of the observation noise correlation model 
(iv)  Specification of the POCS model parameters 
(v)  Specification of the POCS operator components 
(vi)  Specification of the azimuthrange SFOs 
(vii)  Computations of the azimuthrange PSMs (25) 
(viii)  Formation of the azimuthrange POCS operators see(31) 
(ix)  Formation of the MSF image L (30) 
(x)  Iterative POCSRSF image enhancement (using (32)) with the robust updating ( ) of the iterative reconstruction operator (27) 
(xi)  Iterative POCSRASF image reconstruction (using (32)) employing the adaptive updating ( ) of the iterative reconstruction operator (27) 
(xii)  Termination of the iteration procedures 
(xiii)  Image enhancement/reconstruction performance analysis using the adopted quality metrics. 
 (i)
First, the PSMs (25), and factorized over the azimuth and range axes can be calculated concurrently that we refer to as , where symbol specifies now the concurrent execution of the corresponding computational operations.
 (ii)
Second, the zero step iteration (MSF image) can be computed using the same factorized structure analogues to .
 (iii)
Third, the reconstructed image , at the current (i 1)st iteration step is an iteratively updated function of computed at the previous i th iteration that also admits the factorized computing.
5.2. Partitioning Phase
One of the challenging problems of the HW/SW codesign is to perform an efficient HW/SW partitioning of the computational tasks. The goal of the partitioning stage is to find which computational tasks can be implemented in an efficient parallelized HW/SW architecture seeking for balanced areatime tradeoffs between different admissible design solutions [18–20]. In this study, the iterative fixedpoint POCSDEDR regularized algorithm has been partitioned at the algorithmic level to minimize the overall signal processing (SP) time via transferring some required reconstructive SP functions from the SW to the HW. The solution to this problem requires, first, the definition of a partitioning model that meets all the specification requirements (functionality, goals and constraints).
The system partitioning is clearly influenced by the target architecture onto which the HW and the SW will be mapped. The target architecture proposed in this study consists of one 32 bits RISC instruction set embedded processor (MicroBlaze) running the software and three dedicated coprocessors implemented by systolic processor arrays.
 (i)
In order to ensure a viable solution, the system must always satisfy the constraints: max ha, max ht, for each i th hardware coprocessor and max St, for the embedded processor . These three hardware coprocessors and the embedded processor compose the target architecture , for the preselected FPGA with the corresponding predetermined architecture constraints : [18].
 (ii)
Each block implementation must satisfy the predefined execution time performance requirements [18]: and conditioned by the specified above architecture constraints , and , correspondingly.
where represents the execution time required for implementing the corresponding DEDRPOCSrelated RSF and RASF algorithms in the standard MATLAB computational environment.
5.3. Mapping Phase
First, to achieve the desired maximal possible parallelism in an algorithm, we perform the analysis of the data dependencies in the corresponding computations. Then, the algorithm is transformed into a single assignment algorithm without global communication. A dependence graph (DG) is used to analyze these data dependencies of the corresponding algorithms [21]. Following [21], DG is defined as , where P represents a set of nodes and E is a set of arcs (or edges), that is, each edge connects the corresponding pair of nodes and the connection is formalized by .
that maps the Ndimensional DG (G^{ N }) onto the (N– 1)dimensional SFG ( ).
Here, defines a (1 p) vector composed of the first row of that determines the time scheduling. This vector indicates the normal direction of the equitemporal hyperplanes in the DG, "equitemporal" being understood in the sense that all the nodes on the same hyperplane must be processed at the same time [22]. The submatrix of dimension (the rest rows of ), determine the space processor. With this mapping, we are now ready to proceed with the construction of the required regular (N – 1)dimensional systolic arrays.
5.4. HW Implementation
Once the HW/SW codesign has been defined, the three coprocessors employed in the architecture exemplified in Figure 4 can be implemented using the HW systolic arrays. In this study, we are oriented at the use of the Xilinx MicroBlaze soft processor that employs the On Chip Peripheral Bus (OPB) for transferring the data from/to the memory to/from the coprocessor [23]. Such the OPB is a fully synchronous bus that connects other separate 32 bit data buses. This system architecture (based on the FPGA XC4VSX3510ff668 with the embedded processor and the OPB buses) restricts the corresponding processing frequency to 100 MHz. The typical rate of the OPB bus is 133 MByte/s, providing that each data transfer of 32bits is accomplished at 30.05 ns [23]. Next, to avoid multiple data transfer from the embedded processor data memory to the coprocessors, a register file is to be implemented inside each coprocessor.
The MSF coprocessor systolic architecture of Figure 6(b) consists of identical linearlyconnected processing elements (PEs). In our case, the internal structure of each PE contains a multiplier and an adder. Each PE receives 32bits operands and generates 64bits product. Then, the product is truncated to 32bits with a fixedpoint adopted representation of 9 integers and 24 decimals. Next, since the bandToeplitz type matrix S_{ a } is preloaded, the incoming data are transmitted in parallel to the corresponding PEs. After 2 cycles of clock, the data outputs are produced and transferred to the registers (gray blocks in Figure 6(a)). Once the first of the triple matricial product is completed, the data transfer to the second array begins. The control unit block guarantees the correct synchronization between the arrival of the input data and the computations for each PE. The result buffer of Figure 6(b) consists of a shift buffer used to store the elements generated in parallel by the boundary PEs. Finally, the bus interface unit realizes the communication between the systolic array and the embedded processor.
In summary, the developed systolic architectures perform the parallel and pipelined schemes which exploit the proposed above mapping methodology. These architectures provide the necessary HWlevel implementation of the SWoptimized complex multipurpose RS imaging algorithms.
6. Simulations and Performance Analysis
6.1. Simulation Experiment Specifications
In the verification simulation experiments, we considered a conventional singlelook SAR with the fractionally synthesized aperture as an RS imaging system [1, 2]. Recall, that signal formation operator (SFO) of such a SAR is factored along two axes in the image plane [3]: the azimuth or crossrange coordinate (horizontal axis, x) and the slant range (vertical axis, y), respectively. We considered the conventional triangular SAR range ambiguity function (AF) [3] (y) and Gaussian approximation [5, 6], of the SAR azimuth AF with the adjustable fractional parameter, a. Note that in the imaging radar applications [3, 4], an AF is referred to as the continuousform approximation of the PSM defined by (25) and serves as an equivalent to the point spread function in the conventional image processing terminology [9]. The image degradation and noising effects were incorporated to simulate the process of formation of the degraded specklecorrupted MSF images. First, following [1, 3] the degradation in the spatial resolution due to the fractional aperture synthesis mode were simulated via blurring the original image with the range AF along the y axis and with the azimuth AF along the x axis, respectively. Next, the degradations at the imageformation level due to the propagation and calibration uncertainties were simulated using the statistical model of a SAR image defocusing [2, 3]. For a considered singlelook SAR, the conventional MSF image formation algorithm (30) implies, first, application of the regular adjoint SFO to the zeromean Gaussian data realization u, and second, performing the elementbyelement (i.e., pixelbypixel) squared detection of S^{+}u to compose the corresponding SSP pixel estimates . Consequently, the MSF pixel estimates are chisquared distributed with two degrees of freedom, and such a distribution is a negative exponential Rayleigh distribution [2, 9]. Thus, to comply with the technicallymotivated MSF image formation scheme, the composite multiplicative noise was simulated as a realization of the distributed random variables with the pixel mean value assigned to the actual degraded scene image pixel that directly obeys the statistical speckle model [2, 5, 6]. Such signaldependent multiplicative image noise dominates the additive noise component in the data in the sense that , hence the estimate performed empirically via the application of the local statistics method [2] was used to adjust the regularization degrees of freedom (regularization factors) in all simulated DEDRrelated SSP reconstruction procedures.
We have run the simulation experiments for both certain and uncertain operational scenarios. In the both scenarios, we considered the MSF, RSF and RASF algorithms from the DEDRPOCS family (22). Also, to compare the developed algorithms with the conventional SAR image enhancement techniques [1–3], the celebrated Lee adaptive despeckling filter based on the local statistics method [2] was simulated. The family of four simulated techniques were renumbered as . The first one (p 1) relates to the conventional MSF estimator (30) that employs the adjoint SO . This degraded MSF image { } was then postprocessed applying the Lee adaptive despeckling filter [2] that we refer to as the adaptively despeckled MSF image , that is, p 2. Next, the nonadaptive RSF algorithm with the solution operator defined by (19) was applied to enhance the original MSF image employing the iterative DEDRRSF version of the unified fixedpoint iterative procedure (32); the resulting DEDRRSF enhanced image was specified as L and numbered as p 3. Last, the fourth simulated technique corresponds to the adaptive DEDRRASF method (32) with the optimal solution operator given by (21); the resulting adaptively enhanced DEDRRASF image was specified as L and numbered correspondingly as . In the second (uncertain) simulated scenario, the system AF was additionally distorted over the azimuth frame within the realistic interval of that corresponds to the partially uncompensated carrier trajectory deviations interval [2, 10]. For both scenarios, the simulations were run for different composite signaltonoise ratios (SNR) μ defined as the ratio of the average signal component in the rough image formed using the MSF algorithm (30) to the relevant noise component in the same image, where represents the average gray level of the original scene image.
6.2. Performance Metrics
According to these quality metrics, the higher is the IOSNR, and the lower is the MAE, the better is the improvement of the image enhanced/reconstructed with the particular employed algorithm.
6.3. Simulations
IOSNR values providedwith three simulated DEDRrelated methods, 2, 3, 4 (2) adaptive despeckling filter; (3) DEDRRSF; (4) DEDRRASF; results are reported for the certain and uncertain simulated scenarios.
SNR (dB)  ; 2, 3, 4  

FIRST (CERTAIN) SCENARIO:  SECOND (UNCERTAIN) SCENARIO:  





 
5  1.83  3.73  9.12  1.31  3.45  6.36 
10  2.47  4.80  10.11  1.96  4.14  7.93 
15  3.25  7.87  11.12  3.21  6.67  8.52 
20  4.64  9.05  13.42  4.02  8.32  10.28 
6.4. HW/SW Codesign Performance Analysis
Synthesis metrics. Specifications for data matrices of size and two BandToeplitz PSF matrices of the same pixel size with equal bandwidths of 2 3 and 2 3.
Synthesis metrics  Systolic array coprocessors  

MSF  PSM  Iterative POCS  
Number of slices  905 (5.89%)^{c}  242 (1.57%)  216 (1.41%) 
Number of ^{a}DSP'48  192 (100%)  9 (4.68%)  12 (6.25%) 
Number of ^{b}LUTs  932 (3.03%)  —  — 
Number of flipflops  1845 (6.01%)  480 (1.56%)  432 (1.40%) 
Maximum frequency  115.30 MHz  152.20 MHz  148.69 MHz 
Maximum pin delay  8.67 ns  6.57 ns  6.72 ns 
Next, the reported metrics of Table 3 specify the area and time behaviors of the corresponding hardware systolic arrays, that is the corresponding MSF, the PSM, and the iterative POCS architectures specified above in Section 5. From the analysis of the data reported in Table 3 and Figures 14(a) through 14s(c), one can deduce the following: With the proposed HW/SW codesign architecture (in which the embedded processors iterate properly the corresponding SP procedures) the DEDRPOCSrelated algorithms can be efficiently implemented in an iterative fixedpoint fashion also for the realistic largescale scenes (e.g., pixel size). Pursuing the proposed systolic computing architecture concept, the increased scene dimensionality requires the proper segmentation of the scene frame with the parallelized computing performed over the partitioned segments followed by the relevant integration of the overall partial processed data. Such partitioned systolic HW/SW codesign computingoriented processing can be performed directly following the architecture design concept proposed and specified in the previous Section 5.4. Additionally, the scalability in terms of FlipFlops, Slices and LUTs (i.e., the HW resources of the FPGA) for the proposed MSF, PSM and iterative POCS coprocessors are reported in Figures 14(a) through 14(c). In fact, the corresponding DEDRrelated SP algorithms can be efficiently implemented in a Field Programmable Systems on Chip (FPSoC) mode in spite of employing conventional systems based on multiFPGAs or PCClusters [12–14, 16]. The latter is practically inspired and desirable for a wide range of RS and general SP applications due to the large range density of the existing FPGAs that incorporate huge resources of logical gates, block RAM memory modules and soft or hardembedded processors integrated on the same chip with the relevant custom coprocessing HW blocks, and so forth. For example, an alternative approach for highspeed computational implementation of the reconstructive RS image processing based on the use of clusters of PCs was presented in [12–14]. In [12], the cluster NSPO Parallel TestBed for performing parallel radiometric and geometrical corrections of the largescale 3600 2944pixel RS images was implemented. The reconstructive image processing was conducted using a PCCluster composed by three PCs each one with a PentiumIII 550 MHz with 128 MB of RAM connected with 100 Mbps FastEthernet LAN. The processing time achieved with such threePCs cluster was only 33.3 seconds (nearreal time for conventional RS users), while the corresponding processing performed with one single processor required 84.65 seconds. In [13, 14], another kind of parallel architecture was implemented for morphological classification of hyperspectral RS imagery at the NASA's Goddard Space Flight Center. The parallel classifier of [14] uses 256processor Beowulf cluster (Thunderhead cluster) with hybrid neural parallelism that enables such a system to perform an accurate classification of the hyperspectral RS scenes in only 17 seconds.
As a result, advances on high performance computing as well as on specialized high performance hardware modules are necessarily required to achieve the nearreal processing time performances for complex RS algorithms.
Processing times required for implementing the conventional DEDR and the developed POCSregularized (SW/HW codesignbased) unified DEDRPOCS techniques (RSF and RASF).
Implementation method  Processing time [seconds]  

RSF (per iteration)  RASF (per iteration)  
Hypothetical FullFormat Implementation(Evaluated PCOriented Implementation)  5171.6  5655 
Factorized FixedPoint POCSRegularized Implementation (PCOriented)  19.70  20.05 
Previous HW/SW codesignbased implementation [10] (without systolic arrays)  7.82  7.985 
Proposed HW/SW codesignbased implementation (with systolic arrays)  2.51  2.56 
7. Concluding Remarks
The principal result of the undertaken study relates to the digital signal processingoriented solution of the RS image enhancement/reconstruction problems in a (near) real time computing mode (the "near real time" being understood in context of conventional RS users) via exploiting the aggregated hardware/software (HW/SW) codesign paradigm that results in an efficient hardware implementation architecture based on the use of systolic array processors. We have approached the goal of the (near) real time computational implementation of the enhancement/reconstruction of the RS imagery from two directions. First, we have analytically established that to alleviate the problem illposedness and reduce the overall computational load of the largescale image enhancement/reconstruction tasks at the algorithmic processing level, some special form of descriptive experiment design projectiontype numerical regularization must be employed. This stage was developed and addressed here as the unified DEDR method, and the efficient fixedpoint numerical iterative technique that incorporates the proper construction of the relevant orthogonally factorized regularizing projector onto convex sets (POCS) in the solution domain was designed and specified for the particular employed RS sensor system, namely, the sidelooking imaging synthetic aperture radar (SAR) operating in both certain and uncertain scenarios. We have also examined how such SARadapted POCSregularized fixedpoint iterative technique can be executed concurrently over the orthogonal rangeazimuth coordinates with optimal use of the sparseness properties of the overall SAR system point spread function characteristics. The algorithmiclevel advantages of such unified DEDRPOCSregularized RS image enhancement/reconstruction techniques relate to the theoretically guaranteed convergence of the corresponding fixedpoint iterative process with the proper factorization of the numerical reconstructive procedures over the orthogonal rangeazimuth directions in the representation image frame.
Second, we have examined that pursuing the proposed HW/SW codesign paradigm and employing the systolic arrays as coprocessing units, the (near) real time image processing requirements can be achieved due to performing the corresponding computations in an efficient systolic architecture mode. The unified algorithmic (softwarelevel, SW) and systematic (hardwarelevel, HW) codesign approach was verified via computer simulation experiments indicative of its efficiency for performing the RS image enhancement and reconstruction tasks in (near) real computational time. The tested DEDRPOCSrelated techniques implemented numerically using the proposed HW/SW codesigned computational architecture overperform the previously developed methods both in the attained reconstruction quality and the achievable computational complexity, that is manifest the substantially reduced overall computational time (e.g., up to three orders with respect to the schemes that do not aggregate the POCS regularization with the systolic computing). We do believe that pursuing the DEDRPOCSrelated HW/SW codesign paradigm with systolic array hardware accelerators one could approach definitely the real time computational requirements while performing the reconstructive processing of the largescale RS imagery attaining the enhancement/reconstruction performance gains close to the limiting bounds.
Authors’ Affiliations
References
 Wehner DR: HighResolution Radar. 2nd edition. Artech House, Boston, Mass, USA; 1994.Google Scholar
 Lee JS: Speckle suppression and analysis for synthetic aperture radar images. Optical Engineering 1986, 25(5):636643.View ArticleGoogle Scholar
 Henderson FM, Lewis AV: Principles and applications of imaging radar. In Manual of Remote Sensing. 3rd edition. John Wiley & Sons, New York, NY, USA; 1998.Google Scholar
 Shkvarko YV: Estimation of wavefield power distribution in the remotely sensed environment: Bayesian maximum entropy approach. IEEE Transactions on Signal Processing 2002, 50(9):23332346. 10.1109/TSP.2002.801916MathSciNetView ArticleGoogle Scholar
 Shkvarko YV: Unifying regularization and Bayesian estimation methods for enhanced imaging with remotely sensed data—part I: theory. IEEE Transactions on Geoscience and Remote Sensing 2004, 42(5):923931.View ArticleGoogle Scholar
 Shkvarko YV: Unifying regularization and Bayesian estimation methods for enhanced imaging with remotely sensed data—part II: implementation and performance issues. IEEE Transactions on Geoscience and Remote Sensing 2004, 42(5):932940.View ArticleGoogle Scholar
 Shkvarko YV: From matched spatial filtering towards the fused statistical descriptive regularization method for enhanced radar imaging. EURASIP Journal on Applied Signal Processing 2006, 2006:9.Google Scholar
 Shkvarko YV, PerezMeana H, CastilloAtoche A: Enhanced radar imaging in uncertain environment: a descriptive experiment design regularization paradigm. International Journal of Navigation and Observation 2008, 2008:11.Google Scholar
 Barrett HH, Myers KJ: Foundations of Image Science. John Willey & Sons, New York, NY, USA; 2004.Google Scholar
 Castillo A, Shkvarko YV, Torres D, Perez HM: Convex regularizationbased hardware/software codesign for realtime enhancement of remote sensing imagery. International Journal of Real Time Image Processing 2008, 4(3):261272.View ArticleGoogle Scholar
 Ponomaryov VI: Realtime 2D3D filtering using order statistics based algorithms. Journal of RealTime Image Processing 2007, 1(3):173194. 10.1007/s1155400700215View ArticleGoogle Scholar
 Yang CT, Chang CL, Hung CC, Wu F: Using a Beowulf cluster for a remote sensing application. Proceedings of the 22nd Asian Conference on Remote Sensing, November 2001, Singapore 1:Google Scholar
 Thunderhead System http://newton.gsfc.nasa.gov/thunderhead/
 Plaza A, Plaza J: Parallel morphological classification of hyperspectral imagery using extended opening and closing by reconstruction operations. Proceedings of the International Geoscience and Remote Sensing Symposium (IGARSS '08), July 2008, Boston, Mass, USA I.58I.61.Google Scholar
 MeyerBaese U: Digital Signal Processing with Field Programmable Gate Array. Springer, Berlin, Germany; 2001.View ArticleMATHGoogle Scholar
 Greco J, Cieslewski G, Jacobs A, Troxel IA, George AD: Hardware/software interface for highperformance space computing with FPGA coprocessors. Proceedings of the IEEE Aerospace Conference, March 2006, Big Sky, Mont, USA 1025.Google Scholar
 FixedPoint Toolbo User's Guide, MATLAB http://www.mathworks.com/
 EDK 9.1 MicroBlaze tutorial in Virtex4 Xilinx, http://www.xilinx.com/
 Marquardt A, Betz V, Rose J: Speed and area tradeoffs in clusterbased FPGA architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2000, 8(1):8493.View ArticleGoogle Scholar
 LópezVallejo M, López JC: On the hardwaresoftware partitioning problem: system modeling and partitioning techniques. ACM Transactions on Design Automation of Electronic Systems 2003, 8(3):269297. 10.1145/785411.785412View ArticleGoogle Scholar
 Lo SC, Jean SN: Mapping algorithms to VLSI array processors. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '88), 1988, New York, NY, USA 20332036.View ArticleGoogle Scholar
 Kung SY: VLSI Array Processors. Prentice Hall, Upper Saddle River, NJ, USA; 1988.Google Scholar
 Xilinx Application Note XAPP967: creating an OPB IPIFbased IP and using it in EDK 2007.
 Space Imaging GeoEye, 2008, http://www.euspaceimaging.com/
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.