Skip to main content

High-precision reconstruction method based on MTS-GAN for electromagnetic environment data in SAGIoT

Abstract

Equipment failures and communication interruptions of satellites, aircraft and ground devices lead to data loss in Space-Air-Ground Integrated Internet of Things (SAGIoT). The incomplete data affect the accuracy of data modeling, decision-making and spectrum prediction. Reconstructing the incomplete data of electromagnetic environment is a significant task in the SAGIoT. Most spectral data completion algorithms have the problem of limited accuracy and slow iterative optimization. In light of these challenges, a novel high-precision reconstruction method for electromagnetic environment data based on multi-component time series generation adversarial network (MTS-GAN) is proposed in this paper. MTS-GAN transforms the reconstruction method of electromagnetic environment data into the data generation problem of multiple time series. It extracts the time–frequency joint features and the overall distribution of electromagnetic environment data. To improve the reconstruction precision, MTS-GAN simulates the time irregularity of incomplete time series by applying a gate recursive element to adapt to the attenuation effect of discontinuous time series observations. Experimental results show that the proposed MTS-GAN provides high completion accuracy and achieves better results than competitive data completion algorithms.

1 Introduction

With the development of the Sky-to-Ground Integrated Internet of Things (SAGIoT), various heterogeneous signals from satellites, aircraft and ground devices are fused into a large-scale electromagnetic spectrum dataset. However, due to equipment failures and communication interruptions, the acquired spectrum data often contain missing values and gaps. Spectrum data of satellite communications exist with gaps due to changes in relative geographical positions and obstructions by obstacles. Spectrum data transmitted by aircraft have gaps due to electromagnetic interference, adverse weather conditions and other reasons. Spectrum data of ground devices are missing due to distance limitations and equipment aging. Completing these missing spectrum data is crucial for improving the performance of downstream analysis tasks such as target identification and anomaly detection. Generative adversarial networks (GANs) provide a promising solution for spectrum completion by learning the underlying distribution of spectrum data.

For the rapid improvement of communication technology and the exponential growth of communication devices [1], the electromagnetic environment presents complex and transient characteristics [2]. Scientific establishment of electromagnetic environment model and adequate electromagnetic environment mining and evolution play a crucial role in fusion mining of potential information and correlation, accurate cognition of electromagnetic activity and frequency behavior [3] and auxiliary spectrum allocation and decision-making. The spectrum data of electromagnetic environment refer to the data related to the radio environment [4]. Large, complete and accurate spectrum data are the premise of electromagnetic environment modeling and monitoring [5].

In the early data completion, the feature and distance of the known data near the missing item were analyzed by introducing features, kernel function and confidence, and the missing item was estimated. In recent years, these methods have been combined with deep learning to better extract features and improve completion accuracy. For example, Tang et al. [6] constructed a new learning-guided convolutional network based on kernel weight prediction, which automatically generates spatially varying convolutional kernel according to input and extracts depth image features through a spatially variable channel convolution stage and a spatially invariant cross-channel convolution stage convolution network. Bao et al. [7], Sattari et al. [8] considered the influence of data from the same site in the same period and used kernel-based support vector regression and Gaussian process regression to complete the missing rainfall data, but did not further introduce the correlation of data from different time periods [9]. In addition, Eldesokey et al. [10] introduced a normalized convolution layer through confidence propagation for depth completion of unguided scenes. However, without the availability of multimodal data, these methods are limited and lose depth of detail and semantic information [11]. Zhao et al. [12] use an attentional mechanism to graph propagation, and multi-scale features can be recovered by applying these propagations to different graphs derived from observed pixels [13].

The completion of tensor data is generally carried out by tensor decomposition and kernel norm minimization. Yokota et al. [14] added “smooth” constraints and low-rank approximation to low-rank tensor decomposition with high missing proportion by effectively selecting the model to minimize tensor rank and smooth PARAFAC decomposition of incomplete tensors [15]. For the missing data containing noise, Qiu et al. [16] adopted tensor ring kernel norm (TRNN) and least squares estimator to regularize the underlying tensor and observation terms, respectively, and proposed an effective noise tensor completion model.

In addition, data completion based on generative adversarial network comes into people’s sight, which guarantees the global consistency of data through the interaction of generator and discriminator. Iizuka et al. [17] used GANs with global and local context discriminators to distinguish between false and non-false images. This method can make the completed image data consistent both locally and globally. Ehsani et al. [18] proposed SeGAN, an appearance scheme based on improved generative adversarial network to complete the blocked object [19]. Through the generated realistic synthetic data, the exact boundary of the invisible region was obtained, and on this basis, the segmentation and generation of the invisible part of the object were jointly optimized. Dhamo et al. [20] proposed to use CNN to jointly get deeper features and foreground separation mask, learn the standard depth map and foreground background mask using the full convolutional network, and fill in the color and depth of the missing region using the conditional GAN generator [21]. Further, in Kortylewski et al. [22] an improved deep feature learning model is proposed to recognize the masked regions and extract the features of the unmasked regions [23]. Zheng et al. [24] use improved GAN to complete image occlusion completion. GAN is used to segment and complete the input samples at the same time to achieve the performance optimization of the model [25]. Zhou et al. [26] further introduced human posture to solve the problem of segmentation mask occlusion and human body occlusion with invisible human appearance content by improving GAN [27].

For electromagnetic spectrum data, because the current electromagnetic environment monitoring and mining still rely on long-term complete spectrum data, spectral data missing completion is a key problem in spectrum data preprocessing. In order to fill in the gaps in the spectrum data and use the completed spectrum data to aid in the study of a complicated electromagnetic environment [28], Sun et al. [29] developed a spectrum tensor completion strategy based on HaLRTC. The problem lies in the random missing. Sun et al. [30] proposed a brand-new method for long-term spectrum forecasting based on tensor completion (LSP-TC) [31], which visualizes the spectrum data of various frequency bands or points throughout the entire sky in various time intervals. Ding et al. [32] developed a robust online spectrum prediction (ROSP) framework with incomplete and damaged observations by considering possible anomalies [33], omissions and data to be predicted in the measured spectrum data and carried out joint optimization of matrix completion and recovery from the perspective of spectrum-time 2D. However, the above studies only considered the impact of individual known elements on the missing time series and did not consider the inter-sequence impact from the aspects of time domain and frequency domain. From the perspective of missing data for multiple time slots in broadband, the decay of the influence of past observations over time is also rarely considered. Therefore, the contributions of this paper are as follows:

  1. (1)

    In this paper, the problem of electromagnetic environment data completion is transformed into the problem of multi-component time series data generation and reconstruction, and multi-component time series generation adversarial network (MTS-GAN) is proposed. The accuracy of electromagnetic environment data reconstruction is improved by extracting multiple sequences and their features and learning the correlation between time domain and frequency domain.

  2. (2)

    As the influence of gated loop unit on past observations gradually decreases with time, in order to adapt to the attenuation effect of discontinuous time series observations, an improved gate recursive unit is adopted in this paper to simulate the time irregularity of incomplete time series, so as to ensure the accuracy of feature extraction of temporal data distribution and the robustness and accuracy of reconstruction under severe data missing.

  3. (3)

    In this paper, a comprehensive evaluation method of multiple indicators is used to ensure the reliability and robustness of the results. The results are compared with the tensor statistical calculation method of SiLRTC, FaLRTC and machine learning method of SAEs under different miss rates, which fully demonstrates the effectiveness and superiority of the proposed method.

The structure of the rest of this article is as follows: In Sect. 2, we first explain the principle of data completion method based on generative adversarial network and then outline the overall process of data completion. In Sect. 3, the principle of spectral data completion based on multi-component time series generation adversarial network is explained in detail. In Sect. 4, we demonstrate the advantages of the proposed method in reconstruction accuracy and convergence speed through experiments. In Sect. 5, the full text is summarized.

2 Data completion based on GAN

2.1 Generative adversarial network

Generative adversarial network, as a novel form of generative neural network, has exhibited significant performance advantages in addressing data incompleteness issues. By employing an adversarial learning framework, GAN constrains the learned distribution of the model, thus ensuring it captures the true data distribution. Typically, GAN consists of a generator (generative network) and a discriminator (discriminative network). The generator tries to properly understand the data distribution during training, while the discriminator aims to assign a high probability of authenticity to each sample. When the generator successfully captures the data distribution and the discriminator consistently labels the samples as genuine, the model training converges.

The generator network \(\textit{G}\) takes g as input in order to generate synthesized samples G(g) that aim to mimic the distribution of real samples as accurately as possible. And a represents real samples obtained by sampling from real data. Both real samples and generated samples are simultaneously input into the discriminator network \(\textit{D}\). The motive of the discriminator community is to decide whether or not the enter records is a actual pattern or a generated sample. For genuine samples, it outputs 1, and for produced samples, it outputs 0. On the other hand, the generator network’s objective is to produce synthetic data g and estimate the distribution p(a) of real data. The generator aims to make the output of the discriminator network D(G(g)) as close as possible to the output of real samples D(a). The objective function of GAN can be calculated using the following formula:

$$\begin{aligned} \mathop {\max }\limits _D \mathop {\min }\limits _G F(G,D) = {L_{a \sim {P_\mathrm{{data}}}(a)}}[\log D(a)] + {L_{g \sim {P_g}(g)}}[\log (1 - D(G(g)))] \end{aligned}$$
(1)

In the formula, \(P_{\text {data} }(\textit{a})\) represents the sample distribution of real data, and \(P_{\textit{g}} (\textit{g})\) represents the sample distribution of generated samples from the generator. The Nash equilibrium point in this minimax adversarial competition is achieved when the generator \(\textit{G}\) and the discriminator \(\textit{D}\) reach their respective minimum loss functions. At this stage, the discriminator is unable to tell the difference between the generated samples and the genuine samples, allowing the generator to produce phony data that closely resembles the distribution of the original sample. This leads to a convincing deception. In theory, it is possible to achieve this when \(P_{\textit{g}} (\textit{g})\) equals \(P_{\text {data} }(\textit{a})\). However, training GAN can be challenging in practice as they often face convergence difficulties.

Fig. 1
figure 1

Common data filling procedure based on generative adversarial network

2.2 Data completion

When formalizing the description of electromagnetic data completion problem, data filling process is shown in Fig. 1.

where \(\tilde{x}\) is the original data after pre-filling, \(\bar{x}\) is the completed time series data generated by the generator, and \(\hat{x}\) is the filled time series data. After filling, the existing data in the original data are retained and the missing part is replaced by the generated data of the generator, where \(L_D\) is the discriminator loss function, \(L_{rc}\) is the generator reconstruction loss, \(L_A\) is the adversarial loss, and M is the data missing mask matrix. The following definitions are made:

  1. 1.

    Definition \(X=\left\{ x_i \mid x_i \in R^n, i=1,2, \ldots , n\right\}\) is an incomplete time–frequency data set with n samples and s attributes. The ith sample, denoted as \(x_i=\left[ x_{i 1}, x_{i 2}, \ldots , x_{i s}\right] ^T(i=1,2, \ldots , n)\), represents the observed values of s attributes at time \(t_{\textit{i}}\). The observed value of the j-th attribute at time \(t_{\textit{j}}\) is represented by \(x_{\textit{ij} }\).

  2. 2.

    Let M be a binary mask matrix that describes the missing data pattern:

    $$\begin{aligned} M_{i j}=\left\{ \begin{array}{l} 0, x_{i j}=? \\ 1, x_{x i j} \ne ? \end{array}\right. \end{aligned}$$
    (2)

    where \(x_{i j}=?\) represents missing data for \(x_{i j}\), and \(M_{i j}\) is 0 when the data at the corresponding position of the time series are missing, and 1 when it is not missing.

During the training, the generator network parameters are fixed first, and the pre-filled time sequence data are input into the generator to generate complete time sequence data to fill the original missing data, and the generated completion data are input into the discriminator network for binary classification training. The training labels are sampled from the data missing mask matrix M, where the data labels \(M_{i j}=1\) are denoted as 1 and the data labels \(M_{i j}=0\) are denoted as 0. The discriminator is trained to the point where it can distinguish the input sample from the true or false, then the training stops, and the generator network is trained.

The training of generator network requires the participation of discriminator, and the trained discriminator is used to judge whether the generated sample is real data. During the training, the network parameters of the trained discriminator are fixed first, the generator and discriminator are connected in series to form a joint discriminant model, and then, the pre-filled timing data are input into the model for training. During the training, the discriminator parameters are fixed only for calculating errors and fed back to the generator for parameter adjustment. The training ends when the final discriminator cannot distinguish the true and false sample data generated by the generator, and the complete time series data generated by the generator are sufficient to fill in the missing part of the original data set.

The goal of missing data imputation methods is to find reasonable imputed values for each missing value in an incomplete dataset. These imputed values are then used to replace the corresponding missing values, resulting in a complete dataset that maintains a similar distribution, and scale as the original dataset. This process can be expressed as:

$$\begin{aligned} X=\left[ \begin{array}{llll} 3 & 10 & ? \\ ? & 3 & 4 & \cdots \\ ? & 9 & ? \end{array}\right] , M=\left[ \begin{array}{lll} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 1 & 0 \end{array}\right] \Rightarrow \hat{X}=\left[ \begin{array}{cccc} 3 & 10 & \text { filled values } \\ \text { filled values } & 3 & 4 & \cdots \\ \text { filled values } & 9 & \text { filled values } \end{array}\right] \end{aligned}$$
(3)

In the context of electromagnetic environment data reconstruction methods, addressing the issue of incorporating appropriate correlation analysis to handle existing correlations in the data is one of the challenging problems that the network design needs to solve.

3 High-precision reconstruction of electromagnetic environment data based on generative adversarial network

Based on the observation and analysis of the time and frequency characteristics of spectral data, a Multivariate Time Series Generative Adversarial Network (MTS-GAN) is proposed. For incomplete time series data, extract the characteristics of multivariate sequences and sequences, and better capture the correlation between the time domain and frequency domain of electromagnetic environment data. It uses generative confrontation networks to generate and discriminate incomplete multivariate time series information. More potential relationships between time domain and frequency domain observations are learned.

In addition, this paper uses an improved gate recursive unit (GRUI) to simulate the time irregularity of incomplete time series, so as to ensure the accuracy of time series data distribution feature extraction and the robustness and accuracy of reconstruction under severe data loss. The model can be trained with incomplete samples to learn the distribution pattern and related information of data in time series and frequency distribution. Once trained, it can generate new data to fill in missing values.

Under the framework of the complete network in Fig. 1, we propose an improved generative adversarial network complete model for electromagnetic signal data, as shown in Fig. 2. Generator networks based on gated cycle units GRUI for data interpolation can reconstruct the complete electromagnetic environment data from the original missing electromagnetic spectrum elements and fill in the missing parts of the original data set. This filling process can be represented as follows:

Electromagnetic environment data completion method based on MTS-GAN. The first is the pre-training part, which only contains the generator, generates false samples through random noise parameters, calculates errors and updates the pre-training model. Then, the GAN network is trained on the basis of pre-training. The generator network is built by the gated loop unit GRUI for data interpolation and the full connection layer. The missing electromagnetic environment data are first segmented to form the missing data set. The pre-filled data generate the complete time series data through the generator network. Then, combined with the data deletion matrix M, the missing values of the frequency band sequence in the original electromagnetic spectrum are completed by filling in the missing values, and the generated complete data are obtained. The discriminator network also consists of an improved gated loop unit and a fully connected network to determine the probability that the input data are true. The input to the discriminator consists of two parts: One is the existing value that is not missing from the original data, and the other is the missing part of the original data, which has been filled by the generator’s generated sample, and the output of the discriminator is a sequence of values between 0 and 1, indicating the probability that each input data belongs to the real data.

Fig. 2
figure 2

The MTS-GAN electromagnetic environment data completion network structure

Due to mode collapse issue, conventional GAN is difficult to train. The Wasserstein distance is used by WGAN, an alternate training approach for GAN that is simpler to use than the original GAN. WGAN is an alternative training method for GAN that uses the Wasserstein distance, which is easier to train compared to the original GAN. WGAN improves the stability of model learning and makes the optimization of GAN models easier. Its loss function is defined as follows:

$$\begin{aligned} L_G&= \mathbb {E}_{g}[-D(G(g))], \\ L_G&= \mathbb {E}_{g}[-D(G(g))]-\mathbb {E}_{a}[D(a)] \end{aligned}$$
(4)

where E is the mean square error loss, D is the discriminator, G is the generator, g is the random noise, and a is the real data.

If a variable has been missing for a period of time, the impact of previous observations should diminish over time. Due to the limited scale of the data, there can be significant differences in time lags between two consecutive valid observations, making the traditional GRU cell or LSTM cell less effective. To accommodate the decaying effect of historical observations, we suggest using the GRAI, or gated recurrent unit with irregular time intervals to simulate incomplete time series’ temporal irregularity and learn hidden information from time gaps. Below are the calculation methods and results for the sample dataset:

$$\begin{aligned} \delta _{t_i}^j=\left\{ \begin{array}{ll} t_i-t_{i-1,} & M_{t_{i-1}}^j==1 \\ \delta _{t_i}^j+t_i-t_{i-1,}, & M_{t_{i-1}}^j==0 \& i>0 \\ 0, & i==0 \end{array}\right\} ; \delta =\left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ 5 & 5 & 5 & 5 \\ 8 & 13 & 8 & 13 \end{array}\right] \end{aligned}$$
(5)

where \(\delta \in {^{ n \times d }}\) is the time lag matrix, which keeps track of how much time has passed since the previous effective value and the present value. M indicates missing state.

To account for the impact of earlier observations, we model it using a time delay matrix and \(\delta _{ t_{i} }\) time decay vector to capture the interactions between variables:

$$\begin{aligned} \beta _{t_i}=1 / e^{\max \left( 0, W_\beta \delta _{t_i}+b_\beta \right) } \end{aligned}$$
(6)

In this model, the time decay vector, denoted as \(\beta\), is a vector with elements ranging from 0 to 1. The larger the parameter \(\delta\), the smaller the decay vector \(\beta\). The parameters \(W _{\beta }\) and \(b _{\beta }\) need to be learned, and \(W _{\beta }\) is a fully connected matrix.

The GRU hidden state is updated by multiplying it element-wise with the decay factor once we have obtained the decay vector. We also apply batch normalization to ensure that the hidden state \(k\) remains below 1.

Based on the multiplication decay approach, the update equation for GRUI is as follows:

$$\begin{aligned} k_{{t_{i - 1}}}^\prime&= {\beta _{{t_i}}} \times {k_{{t_{i - 1}}}}, \\ {\varepsilon _{{t_i}}}&= \delta \left( {{W_\varepsilon }\left[ {h_{{t_{i - 1}}}^\prime ,{x_{{t_i}}}} \right] + {b_\varepsilon }} \right) , \\ {r_{{t_i}}}&= \delta \left( {{W_r}\left[ {k_{{t_{i - 1}}}^\prime ,{a_{{t_i}}}} \right] + {b_r}} \right) , \\ {{\tilde{k}}_{{t_i}}}&= \tanh \left( {{W_{\tilde{k}}}\left[ {{r_{{t_i}}} \times k_{{t_{i - 1}}}^\prime ,{a_{{t_{i - 1}}}}} \right] - {b_{\tilde{k}}}} \right) ,\quad {k_{{t_{i - 1}}}} = \left( {1 - {\varepsilon _{{t_i}}}} \right) \times k_{{t_{i - 1}}}^\prime + {\varepsilon _{{t_i}}} \times {{\tilde{k}}_{{t_i}}} \end{aligned}$$
(7)

where \(\varepsilon\) represents the update gate, \(r\) represents the reset gate, \({\tilde{k}}\) represents the candidate hidden state, and \(\delta\) represents the sigmoid activation function. The other variables are trainable parameters.

Each row of has the same value when we enter the fictitious time series that the generator produced since there are no missing values. The generator additionally includes a GRUI layer and a full connection layer since we want to ensure that the produced sample has the same time latency as the original sample. Since the generator is a self-fed network, its most recent output will be fed into the same cell’s subsequent iteration. The random noise vector \({ g }\) is the generator’s initial input, and each line of the false sample’s has a constant value.

The results generated by the generator, along with the mask matrix, are simultaneously fed into the discriminator for training. The discriminator assesses the probabilities of a given position in the matrix representing either a real value or a missing value. Additionally, it is possible to incorporate weighted position losses for sensitive missing and non-missing positions in the input matrix. These weighted losses are used to calculate the overall loss, along with the inclusion of well-suited regularization terms.

A GRUI layer that learns partial or complete time series makes up the initial component of the discriminator. The final concealed state of the GRUI is then put on top of a fully linked layer. We added dropout to the fully linked layer to avoid overfitting.

By updating the model, the error of random noise vector generation and original two-dimensional matrix is gradually reduced. In this paper, two kinds of losses are defined to obtain the optimal generation model. An adaptive regression measurement method is used to calculate the losses. First, the mask reconstruction loss is defined as the square error between the data of the non-missing part of the original incomplete frequency–time two-dimensional matrix and the data in the generated two-dimensional matrix, and the formula is as follows:

$$\begin{aligned} {L_r}(g) = ||a \times Q - G(g) \times Q|{|_2} \end{aligned}$$
(8)

On the other hand, the discriminator loss is the loss obtained when the generated samples are input into the discriminator. It is a negative value that represents the authenticity of the generated sample G(g):

$$\begin{aligned} L_d(g)=-D(G(g)) \end{aligned}$$
(9)

By combining the aforementioned reconstruction loss and discriminator loss, we obtain the loss function for this framework:

$$\begin{aligned} L_{\text {imputation }}(g)=L_r(g)+\lambda L_d(g) \end{aligned}$$
(10)

For each sample, we take a set of samples from a Gaussian distribution and feed them into an established generation algorithm G(g). We then use backpropagation methods to train these samples and constrain them. After the loss convergence reaches the optimal solution, we have an intact matrix, corresponding to the following formula:

$$\begin{aligned} a_{\text {imputed }}=x \times M+(1-M) \times G(g) \end{aligned}$$
(11)

4 Experiments

4.1 Dataset description

The electromagnetic spectrum environmental data set collected in this study is named “Qingdao Offshore Measurement Data Set.” The dataset was collected simultaneously by monitoring devices located in three different geographical locations. Data collected from different spatial locations have the same acquisition start and end time, and time and frequency domain resolution. Specifically, we choose a Qingdao offshore measurement data set with a frequency resolution of 250KHz. We choose GSM1800 downlink frequency band data for experiments, and its time scale is 1650 min. Points to calculate an average, and get a 9900*300 two-dimensional time–frequency data. For such a two-dimensional data, we divide the data into training set and test set according to 8:2 according to the time dimension. Then, take 400 s of data in the training set and test set in turn to form training samples and test samples, and the data size of each sample is 40*300.

4.2 Training settings

The discriminator is mainly composed of a sequential recurrent neural network layer and a fully connected layer, which normalizes the data before the training begins and extracts and saves the column mean and variance of the data. Firstly, the incomplete frequency–time two-dimensional matrix x, the generated complete frequency–time two-dimensional matrix G(g) and the corresponding \(\delta\) input sequential recurrent neural network layer are obtained. The first input to the generator is random noise \({ g }\), and we set the random noise dimension to 256. Before GAN training, the generator is pre-trained with a few epochs, where the pre-training is set to 10 times and the loss update model is calculated to predict the next value in the matrix. For Qingdao offshore monitoring data set, the input frequency–time two-dimensional matrix size is 40*300, the batch size is 32, and there are 64 hidden layers in GRUI’s generator and discriminator. The model was implemented in TensorFlow and optimized using the Adam optimizer with hyperparameters: training times of 40 times including pre-training times, learning rate of 0.002 and generator loss factor of 0.05. The system environment used in the experiment is windows11, the editor is PyCharm, the main libraries include TensorFlow and math, and the hardware equipment is mainly NVIDIA RTX3060 and AMD R7-5800.

4.3 Comparison method

Tensor-based statistical computing methods are common and traditional data processing and analysis methods, used to solve similar problems, using two low-rank tensor completion (SiLRTC) [34] and fast low-rank tensor completion (FaLRTC) [35]. The rank tensor completion model [36] is used as a comparison. In addition, in terms of deep learning, we choose the missing completion algorithm based on stacked autoencoders [37] as a comparison.

Tensor is essentially a storage form of data elements, which is a multi-dimensional extension of vectors and matrices. Therefore, according to the rank-minimum framework of matrix low-rank completion, the tensor rank is represented by the tensor kernel norm, and the optimization problem of low-rank tensor completion is obtained. On this basis, the SiLRTC model solves the following relaxed nuclear norm minimum optimization problem through the block coordinate descent algorithm; the FaLRTC model converts the nuclear norm minimization problem into a smoothing problem and uses an effective algorithm to solve the smooth optimization problem, to improve the convergence speed of low-rank tensor completion models.

The autoencoder maps m observations x and latent variable z to each other through a nonlinear function, where latent variable z is a low-dimensional representation of the matrix, and generates missing completions of the input data in the reconstruction stage. When approaching nonlinear functions, multi-hidden layer neural networks are more effective than single-hidden layer neural networks, so stacking autoencoders constitutes an incomplete data-depth autoencoder network to solve nonlinear functions through greedy layered training. To complete the missing data, compress X to Z through multi-stage nonlinear mapping, and map Z back to X through multi-stage nonlinear mapping. At the same time, the missing items and parameters of X are optimized to minimize the error after missing data reconstruction.

4.4 Result

For the evaluation indicators, we selected a series of classic performance metrics, including mean square error (MSE), root-mean-square error (RMSE), square absolute error (MAE), mean absolute percentage error (MAPE), normalized mean square error (NMSE) and the time required for data completion (Second). By integrating multiple indicators, the advantages and disadvantages of the comparison method and the method used in this paper are evaluated comprehensively.

Fig. 3
figure 3

Missing rates range from 20 to 80% under the curves for different error indicators

Figure 3a shows the MAE results of the electromagnetic environment data completion experiment by the method proposed in this paper and various baseline methods under the missing percentage of 20–80%. MAE is the average of the absolute errors between the actual observations and the model predictions. Compared with other methods, the MAE of the completed spectral data output by the MTS-GAN model is the smallest compared with the original complete data. As the proportion of data missing elements increases from 20 to 80%, the MAE of each model generally increases. As the missing percentage increases, the method proposed in this paper still maintains stable reconstruction performance.

Figure 3b shows the RMSE results of the electromagnetic environment data completion experiment. RMSE is the square root of MAE, which is more sensitive to larger error values. As the deletion ratio increases, the RMSE of MTS-GAN increases by 0.1014, that of SAEs increases by 0.1039, that of FaLRTC increases by 0.9196, and that of SiLRTC increases by 0.2621. Under the same missing degree, the maximum error of MTS-GAN data completion is at least 49% lower than SAEs, at least 58% lower than FaLRTC, and at least 19% lower than SiLRTC. This shows that the MTS-GAN method can capture the time–frequency correlation of multivariate time series in the electromagnetic environment data completion experiment, and the completion of most items will not deviate significantly.

Figure 3c shows the MSE results of the electromagnetic environment data completion experiment. MSE is the mean of the squares of the errors between the actual observations and the model predictions. In the low missing cases (40% missing percentage), the MSE of the method proposed in this paper is 0.10105. The MSE of the method proposed in this paper is slightly improved to 0.14516, which is less than 43% of other comparison methods. The MTS-GAN method proposed in this paper has certain robustness in electromagnetic environment data completion.

Figure 3d shows the MAPE results of the electromagnetic environment data completion experiment. MAPE is the average of the percentage errors between the actual observations and the model predictions. It expresses the error as a percentage and is normalized for a uniform representation of the error across the different models. The MTS-GAN method proposed in this paper reaches a maximum of 0.15425%, which is 66% higher than the baseline method. Therefore, considering the average error, maximum error and normalized error, the MTS-GAN method has the highest reconstruction accuracy.

The method proposed in this study showed excellent performance for the completion error evaluation indicators such as RMSE and MAE under different miss rates. This method shows the most superior effect in completing and reconstructing electromagnetic spectrum data. As the missing rate gradually increased, the method still maintained significant accuracy, showing strong data reconstruction ability and robustness in the case of serious data missing. It is worth emphasizing that the experimental results clearly demonstrate the efficiency and robustness of the proposed method in the task of missing completion of electromagnetic spectrum data. In the trend curve of each evaluation index, we can clearly observe the excellent performance of this method. This series of trend analysis further confirms that our approach has achieved excellent results in the face of the challenge of missing and completing big data in the electromagnetic environment.

Fig. 4
figure 4

Comparison of the time required to complete missing data using different methods

In Fig. 4, the time required for our method and the comparison method to complete the electromagnetic environment data at different miss rates is shown. It can be seen from the chart that the completion time required by the traditional method based on tensor completion is about 120 s and 160 s, respectively, which is much higher than that of the method based on machine learning. Moreover, due to data differences, the completion time required by the method based on tensor calculation with different missing data has a large variation. Among the methods of machine learning, the time required by our method to complete the completion is basically maintained at 24.5 s, with small fluctuation, while the time of stacked autoencoder method is larger than that of the method in this paper, and it also has certain instability. It can be seen from the results that our method can realize fast data reconstruction and completion and improve the efficiency of the completion algorithm compared with the previous methods.

From the above experimental results, it is shown that this study not only proposes an electromagnetic environment data completion method for actual scenarios, but also compares it with the tensor-based statistical calculation method and the stacked autoencoder machine learning method. During the process, we adopted a comprehensive evaluation method to ensure the reliability and robustness of the results, which fully proved the effectiveness and superiority of our method.

4.5 Ablation experiment

Then, an ablation experiment was set in this section to compare the effect of time attenuation factor on the improvement of high-precision reconstruction performance for the proposed improved GRUI unit. First of all, we change our generator and decision, choose two basic LSTMCell and GRUCell loop units and build a time series recurrent neural network as a comparative experiment. Then, there is the GRUICell unit proposed in this chapter. On the basis of GRUCell, considering the time distribution of missing data, time attenuation factor is introduced to compare and analyze the high-precision reconstruction performance of the improved GRUI unit and the basic RNN unit, and the effectiveness of extracting data distribution features in the case of irregular time series. Other parameter settings remained consistent in the comparison test.

In Fig. 5, three graphs show the MAPE, NMSE, and MSE results of the generator and decision experiments for three different RNN units. It can be seen from the figure that the errors of MAPE, NMSE and MSE obtained by using LSTMCell and GRUCell cycle units are basically consistent under different data loss rates, which also indicates that the feature extraction performance of gated cycle units is similar to that of long short-term memory networks when the data volume is relatively small. The error of the high-precision electromagnetic environment reconstruction method proposed in this chapter based on MTS-GAN model is obviously better than the other two methods, and the reconstruction error of GRUICell element is always lower than the first two methods with the same missing rate. The mean MAPE error decreased by 0.10687 compared with the base cell, which was 8.6% higher than the baseline method. The average error of NMSE decreased by 1.043167e\(-\)6, which was 10% higher than that of the baseline method. The MSE error decreased by an average of 0.017764, a 16.5% improvement over the baseline method. Therefore, considering the average error, maximum error and normalization error comprehensively, the high-precision reconstruction method of electromagnetic environment proposed in this chapter based on MTS-GAN model has the highest reconstruction accuracy, fully indicating that the improved cyclic gating unit can better improve the effectiveness of feature extraction of generated adduction network under irregular missing data and obtain higher reconstruction accuracy.

Fig. 5
figure 5

Missing rates range from 20 to 80% under the curves for different error indicators

The above experimental results show that this study not only proposes a high-precision reconstruction method of electromagnetic environment data for the actual scene, but also compares it with the tensor-based statistical calculation method and the stacked autoencoder machine learning method. In this process, the comprehensive evaluation method is adopted to ensure the reliability and robustness of the results, which fully proves the effectiveness and superiority of the method. Then, an ablation experiment was set for the proposed improved GRUI element to compare the effect of time attenuation factor on the improvement of high-precision reconstruction performance. By comparing the reconstruction error results obtained by different loss rates, the high-precision reconstruction error of the GRUICell element was always lower than that of the improved basic element method. It is proved that the proposed method is effective to improve the precision of high-precision reconstruction.

5 Conclusion

In this paper, MTS-GAN was proposed to achieve fast, accurate and high-precision reconstruction for electromagnetic spectrum data. Aiming at the discontinuity of time series, an RNN model with improved gate recursion element was proposed to extract the distribution characteristics of time–frequency signal data more accurately and reduce the influence of random loss of time series. The reconstruction experiment results on the 20–80% missing measured spectrum data showed that compared with the tensor completion and SAEs methods, the error of the proposed MTS-GAN method in this paper reduced by about 19%. In addition, the average running time of MTS-GAN was 24.5 s, which was 20.4% of the tensor completion method and 15.3% of the SAEs method, improving the efficiency of the reconstruction in SAGIoT, further refining the architectures of the network structures, more efficient training algorithms and better hyperparameter tuning to enhance model performance.

Availability of data and materials

Please contact author for data requests.

References

  1. X. Liu, B. Lai, B. Lin, V.C.M. Leung, Joint communication and trajectory optimization for multi-UAV enabled mobile internet of vehicles. IEEE Trans. Intell. Transp. Syst. 23(9), 15354–15366 (2022)

    Article  Google Scholar 

  2. X. Liu, Q. Sun, W. Lu, C. Wu, H. Ding, Big-data-based intelligent spectrum sensing for heterogeneous spectrum communications in 5G. IEEE Wirel. Commun. 27(5), 67–73 (2020)

    Article  Google Scholar 

  3. T. Ya, L. Yun, Z. Haoran, J. Zhang et al., Large-scale real-world radio signal recognition with deep learning. Chin. J. Aeronaut. 35(9), 35–48 (2022)

    Article  Google Scholar 

  4. C. Hou, G. Liu, Q. Tian, Z. Zhou et al., Multisignal modulation classification using sliding window detection and complex convolutional network in frequency domain. IEEE Internet Things J. 9(19), 19438–19449 (2022)

    Article  Google Scholar 

  5. X. Liu, H. Ding, S. Hu, Uplink resource allocation for NOMA-based hybrid spectrum access in 6g-enabled cognitive Internet of Things. IEEE Internet Things J. 8(20), 15049–15058 (2021)

    Article  Google Scholar 

  6. J. Tang, F.P. Tian, W. Feng et al., Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 30, 1116–1129 (2020)

    Article  Google Scholar 

  7. Z. Bao, Y. Lin, S. Zhang, Z. Li et al., Threat of adversarial attacks on DL-based IoT device identification. IEEE Internet Things J. 9(11), 9012–9024 (2021)

    Article  Google Scholar 

  8. M.T. Sattari, K. Falsafian, A. Irvem et al., Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Eng. Appl. Comput. Fluid Mech. 14(1), 1078–1094 (2020)

    Google Scholar 

  9. Y. Lin, Y. Tu, Z. Dou et al., Contour Stella image and deep learning for signal recognition in the physical layer. IEEE Trans. Cogn. Commun. Netw. 7(1), 34–46 (2020)

    Article  Google Scholar 

  10. A. Eldesokey, M. Felsberg, F.S. Khan, Confidence propagation through CNNS for guided sparse depth regression. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2423–2436 (2019)

    Article  Google Scholar 

  11. Y. Lin, H. Zhao, X. Ma, Y. Tu, M. Wang, Adversarial attacks in modulation recognition with convolutional neural networks. IEEE Trans. Reliab. 70(1), 389–401 (2020)

    Article  Google Scholar 

  12. S. Zhao, M. Gong, H. Fu et al., Adaptive context-aware multi-modal network for depth completion. IEEE Trans. Image Process. 30, 5264–5276 (2021)

    Article  Google Scholar 

  13. Y. Lin, Y. Tu, Z. Dou, An improved neural network pruning technology for automatic modulation classification in edge devices. IEEE Trans. Veh. Technol. 69(5), 5703–5706 (2020)

    Article  Google Scholar 

  14. T. Yokota, Q. Zhao, A. Cichocki, Smooth PARAFAC decomposition for tensor completion. IEEE Trans. Signal Process. 64(20), 5423–5436 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Y. Lin, H. Zha, Y. Tu, S. Zhang, W. Yan, C. Xu, GLR-SEI: green and low resource specific emitter identification based on complex networks and fisher pruning. IEEE Trans. Emerg. Top. Comput. Intell. (2023)

  16. Y. Qiu, G. Zhou, Q. Zhao et al., Noisy tensor completion via low-rank tensor ring. IEEE Trans. Neural Netw. Learn. Syst. (2022)

  17. S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 36(4), 1–14 (2017)

    Article  Google Scholar 

  18. K. Ehsani, R. Mottaghi, A. Farhadi, Segan: segmenting and generating the invisible, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6144–6153

  19. H. Long, W. Xiang, J. Wang, Y. Zhang, W. Wang, Cooperative jamming and power allocation with untrusty two-way relay nodes. IET Commun. 8(13), 2290–2297 (2014)

    Article  Google Scholar 

  20. H. Dhamo, K. Tateno, I. Laina et al., Peeking behind objects: layered depth prediction from a single image. Pattern Recogn. Lett. 125, 333–340 (2019)

    Article  Google Scholar 

  21. Z. Mao, W. Xiang, H. Long, W. Wang, Proportional fair resource partition for LTE-Advanced networks with type I relay nodes, in Proceedings of IEEE International Conference on Communications (ICC) (2011), pp. 1–5

  22. Kortylewski A, He J, Liu Q, et al. Compositional convolutional neural networks: a deep architecture with innate robustness to partial occlusion, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 8940–8949

  23. J. Fu, C. Hou, W. Xiang, L. Yan, Y. Hou. Generalised spatial modulation with multiple active transmit antennas, in Proceedings of IEEE Globecom, Miami (2010), pp. 839–844

  24. C. Zheng, D.S. Dao, G. Song et al., Visiting the invisible: layer-by-layer completed scene decomposition. Int. J. Comput. Vision 129, 3195–3215 (2021)

    Article  Google Scholar 

  25. W. Xiang, N. Wang, Y. Zhou, An energy-efficient routing algorithm for software-defined wireless sensor network. IEEE Sens. J. 16(20), 7393–7400 (2016)

    Article  Google Scholar 

  26. Q. Zhou, S. Wang, Y. Wang, et al. Human de-occlusion: Invisible perception and recovery for humans, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 3691–3701

  27. A. Frühstück, K. K. Singh, E. Shechtman, et al. Insetgan for full-body image generation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 7723–7732

  28. L. Sun, Y, Lin. Spectrum completion based on HaLRTC, in 2022 9th International Conference on Dependable Systems and Their Applications (2022), pp. 986–992

  29. J. Xue, Y. Zhao, S. Huang et al., Multilayer sparsity-based tensor decomposition for low-rank tensor completion. IEEE Trans. Neural Netw. Learn. Syst. 33(11), 6916–6930 (2021)

    Article  MathSciNet  Google Scholar 

  30. J. Sun, J. Wang, G. Ding et al., Long-term spectrum state prediction: an image inference perspective. IEEE Access. 6, 43489–43498 (2018)

    Article  Google Scholar 

  31. P. Chauhan, S.K. Deka, B.C. Chatterjee et al., Cooperative spectrum prediction-driven sensing for energy constrained cognitive radio networks. IEEE Access 9, 26107–26118 (2021)

    Article  Google Scholar 

  32. G. Ding, F. Wu, Q. Wu et al., Robust online spectrum prediction with incomplete and corrupted historical observations. IEEE Trans. Veh. Technol. 66(9), 8022–8036 (2017)

    Article  Google Scholar 

  33. F. Shen, Z. Wang, G. Ding et al., 3D compressed spectrum mapping with sampling locations optimization in spectrum-heterogeneous environment. IEEE Trans. Wirel. Commun. 21(1), 326–338 (2021)

    Article  Google Scholar 

  34. J. Liu, P. Musialski, P. Wonka et al., Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 208–220 (2012)

    Article  Google Scholar 

  35. A. Chen, Z. Xu, A. Geiger, et al. Tensorf: tensorial radiance fields, in European Conference on Computer Vision (Springer, Cham, 2022), pp. 333–350

  36. Y. Panagakis, J. Kossaifi, G.G. Chrysos et al., Tensor methods in computer vision and deep learning. Proc. IEEE 109(5), 863–890 (2021)

    Article  Google Scholar 

  37. J. Fan, T. Chow, Deep learning based matrix completion. Neurocomputing 266, 540–549 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the anonymous reviewers and editors of this paper for their valuable comments and suggestions.

Funding

This material is based upon unfunded work.

Author information

Authors and Affiliations

Authors

Contributions

LG contributed to the design and writing of the study, and the author supervised the study and advised on the revision of the manual and provided input on the revision of the draft manuscript. YL contributed to the data. YL made some comments on the manuscript. KY provided some review for the revision of the manuscript. All authors have read and agreed to the manuscript.

Corresponding author

Correspondence to Yuchao Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The picture materials quoted in this article have no copyright requirements, and the source has been indicated.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, L., Liu, Y., Li, Y. et al. High-precision reconstruction method based on MTS-GAN for electromagnetic environment data in SAGIoT. EURASIP J. Adv. Signal Process. 2023, 125 (2023). https://doi.org/10.1186/s13634-023-01085-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13634-023-01085-0

Keywords