NOVEL HYBRID ALGORITHM USING CONVOLUTIONAL AUTOENCODER WITH SVM FOR ELECTRICAL IMPEDANCE TOMOGRAPHY AND ULTRASOUND COMPUTED TOMOGRAPHY

: This paper presents a new hybrid algorithm using multiple support vector machines models with a convolutional autoencoder for electrical impedance tomography, and ultrasound computed tomography image reconstruction. The ultimate hybrid solution uses multiple SVM models to convert input measurements to individual autoencoder codes representing a given scene then the decoder part of the autoencoder can reconstruct the scene


Introduction
Ultrasound Transmission Tomography [11,12,27] is the process which enables, among others, reconstruction of the scene based on ultrasonic signal measurements. Another [7] paper on ultrasound tomography presents an image reconstruction algorithm based on a conventional neural network (CNN) for twophase imaging materials. It has also been verified with experimental data. Finally, the paper [4] describes research into applying a dual-domain network for ultrasound tomography.
The EIT reconstruction [16] is based on reconstructing the conductivity from measurements vectors obtained from measurements using electrically conductive electrodes. However, the best results can be obtained using neural networks (especially deep networks). Several works have extensively described the EIT method, among which we can mention works [2,6,8,9,13,20,21,25].
Several machine-learning methods for EIT reconstruction have been discussed in the literature. In the article [18], the authors explored logistic regression using an elastic net. The paper [16] used neural networks for EIT image reconstruction, where N neural networks trained separately for each output pixel were used to reconstruct image pixel values. Finally, the paper [15] applies convolutional neural network structure in EIT reconstruction.
The multiple ANN reconstruction methods are based on deep and convolutional autoencoders. Paper [10] describes a solution based on EIT reconstruction obtained using a deterministic algorithm (D-Bar) and applies the UNet convolutional model to correct these initial reconstructions.
Another method based on deep convolutional autoencoders is described in [24]. This algorithm uses a deep autoencoder to reconstruct a lung object based on EIT. The method consists of several steps: 1. Minimizing the cost function of the autoencoder with lung images. 2. The fully connected networks are trained on the measurements and outputs of the encoder. 3. Finally, the two networks are combined, where the network from step 2 processes the measurements, and the decoder of the developed autoencoder outputs reconstruction.
There are several numerical methods [5,14,17,19,22,23,24]. The authors have developed an innovative hybrid algorithm for image reconstruction of impedance and ultrasonic transmission tomography. The solution uses the encoder part from the convolutional autoencoder and multiple Support Vector Machines [3,26] models to convert EIT or UTT measurements into individual autoencoder codes. In the case of EIT, the synthetic data was used only, containing generated scenes and EIT simulations. Therefore, the autoencoder is trained using generated scenes, and the encoder part is used to encode these scenes. In the final solution, the multiple SVM models are used to convert potential difference vectors from EIT simulations into AE codes representing the corresponding scenes. In the case of UTT, the convolutional autoencoder is trained using augmented realscene images, and then, the encoder part is used for real-scene encoding; next, multiple SVM models are trained to convert real UTT measurements into codes representing real scenes. The ultimate hybrid algorithm in both cases is constructed using multiple SVM models at the input whose outputs are sent to the decoder part of AE, which reconstructs the image of the scene. Such a hybrid algorithm is particularly useful, especially when there is too small of training data (i.e., where training data comes from real measurements), so impossible to use a pure deep learning solution as in our UTT experiments.

EIT synthetic data
The model for the synthetic data in Electrical Impedance Tomography was prepared with 16 electrodes placed outside the area. A finite element mesh was prepared in the model, built with 4717 elements. The EIT training dataset used in this experiment consists of synthetic data. The data generation algorithm generated 50,000 cases containing inclusions (one or two inclusions of elliptical or rectangular shape). An EIT simulation was performed for each case to generate 192 measurements between electrodes.
To generate data, we used an EIT simulation algorithm based on the finite element method with square geometry to obtain potential vectors based on the cases generated with the ellipse and rectangle inclusions. The parameters of the simulation algorithm for generating the training dataset used in the experiments were adjusted to make the synthetic data as close as possible to the real data. Finally, the resulting potential vectors were used to calculate the difference between potentialvoltage: = − 0 where is potential vector for the case with syntethic inclusions, 0is potential vector for the case empty tank, is the voltage vector, iis the index of potential vectors and voltage elements.
The 50,000 samples generated from reference images (Y) and potential difference vectors (X) were divided into a training dataset (40,000 samples) and a test dataset (10,000 samples).

UTT Input data
The data used in this research was obtained in the laboratory under specific conditions. At the bottom of the polypropylene pail, 76 items numbered 1-77 were marked (no. 18 was omitted). The bucket was filled with chlorinated water. Round phantoms in plastic tubes filled with air were placed in marked positions.
An application was written to facilitate data collection. The application was intended to give the positions of the phantoms. The positions of the phantoms were stored in a defined list. In addition, the application controlled the measurements as follows. When the phantom was placed at the indicated position, and the MEASURE button was pressed, the application saved the last measured measurement frame to memory. After the measurement, the application pointed to the next position where the phantom had to be set. This way, the list (measurement matrix) -(position numbers with phantoms) was completed.
Then, thanks to a hand-prepared set of photos ( Fig. 2 and based on position numbers, the phantoms' actual positions were reconstructed in binary images. Each image marked with a single inclusion was segmented into a binary image with a circle in that phantom position. The image of the phantom positions in the individual setting variants of several phantoms was obtained by summing up the appropriate segmented position patterns. The resulting image was scaled down to a resolution of 64x64 pixels in the "nearest" interpolation mode. The active probe of the ultrasonic tomograph performs measurements with a single piezo transducer using the absorption method. This transducer can operate as a transmitter and receiver of the ultrasonic wave. Its resonant frequency is 40 kHz. The PCB allows connecting an external transducer via an SMB socket. The probe has an integrated signal processing circuit and a microcontroller with an integrated A/D converter. Each probe can adjust the gain of the received signal using a programmable digital potentiometer. Measuring the transit time of an ultrasonic wave from one probe to another is achieved by connecting all probes to an additional communication line. When a low state occurs on this line, all probes except the transmitting probe start timing and stop timing when the ultrasound wave is received. Each receiving probe then sends the measurement result to the tomograph controller. Based on the information of which probe was the transmitter and which probe sent the result, the measurement value is stored in the corresponding cell of the measurement matrix. The probes were made to be placed very close to each other. The power supply lines, the communication bus and the interruption lines necessary for correct timing from sending to receiving on the other probes were carried out using RJ12. There were 21 sensors with a frequency of 40 kHz in the bucket area (Fig. 1). The 2121 size M measurement matrix contains the ToF (Time of Fly) values of the ultrasonic wave between all sensor pairs in microseconds. The diagonal is filled with zeros only to ensure that the value of M i,j stands for the value measured between the i-th and j-th sensors and has no effect on the measurements, i.e., the sensor transmitting the wave does not measure the time.
After obtaining the data was split into training (2617 samples) and test (656 samples) datasets.

Augmentation of images represented scenes from multimodal tomography (UTT)
Because the collected data from transmission tomography contains a small number of samples (a few thousand), it is too small to train a convolutional autoencoder, so the scene images dataset should be augmented. The image data augmentation is simpler than the augmentation of data tomography measurements. The augmentation of scene images was realized by three transforms performed on images (rotation, scaling and translation) using random parameters. During the augmentation, 25000 images were generated. As a result, 20% of images are empty, and other (80%) images were generated by random affine transformation with maximal rotation change equal to 15 degrees, maximal scale changes equaling 20% and maximal translation equaling 10 pixels. After augmentation, the data was split into a training dataset (containing 20000 images) and a test dataset (containing 5000 images).

Training of convolutional autoencoder for EIT data
The convolutional autoencoder for EIT data was trained using synthetic scenes. In table 1, the used convolutional autoencoder model structure details are presented.
The reshape layer (row number 18) in this model makes the reshape operation of input from the previous layer into shape 551024 to achieve the proper input shape for the next layer.

Training of convolutional autoencoder for UTT data
The convolutional autoencoder for UTT data was trained on an augmented scene images dataset similar to real scene images. The used model architecture is similar to EIT's, but the Dense layer with an id equal to 17 has 16384 neurons instead of the 25600 ones used for EIT. In addition, the used Reshape layer (id equal 18) has a destination shape equal to 441024 instead of 551024 in the previous version of the autoencoder.

Fig. 4. Samples results of encoding UTT reference images by convolutional autoencoder. In each triplet, on the left is a reference image, in the centre is autoencoder output, and on the right is thresholded autoencoder output
After training, the Mean Absolute Error equal to 0.000826 and DICE equal to 95.37% were achieved on the training dataset, while the MAE equal to 0.0011 and the DICE metric equal to 93.12% were achieved on the test dataset. Then, the encoder part was extracted from the autoencoder pre-trained in the previous step. After extraction, the image scenes from the real dataset were encoded. Finally, the vectors of codes, each containing 512 elements, were generated using the encoder and will be used in the SVM training process. Figure 3 shows the 10 sample results of EIT reference scene image restoration. Each sample contains a reference image, restored reference image and thresholded restoration result. These samples are generated on the test dataset. The thresholding of outputs is used because empty scenes exist in the dataset, so thresholding was necessary to visualize the output properly.

Training of multiple SVM models for EIT and UTT codes represents scenes regression
The final training stage of the ultimate solution uses Support Vector Machines for the regression of codes of scene images based on input from Electrical Impedance Tomography differential potential vectors. The same training stage is performed for Ultrasound Transmission Tomography measurements.
For each i-th code from all 512 available, the separate SVM models are trained for regression one of the code based on input measurements.
In Fig. 5 to Fig. 8, four stages of the algorithm were presented. The input and output of this autoencoder are the same reference image. During training, the autoencoder's reproduction of reference images in the latent space produces a compressed vector representation of the images as a side effect. This representation can be regressed more easily than the original image. In Fig. 6, the encoding process of all reference images using the encoder part extracted from the full convolutional autoencoder was presented.

Fig. 6. Reference images encoding (conversion of all reference images into vector
In Fig. 7, the training process of 512 SVM models, each representing one of 512 codes of autoencoder based on input vector with measures (EIT or UTT), was presented.
In Fig. 8, the reconstruction process was presented. In the beginning, the input vector with measures is used as the same input for each 512 SVM model. Then, each i-th SVM model predicts one i-th value from the autoencoder code vector. Finally, the vector of codes is used as input to extracted decoder part of the convolutional autoencoder to predict the output image (final reconstruction).
During the creation of the solution, the sigma value in the SVM model (SVM parameter) was optimized, and the best result was obtained for an epsilon value equal to 0.01 (in the case of EIT data) and 0.01 (in the case of UTT data). Epsilon is an SVM hyperparameter which is a special margin useful for regression. The model's training tries to fit as many samples as possible in one "street" with as few margin violations. The smaller epsilon values were not tested because of the long training process and bigger memory usage.

Using a decoder with SVM models for transmission tomography image reconstruction for EIT and UTT
The solution for scene image reconstruction has the same form as EIT and UTT. After the pretraining of the autoencoder, the decoder part is extracted. The reconstruction process for EIT and UTT has the following form: 1. In the beginning, the multiple SVM models (trained earlier) are used for individual codes of autoencoder regression based on input measurement (EIT or UTT) where each ith SVM model can make regression of ith code in the autoencoder latent space. 2. Obtained vector of codes generated by SVM models is used as an input to the extracted decoder from the pretrained autoencoder, and then the decoder can reconstruct a scene image (for EIT or UTT). 3. The settings of the epsilon parameter are described in paragraphs 9 and 10. The other settings of a single SVM model are the following: a) Input is the vector from EIT or UTT measures. b) Output is one of regressed value represents one of autoencoder code. c) The used kernel type is RBF. d) The gamma parameter is calculated using the following formula: is a number of features, 2is an input data variance e) The tolerance parameter is set to a value of 0.001. f) The C parameter is set to 1. g) There is no iteration limit.

EIT reconstruction results
The numerical results for the EIT reconstruction final solution are presented in table 4. In figure 9, the 12 pairs of results represent original scenes and reconstructed scenes are presented using the best SVM models (no. 4 in table 2). Next, the table presents the results of four models, identified by the "models no." column. The second column, "epsilon," represents a hyperparameter used in the models. Finally, the "DICE train" and "DICE test" columns show the DICE coefficient, a metric used to evaluate the performance of image reconstruction models for the training and test sets, respectively.
As the epsilon value decreases (i.e. goes from 0.2 to 0.01), the DICE coefficient for both the training and test sets increases. It suggests that as the epsilon value becomes smaller, the model becomes more precise in its reconstruction. Model  It suggests that as the epsilon value decreases, the models become more precise, and the model's performance increases on both training and test sets.

UTT reconstruction results
The numerical results for the UTT reconstruction final solution were also computed using SVM models trained with epsilon equal to 0.01. The MSE, MAE and DICE metrics are presented in table 5. The table presents the results of a model trained and tested on two different datasets, identified by the "dataset" column. The "MSE" column shows the mean squared error, a metric used to evaluate the performance of regression models, for each dataset. The "MAE" column shows the mean absolute error, another metric used to evaluate the performance of regression models, for each dataset. Finally, the "DICE" column shows the DICE coefficient, a metric used to evaluate the performance of image reconstruction models for each dataset.
The results generally show that the model performed better on the training dataset than on the test dataset. The MSE, MAE and DICE values are generally lower for the training dataset than for the test dataset.
On the training dataset, the MSE is 0.01337, the MAE is 0.01582, and the DICE coefficient is 51.536%. On the test dataset, the MSE is 0.01404, the MAE is 0.01647, and the DICE coefficient is 47.556%. In figure 10, the 8 samples of results of Transmission Tomography reconstruction are presented. Columns 1 and 4 represent the original scene image, columns 2 and 5 represent the reconstructed scene, and columns 3 and 3 represent thresholded results.

Conclusion
This paper presents a new hybrid algorithm for EIT and UTT scene reconstruction (with all models trained separately). This algorithm consists of multiple SVM models that reconstruct separate codes from the convolutional autoencoder based on EIT or UTT measurements using proper models. Since used convolutional autoencoder could be trained using augmented image data, this hybrid algorithm is very useful for real datasets where data is often limited (as was in our UTT experiments).
Notice that this is our initial research on EIT and UTT scene reconstruction, so the used experimental setup is relatively simple. However, we are planning to use this hybrid method for medical applications. Since convolutional autoencoders have strong possibilities of complex shape reconstruction, we trust that this hybrid solution will also be useful for realistic medical application setups.
The models were implemented in Python with a GPU version of Tensorflow (for convolutional autoencoder) and Scikit -Learn (for SVM libraries).