Underwater radio frequency image sensor using progressive image compression and region of interest

The increasing demand for underwater robotic intervention systems around the world in several application domains requires more versatile and inexpensive systems. By using a wireless communication system, supervised semi-autonomous robots have freedom of movement; however, the limited and varying bandwidth of underwater radio frequency (RF) channels is a major obstacle for the operator to get camera feedback and supervise the intervention. This paper proposes the use of progressive (embedded) image compression and region of interest (ROI) for the design of an underwater image sensor to be installed in an autonomous underwater vehicle, specially when there are constraints on the available bandwidth, allowing a more agile data exchange between the vehicle and a human operator supervising the underwater intervention. The operator can dynamically decide the size, quality, frame rate, or resolution of the received images so that the available bandwidth is utilized to its fullest potential and with the required minimum latency. The paper focuses first on the description of the system, which uses a camera, an embedded Linux system, and an RF emitter installed in an OpenROV housing cylinder. The RF receiver is connected to a computer on the user side, which controls the camera monitoring parameters, including the compression inputs, such as region of interest (ROI), size of the image, and frame rate. The paper focuses on the compression subsystem and does not attempt to improve the communications physical media for better underwater RF links. Instead, it proposes a unified system that uses well-integrated modules (compression and transmission) to provide the scientific community with a higher-level protocol for image compression and transmission in sub-sea robotic interventions.


Introduction
In the context of the MERBOTS research project (http:// www.irs.uji.es/merbots/), a 3-year coordinated project funded by the Spanish government for the period 2015-2017 under grant DPI2014-57746-C3 [28], one of the objectives is to build a wireless communication system that can provide freedom of movements to the underwater robot and, at the same time, to allow the operator to get feedback and supervise the intervention (Fig. 1). The robotic system under development will assist the archeologists in the detailed work of monitoring, characterization, study, and reconstruction and preservation of archaeological sites, always in accordance with the continuous supervision of the human expert.
One of the objectives of the MERBOTS project is to provide different communication technologies that can be used to allow the operation of a vehicle without any physical connection to the surface operators, which supervise and control an intervention task. This differentiates the project from previous national and international research projects in the field of underwater robotic intervention (i.e.. RAUVI [30], TRITON [29], EU FP7 TRI-DENT [31]).
The present article describes the current state of the wireless underwater vision system that is able to transmit telemetry data as well as compressed low-resolution images, allowing further implementation of cooperative intervention missions. The depth embedded block tree (DEBT) compression system has been designed specifically to maximize the efficiency of the underwater intervention application, taking into account the following facts: • It uses a progressive compression technique designed to reduce the latency, taking into account that this is an essential part of the whole robotic control system. Also, high-quality images, even lossless, can be stored locally and only a prefix of it of any size can be transmitted so that the original images can be retrieved at a later time to be studied or archived in more detail. • The compression algorithm has been implemented to be used in very low-bandwidth scenarios. It can be applied in underwater radio frequency communications and other robotic applications. • An implementation of the compression algorithm has been realized which is capable of compressing more than 30 fps on low power embedded computers (e.g., Raspberry Pi 3 Model B). • The system is able to send usable images using only a few hundred bytes per frame. The stream can simply be truncated to send a lower-quality version of the image. • The user is able to select one or more regions of interest (ROI) to get more quality in specific parts of the image.
This article explains how this technique has been implemented.
• The compression algorithm was tested with a battery of underwater images to better adjust the compression parameters for underwater intervention missions, remembering that JPEG2000 (Joint Photographic Experts Group 2000) is optimized for larger packets that are unrealistic for underwater acoustic or RF transmissions [12].
The use of underwater radio frequency (RF) links is a prime example of a low-bandwidth scenario, but the techniques described here can also be applied to any kind of bandwidth-constrained scenario. The techniques described here can also be used to enhance usability by a great deal in normal-bandwidth scenarios, especially when dealing with remote image searching, using either a lowquality or low-resolution version of the original image and only requesting more details when the target image has been found. The RF link was used to test a low-bandwidth communication channel in a real way and not in simulated mode, with less bandwidth than an ultrasonic modem commonly used in underwater applications. RF modems are less expensive and there was an acoustic modem available for testing. Thus, the RF modem allowed to test and validate the DEBT algorithm with progressive image compression and ROI in conditions of lower bandwidth, even having distance limitation for RF transmission. That is, if the DEBT algorithm works properly with RF links, it will certainly work with acoustic modems. Table 1 makes a small comparison of the most important differences between the DEBT algorithm [25] and the wellknown JPEG (Joint Photographic Experts Group) [21] and JPEG2000 [41] algorithms. JPEG is an aging algorithm Fig. 1 Search and recovery envisioned concept in the context of archeology. A wireless RF link provides feedback to the user that is supervising the intervention, Autonomous Underwater Vehicle (AUV) Girona 500 and Sparus II that, although being quite fast, performs poorly under high compression and does not possess the necessary features. JPEG2000, on the other hand, is a quite complex and sophisticated algorithm that compresses well under almost all conditions and has most of the needed features, but is quite slow. A more detailed explanation of the compression features of the DEBT algorithm will be given in Sect. 4.

Related works
Suzuki and Sasaki [39] proposed the first system to demonstrate image transmission over a vertical path which was developed in Japan. The JPEG standard discrete cosine transform (DCT) was used to encode 256 Â 256 pixel still images with 2 bits per pixel. Transmission of about one frame per 10 seconds was achieved using 4-DPSK (differential phase shift keying) at 16 kbps. Remarkable results obtained with this system included a video of a slowly moving crab, transmitted acoustically from a 6500 m-deep ocean trench. Another vertical path image transmission system was developed in France and successfully tested in 2000 m-deep water. This system was also based on the JPEG standard and used binary DPSK for transmission at 19 kbps. An image transmission system has been developed in a Portuguese effort called ASIMOV [8]. In this project, a vertical transmission link is secured by a coordinated operation of an AUV and an autonomous surface craft (ASC). Once the site is chosen and the vehicles are positioned, transmission of a sequence of still images of about 2 frames/s is accomplished at 30 kbps using an 8 PSK (phase-shift keying) modulation method. Another experiment of an underwater video transmission system, developed in Japan [39], employs 4 PSK, 8 PSK, and 16 quadrature amplitude modulation (QAM) signals with 40 kHz bandwidth to achieve transmissions up to 128 kbps. The system uses 100 kHz carrier frequency and was tested over a short vertical path of 30 m.
Because underwater images have low contrast, their information is concentrated at low frequencies. Thus, by decomposing the image information into low-and highfrequency subbands, and encoding the low bands with more precision, it is possible to achieve higher compression ratios. This is the basic motivation behind the work in [10] which used the DWT (discrete wavelet transform) in place of the standard DCT. This algorithm was applied to a sequence of underwater images, taken at 30 frames per second, each having 256 Â 256 8-bit pixels. The achieved compression ratio of 100:1 provided very good quality monochrome video. The resulting bit rate needed to support such high quality is on the order of 160 kbps, which surpasses the capabilities of the current acoustic modem technology.
Another system that exploits wavelet-based compression together with motion compensation is proposed [17]. Although it attains approximately the same compression ratio (100:1) as [10], it has better visual intelligibility because it employs a generalized dynamic image model (GDIM) that decouples the geometric and photometric variations in an image sequence commonly encountered in deep sea imagery. This approach is in contrast with ordinary terrestrial motion-compensated algorithms, where steady and uniform illumination is the underlying assumption. Using 128 Â 128 pixel frames and 30 frames/ s, the resulting bit rates needed to support real-time video transmission were in the order of 40 kbps.
Pelekanakis [20] presents a high bit rate acoustic link for underwater video transmission. Currently, encoding standards support video transmission at bit rates as low as 64 kbps. While this rate is still above the limit of commercially available acoustic modems, prototype acoustic modems based on phase coherent modulation/detection have demonstrated successful transmission at 30 kbps over a deep water channel.
An experimental system [20], based on DCT and Huffman entropy coding for image compression, and variable rate Mary quadrature amplitude modulation (QAM) was implemented. Phase-coherent equalization is accomplished by the joint operation of a decision feedback equalizer (DFE) and a second-order phase locked loop (PLL). System performance is demonstrated experimentally, using a transmission rate of 25,000 symbols/s at a carrier frequency of 75 kHz over a 10 m vertical path. Excellent results were obtained, thus demonstrating bit rates as high as 150 kbps, which are sufficient for realtime transmission of compressed video. Eastwood et al. [4] presents techniques for compression of laser line scan and camera images, as well as formatspecific data compression for quick-look sonar mapping data. For image compression, both JPEG and a waveletbased technique called Efficient Pyramid Image Coder (EPIC) are examined. JPEG is found to be less efficient than the wavelet transform, but has the advantage of being robust with respect to lost data packets. The wavelet-based transform is more efficient at high compression rates, though above a certain rate both offer similar performance.
Walter et al. [44] presents a new wavelet-based image compression system. The compression system is based on a particular type of compressed encoding of wavelet transforms called wavelet difference reduction (WDR) and describes experimental results in applying a compression algorithm to a suite of underwater camera images. These underwater camera images were required to be compressed at very high compression ratios (400:1, 200:1, 100:1, and 50:1) and the algorithm produced very high-fidelity decompressions. In fact, it performed at a comparable level to a system based on the celebrated Daub CDF-9/7 system (used in JPEG2000 [13]), yet employing 256 times less RAM (random access memory) and a 16-bit dynamic range (with 8-bit images) instead of a 32-bit dynamic range.
Murphy [16] presents an analysis of the unique considerations facing telemetry systems for free-roaming autonomous underwater vehicles used in exploration. These considerations include high-cost vehicle nodes with persistent storage and significant computation capabilities, combined with human surface operators monitoring each node. He then proposes mechanisms for interactive, progressive communications of data across multiple acoustic hops. These mechanisms include wavelet-based embedded coding methods and a novel image compression scheme based on texture classification and synthesis. The specific characteristics of underwater communication channels, including high latency, intermittent communication, the lack of instantaneous end-to-end connectivity, and a broadcast medium, were taken into consideration.
Kaeli in his PhD thesis [12] shows that the fundamental problem in autonomous underwater robotics is the high latency between the capture of image data and the time at which operators are able to gain a visual understanding of the survey environment. Typical missions can generate imagery at rates hundreds of times greater than highly compressed images can be transmitted acoustically, delaying the understanding until after the vehicle had been recovered and the data analyzed. His thesis presents a lightweight framework for processing imagery in real time aboard a robotic vehicle. The work implements a framework on real underwater datasets and demonstrates how it can be used to select summary images for the purpose of creating low-bandwidth semantic maps capable of being transmitted acoustically.
Kaeli [12] compares JPEG, JPEG2000 and set partitioning in hierarchical trees (SPIHT). JPEG is a common example of a lossy compression format which uses the DCT for each 8 Â 8 block to achieve roughly 10:1 compression without major perceptual changes in the image. JPEG2000 employs variable compression rates using progressive encoding, meaning that a compressed image can be transmitted in pieces or packets that independently add finer detail to the received image. This is particularly well suited to underwater applications, where acoustic channels are noisy and subject to high packet loss; however, it is optimized for larger packets that are unrealistic for underwater acoustic transmissions. Recent work has focused on optimizing similar wavelet decomposition techniques for underwater applications using smaller packet sizes with SPIHT.
Zheng et al. [49] presents a special application of delaytolerant networks (DTNs). Efficient data collection in deep sea poses some unique challenges, due to the need for timely data reporting and the delay of acoustic transmission in the ocean. Autonomous underwater vehicles are deployed in deep sea to surface communications and frequently have to transmit collected data from sensors (in a two-dimensional or three-dimensional search space) to the surface stations. However, additional delay occurs at each resurfacing.
Senapati et al. [32] presents a listless implementation of a wavelet-based block tree coding (WBTC) algorithm of varying root block sizes. The WBTC algorithm improves the image compression performance of SPIHT at lower rates by efficiently encoding both inter-and intra-scale correlation using block trees. Though WBTC lowers the memory requirement by using block trees compared to SPIHT, it makes use of three ordered auxiliary lists. The proposed algorithm is combined with DCT and DWT to show its superiority over DCT-and DWT-based embedded coders, including JPEG2000 at lower rates. The compression performance on most of the the standard test images is nearly the same as WBTC, but it outperforms SPIHT by a wide margin particularly at lower bit rates.
Pearlman et al. [18] proposes an embedded, blockbased, image wavelet transform coding algorithm of low complexity. It uses a recursive set partitioning procedure to sort subsets of wavelet coefficients by maximum magnitude with respect to thresholds that are integer powers of two. It exploits two fundamental characteristics of an image transform: the well-defined hierarchical structure and and energy clustering in frequency and in space. They describe the use of this coding algorithm in several implementations, including reversible (lossless) coding and its adaptation for color images, and show extensive comparisons with other state-of-the-art coders, such as SPIHT and JPEG2000.
Zhang et al. [48] presents a new underwater video compression technique based on adaptive hybrid wavelets and directional filter banks to achieve both high coding efficiency and good reconstruction quality at very low bit rates. A key application is the real-time transmission of video through acoustic channels with limited bandwidth from an autonomous underwater vehicle to a surface station, e.g., for man-in-the-loop monitoring and inspection operations.
According to Esmaiel [6,9], the SPIHT coder based on the wavelet algorithm is probably the most widely used for image compression, as well as being a basic standard of compression for all subsequent algorithms [5,27,42]. In SPIHT, the information bits are sorted according to the bit information significance. The protection level of transmitted data must take this feature into account and progressive protection is provided to the transmitted bits. This methodology is used to reduce the distortion in the reconstructed image (reduce the difference between the original and the reconstructed images). After image decomposition with the CDF-9/7 wavelet, the general SPIHT coding algorithm encodes images by splitting the decomposed image into considerable sections on the basis of the significance classification function [38].
Mohammed and Hamada [14] propose a new scheme for efficient rate allocation in conjunction with reducing peakto-average power ratio (PAPR) in orthogonal frequencydivision multiplexing (OFDM) systems. Modification of the SPIHT image coder is proposed to generate four different groups of bit streams relative to its significances. The significant bits, the sign bits, the set bits and the refinement bits are transmitted in four different groups.
None of the references presented use a solution based on progressive image compression and region of interest (ROI) and, together, these are the main contributions of the currently developed algorithm, DEBT. Rubino et al. [25] presents some initial results for the Raspberry Pi Model 2B platform that allowed for the validation of the proposed approach in a simulated way without using a real RF link. In this work, we present real results obtained with an RF link using the Raspberry Pi 3B platform and also describe the system in more depth in experimental tests carried out at the University of Girona using two AUVs, the Girona 500 and Sparus II (http://cirs.udg.edu/auvs-technology/ auvs/), Fig. 1.

The intervention domain
Robotic applications and, particularly, Autonomous underwater vehicles for intervention (I-AUV) use images from its built-in camera(s) as one of its main sources of data, among others, to control its internal algorithms. In a supervised system, these images should reach the operator with the lowest latency and with the highest quality possible so that he can interact with the system and adjust the task execution in a supervised manner.
As an example, this kind of control has been experimented in the FP7 TRIDENT project, to perform autonomous visually guided grasping in the sea [22].
Besides this, communications is a crucial subsystem in any robotic application, especially the ones that permit the user to interact remotely with the system. Because of that, image compression and transmission is necessary to send the required information with the lowest latency and without compromising the network and the whole system.
Although recent studies demonstrate that, using the most efficient modulation methods, it is possible to transmit video through an underwater channel using acoustic signals [19,24] and Blue Light [7], both acoustic and optical signals are not capable of passing through solid objects that could be in the line of sight of the wireless transceivers. Moreover, the performance of these methods depends heavily on the characteristics of the underwater scenario and the type of the channel. On one hand, acoustic systems are greatly affected due to multi-path if the link is horizontal, and also by the acoustic noise originated by human activity or the noise of the sea waves, animals, and other sources. The acoustic noise constrains the range of typical frequencies used in acoustic systems between 8 and 155 kHz [36], which makes it very difficult to achieve high data rates. On the other hand, communication methods based on visible light only work fine on very clear waters, are greatly affected by scattering, suffer attenuation by absorption, and usually need accurate alignment.
Nevertheless, RF-based solutions are not as affected by the typical problems of the acoustic and optical methods, and are much cheaper. Moreover, RF signals can propagate easier from a medium to another, allowing the establishment of a communication link to an underwater transducer from the surface.
The main problem of using RF is the high attenuation that it suffers when the waves go through the water. However, different studies [34,35,47] indicate that, with the necessary antennas, at lower frequencies, and using the best modulation methods, it is possible to set up a communication link up to several tens of meters through the water.
It is worth mentioning that the objective of the present work is not to improve the communications physical media for better underwater RF links, but the design of a unified system that uses well integrated modules (i.e., compression and transmission), to provide the scientific community with a higher-level protocol for image compression and transmission in sub-sea.
The application of the most advanced progressive image compression algorithms, as the ones presented in this document, allows image transmission rates of several frames per second, at the typical latency of the radio frequency communications.
In the proposed system, a progressive image compression technique and the use of ROI are demonstrated.

Overall system description
As can be seen in Fig. 2, the system has two main parts: (1) the sensor side and (2) the operator one. At the vehicle side, the developed circuitry is installed in the OpenROV (https://www.openrov.com/) housing cylinder. It includes a Raspberry PI computer running Linux, the camera acquisition, the compressor and ROI, and the transport layer modules. This is connected to an Arduino board that controls the RF Radiometrix transmitter.
At the user side, the underwater RF receiver is connected to the user computer through a universal serial port (USB) port, and provides the user interface that enables the operator to get the compressed images and select the corresponding region of interest for further inspection. The RF transmission system is at an early stage and, for evaluation purposes, a low-power radio module has been used. Further work will concentrate on using antennas and transceivers better suited for longer distances. The RF devices used for this experiment are the commercial low power UHF (Ultra High Frequency) modules BiM3B (http://www.radiometrix.com/content/bim3b), which work over the 868.3 MHz at 25 mW, and 1/4 wave antennas. On the transmitter side, the electronics involved are the RPi2B and RPi3B, the RaspiCam, an Arduino Pro Micro, and an RF module. On the receiver side, an Arduino and an RF module are used. All the electronics were encapsulated properly within a watertight container. Transmitter and receiver containers were attached and fixed to a wooden stick which was immersed at a depth approximately of 15 cm (Fig. 2).
For this experiment, 100 encoded images were transmitted for each distance point at 20, 40, 60, 70, 80, and 100 cm. A prefix of 400 bytes of each encoded image was transmitted, and each one within a single protocol data unit (PDU). The prefix corresponds to the whole image input for a given quality that fits in a particular size (e.g., 400 bytes). The number of reception errors (packets lost plus packets with errors) and the received signal strength indicator (RSSI) were measured for each distance. The receiver (Arduino ? RF module) was connected to a PC (personal computer) through a USB bus, where a process decoded and displayed each received image and also showed an error counter. The RF link was established at 9600 baud. Each packet has an overhead length of 15 bytes and With this preliminary implementation, it was possible to transmit each one of the 100 images without errors at a maximum distance of 60 cm in freshwater. Once a distance of 70 cm was reached, only 36 of the 100 images were received properly. For larger ranges ([70 cm), it was not possible to receive any image using the transceivers and antennas used in this particular experiment. Figure 3 shows the RSSI sampling at each position and Fig. 4 shows the FPS (frames per second) obtained with the same configuration of the protocol used in the water experiments, for different lengths of the encoded image prefix. As shown in the image, with a length less than or equal to 800 bytes, a transmission at a frame rate greater than 1 FPS is possible. Usually, a length of 800 bytes allows an operator on the surface to properly monitor the camera sensor input.

Image compression
Image compression is a transformation applied to an image to reduce its size as much as possible to store or transmit it in a more efficient manner. There is a clear distinction between lossless and lossy compression. In lossless compression, the decompressed image will be exactly the same as the original image, while in lossy compression the decompressed image will be an approximation of the original image. Digital images usually have three color components, which means that what we perceive as one color image is (without loss of generality), in fact, composed of a luminance channel (black and white version of the color image) and two color difference channels (which can usually be sub-sampled without much visual loss).
As opposed to video compression, which takes advantage of the high temporal correlation between adjacent frames in a video sequence and creates inter-frames which are dependent on previous frames, image compression exclusively creates intra-frames, which are independently compressed frames. Intra-frames have the advantage of being able to rapidly adapt to changing conditions in the communications channel as well as increased flexibility in dynamically changing the frame rate and quality parameters, which are of great importance in low-bandwidth and low-latency communications.
Compression in general and image compression in particular is a very application specific task, with many available trade-offs and many different algorithms that try to maximize (or minimize) some design criteria. Most image compression algorithms are lossy algorithms designed with the sole purpose of minimizing the resulting size of the image with minimal regard to other constraints and usually the whole compressed data is necessary to be able to decompress it. A prime example of this class of algorithms is the JPEG algorithm, which is a de facto standard but performs very poorly under high compression.
Progressive or embedded image compression is such that it is trivial and very inexpensive in terms of processing power (there is no need to decompress and recompress the image) to supply an image which is either a lower-resolution or a lower-quality approximation of the original image. Preferably, the compressed image could be simply truncated at any point, yielding a lower-resolution or lower-quality version of the original image (in this sense, progressive lossless streams can become lossy by simple truncation). In the case of color images, we could also prepare the image in such a way that a monochrome version of it could be obtained with the same progressive characteristics as before.

The depth embedded block tree (DEBT) algorithm
The main properties sought for a proper implementation of our communications framework are:  • Quality, resolution, and color channel scalability: truncating the stream should result in the ''best'' approximation for the original image or, by rearranging and truncating the stream, in the ''best'' approximation to a scaled, monochrome or color version of the image; • Region of interest (ROI): definition of certain areas of the image that should be compressed with lower distortion than the rest of the image, allowing for very high compression ratios while keeping these areas with high quality. The ROI areas could be either chosen automatically by an object recognition mechanism, or by the user, who desires a higher quality on a region that is currently not detailed enough; • Embedded lossless: allow for lossless compression with a stream that yields the ''best'' possible image at any truncation point and interpreting the truncated stream as a lossy version of the image which was compressed. The image could be stored losslessly for later archival, but any desired truncated part of it could be transmitted in real time, only compressing the image once and giving the flexibility of dynamically choosing the transmitted quality or size; • Fast and parallelizable: it should perform well on low power, small single board computer (SBCs) with optional hardware floating point arithmetic support and with compression speeds comparable to the joint photographic experts group (JPEG) algorithm. Also, the algorithm should be parallelizable to take advantage of current and future multi-core processors, allowing both lower latency and higher throughput; • High compression: while this seems to be an obvious property for any image compression algorithm, the goal is to be competitive with current state-of-the-art image compressors.
Scalable compression usually takes advantage of multiresolution signal decomposition, which is natural for dyadic wavelet decomposition but can also be used with DCT [46] or other block transforms by simply rearranging its coefficients. There are two major classes of transform-based image compression algorithms. The first follows the transformmodel-code paradigm with a very distinctive separation of the three main steps: the transformation (either a block or wavelet transform), followed by statistical modeling of the coefficients and bit allocation, followed by entropy coding in the form of some sort of context-adaptive arithmetic coding. All JPEG coders are in this category and depend strongly on the final step, which is usually quite slow and needs many operations for each output bit. The JPEG2000 standard, which is the current state-of-the-art image compressor, is an example of this traditional scheme [it is based on the embedded block coding with optimized truncation (EBCOT) [40] algorithm].
On the other hand, there are other algorithms which do not have a clear distinction between the model and code steps and do not rely on any sort of final entropy coding, which should make them quite fast as well as good candidates for a parallel implementation. However, most of them rely on orthogonal wavelets which makes them unsuitable for lossless compression and most of them have implementation issues dealing with list manipulation and high memory use. Also, none of these algorithms have an efficient implementation available and none of them possess all our requirements simultaneously. Some of the best known algorithms in this class are EZW [33] (embedded zerotree wavelet), SPIHT [26] (set partitioning in hierarchical trees), SPECK [18] (set partition embedded block), HBC [45] (hybrid block coder), WBTC [15] (wavelet block tree coder), and GTW [11] (group testing for wavelets), among many others.
The DEBT algorithm has been designed according to this second class of algorithms and possesses all the stated properties above. Basically, it consists of: 1. Wavelet transform: transform the image using a wavelet transform in N levels ( Fig. 5 shows a threelevel dyadic wavelet decomposition). Currently, the 5/3, 9/3, 9/7, and 13/7 symmetric biorthogonal integer transforms built from the interpolating Deslauriers-Dubuc scaling functions [2] were implemented, along with the the popular real-valued CDF-9/7 symmetric biorthogonal transform [2]; (Figs. 5, 6) 2. The concept of variable-depth trees (simply referred to as trees) and variable-depth blocks (simply referred to as blocks) is introduced and these are the main data structures used to group similar magnitude coefficients, so that the necessary significant coefficient In fact, the DEBT algorithm can be viewed as a superset, generalization, unification, and improvement of many of the existing set partition algorithms (SPIHT, SPECK, HBC, WBTC, and others) by simply changing a single parameter. Blocks are always partitioned into lower-depth variable-depth subblocks (simply called subblocks); 4. To achieve good embeddedness (bit allocation), a modeling of the coefficient distribution must be taken into account so that the instant distortion reductions for significant and refinement coefficients are predicted, which will lead to the desired distortion reduction per bit. Currently, the ordering is done assuming a Laplacian distribution for the wavelet coefficients, but a more precise modeling, using a generalized power distribution, is under investigation and should yield a better ordering and, therefore, better embeddedness. Table 2 shows the square root of the instant distortion decrease per bit for significant (diagonal values) and refinement coefficients (the Laplacian distribution, just like the uniform distribution, has the nice property that all refinement distortion decreases for bits in the same bit plane are equal, irrespective of this coefficient's significant bit plane). For a Laplacian distribution with variance r 2 , the values of the distortion decrease per significant coefficient in an interval [a, b) are given by and the values of the distortion decrease per refinement coefficient which are significant in an interval [a, b) are given by and n represents the refinement level.
The number of bits per significant coefficient for coefficients that are significant in interval [a, b) is given by where and HðxÞ is the binary entropy function (in bits) defined for 0 x 1 as where E is the entropy function (in bits) defined as Eð0Þ ¼ 0 and, for 0\x 1, as The distortion decrease per bit for significant coefficients is then calculated by dividing Eq. 1 by Eq. 4, i.e., DD=g. In the case of the DEBT algorithm (and most set partitioning algorithms), the number of bits per refinement coefficient is 1, even though this is not their real entropy.
5. In general, most DWTs are not energy preserving so that each subband contributes differently to the total distortion. The weight for each subband can be calculated as a function of its respective reconstruction filter [43] and should be used as a factor for all the   Table 2 for each respective subband. Table 3 shows the weights for each subband for the ibior-13/7 wavelet. 6. Precise distortion decrease assignment to each significant and refinement (subband, bit plane) pair indicates the most important one to send serving as an embedded bit allocation. Figure 6 shows an example of a six-level transform resulting in coefficients where the maximum absolute value has 9 bits, ranging from bit planes 0 to 8. Each cell contains the n-th column of Table 2 (where n corresponds to the bit plane level) scaled by the respective subband gain. As there is a single column for the LL subband, and N columns for each of the HL, LH, and HH subbands (N is the number of decompositions), there is a total of ð3N þ 1ÞBðB þ 1Þ=2 weighted distortion reduction per bit values, where B is the number of bit planes, in the general case. In the case of a Laplacian distribution, as all refinement values are the same for each column, the number of weighted distortion reduction per bit values can be reduced to ð3N þ 1Þð2B À 1Þ; 7. Scan all weighted distortion reduction per bit values in decreasing order, output the necessary significance and refinement information, and keep the set partition and decomposition (addressing) information. All housekeeping is done on a fixed size memory pool with size dependent on the image dimensions. Roughly speaking, for an N Â M 8-bit image, the DEBT algorithm needs one N Â M 16-bit array for the transform coefficients, one N Â M 16-bit array for coefficient management, and and one N=2 Â M=2 32-bit array for set management.
There are many ways in which different compression algorithms can be evaluated and compared. For quantifying the error between images, two measures are commonly used. They are the mean square error (MSE) and the PSNR (peak signal to noise ratio). The MSE between an image fy k g and its approximation fŷ k g is given by: where N is the total number of pixels in each image. The PSNR between two (8 bpp) images, in decibels, is given by PSNR ¼ 10 log 10 255 2 MSE ; and is used more often since it is a logarithmic measure and the human brain seems to respond logarithmically to changes in intensity. Increasing PSNR means increasing the fidelity of compression and, as a rule of thumb, when the PSNR is greater than or equal to 40 dB, it is said that the two images are virtually indistinguishable by human observers.
To compare the compression ratio obtained by the DEBT algorithm, standard test images were used. Figure 9 shows the standard ''lena'' and ''barbara'' 512 Â 512, 8-bit gray level images and Fig. 10 shows the respective PSNR curves obtained by using the CDF-9/7 wavelet transform and six levels of decomposition with the DEBT algorithm. Table 4 compares DEBT versus JPEG2000 for various compression rates. The resulting sizes were obtained by applying the respective rate for the JPEG2000 compressor which were then used to make DEBT compress exactly to those sizes. Even though the current DEBT algorithm is not yet finished and still needs tuning, it does a better job at compressing the test images than the JPEG2000 algorithm by up to 0.5 db without using any kind of entropy coding and while being much faster.
Embedded image compression is a very efficient way to cope with varying transmission bandwidth problems in hard real time systems, where it would be better to have a low-quality version of the current image instead of a highquality version of an old image. The main idea behind using embedded image compression in the current scenario is to group the source of the data (image) with the transmission channel into one manageable whole, increasing the adaptability of the whole system by varying the amount of data transmitted when the channel capacity changes, i.e., increase the image quality when there is bandwidth available and decrease it when there is not, to meet a predefined maximum latency or bandwidth. Also, to cope with the need of very low latency, the current algorithm has been developed with parallelism in mind, being able to use many threads of execution to decrease the latency as much as possible.

Underwater test images
To test and tune some parameters of the algorithm to underwater imagery, we used a set of 1258 underwater grayscale images from the Australian Center for Field Robotics (http://marine.acfr.usyd.edu.au/datasets/data/ TasCPC/TasCPC_LC.tar.gz). A low bit rate compression performance comparison against the JPEG2000 algorithm was done using the same parameters we used for the lena and barbara images. The PSNR difference between the  DEBT and JPEG2000 algorithm was plotted for four different rates (0.0005, 0.001, 0.005, and 0.01). All images are 1360 Â 1024 pixels and were numbered in lexical order from 1 up to 1258. Figure 11 shows the compression difference in db for all images for rates 0.0005 and 0.001. It is quite clear that the DEBT algorithm performs better for all images, without exception, for these lower rates by quite a significant margin, reaching a difference of 2.69 db in the 0.0005 case and 2.19 db in the 0.001 case. On average, the gain of DEBT over JPEG2000 is 0.43 db and 0.30 db for rates 0.0005 and 0.001, respectively.
As the rate goes up (less compression and more quality), the DEBT algorithm is still superior to the JPEG2000 for all 1258 images in this dataset except for 1 in the 0.005 case and 2 in the 0.01 case, and in all these cases the compression quality was very high (the images were very dark), over 40 db or higher for both algorithms. Figure 12 shows the results for these rates. On average, the gain of DEBT over JPEG2000 is 0.23 and 0.25 db for rates 0.005 and 0.01, respectively.
A few images were randomly selected from this dataset and are presented in Fig. 13. Each row shows the original image (1st column) and compressed with the DEBT algorithm at exactly 500, 1000, and 2000 bytes on the second, third, and fourth columns, respectively.
An important point to note is that the DEBT algorithm makes it very easy, by simply changing the weights on each cell shown in Fig. 6, to create a stream which is not optimal in the MSE sense, but which could be better suited to highlight other characteristics of the image at very low bit rates. These alternate metrics should be the subject of further investigation.

Region of interest (ROI)
In a low-bandwidth scenario or when the images are highly compressed, there may be circumstances where the image is still not good enough for an operator to distinguish the necessary details. In this case, the use of ROI is an elegant solution to the problem of being able to see the details in avg size=652.408  For ROI processing, there must be a way for the decoder to know which regions were encoded with higher priority than others. A common method known as maxshift [3,37] is commonly used so that the bit planes of the ROI region are encoded in its entirety before any bit planes of the rest of the image (background) (see Fig. 14). This has the advantage of almost no overhead (only the number of extra bit planes are sent so the decoder knows that after reaching this number of bit planes it should unscale the received coefficients by the amount of bit planes remaining) but has the disadvantage of having to send the whole ROI, with all its details, before receiving a single bit from the rest of the image.
A more useful method known as scaling [3,37] consists of simply shifting the ROI coefficients by a certain number of bits, so that they fool the bit allocation algorithm into thinking that they are more important than they actually are and coding them before other coefficients that became smaller due to the scaling (see Fig. 14). In fact, this effectively blends the ROI coefficients with background coefficients which are also important (same order of magnitude), such that the results are seen with good quality in a lower-quality background. The main disadvantage of this method is that an ROI map must be sent as extra information to the decoder (overhead), increasing the minimum amount of bits necessary to recover a suitable approximation to the original image.
This map information can consist of object coordinates (rectangles, ellipses, or arbitrary polygons) or, in case of an arbitrary region, a bit map of the ROI. In this last case, to reduce the amount of overhead, this could be the resulting map on the last decomposition subband (LL), thereby significantly decreasing the bit map size but having the drawback of using a coarser scale, depending on the number of dyadic decompositions (for an n-level dyadic decomposition the grid would be 2 n Â 2 n pixels). This bit map usually consists of a small region and, therefore, is a good candidate for some form of run-length encoding (Fig. 15).
Other methods, which interleave the ROI coefficients with the background coefficients, in a predetermined and alternating order, have also been devised to minimize the transmission of overhead information, but most of them require fundamental algorithmic changes so that both the encoder and the decoder scan the bit planes in the same order and are more complex than the maxshift and scaling method. The examples used in this paper were prepared with the scaling algorithm with arbitrary coarse regions as described above.
It should be observed that, in general, the use of ROI will impact negatively on the PSNR of the whole reconstructed image, but will improve significantly the fidelity in the ROI region itself. In our example, Fig. 16 shows the difference for coding the region around the text with 4 bits of shifting in comparison to the coding of the image without any ROI. The image used for this is the monochrome 1920 Â 1080 image used in Fig. 17

Implementation
A working implementation has been developed in the C programming language with special vector code for both Intel and ARM processors to speed up the inner loop of the wavelet transform routine. A few wavelet transforms were implemented: integer wavelet transforms (interpolating biorthogonal integer transforms 5/3, 9/3, 9/7 and ibior-13/ 7) and the CDF-9/7 real-valued transform. The wavelet used for the examples in this paper was the ibior-13/7 transform, which has a highpass filter with 7 taps and a  lowpass filter with 13 taps. This wavelet allows for lossless compression and can also be truncated at any point yielding performance within 0.5 db of the CDF-9/7 real-valued transform, but being much faster due to the all-integer arithmetic and 16-bit coefficients. For the 1920 Â 1080 pixels, 8 bpp gray level image used (Fig. 17), we ran the compressor in both a Raspberry Pi 2 Model B (RPi2B) and Raspberry Pi 3 Model B (RPi3B). The RPi2B is based on a 1.0 GHz quad code ARM processor (quad core ARM Cortex-A7 with 512 kB L2 cache) manufactured by Broadcom (BCM2836 SoC), while the RPi3B, the thirdgeneration Raspberry Pi, uses a Broadcom BCM2837 SoC (quad core ARM Cortex-A53 with 512 kB of L2 cache) operating at 1.3 GHz with 1 GB of DDR2 RAM. Both use a 32-bit memory bus which was operated at 500 MHz.
The timings for each board for compressing the image on the upper left corner of Fig. 17 with and without ROI are displayed on Tables 5 and 6. The algorithm has a parameter that specifies either the max size or a ''quality'' factor (which bears some inverse relation with the PSNR). The normal usage (if lossless compression is not required) is to use a ''good'' quality parameter for local storage and transmission of any prefix of this file for lower-quality versions of the compressed image. The ''quality'' used for each line of the tables was 0, 4, 8, 12, 16, and 24. The wavelet used was the ibior-13/7 b-spline interpolating integer transform with six decomposition levels. The first line of each table (where the ''quality'' parameter is 0) is for the lossless case, where MSE ¼ 0 (PSNR ¼ 1) and all timings were based on the single-threaded version of the algorithm.
The column labeled ''pre'' (Tables 5, 6) is the time (in milliseconds) for the wavelet transformation and all other tasks needed to actually start running the compression algorithm (mean extraction, significance map, etc). This step is independent of the amount of bits output and is in fact a lower bound for images of this size (it is almost independent of the contents of the image itself and mostly dependent on the image dimensions alone).
The column labeled ''code'' (Tables 5, 6) is the time (in milliseconds) for the actual compression algorithm and is directly proportional to the amount of bits output. Therefore, the specification of a quality factor or a maximum size will have a great impact on this part and in the total running time for the image compression.
7.1 Benchmark: DEBT Â JPEG2000 Table 7 compares DEBT with JPEG2000. The JPEG2000 implementation used was the ''JasPer'' program [1] version 2.0.12 with six levels of decompositions and the CDF-9/7 wavelet transform. To do a ''fair'' comparison, we have also included a run of our algorithm with the same number of decompostions (6) and the same wavelet (CDF-9/7) along with the previously used six levels of decompositions and the ibior-13/7 integer wavelet transform (used for the timing results in the previous tables).
It is important to note that, because JPEG2000 does not has an option of exact output size, the amount of bytes used in the comparison was given by the resulting size of the JPEG2000 file by using the following: jasper -input tank.pgm --output tank.jpc --output-format jpc -O rate=X, where X is the rate (first column) for the compression. The resulting file size was then used to compress the same image using our algorithm to this exact size, once with our current parameters (6 decomposition levels and the ibior-13/7 DWT) and another with the same parameters as the ones used in the JPEG2000 case (6 decomposition levels and the CDF-9/7 DWT).
The results show that, for the example image used, our algorithm is quite competitive with the current state-of-theart JPEG2000 codec, even when using the ibior-13/7 DWT and is vastly superior for very small rates using either DWT. In our example, DEBT constantly outperforms JPEG2000 when using the same number of decompositions (6) and both transforms, while being much faster.

Conclusions and further work
This paper proposes the use of progressive image compression and region of interest (ROI) for an RF underwater image sensor to be installed in a robotic platform. The operator can dynamically decide the size, quality, frame rate, and resolution of the received images so that the Fig. 16 Effect of ROI in PSNR: without ROI in black and with ROI in red Fig. 17 Comparison of compressed image with and without ROI-DEBT available bandwidth is utilized to its fullest potential and with the required minimum latency.
The system is capable of dynamically and precisely adjusting the image compression to either a predefined size or quality and it proved that it was capable of sending good quality images using 400-800 bytes. The frame rate is directly proportional to the channel capacity and can be precisely established by varying the size of the compressed images. Quality is enhanced by letting the operator specify a region of interest in the camera input, which means that more details can be observed in this specific part of the image, maintaining the final image size and the consumed network channel.
Related to the compression algorithm, which is explained in detail in the previous sections, results show that one core of a RPi3B can compress high-resolution full-HD images (1920 Â 1080) in very high quality. It can be seen from Table 5 that it can process close to 30 frames per second with a PSNR around 40 db and there are still three more cores to be used by other processes.
Currently, a vectorized and parallel implementation of the DEBT algorithm (https://tinyurl.com/y722ecbf), including a parallel wavelet transform implementation for the above-mentioned transforms, is being developed which should make it able to process very large images in real time on embedded multi-processor single board computers (SBC). Also, the use of a better PDF match for the DWT coefficients instead of the Laplace PDF is being implemented (work is being done on the exponential power distribution, also know as the generalized gaussian distribution of generalized normal distribution) which should improve the rate distortion curve of the compression allowing for an optimized stream in case the coefficients are really modeled by such a PDF. The RPi2B is a less powerful board for images of this dimension, but is still able to compress around 15 frames per second at the same 40 db PSNR using a single core. In this case, either the quality, frame size, or frame rate could be tuned so that the desired rate is achieved. Also, once the parallel version of both the DWT and DEBT are implemented, all cores could be used yielding a substantially faster compression rate.
As long as the ROI is a small region, there is not much difference in encoding times for using it, even though there is a small penalty to pay in compression efficiency for the whole image, as expected.
In summary, a specially designed progressive compression algorithm has been implemented so that it possesses both quality and resolution scalability, ROI, and is simple and fast enough with the goal of being usable in limited resource computers while producing compression comparable or better than current state-of-the-art compressors. The results show that it is quite competitive with state-of-the-art compression algorithms like JPEG2000 while being an order of magnitude faster.
Further work will concentrate on improving the communications physical layer to obtain communication distances around 5 m, according to the project needs. Also, the higher-level transport protocol will be enhanced by using congestion techniques that obtain a better use of the available bandwidth.