P-GAN enables visualization of cellular structure from a single speckled image
The overall goal was to learn a mapping between the single speckled and averaged images (Fig.1b) using a paired training dataset. Inspired by the ability of traditional GAN networks to recover aspects of the cellular structure (Supplementary Fig.4), we sought to further improve upon these networks with P-GAN. In our network architecture (Supplementary Fig.2), the twin and the CNN discriminators were designed to ensure that the generator faithfully recovered both the local structural details of the individual cells as well as the overall global mosaic of the RPE cells. In addition, we incorporated a WFF strategy to the twin discriminator that concatenated features from different layers of the twin CNN with appropriate weights, facilitating effective comparisons and learning of the complex cellular structures and global patterns of the images.
P-GAN was successful in recovering the retinal cellular structure from the speckled images (Fig.1d and Supplementary Movie1). Toggling between the averaged RPE images (obtained by averaging 120 acquired AO-OCT volumes) and the P-GAN recovered images showed similarity in the cellular structure (Supplementary Movie2). Qualitatively, P-GAN showed better cell recovery capability than other competitive deep learning networks (U-Net41, GAN25, Pix2Pix30, CycleGAN31, medical image translation using GAN (MedGAN)42, and uncertainty guided progressive GAN (UP-GAN)43) (additional details about network architectures and training are shown in Other network architectures section in Supplementary Methods and Supplementary Table4, respectively) with clearer visualization of the dark cell centers and bright cell surroundings of the RPE cells (e.g., magenta arrows in Supplementary Fig.4 and Supplementary Movie3), possibly due to the twin discriminators similarity assessment. Notably, CycleGAN was able to generate some cells that were perceptually similar to the averaged images, but in certain areas, undesirable artifacts were introduced (e.g., the yellow circle in Supplementary Fig.4).
Quantitative comparison between P-GAN and the off-the-shelf networks (U-Net41,GAN25, Pix2Pix30, CycleGAN31, MedGAN42, and UP-GAN43) using objective performance metrics (PieAPP34, LPIPS35, DISTS36, and FID37) further corroborated our findings on the performance of P-GAN (Supplementary Table5). There was an average reduction of at least 16.8% in PieAPP and 7.3% in LPIPS for P-GAN compared to the other networks, indicating improved perceptual similarity of P-GAN recovered images with the averaged images. Likewise, P-GAN also achieved the best DISTS and FID scores among all networks, demonstrating better structural and textural correlations between the recovered and the ground truth averaged images. Overall, these results indicated that P-GAN outperformed existing AI-based methods and could be used to successfully recover cellular structure from speckled images.
Our preliminary explorations of the off-the-shelf GAN frameworks showed that these methods have the potential for recovering cellular structure and contrast but alone are insufficient to recover the fine local cellular details in extremely noisy conditions (Supplementary Fig.4). To further reveal and validate the contribution of the twin discriminator, we trained a series of intermediate models and observed the cell recovery outcomes. We began by training a conventional GAN, comprising of the generator, G, and the CNN discriminator, D2. Although GAN (G+D2) showed promising RPE visualization (Fig.2c) relative to the speckled images (Fig.2a), the individual cells were hard to discern in certain areas (yellow and orange arrows in Fig.2c). To improve the cellular visualization, we replaced D2 with the twin discriminator, D1. Indeed, a 7.7% reduction in DISTS was observed with clear improvements in the visualization of some of the cells (orange arrows in Fig.2c, d).
a Single speckled image compared to images of the RPE obtained via b average of 120 volumes (ground truth), c generator with the convolutional neural network (CNN) discriminator (G+D2), d generator with the twin discriminator (G+D1), e generator with CNN and twin discriminators without the weighted feature fusion (WFF) module (G+D2+D1-WFF), and f P-GAN. The yellow and orange arrows indicate cells that are better visualized using P-GAN compared to the intermediate models. gi Comparison of the recovery performance using deep image structure and texture similarity (DISTS), perceptual image error assessment through pairwise preference (PieAPP), and learned perceptual image patch similarity (LPIPS) metrics. The bar graphs indicate the average values of the metrics across sample size, n=5 healthy participants (shown in circles) for different methods. The error bars denote the standard deviation. Scale bar: 50m.
Having shown the outcomes of training D1 and D2 independently with G, we showed that combining both D1 and D2 with G (P-GAN) boosted the performance even further, evident in the improved values (lower scores implying better perceptual similarity) of the perceptual measures (Fig.2gi). For this combination of D1 and D2, we replaced the WFF block, which concatenated features from different layers of the twin CNN with appropriate weights, with global average pooling of the last convolutional layer (G+D2+D1-WFF). Without the WFF, the model did not adequately extract powerful discriminative features for similarity assessment and hence resulted in poor cell recovery performance. This was observed both qualitatively (yellow and orange arrows in Fig.2e, f) as well as quantitatively with the higher objective scores (indicating low perceptual similarity with ground truth averaged images) for G+D2+D1-WFF compared to P-GAN (Fig.2gi).
Taken together, this established that the CNN discriminator (D2) helped to ensure that recovered images were closer to the statistical distribution of the averaged images, while the twin discriminator (D1), working in conjunction with D2, ensured structural similarity of local cellular details between the recovered and the averaged images. The adversarial learning of G with D1 and D2 ensured that the recovered images not only have global similarity to the averaged images but also share nearly identical local features.
Finally, experimentation using different weighting configurations in WFF revealed that the fusion of the intermediate layers with weights of 0.2 with the last convolutional layer proved complementary in extracting shape and texture information for improved performance (Supplementary Tables2,3). These ablation experiments indicated that the global perceptual closeness (offered by D2) and the local feature similarity (offered by D1 and WFF) were both important for faithful cell recovery.
Given the relatively recent demonstration of RPE imaging using AO-OCT in 201612, and the long durations needed to generate these images, currently, there are no publicly available datasets for image analysis. Therefore, we acquired a small dataset using our custom-built AO-OCT imager13 consisting of seventeen retinal locations obtained by imaging up to four different retinal locations for each of the five participants (Supplementary Table1). To obtain this dataset, a total of 84h was needed (~2h for image acquisition followed by 82hours of data processing which included conversion of raw data to 3D volumes and correction for eye motion-induced artifacts). After performing traditional augmentation (horizontal flipping), this resulted in an initial dataset of only 136 speckled and averaged image pairs. However, considering that this and all other existing AO-OCT datasets that we are aware of are insufficient in size compared to the training datasets available for other imaging modalities44,45, it was not surprising that P-GAN trained on this initial dataset yielded very low objective perceptual similarity (indicated by the high scores of DISTS, PieAPP, LPIPS, and FID in Supplementary Table6) between the recovered and the averaged images.
To overcome this limitation, we leveraged the natural eye motion of the participants to augment the initial training dataset. The involuntary fixational eye movements, which are typically faster than the imaging speed of our AO-OCT system (1.6 volumes/s), resulted in two types of motion-induced artifacts. First, due to bulk tissue motion, a displacement of up to hundreds of cells between acquired volumes could be observed. This enabled us to create averaged images of different retinal locations containing slightly different cells within each image. Second, due to the point-scanning nature of the AO-OCT system compounded by the presence of continually occurring eye motion, each volume contained unique intra-frame distortions. The unique pattern of the shifts in the volumes was desirable for creating slightly different averaged images, without losing the fidelity of the cellular information (Supplementary Fig.3). By selecting a large number of distinct reference volumes onto which the remaining volumes were registered, we were able to create a dataset containing 2984 image pairs (22-fold augmentation compared to the initial limited dataset) which was further augmented by an additional factor of two using horizontal flipping, resulting in a final training dataset of 5996 image pairs for P-GAN (also described in Data for training and validating AI models in Methods). Using the augmented dataset for training P-GAN yielded high perceptual similarity of the recovered and the ground truth averaged images which was further corroborated by improved quantitative metrics (Supplementary Table6). By leveraging eye motion for data augmentation, we were able to obtain a sufficiently large training dataset from a recently introduced imaging technology to enable P-GAN to generalize well for never-seen experimental data (Supplementary Table1 and Experimental data for RPE assessment from the recovered images in Methods).
In addition to the structural and perceptual similarity that we demonstrated between P-GAN recovered and averaged images, here, we objectively assessed the degree to which cellular contrast was enhanced by P-GAN compared to averaged images and other AImethods. As expected, examination of the 2D power spectra of the images revealed a bright ring in the power spectra (indicative of the fundamental spatial frequency present within the healthy RPE mosaic arising from the regularly repeating pattern of individual RPE cells) for the recovered and averaged images (insets in Fig.3bi).
a Example specked image acquired from participant S1. Recovered images using b U-Net, c generative adversarial network (GAN), d Pix2Pix, e CycleGAN, f medical image translation using GAN (MedGAN), g uncertainty guided progressive GAN (UP-GAN), h parallel discriminator GAN (P-GAN). i Ground truth averaged image (obtained by averaging 120 adaptive optics optical coherence tomography (AO-OCT) volumes). Insets in (ai) show the corresponding 2D power spectra of the images. A bright ring representing the fundamental spatial frequency of the retinal pigment epithelial (RPE) cells can be observed in U-Net,GAN, Pix2Pix, CycleGAN, MedGAN, UP-GAN, P-GAN, and averaged images power spectrum corresponds to the cell spacing. j Circumferentially averaged power spectral density (PSD) for each of the images. A visible peak corresponding to the RPE cell spacing was observed for U-Net,GAN, Pix2Pix, CycleGAN, MedGAN, UP-GAN, P-GAN, and averaged images. The vertical line indicates the approximate location of the fundamental spatial frequency associated with the RPE cell spacing. The height of the peak (defined as peak distinctiveness (PD)) indicates the RPE cellular contrast measured as the difference in the log PSD between the peak and the local minima to the left of the peak (inset in (j)). Scale bar: 50m.
Interestingly, although this ring was not readily apparent on the speckled single image (inset in Fig.3a), it was present in all the recovered images, reinforcing our observation of the potential of AI to decipher the true pattern of the RPE mosaic from the speckled images. Furthermore, the radius of the ring, representative of the approximate cell spacing (computed from the peak frequency of the circumferentially averaged PSD) (Quantification of cell spacing and contrast in Methods), showed consistency among the different methods (shown by the black vertical line along the peak of the circumferentially averaged PSD in Fig.3j and Table1), indicating high fidelity of recovered cells in comparison to the averaged images.
The height of the local peak of the circumferentially averaged power spectra (which we defined as peak distinctiveness) provided an opportunity to objectively quantify the degree to which cellular contrast was enhanced. Among the different AI methods, the peak distinctiveness achieved by P-GAN was closest to the averaged images with a minimal absolute error of 0.08 compared to ~0.16 for the other methods (Table1), which agrees with our earlier results indicating the improved performance of P-GAN. In particular, P-GAN achieved a contrast enhancement of 3.54-fold over the speckled images (0.46 for P-GAN compared with 0.13 for the speckled images). These observations demonstrate P-GANs effectiveness in boosting cellular contrast in addition to structural and perceptual similarity.
Having demonstrated the efficacy and reliability of P-GAN on test data, we wanted to evaluate the performance of P-GAN on experimental data from never-seen human eyes across an experimental dataset (Supplementary Table1), which to the best of our knowledge, covered the largest extent of AO-OCT imaged RPE cells reported (63 overlapping locations per eye). This feat was made possible using the AI-enhanced AO-OCT approach developed and validated in this paper. Using the P-GAN approach, in our hands, it took 30min of time (including time needed for rest breaks) to acquire single volume acquisitions from 63 separate retinal locations compared to only 4 non-overlapping locations imaged with nearly the same duration using the repeated averaging process (15.8-fold increase in number of locations). Scaling up the averaging approach from 4 to 63 locations would have required nearly 6h to acquire the same amount of RPE data (note that this does not include any data processing time), which is not readily achievable in clinical practice. This fundamental limitation explains why AO-OCT RPE imaging is currently performed only on a small number of retinal locations12,13.
Leveraging P-GANs ability to successfully recover cellular structures from never-seen experimental data, we stitched together overlapping recovered RPE images to construct montages of the RPE mosaic (Fig.4 and Supplementary Fig.5). To further validate the accuracy of the recovered RPE images, we also created ground truth averaged images by acquiring 120 volumes from four of these locations per eye (12 locations total) (Experimental data for RPE assessment from the recovered images in Methods). The AI-enhanced and averaged images for the experimental data at the 12 locations were similar in appearance (Supplementary Fig.6). Objective assessment using PieAPP, DISTS, LPIPS, and FID also showed good agreement with the averaged images (shown by comparable objective scores for experimental data in Supplementary Table7 and test data in Supplementary Table5) at these locations, confirming our previous results and illustrating the reliability of performing RPE recovery for other non-seen locations as well (P-GAN was trained using images obtained from up to 4 retinal locations across all participants). The cell spacing estimated using the circumferentially averaged PSD between the recovered and the averaged images (Supplementary Fig.7 and Supplementary Table8) at the 12 locations showed an error of 0.61.1m (meanSD). We further compared the RPE cell spacing from the montages of the recovered RPE from the three participants (S2, S6, and S7) with the previously published in vivo studies (obtained using different imaging modalities) and histological values (Fig.5)12,46,47,48,49,50,51. Considering the range of values in Fig.5, the metric exhibited inter-participant variability, with cell spacing varying up to 0.5m across participants at any given retinal location. Nevertheless, overall our measurements were within the expected range compared to the published normative data12,46,47,48,49,50,51. Finally, peak distinctiveness computed at 12 retinal locations of the montages demonstrated similar or better performance of P-GAN compared to the averaged images in improving the cellular contrast (Supplementary Table8).
The image shows the visualization of the RPE mosaic using the P-GAN recovered images (this montage was manually constructed from up to 63 overlapping recovered RPE images from the left eye of participant S2). The white squares (ae) indicate regions that are further magnified for better visualization at retinal locations a 0.3mm, b 0.8mm, c 1.3mm, d 1.7mm, and e 2.4mm temporal to the fovea, respectively. Additional examples of montages from two additional participants are shown in Supplementary Fig.5.
Symbols in black indicate cell spacing estimated from P-GAN recovered images for three participants (S2, S6, and S7) at different retinal locations. For comparison, data in gray denote the mean and standard deviation values from previously published studies (adaptive optics infrared autofluorescence (AO-IRAF)48, adaptive optics optical coherence tomography (AO-OCT)12, adaptive optics with short-wavelength autofluorescence (AO-SWAF)49, and histology46,51).
Voronoi analysis performed on P-GAN and averaged images at 12 locations (Supplementary Fig.8) resulted in similar shapes and sizes of the Voronoi neighborhoods. Cell spacing computed from the Voronoi analysis (Supplementary Table9) fell within the expected ranges and showed an average error of 0.50.9m. These experimental results demonstrate the possibility of using AI to transform the way in which AO-OCT is used to visualize and quantitatively assess the contiguous RPE mosaic across different retinal locations directly in the living human eye.
The rest is here:
Revealing speckle obscured living human retinal cells with artificial intelligence assisted adaptive optics optical ... - Nature.com
Read More..