Unsupervised neural network-based image restoration framework for pattern fidelity improvement and robust metrology

Zijian Du; Lingling Pu; Paul Wei; Rui Yuan; Jeeeon Kim; Jiaoying Tan

doi:10.1117/1.JMM.22.3.034201

21 August 2023 Unsupervised neural network-based image restoration framework for pattern fidelity improvement and robust metrology

Zijian Du, Lingling Pu, Paul Wei, Rui Yuan, Jeeeon Kim, Jiaoying Tan

Author Affiliations +

Journal of Micro/Nanopatterning, Materials, and Metrology, Vol. 22, Issue 3, 034201 (August 2023). https://doi.org/10.1117/1.JMM.22.3.034201

Abstract

Background

Scanning electron microscope (SEM) images acquired by E-beam tools for inspection and metrology applications are usually degraded by blurring and additive noises. Blurring sources include the intrinsic point spread function of optics, lens aberration, and potential motion blur caused by the wafer stage movements during the image acquisition process. Noise sources include shot noise, quantization noise, and electronic read-out noise. Image degradation caused by blurring and noise usually leads to noisy, inaccurate metrology results. For low-dosage metrology applications, metrology algorithms often fail to obtain successful measurements due to elevated levels of blurring and noise. Image restoration and enhancement are necessary as preprocessing steps to obtain meaningful metrology results. Initial success was obtained by applying neural network-based framework to drastically improve image quality and metrology precision as is demonstrated in the previous work.

Aim

We aim to provide more details on the neural network model architecture, model regularization, and training dynamics to better understand the model’s behavior. We also analyze the effect of image restoration on key metrology performances such as line edge roughness and mean critical dimension of the patterns.

Approach

Non-machine learning-based image quality enhancement methods fail to restore low-quality SEM images to a satisfactory degree. More recent convolutional neural networks and vision transformer-based, supervised deep learning models have achieved superior performance in various low-level image processing and computer vision tasks. Nevertheless, they require a huge amount of training data that contain high-quality ground truth images. Unfortunately, high-quality ground truth images for low-dosage SEM images do not exist. Instead, we use self-supervised U-Net combined with a fully connected network (FCN) to recover low-dosage images without the need for ground truth training images. The methodology can be applied to various one- and two-dimensional patterns with different scales, shapes, spatial density, and image intensity statistics. We use image quality metrics and loss function to guide model architecture optimization and study how regularization strength affects the restoration process. These studies provide a better understanding of how the model learns to restore images and how parameters and hyperparameters affect results.

Results

It is demonstrated that image quality metrics could be successfully used to evaluate self-supervised image restoration process and determine stopping criteria. The restored images show significantly improved image quality and metrology performance. Together, these pave the road to a systematic and automatic implementation of this methodology in real metrology applications.

Conclusions

A self-supervised U-Net–based model combined with FCN proved itself as a powerful tool to restore highly blurry and noisy low-dosage SEM images. It can be used to improve image quality, suppress metrology noise, and provide more robust measurements. It would become a crucial and necessary preprocessing step for metrology tasks as the E-beam dosage decreases and image quality worsens.

1. Introduction

The progressive scaling of transistors in semiconductor manufacturing pushes the limit of lithographic techniques capable of smaller feature patterning with feature sizes well below 7-nm nodes. Electron-beam imaging has been established as the technology of choice for in-line and off-line metrology tasks. Highly massive metrology results are necessary to evaluate the process quality of printed resist patterns. With the scaling of semiconductor manufacturing technology, requirements for metrology and tolerance for edge placement error (EPE) are becoming increasingly demanding. High accuracy, precision, repeatability, and fast turn-around time are desirable. As the patterns scale, a lower electron dosage is needed to avoid photoresist damage. Under such conditions, the scanning electron microscope (SEM) images contain an excessive amount of noise and blurring. Electronic, thermal, mechanical, and quantization noise all contribute to images with low signal-to-noise ratio (SNR). Furthermore, E-beam aberration as well as relative movement between the wafer stage and detector introduces optical and motion blur to the images and causes high uncertainty for the edge profile of printed resist patterns. These pose challenges to SEM image pre-processing algorithms for high-quality metrology tasks. In extreme conditions, the SEM images are simply unmeasurable with edge profile-based metrology algorithms. Hence, an effective image restoration methodology is crucial to turning unmeasurable SEM images metrology-ready and effectively improving metrology performance to meet demanding specifications for massive metrology. Below, we give a high-level review of existing technology for image quality (IQ) enhancement and restoration.

1.1.

Non-Deep Learning Based Image Restoration

Before machine learning (ML) became the dominant methodology for every field in science and technology, the problem of image restoration has already been extensively studied. Image deblurring methodology can be classified into non-blind deblurring and blind deblurring using non-deep learning (DL)-based architecture. The first category describes a flow that makes explicit assumptions on the function of the underlying blurring kernel and uses iterative optimization or Fourier domain inverse filtering to solve for restored image.¹^,² The second category does not explicitly assume a function for the kernel and usually describes a flow using a Bayesian approach. Usually, complex optimization algorithms are needed to solve for the solution.³ Both approaches are considered ill-posed inverse problems.

Most image restoration research models the image degradation process as a convolution of the underlying sharp image with degradation kernel plus additive noise term introduced by the SEM image acquisition process. As is shown in Eq. (1), $x$ represents latent sharp and clean image, $k$ models the kernel, $η$ models noise intrinsic to the SEM image, and $y$ is the captured SEM image. In a signal restoration process, the goal is to recover the underlying image $x$ . Figure 1 shows the described image degradation process Eq. (1):

Eq. (1)

y = k \otimes x + η .

Fig. 1

Illustration of the image degradation process.

Usually this is an ill-posed, highly under-constrained problem hence it is necessary to reduce the solution space to obtain meaningful solutions. Assumptions are usually made on noise form, degradation kernel, and sometimes on image itself. Another approach is to solve the maximum a posterior problem as in Eq. (2), in which the latent image has a conditional probability distribution density $P (x | y, k)$ given assumptions on kernel and degraded image. The proper choice of prior term $P (x)$ has been researched extensively to ensure the correct model. Among many proposed terms, the most widely used is $l_{p}$ norm and its variants. This is based on the observation that gradient intensity of natural image follows a Laplace distribution as in Eq. (3).⁴

Eq. (2)

P (x | y, k) = \frac{P (y | x, k) P (x)}{P (y)},

Eq. (3)

p^{Lap} (\nabla x) = \prod_{i} \frac{1}{2 b} \exp (- \frac{{‖ \nabla x_{i} ‖}_{1}}{b}) .

To recover the sharp image, many non-blind deblurring approaches try to solve the optimization problem with objective functions in the form of Eq. (4). Total variation (TV) prior of sparse image gradient as well as $l_{2}$ norm prior on image pixel intensity are popular choices.²^,⁵^,⁶ An improved regularization terms $l_{1} / l_{2}$ are used instead of $l_{1}$ or $l_{2}$ alone which both tend to over-regularize gradient while $l_{1}$ scaled by $l_{2}$ term has a better chance of converging to sharp restored image instead of smooth and blurry solution.⁴

Eq. (4)

\hat{x} = {\arg \max}_{x} \log p (y | x) + \log p (x) .

All the above-mentioned methodologies assume an initial kernel form then iteratively optimizes the kernel to converge to underlying true kernel using appropriate image prior as regularization. More recent research has been focusing on solving blind restoration task where kernel form is not assumed.⁷^,⁸ This is an even more challenging task since we want to model both kernel and latent image distribution given degraded image only. From a Bayesian perspective, a generative model is needed to model joint distribution of both kernel and latent image. It was pointed out that to successfully model $P (x, k | y)$ as in Eq. (5), we can never collect enough measurements because the number of unknowns grows exponentially with image size, hence instead of modeling ${MAP}_{k, x}$ , we can model ${MAP}_{k}$ which has much lower dimension and fewer unknown since kernel size is much smaller than image itself.⁹ After solving for ${MAP}_{k}$ , $x$ can then be solved by Fourier technique. In Ref. 10, instead of modeling image prior, the author modeled image gradients. It has been observed that image gradient obeys heavy-tailed distribution with most of its masses concentrated on small values but significantly higher probability at large values than Gaussian distribution, due to abrupt image intensity variations caused by strong image features such as edges and corners. The paper tried to solve restoration problem from a Bayesian approach and modeled latent image gradient as mixture of Gaussians and assumed exponential distribution for the kernel to induce sparsity and smoothness.¹⁰ As is shown in Eq. (6), the first term on the right-hand side is Gaussian term on gradient of degraded image, second term is mixture of Gaussians on restored image gradient, and third term is exponential function assumption on kernel. The authors tried to approximate the full posterior distribution $P (K, \nabla X | \nabla Y)$ then compute kernel with maximum marginal probability:

Eq. (5)

P (x, k | y) = \frac{P (y | x, k) P (x) P (k)}{P (y)},

Eq. (6)

P (K, \nabla X | \nabla Y) \propto P (\nabla Y | K, \nabla X) P (\nabla X) P (K) = \prod_{i} N (\nabla Y (i) | (K \otimes \nabla X (i)), σ^{2}) \times \prod_{i} \sum_{c = 1}^{C} π_{c} N (\nabla X (i) | 0, ν_{c}) \times \prod_{j} \sum_{d = 1}^{D} π_{d} E (K_{j} | λ_{d}) .

In real situations, blurring kernel can be arbitrary without specific form and this is especially true for random motion blur. Hence non-blind deblurring usually suffers in performance due to simplistic or simply wrong assumption of kernel. Another challenge is to apply optimization algorithm with appropriate regularization strength to converge to the desired solution. On the other hand, blind deblurring loosens strong prior term but requires more sophisticated variational inference algorithm. The two solutions contain intrinsic trade-off of over and under regularization. Next section will go through some typical optimization techniques used in non-blind deblurring tasks.

1.2.

Iterative Optimization for Non-Blind Deblurring

Non-blind deblurring usually resorts to optimization algorithms such as half quadratic splitting (HQS) or alternating direction method of multipliers.³^,⁵ A routine practice is to initialize degradation kernel as 2D Gaussian function and degraded image as initial latent image then iteratively solve for kernel and latent image alternatively while fixing the other term. Authors in Ref. 11 took advantage of the denoising effect of U-Net and used diluted convolutional layers to increase receptive fields of the convolutional neural networks (CNNs).¹² Such an optimization process can be formulated as Eqs. (7) and (8). Both methods optimize data fidelity subproblem in HQS using CNNs, as in Eq. 9(a) and solve for sharp image using inverse filtering. Equation 9(b) corresponds to Gaussian denoising on $x_{k}$ with noise level of $\sqrt{λ / μ}$ , hence any CNN inspired Gaussian denoiser can be plugged into the formula to solve Eq. 9(b). This methodology has been extended to solve other image restoration problems such as de-mosaicking and image super-resolution.¹¹ Prior regularization term is routinely added to optimization formulation to reduce solution choices and stabilize convergence to avoid noise amplification or trivial solution which are common results for its unregularized counterpart:

Eq. (7)

\hat{x} = {\arg \min}_{x} \frac{1}{2 σ^{2}} {‖ y - k \otimes x ‖}^{2} + λ R (z) s.t. z = x,

Eq. (8)

L_{μ} (x, z) = \frac{1}{2 σ^{2}} {‖ y - k \otimes x ‖}^{2} + λ R (z) + \frac{μ}{2} {‖ z - x ‖}^{2},

Eq. (9)

{\begin{cases} x_{k} = \arg \min_{x} {‖ y - k \otimes x ‖}^{2} + μ σ^{2} \cdot {‖ x - z_{- 1} ‖}^{2}, (a) \\ z_{k} = \arg \min_{z} \frac{1}{2 {(\sqrt{λ / μ})}^{2}} {‖ z - x_{k} ‖}^{2} + R (z) . (b) \end{cases}

1.3.

Deep Learning Based Image Restoration

Due to high heterogeneity of image statistics, a fixed prior form usually cannot model true image prior, and this leads to wrong model and unsatisfactory results. This problem led researchers to experiment with a data-driven approach to model prior term. In Refs. 6 and 13, the author used CNNs to model the prior of training images. In Ref. 14, the author generalized shrinkage fields by removing unnecessary parameter sharing and replacing pixel-wise applied shrinkage functions with CNN that applied to entire image. These approaches outperform the results obtained from hand-crafted prior term approaches, illustrating flexible modeling capability of CNNs.

To further utilize modeling capability of CNNs, end-to-end image restoration frameworks have been proposed and researched. In this framework, CNN is treated as black box and takes low quality image as input while outputs restored high quality images. Convolutional layer followed by element-wise rectified linear unit (ReLU) layer, sum of data fidelity term, and $l_{2}$ regularization as loss function was used.¹⁵^,¹⁶ Researchers used powerful generative adversarial network to restore image by optimizing adversarial information loss.¹⁷ Reference 18 used CNN as feature extraction module that extracts image features from degraded images then estimates kernel and latent image. Reference 19 used six-layers CNNs to learn gradient map and obtained kernel using Fourier transform and HQS to obtain restored image. CNN empowered approaches, when properly combined with physical model, can achieve state-of-the art results compared with methods that assume prior explicitly and can provide end-to-end solutions.²⁰^,²¹ However, the supervised method carries crucial pitfalls that make it undesirable for SEM image restoration task.

The trained model suffers from poor generalization to images and kernel forms it was not trained on. This limits the real-world applications of these approaches since realistic blurring kernels in SEM images could be arbitrary without fixed patterns.²²^,²³ Another crucial challenge in restoring low quality SEM images is the lack of ground truth images and supervised ML model does not work without ground truth. Therefore, it becomes obvious that a pre-training free, self-supervised, generative methodology that imposes just enough prior assumption is strongly preferred in SEM image restoration applications. In this paper, we present a thorough investigation of such a SEM image restoration methodology and detailed evaluation of its performance.

2. Dataset and Methodology

In this section, we describe in detail (1) SEM image dataset, (2) architecture of the neural networks, (3) IQ evaluation metrics, and (4) model regularization and convergence.

2.1.

SEM Image Dataset

SEM images were collected from 10 dies and 10 runs for each die and four images were captured for each run, resulting in 400 images. The images were captured using ASML HMI eP5 metrology and inspection machine. The patterns include AEI and ADI line space (LS) and contact holes (CH). Pixel size is 1 nm, the field of view is $2048 \times 2048$ , there are 19 line patterns in each LS image and 1024 CHs in each CH image. To create a dataset with improving image qualities, we used frame averaging to obtain 1, 2, 3, and 4 frame averaged images from the original datasets. Figure 2 summarizes the entire dataset.

Fig. 2

Low dosage SEM image dataset.

2.2.

Neural Network Architecture

The proposed image restoration framework is composed of generative networks $G_{k}$ which generates kernel and $G_{x}$ which generates latent image.

Generative network $G_{k}$ is used to generate blurring kernel $k$ , which has much lower dimension than latent image hence should be modeled using a lightweight fully connected network (FCN). The FCN takes a one-dimensional (1D) noise vector $z_{k}$ with 200 dimensions as input and a hidden layer of 1000 nodes and an output layer of $k^{2}$ nodes. To guarantee the non-negativity constraint, the SoftMax nonlinearity is applied to the output layer of $G_{k}$ . Finally, the 1D output of $k^{2}$ entries is reshaped to a 2D $k \times k$ blurring kernel. Table 1 shows the architecture details.

Table 1

Architecture of generative network Gk.

Input: kernel size $k \times k$ , $z_{k}$ (200) from the uniform distribution with fixed seed

Output: blur kernel $k$ with size $k \times k$

Hidden layer: linear (200, 1000); ReLU

Output layer: linear (1000,

k \times k

); SoftMax

Reshape 1D output to obtain 2D blur kernel with size

k \times k

Generative network $G_{x}$ is an asymmetric autoencoder with skip connections and is used to generate latent clean image.²⁴ The first five layers of encoder are skip-connected to the last five layers of decoder. Finally, a convolutional output layer is used to generate a latent clean image. Since the output image needs to have positive values, the Sigmoid nonlinearity is applied to the output layer. As is shown in Fig. 3, the $i$ ’th unit encoder–decoder architecture is illustrated. Taking $e_{i}$ as an example, we use the form $e_{i} (n_{f}, k, p)$ to represent that the convolution in $e_{i}$ has $n_{f}$ filters with kernel size $k \times k$ and $p \times p$ padding. The filter size in the last convolutional layer is fixed as $1 \times 1$ since we apply Sigmoid activation function to each pixel without any spatial averaging to avoid blurring the image. Down-sampling and up-sampling both have a stride of 2 and bilinear interpolation is used for up-sampling. The input dimension is $c_{in}$ and the filter number at $i$ ’th skip connection layer is proportional to the filter number at $i$ ’th encoder layer with ratio $α$ . This ratio should determine the proportion of the information $i$ ’th decoder layer obtains directly from $i$ ’th encoder layer. Kernel size is fixed at $3 \times 3$ and padding fixed at 1. Neural network architecture usually has major impacts on the results and in this case the restored image’s quality. We picked key architectural hyperparameters that include input dimension $c_{in}$ , filter number combination for each layer $(n_{1}, n_{2}, n_{3}, n_{4}, n_{5})$ , and the ratio of the filter number between skip layer and encoder layer $α$ . We experimented with a series of choices and chose the best combination guided by IQ metrics. Since SEM images are single channel gray level images, the input has single channel. The architecture details are presented in Table 2, and the illustration is shown in Fig. 4. The choice of model architecture is determined by factors, such as pattern complexity, computational latency, and restoration performance. For 1D patterns, $c_{in} = 8$ and filter number is (4, 4, 8, 16, 16), while for 2D patterns, $c_{in} = 32$ and filter number is (8, 8, 16, 32, 32). We also set $α = 0.5$ since it provides the best restored image while still maintaining enough pattern style from the low-dosage image.

Fig. 3

Schematic illustration of encoder, skip, and decoder layers.

Table 2

Architecture of generative network Gx.

Input: $z_{x}$ ( $c_{in} \times 2048 \times 2048$ ) From the uniform distribution with fixed seed.

Output: latent image $x (1 \times 2048 \times 2048)$ modeled by the neural network

Encoder unit 1

e_{1} (n_{1}, 3, 1), s_{1} (α n_{1}, 1, 1)

Encoder unit 2

e_{2} (n_{2}, 3, 1), s_{2} (n_{2}, 1, 1)

Encoder unit 3

e_{3} (n_{3}, 3, 1), s_{3} (α n_{3}, 1, 1)

Encoder unit 4

e_{4} (n_{4}, 3, 1), s_{4} (α n_{4}, 1, 1)

Encoder unit 5

e_{5} (n_{5}, 3, 1), s_{5} (α n_{5}, 1, 1)

Decoder unit 5

d_{5} (n_{5}, 3, 1)

Decoder unit 4

d_{4} (n_{4}, 3, 1)

Decoder unit 3

d_{3} (n_{3}, 3, 1)

Decoder unit 2

d_{2} (n_{2}, 3, 1)

Decoder unit 1

d_{1} (n_{1}, 3, 1)

Output layer Conv.(

n_{1}

, 1, 1); Sigmoid

Input dimension

c_{in}

{4, 8, 32}

Filter number (

(n_{1}, n_{2}, n_{3}, n_{4}, n_{5})

) (4, 4, 8, 16, 16), (8, 8, 16, 32, 32)

Filter number ratio

α

{0.25, 0.5, 0.75}

Fig. 4

Architecture of $G_{x}$ .

Image restoration network is composed of $G_{x}$ and $G_{k}$ , the latent image $z_{x}$ convolves with kernel $z_{k}$ , resulting in a blurred image, and we use its mean squared error (MSE) with low-quality SEM image $y$ as the image fidelity term. We formulate the image restoration into an unconstrained optimization problem. It can be seen from the introduction that appropriate regularization strength is crucial to obtaining satisfactory results, and we add TV regularization terms for both image and kernel and weigh the two terms by coefficients $λ_{x}$ and $λ_{k}$ , respectively, to encourage sparsity. The choice of $λ_{x}$ should be related to the noise level in image $y$ with larger $λ_{x}$ for a higher noise level to prevent overfitting the noise in the image. The choice of $λ_{k}$ should reflect the complexity of the kernel and since the kernel is supposed to be simple for SEM image, the regularization term could be fixed at a higher value. Our image restoration loss function is composed of the MSE term in addition to two TV regularization terms; the goal is to minimize the total loss, which can be written as Eq. (10):

Eq. (10)

\min_{G_{k}, G_{x}} {‖ G_{k} (z_{k}) \otimes G_{x} (z_{x}) - y ‖}^{2} + λ_{x} TV (G_{x} (z_{x})) + λ_{k} TV (G_{k} (z_{k})) .

Fig. 5

Schematic illustration of the two generative networks and training process.

The optimization process for Eq. (10) can be viewed as a “zero-shot,” self-supervised learning,²⁵ where both the generative networks are trained using only the low-quality SEM image and no ground-truth clean image. To optimize the networks, we adopt joint optimization which takes advantages of the automatic differentiation technique, the gradients w.r.t. $G_{k}$ and $G_{x}$ can be derived and the network parameters updated. The reason alternating minimization is not used is because this is an unconstrained problem, and it is easy to get stuck at saddle point due to the highly non-convex nature of loss function. We picked the ADAM optimization algorithm to simultaneously update $G_{k}$ and $G_{x}$ in one step due to its gradient and learning rate adaptive nature by using momentum.²⁶ Table 3 shows the pseudo code for the optimization process. Both the generative networks and the training process are schematically illustrated in Fig. 5.

Table 3

Pseudo code for join optimization.

Input: Low dosage SEM image $y$

Output: Blur kernel $k$ and clean image $x$

1: Sample

z_{k}

and

z_{x}

from uniform distribution with fixed seed

2: for $t = 1$ to $T$ do

3:

k = G_{k}^{t - 1} (z_{k})

4:

x = G_{x}^{t - 1} (z_{x})

5: Compute the gradients w.r.t.

G_{k}

and

G_{x}

6: Update

G_{k}^{t}

and

G_{k}^{t}

using ADAM algorithm

7: end for

8:

x = G_{x}^{T} (z_{x})

,

k = G_{k}^{T} (z_{k})

2.3.

Image Quality Evaluation Metrics

We describe several popular IQ evaluation metrics that will be used to evaluate the image restoration process and guide the learning process.

2.3.1.

Peak signal-to-noise ratio

Peak signal-to-noise ratio (PSNR) is calculated by Eq. (11), $m$ and $n$ are the image dimensions, and MAX is the maximum signal value, which is 255 for grayscale 8 bit SEM images. $X$ and $Y$ represent the pixel value matrix of the reference image and the image to be measured, respectively. High PSNR value indicates a low noise level in the measured image compared to the reference image. We picked this metric since we want to denoise low-dosage SEM images and hope to achieve high PSNR:

Eq. (11)

PSNR = 10 \cdot \log_{10} (\frac{{MAX}^{2}}{MSE}), MSE = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[X (i, j) - Y (i, j)]}^{2} .

2.3.2.

Structural similarity index measure

Structural similarity index measure (SSIM) is calculated by Eq. (12), $μ_{x}$ , $μ_{y}$ , $σ_{x}^{2}$ , $σ_{y}^{2}$ are mean and variance of image matrix $X$ and $Y$ , $σ_{x y}^{2}$ is the covariance term of the two, and $c_{1}$ and $c_{2}$ are two scalars to stabilize division numerically. SSIM measures the relative geometrical information similarity between test image w.r.t. reference image and the value ranges from 0 to 1. We picked this metric since we want to restore the underlying patterns from the low-dosage SEM image with high integrity. Since SSIM is sensitive to pattern shift, this metric helps to identify potential pattern misalignment which could cause issues in metrology:

Eq. (12)

SSIM (X, Y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})} .

2.3.3.

Pattern sharpness

Pattern sharpness (PS) is a measure to evaluate quality of the pattern edges. Figure 6 shows how this measure is calculated. It is calculated based on the gray-level profile extracted at pattern edge, and the value is in unit of nm. Smaller value means faster rise time in an edge slope, hence sharper edges. The $x$ -axis is pixel location and $y$ -axis is the gray level values. The calculated measure converts to a score in nm unit by multiplying the pixel size, which is 1 nm in this case. We picked this metric since we want the restoration framework to deblur or sharpen the image, especially at the pattern edges.

Fig. 6

Sharpness based on 25% and 75% levels between maximum and minimum values on the gray level profile.

Since we have 10 runs for each die and each run has our frames, we average these 40 images and use it as reference image when calculating SSIM and PSNR.

2.4.

Model Architecture

As computer vision tasks become more challenging over the years, the complexity of neural network architecture has increased drastically to model complex functions. The search space dimension for optimal hyperparameters and training parameters is so high that optimizing in a trial-and-error approach becomes infeasible. Neural architecture search has become a heated research area aiming to provide an automatic way to select optimized architecture.²⁷ This approach usually resorts to cross validation performance to guide automatic selection of architecture.

However, due to the lack of ground truth image and self-supervised nature of this methodology, using cross validation to optimize model architecture is infeasible.

Alternatively, since we want to restore the IQ as much as possible while keeping the intrinsic geometrical patterns undistorted, we use the IQ metrics introduced in previous sections to guide architecture selection. The neural network architecture under evaluation is in Table 4. To better balance underfitting/overfitting and maximize the application scenario for different pattern complexity, we picked a network of intermediate complexity. Table 5 shows how all IQ metrics improve significantly after restoration and reach comparable values as the four frames averaged image, suggesting successful restoration. The up-arrow means the higher the value, the better the IQ, and vice versa.

Table 4

Neural network architecture under evaluation.

Filter number

(n_{1}, n_{2}, n_{3}, n_{4}, n_{5})

(8, 8, 16, 32, 32)

Filter number ratio

α

0.5

Table 5

IQ metrics before and after restoration.

	Original 1-frame	Restored 1-frame	Four-frame avg.
PSNR ↑	11.88	13.84	12.66
SSIM ↑	0.099	0.245	0.260
Sharpness (nm) ↓	3.59	2.43	2.51

2.5.

Model Regularization and Convergence

In this section, we discuss how regularization affects convergence. We use AEI pattern for restoration, we fixed training parameters with input noise standard deviation of 0.01, a fixed learning rate of 0.002, and total 120 iterations. ADAM optimizer was used for all learning processes. It has been discussed previously that we can add prior on image or kernel as a form of regularization so the inverse optimization does not easily converge to trivial or random solutions. We explore how regularization strength on image and kernel TV gradient affects the convergence behaviors. We experiment with different regularization coefficient combinations to better understand how the strength of each regularization term affects convergence as well as learned kernel and restored image.

Figure 7 shows the loss function versus iterations for several selected regularization coefficient combinations to illustrate their effects on learned kernels, and the loss is plotted in log scale for better visualization. We use zero regularization (blue curve) as baseline condition and study the effects of regularization coefficients by varying them individually. The kernel snapshots are at iteration 20, 60, and 120, respectively. When $λ_{k}$ increases from 1 to 100 (orange to green), kernel becomes sparser and shows pixelation characteristics. This is due to the typical sparsity inducing effect of TV regularization. Nevertheless, over-regularization leads to failure of kernel convergence. When $λ_{k} = 1$ , we can observe the kernel becoming less noisy and sparse as iteration increases while still being able to converge to underlying blurring kernel characteristics. Note the kernel size is $21 \times 21$ and is artificially enlarged for better visualization. This experiment confirms the divergent nature of this inverse problem and the importance of choosing the appropriate regularization strength.

Fig. 7

Loss function versus iterations under different regularization strength combinations.

We further examine if IQ metrics could be used as a proxy for the quality of restored latent images. Figure 8 shows SSIM and PSNR under different image regularization strengths. When $λ_{x} = 20$ (over-regularized), the model converges to a trivial solution containing no pattern, this is because the over regularization strongly prefers image with little features such as edge and corner. This correlates well with low IQ metrics. Comparatively for $λ_{x} = 0.1$ (appropriately regularized), the model successfully restores low noise and sharp latent image with high structural similarity with reference image which correlates well with high IQ metrics. Hence, appropriate regularization strength is crucial to the successful restoration and IQ metrics, such as PSNR and SSIM, could be used to evaluate convergence and determine stopping criteria from an IQ perspective when proper cross-validation technique is not feasible.

Fig. 8

SSIM and PSNR with iterations under different regularization strength combinations.

3. Results and Analysis

This section analyzes line edge roughness (LER) and critical dimension (CD), which provides us insights as to how image restoration affects metrology results.

It has been established that CD histogram of the image after restoration shows a much smaller standard deviation and suppressed outliers compared to that of low-dosage image, and this was demonstrated in a previous conference paper.²⁸ This is because poor IQ leads to inaccurate CD measurements with higher standard deviation (1.676 nm) and is reduced after restoration (1.057 nm). The restoration also biases the mean CD value by only a small amount (47.08 nm before versus 46.78 nm after). These are metrology results of AEI patterns from Fig. 9.

Fig. 9

Unbiased LERs for left and right EPEs before and after image restoration.

We conduct LER characterization for left EPE and right EPE using power spectral density (PSD) estimation to further confirm our observations. We place measurement gauges along the direction of the line patterns with a 2 nm interval for both left and right edges, respectively, to obtain the EPEs, then we average the left and right LERs for all line patterns from all images to get a better estimation. The PSD formula is shown as Eq. (13). $f$ is the spatial frequency, $Δ d = 2 nm$ , $x_{n} = x_{0} + (n Δ d)$ is signals sampled at discrete positions for a total measurement of $N$ samples, we study $Re (PSD (f))$ since we care about spectral amplitude. Note that the actual PSD is achieved when $N$ approaches infinity and the expected value can be obtained accurately. In real world cases, the number of real measurement samples is finite hence averaging PSD over many trials is necessary to more accurately estimate the underlying physical process:²⁹^,³⁰

Eq. (13)

PSD (f) = \lim_{N \to \inf} \frac{{(Δ d)}^{2}}{(2 N + 1) Δ d} {| \sum_{n = - N}^{N} x_{n} e^{- i 2 π f n Δ d} |}^{2} .

Due to the stochastic nature of LER measurement, we break the variance into three parts as in Eq. (14).³¹^–³³ $σ_{intrinsic}$ refers to the stochasticity of LER profiles caused by process variation, $σ_{metrology}$ is the measurement uncertainty introduced by metrology algorithm and in our case influenced by IQ, $σ_{extrinsic}$ refers to extrinsic noise contributions from factors such as SEM shot noise, SEM tool stage movement, and beam profile. Unbiasing could be used to remove high frequency extrinsic noise and the results are shown in Fig. 9.³¹ By comparing the unbiased PSD spectrums in Fig. 9, we can see the middle frequency range was drastically attenuated after restoration. This is due to the reduction of metrology noise due to the improvement of IQ. This is in accordance with the observation that CD standard deviation reduces after restoration:

Eq. (14)

σ_{psd} = σ_{intrinsic} + σ_{metrology} + σ_{extrinsic} .

Hence, this image restoration method can be used to reduce metrology noise by improving IQ without introducing mean shift and expose intrinsic pattern edge stochasticity. This might be able to provide valuable process information if the underlying process of the patterns is known. The relationship between process and corresponding PSD characteristics is complicated and beyond the scope of this paper and not discussed here.

4. Conclusion

Effective restoration of low-quality SEM images is critical in future high-performance metrology applications as the user pushes for higher throughput and faster turn-around time. This paper introduced a new methodology based on a self-supervised, generative, neural network model. A huge advantage of this approach is that it does not require high-fidelity “ground truth” image for training, making it especially desirable for low-dosage metrology applications since such ground truth data are usually unavailable. Detailed description of model architecture and regularization was provided. It has been shown by applying the proposed framework IQ can be improved greatly while preserving intrinsic pattern geometry. CD precision, mean CD, and overall distribution confirm the effectiveness in metrology applications. Extension to 2D patterns is also promising, the image is transformed from a state of non-measurable to one that enables reliable metrology results.²⁸ PSD based LER analysis suggests that the restoration method could reduce metrology noise by improving IQ hence exposes intrinsic process-induced line edge profile stochasticity, which is of great value since process stochasticity becomes more prominent as the device feature keeps shrinking in size. We think more use cases of this restoration framework are yet to be discovered.

Code, Data, and Material Availability

The data utilized in this study were obtained from wafers manufactured by our customer, and they do not allow sharing their images freely. The source code that supports the finding of this article is not publicly available because the work is patented and is considered an internal IP for ASML. Nevertheless, to foster collaboration and research, the code could be requested by contacting the author at zijian.du@asml.com

Acknowledgments

This article is based on the conference proceeding paper in 2022 SPIE Advanced Lithography (Ref. 28) and is a detailed extension of it. The authors would like to acknowledge Dr. Rui Yuan for his technical help for conducting the LER analysis and general discussion of research ideas.

References

1.

L. B. Lucy, “An iterative technique for the rectification of observed distributions,” Astron. J., 79 (6), 745 https://doi.org/10.1086/111605 (1974). Google Scholar

2.

S. Boyd et al., “Distributed optimization and statistical learning via the alternating direction methods of multipliers,” Found. Trends Mach. Learn., 3 (1), 1 –122 https://doi.org/10.1561/2200000016 (2010). Google Scholar

3.

W. H. Richardson, “Bayesian-based iterative method of image reconstruction,” J. Opt. Soc. Am., 62 (1), 55 –59 https://doi.org/10.1364/JOSA.62.000055 JOSAAH 0030-3941 (1972). Google Scholar

4.

L. Xu, S. Zheng and J. Jia, “Unnatural

L_{0}

sparse representation for natural image deblurring,” http://www.cse.cuhk.edu.hk/leojia/projects/l0deblur/ (2013). Google Scholar

5.

A. Chakrabarti, “A neural approach to blind motion deblurring,” Lect. Notes Comput. Sci., 9907 221 –235 https://doi.org/974310.1007/978-3-319-46487-9_14 LNCSD9 0302- (2016). Google Scholar

6.

S. H. Chan et al., “An augmented Lagrangian method for total variation video restoration,” IEEE Trans. Image Process., 20 (11), 3097 –3111 https://doi.org/10.1109/TIP.2011.2158229 IIPRE4 1057-7149 (2011). Google Scholar

7.

D. Krishnan, T. Tay and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in CVPR, (2011). Google Scholar

8.

R. Wang and D. Tao, “Recent progress in image deblurring,” in SIGGRAPH Asia, (2013). Google Scholar

9.

C. J. Schuler et al., “A machine learning approach for non-blind image deconvolution,” in CVPR, (2013). Google Scholar

10.

A. Levin et al., “Understanding and evaluating blind deconvolution algorithms,” in CVPR, (2009). Google Scholar

11.

J. Kruse, C. Rother and U. Schmidt, “Learning to push the limits of efficient FFT-based image deconvolution,” in ICCV, (2017). Google Scholar

12.

K. Zhang et al., “Plug-and-play image restoration with deep denoiser prior,” in CVPR, (2019). Google Scholar

13.

K. Zhang et al., “Learning deep CNN denoiser prior for image restoration,” in CVPR, (2017). Google Scholar

14.

S. Xie et al., “Non-blind image deblurring method by the total variation deep network,” IEEE Access, 7 37536 –37544 https://doi.org/10.1109/ACCESS.2019.2891626 (2019). Google Scholar

15.

D. Gong et al., “Self-paced kernel estimation for robust blind image deblurring,” in ICCV, (2017). Google Scholar

16.

Y. Nan, Y. Quan and H. Ji, “Variational-EM-based deep learning for noise-blind image deblurring,” in CVPR, (2020). Google Scholar

17.

M. Hradis et al., “Convolutional neural networks for direct text deblurring,” in BMVC, (2015). Google Scholar

18.

O. Kupyn et al., “DeblurGAN: blind motion deblurring using conditional adversarial networks,” in CVPR, (2018). Google Scholar

19.

M. L. Green, “Statistics of images, the TV algorithm of Rudin-Osher-Fatemi for image denoising and an improved denoising algorithm,” https://ww3.math.ucla.edu/camreport/cam02-55.pdf Google Scholar

20.

C. J. Schuler et al., “Learning to deblur,” IEEE Trans. Pattern Anal. Mach. Intell., 38 (7), 1439 –1451 https://doi.org/10.1109/TPAMI.2015.2481418 ITPIDJ 0162-8828 (2016). Google Scholar

21.

X. Xu et al., “Motion blur kernel estimation via deep learning,” IEEE Trans. Image Process., 27 (1), 194 –205 https://doi.org/10.1109/TIP.2017.2753658 IIPRE4 1057-7149 (2018). Google Scholar

22.

R. Fergus et al., “Removing camera shake from a single photograph,” ACM Trans. Graphics, 25 (3), 787 –794 https://doi.org/10.1145/1141911.1141956 ATGRDF 0730-0301 (2006). Google Scholar

23.

A. Levin et al., “Efficient marginal likelihood optimization in blind deconvolution,” in CVPR, (2011). Google Scholar

24.

O. Ronneberger, P. Fischer, T. Brox, “U-Net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention ? MICCAI 2015, (2015). Google Scholar

25.

A. Shocher, N. Cohen and M. Irani, “Zero-shot super-resolution using deep internal learning,” in 2018 IEEE/CVF Conf. Comput. Vision and Pattern Recognit. (CVPR), 3118 –3126 (2018). https://doi.org/10.1109/CVPR.2018.00329 Google Scholar

26.

D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in ICLR, (2015). Google Scholar

27.

T. Elsken, J. H. Metzen and F. Hutter, “Neural architecture search: a survey,” J. Mach. Learn. Res., 20 1 –21 (2019). Google Scholar

28.

Z. Du et al., “Low dosage SEM image processing for metrology applications,” Proc. SPIE, 12053 1205309 https://doi.org/10.1117/12.2614281 PSISDG 0277-786X (2022). Google Scholar

29.

A. Hiraiwa and A. Nishida, “Spectral analysis of line edge and line-width roughness with long-range correlation,” J. Appl. Phys., 108 034908 https://doi.org/10.1063/1.3466777 JAPIAU 0021-8979 (2010). Google Scholar

30.

R. Bonam et al., “Comprehensive analysis of line-edge and line-width roughness for EUV lithography,” Proc. SPIE, 10143 101431A https://doi.org/10.1117/12.2258194 PSISDG 0277-786X (2017). Google Scholar

31.

L. Pu et al., “Analyze line roughness sources using power spectral density (PSD),” Proc. SPIE, 10959 109592W https://doi.org/10.1117/12.2516570 PSISDG 0277-786X (2019). Google Scholar

32.

C. A. Mack, “Reducing roughness in extreme ultraviolet lithography,” Proc. SPIE, 10450 104500P https://doi.org/10.1117/12.2281605 PSISDG 0277-786X (2017). Google Scholar

33.

A. Hiraiwa and A. Nishida, “Spectral analysis of line edge and line-width roughness with long-range correlation,” J. Appl. Phys., 108 034908 https://doi.org/10.1063/1.3466777 JAPIAU 0021-8979 (2010). Google Scholar

Biography

Zijian Du obtained his PhD in electrical engineering from Arizona State University in 2019 and joined ASML Silicon Valley as senior software engineer. His research interests include applying machine learning and deep learning based techniques to SEM image quality enhancement as well as contour-based edge placement error (EPE) massive-metrology applications for high volume manufacturing (HVM).

Biographies of the other authors are not available.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.

Citation Download Citation

Zijian Du, Lingling Pu, Paul Wei, Rui Yuan, Jeeeon Kim, and Jiaoying Tan "Unsupervised neural network-based image restoration framework for pattern fidelity improvement and robust metrology," Journal of Micro/Nanopatterning, Materials, and Metrology 22(3), 034201 (21 August 2023). https://doi.org/10.1117/1.JMM.22.3.034201

Received: 22 June 2022; Accepted: 1 August 2023; Published: 21 August 2023

Access the abstract

JOURNAL ARTICLE
14 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

KEYWORDS

Image restoration

Metrology

Scanning electron microscopy

Image processing

Image quality

Neural networks

Education and training