## Haze Imaging Model

In computer vision and computer graphics, the model widely used to describe the formation of a haze image is as follows:

$$\mathbf{I}(\mathbf{x})=\mathbf{J}(\mathbf{x}) t(\mathbf{x})+\mathbf{A}(1-t(\mathbf{x})) \tag{1}$$

where $\mathbf{I}$ is the observed intensity, $\mathbf{J}$ is the scene radiance, $\mathbf{A}$ is the global atmospheric light, and t is the medium trans- mission describing the portion of the light that is not scat- tered and reaches the camera. The goal of haze removal is to recover $\mathbf{J}, \mathbf{A}$, and t from $\mathbf{I}$.

## Dark Channel Prior

The dark channel prior is based on the following observation on haze-free outdoor images: in most of the non-sky patches, at least one color channel has very low intensity at some pixels. In other words, the minimum intensity in such a patch should has a very low value. Formally, for an image $\mathbf{J}$, we define

$$J^{d a r k}(\mathbf{x})=\min _{c \in\{r, g, b\}}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)\right) \tag{2}$$

where $J^c$ is a color channel of $\mathbf{J}$ and $\Omega(\mathbf{x})$ is a local patch centered at $\mathbf{x}$. Our observation says that except for the sky region, the intensity of $J^{dark}$ is low and tends to be zero, if $\mathbf{J}$ is a haze-free outdoor image. We call $J^{dark}$ the dark channel of $\mathbf{J}$, and we call the above statistical observation or knowledge the dark channel prior.

## Estimating the Atmospheric Light

We can use the dark channel to improve the atmosppheric light estimation. We first pick the top 0.1% brightest pixels in the dark channel. These pixels are most haze-opaque. Among these pixels, the pixels with highest intensity in the input image $\mathbf{I}$ is selected as the atmospheric light.

## Estimating the Transmission

We denote the patch’s transmission as $\tilde{t}(\mathbf{x})$. Taking the min operation in the local patch on the haze imaging Equation (1), we have:

$$\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(I^{c}(\mathbf{y})\right)=\tilde{t}(\mathbf{x}) \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)+(1-\tilde{t}(\mathbf{x})) A^{c} \tag{3}$$

Notice that the min operation is performed on three color channels independently. This equation is equivalent to:

$$\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)=\tilde{t}(\mathbf{x}) \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)+(1-\tilde{t}(\mathbf{x})) \tag{4}$$

Then, we take the min operation among three color channels on the above equation and obtain:

$$\min _{c}\left(\min _{c \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right)=\tilde{t}(\mathbf{x}) \min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)\right)+(1-\tilde{t}(\mathbf{x})) \tag{5}$$

According to the dark channel prior, the dark channel $J^{dark}$ of the haze-free radiance $\mathbf{J}$ should tend to be zero:

$$J^{d a r k}(\mathbf{x})=\min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)\right)=0 \tag{6}$$

As $A^c$ is always positive, this leads to:

$$\min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)\right)=0 \tag{7}$$

Putting Equation (7) into Equation (5), we can estimate the transmission t simply by:

$$\tilde{t}(\mathbf{x})=1-\min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right) \tag{8}$$

If we remove the haze thoroughly, the image may seem unnatural and the feeling of depth may lost. So we can optionally keep a very small amount of haze for the distant objects by introducing a constant parameter $\omega (0<\omega\le 1)$ into Equation (8):

$$\tilde{t}(\mathbf{x})=1-\omega \min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right) \tag{9}$$

## Recovering the Scene Radiance

With the transmission map, we can recover the scene radiance according to Equation (1). But the direct attenuation term $\mathbf{J}(\mathbf{x}) t(\mathbf{x})$ can be very close to zero when the transmission $t(\mathbf{x})$ is close to zero. The directly recovered scene radiance $\mathbf{J}$ is prone to noise. Therefore, we restrict the transmission $t(\mathbf{x})$ to a lower bound $t_0$, which means that a small certain amount of haze are preserved in very dense haze regions. The final scene radiance $\mathbf{J}(\mathbf{x})$ is recovered by:

$$\mathbf{J}(\mathbf{x})=\frac{\mathbf{I}(\mathbf{x})-\mathbf{A}}{\max \left(t(\mathbf{x}), t_{0}\right)}+\mathbf{A} \tag{10}$$

A typical value of $t_0$ is 0.1.

## Guided Image Filter

A general linear translation-variant filtering process, which involves a guidance image $I$, an input image $p$, and an output image $q$, can be defined as:

$$q_{i}=\sum_{j} W_{i j}(I) p_{j} \tag{11}$$

where $i$ and $j$ are pixel indexes. The filter kernel $W_{ij}$ is a function of the guidance image $I$ and independent of $p$. This filter is linear with respect to $p$.

The key assumption of the guided filter is a local linear model between the guidance $I$ and the filter output $q$. We assume that $q$ is a linear transform of $I$ in a window $\omega_k$ centered at the pixel $k$, and define the guided filter as:

$$q_{i}=a_{k} I_{i}+b_{k}, \forall i \in \omega_{k} \tag{12}$$

where $(a_k,b_k)$ are some linear coefficients assumed to be constant in $\omega_k$. We use a square window of a radius $r$. This local linear model ensures that $q$ has an edge only if $I$ has an edge, because $\nabla q=a \nabla I$.

To determine the linear coefficients, we seek a solution to (12) that minimizes the difference between $q$ and the filter input $p$. Specifically, we minimize the following cost function in the window:

$$E\left(a_{k}, b_{k}\right)=\sum_{i \in \omega_{k}}\left(\left(a_{k} I_{i}+b_{k}-p_{i}\right)^{2}+\epsilon a_{k}^{2}\right) \tag{13}$$

Here $\epsilon$ is a regularization parameter preventing $a_k$ from being too large. The solution to optimization problem (13) can be given by linear regression:

\begin{aligned} a_{k} &=\frac{\frac{1}{|\omega|} \sum_{i \in \omega_{k}} I_{i} p_{i}-\mu_{k} \bar{p}_{k}}{\sigma_{k}^{2}+\epsilon}\\ b_{k} &=\bar{p}_{k}-a_{k} \mu_{k} \end{aligned} \tag{14}

Here, $\mu_{k}$ and $\sigma_{k}^{2}$ are the mean and variance of $I$ in $\omega_{k}$, $|\omega|$ is the number of pixels in $\omega_{k}$, and $\bar{p}_{k}=\frac{1}{|\omega|} \sum_{i \in \omega_{k}} p_{i}$ is the mean of $p$ in $\omega_{k}$.

Next we apply the linear model to all local windows in the entire image. After computing $\left(a_{k}, \bar{b}_{k}\right)$ for all patches $\omega_k$ in the image, we compute the filter output by:

$$q_{i}=\frac{1}{|\omega|} \sum_{k: i \in \omega_{k}}\left(a_{k} I_{i}+b_{k}\right)=\bar{a}_{i} I_{i}+\bar{b}_{i} \tag{15}$$

where $\bar{a}_{i}=\frac{1}{|\omega|} \sum_{k \in \omega_{i}}$ and $\bar{b}_{i}=\frac{1}{|\omega|} \sum_{k \in \omega_{i}} b_{k}$.\

The relationship among $I$, $p$, and $q$ given by (14) and (15) are indeed in the form of image filtering (11). In fact, $a_k$ in (14) can be rewritten as a weighted sum of $p$: $a_{k}=\sum_{j} A_{k j}(I) p_{j}$, where $A_{ij}$ are the weights only dependent on $I$. Similarly, we also have $b_{k}=\sum_{j} B_{k j}(I) p_{j}$ from (14) and $q_{i}=\sum_{j} W_{i j}(I) p_{j}$ from (15). It can be proven that the kernel weights can be explicitly expressed by:

$$W_{i j}(I)=\frac{1}{|\omega|^{2}} \sum_{k:(i, j) \in \omega_{k}}\left(1+\frac{\left(I_{i}-\mu_{k}\right)\left(I_{j}-\mu_{k}\right)}{\sigma_{k}^{2}+\epsilon}\right) \tag{16}$$

Some further computations show that $\sum_{j} W_{i j}(I)=1$. No extra effort is needed to normalize the weights.

Last modification：December 5th, 2019 at 02:51 pm