Haze Imaging Model

In computer vision and computer graphics, the model widely used to describe the formation of a haze image is as follows:

$$ \mathbf{I}(\mathbf{x})=\mathbf{J}(\mathbf{x}) t(\mathbf{x})+\mathbf{A}(1-t(\mathbf{x})) \tag{1} $$

where $\mathbf{I}$ is the observed intensity, $\mathbf{J}$ is the scene radiance, $\mathbf{A}$ is the global atmospheric light, and t is the medium trans- mission describing the portion of the light that is not scat- tered and reaches the camera. The goal of haze removal is to recover $\mathbf{J}, \mathbf{A}$, and t from $\mathbf{I}$.

Dark Channel Prior

The dark channel prior is based on the following observation on haze-free outdoor images: in most of the non-sky patches, at least one color channel has very low intensity at some pixels. In other words, the minimum intensity in such a patch should has a very low value. Formally, for an image $\mathbf{J}$, we define

$$ J^{d a r k}(\mathbf{x})=\min _{c \in\{r, g, b\}}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)\right) \tag{2} $$

where $J^c$ is a color channel of $\mathbf{J}$ and $\Omega(\mathbf{x})$ is a local patch centered at $\mathbf{x}$. Our observation says that except for the sky region, the intensity of $J^{dark}$ is low and tends to be zero, if $\mathbf{J}$ is a haze-free outdoor image. We call $J^{dark}$ the dark channel of $\mathbf{J}$, and we call the above statistical observation or knowledge the dark channel prior.

Estimating the Atmospheric Light

We can use the dark channel to improve the atmosppheric light estimation. We first pick the top 0.1% brightest pixels in the dark channel. These pixels are most haze-opaque. Among these pixels, the pixels with highest intensity in the input image $\mathbf{I}$ is selected as the atmospheric light.

Estimating the Transmission

We denote the patch’s transmission as $\tilde{t}(\mathbf{x})$. Taking the min operation in the local patch on the haze imaging Equation (1), we have:

$$ \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(I^{c}(\mathbf{y})\right)=\tilde{t}(\mathbf{x}) \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)+(1-\tilde{t}(\mathbf{x})) A^{c} \tag{3} $$

Notice that the min operation is performed on three color channels independently. This equation is equivalent to:

$$ \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)=\tilde{t}(\mathbf{x}) \min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)+(1-\tilde{t}(\mathbf{x})) \tag{4} $$

Then, we take the min operation among three color channels on the above equation and obtain:

$$ \min _{c}\left(\min _{c \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right)=\tilde{t}(\mathbf{x}) \min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)\right)+(1-\tilde{t}(\mathbf{x})) \tag{5} $$

According to the dark channel prior, the dark channel $J^{dark}$ of the haze-free radiance $\mathbf{J}$ should tend to be zero:

$$ J^{d a r k}(\mathbf{x})=\min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(J^{c}(\mathbf{y})\right)\right)=0 \tag{6} $$

As $A^c$ is always positive, this leads to:

$$ \min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{J^{c}(\mathbf{y})}{A^{c}}\right)\right)=0 \tag{7} $$

Putting Equation (7) into Equation (5), we can estimate the transmission t simply by:

$$ \tilde{t}(\mathbf{x})=1-\min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right) \tag{8} $$

If we remove the haze thoroughly, the image may seem unnatural and the feeling of depth may lost. So we can optionally keep a very small amount of haze for the distant objects by introducing a constant parameter $\omega (0<\omega\le 1)$ into Equation (8):

$$ \tilde{t}(\mathbf{x})=1-\omega \min _{c}\left(\min _{\mathbf{y} \in \Omega(\mathbf{x})}\left(\frac{I^{c}(\mathbf{y})}{A^{c}}\right)\right) \tag{9} $$

Recovering the Scene Radiance

With the transmission map, we can recover the scene radiance according to Equation (1). But the direct attenuation term $\mathbf{J}(\mathbf{x}) t(\mathbf{x})$ can be very close to zero when the transmission $t(\mathbf{x})$ is close to zero. The directly recovered scene radiance $\mathbf{J}$ is prone to noise. Therefore, we restrict the transmission $t(\mathbf{x})$ to a lower bound $t_0$, which means that a small certain amount of haze are preserved in very dense haze regions. The final scene radiance $\mathbf{J}(\mathbf{x})$ is recovered by:

$$ \mathbf{J}(\mathbf{x})=\frac{\mathbf{I}(\mathbf{x})-\mathbf{A}}{\max \left(t(\mathbf{x}), t_{0}\right)}+\mathbf{A} \tag{10} $$

A typical value of $t_0$ is 0.1.

Guided Image Filter

A general linear translation-variant filtering process, which involves a guidance image $I$, an input image $p$, and an output image $q$, can be defined as:

$$ q_{i}=\sum_{j} W_{i j}(I) p_{j} \tag{11} $$

where $i$ and $j$ are pixel indexes. The filter kernel $W_{ij}$ is a function of the guidance image $I$ and independent of $p$. This filter is linear with respect to $p$.

The key assumption of the guided filter is a local linear model between the guidance $I$ and the filter output $q$. We assume that $q$ is a linear transform of $I$ in a window $\omega_k$ centered at the pixel $k$, and define the guided filter as:

$$ q_{i}=a_{k} I_{i}+b_{k}, \forall i \in \omega_{k} \tag{12} $$

where $(a_k,b_k)$ are some linear coefficients assumed to be constant in $\omega_k$. We use a square window of a radius $r$. This local linear model ensures that $q$ has an edge only if $I$ has an edge, because $\nabla q=a \nabla I$.

To determine the linear coefficients, we seek a solution to (12) that minimizes the difference between $q$ and the filter input $p$. Specifically, we minimize the following cost function in the window:

$$ E\left(a_{k}, b_{k}\right)=\sum_{i \in \omega_{k}}\left(\left(a_{k} I_{i}+b_{k}-p_{i}\right)^{2}+\epsilon a_{k}^{2}\right) \tag{13} $$

Here $\epsilon$ is a regularization parameter preventing $a_k$ from being too large. The solution to optimization problem (13) can be given by linear regression:

$$ \begin{aligned} a_{k} &=\frac{\frac{1}{|\omega|} \sum_{i \in \omega_{k}} I_{i} p_{i}-\mu_{k} \bar{p}_{k}}{\sigma_{k}^{2}+\epsilon}\\ b_{k} &=\bar{p}_{k}-a_{k} \mu_{k} \end{aligned} \tag{14} $$

Here, $\mu_{k}$ and $\sigma_{k}^{2}$ are the mean and variance of $I$ in $\omega_{k}$, $|\omega|$ is the number of pixels in $\omega_{k}$, and $\bar{p}_{k}=\frac{1}{|\omega|} \sum_{i \in \omega_{k}} p_{i}$ is the mean of $p$ in $\omega_{k}$.

Next we apply the linear model to all local windows in the entire image. After computing $\left(a_{k}, \bar{b}_{k}\right)$ for all patches $\omega_k$ in the image, we compute the filter output by:

$$ q_{i}=\frac{1}{|\omega|} \sum_{k: i \in \omega_{k}}\left(a_{k} I_{i}+b_{k}\right)=\bar{a}_{i} I_{i}+\bar{b}_{i} \tag{15} $$

where $\bar{a}_{i}=\frac{1}{|\omega|} \sum_{k \in \omega_{i}}$ and $\bar{b}_{i}=\frac{1}{|\omega|} \sum_{k \in \omega_{i}} b_{k}$.\

The relationship among $I$, $p$, and $q$ given by (14) and (15) are indeed in the form of image filtering (11). In fact, $a_k$ in (14) can be rewritten as a weighted sum of $p$: $a_{k}=\sum_{j} A_{k j}(I) p_{j}$, where $A_{ij}$ are the weights only dependent on $I$. Similarly, we also have $b_{k}=\sum_{j} B_{k j}(I) p_{j}$ from (14) and $q_{i}=\sum_{j} W_{i j}(I) p_{j}$ from (15). It can be proven that the kernel weights can be explicitly expressed by:

$$ W_{i j}(I)=\frac{1}{|\omega|^{2}} \sum_{k:(i, j) \in \omega_{k}}\left(1+\frac{\left(I_{i}-\mu_{k}\right)\left(I_{j}-\mu_{k}\right)}{\sigma_{k}^{2}+\epsilon}\right) \tag{16} $$

Some further computations show that $\sum_{j} W_{i j}(I)=1$. No extra effort is needed to normalize the weights.

Last modification:December 5th, 2019 at 02:51 pm