arXiv:1411.0326v2 [cs.CV] 3 Jun 2015 ar Florea Laura S Address Romania, Bucharest, of ”Politehnica” University Laboratory Analysis and Processing Image osatnVertan Constantin S Address Romania, Bucharest, of ”Politehnica” University Laboratory Analysis and Processing Image nprle,LgrtmcIaePoesn LP models (LIP) Processing Image Logarithmic parallel, regular on In viewable is that printers. image and displays HDR a and acquired into consecutive typically combined scenes, are exposures HDR different Imaging, with acquire frames To (HDR) Range popularity. Dynamic gained col- High techniques, named processing cate- and lectively a acquisition range, image captur- dynamic of in lightness gory large cameras with digital scenes of real ing limitation the by Motivated Introduction 1. S Address Romania, Bucharest, of ”Politehnica” University Laboratory Analysis and Processing Image onluFlorea Corneliu ssont rvd ihqaiyi ohsubjec- both evaluations. in objective quality and tive high algorithm provide HDR to shown resulting is The irradi- fusion. standard map to equivalent ance is model LTIP under the the merging possi- for performing problem, is that framework imaging namely it (HDR) unifying Range that Dynamic an Based High show present we to functions. ble similarity, polynomial this of on ratio function a logarithmic with the deriva- replaces (LIP), a further Processing which Image is Logarithmic LTIP the of (HVS). Hu- tion System the of Visual model man Naka-Rushton the and (LTIP) model Processing between Image similarity Logarithmic-Type a the emphasize we paper this In Abstract yPreta oaihi xoueMerging Exposure Logarithmic Perceptual by ihDnmcRneImaging Range Dynamic High lilIndependent¸ei 313 plaiul lilIndependent¸ei 313 plaiul lilIndependent¸ei 313 plaiul rtdb h ooopi hoyadte aeacone a have they and ( structure theory space homomorphic the by erated a dpain ewl hwi hspprta h new the that paper ( this in lo- in introduced the model show of extension will LIP modelling We further adaptation. by cal followed incident are of and absorption energy photoreceptor of Naka- the equation from Rushton extracted are models (HVS) System Visual h acd ftotasitn les( filters processing transmitting from two image modelled of initially cascade to While the alternative operations. based an real with as introduced were oe rdcsanwagrtmta ed oqualitative to leads that algorithm results. new a produces model ( accepted ( perception law Weber-Fechner the with compatible be to shown 1987 sasmd hrl,teritrrtto fteexposure the of ( reinterpretation algorithm the merging contrast- Thirdly, two framework model treat assumed. LTIP to the is if possible previous unitary is approaches the HDR it on ing that based show Secondly, we finding, perception. com- global is eye the model of model LTIP Naka-Rushton/Michaelis-Menten we introduced with patible Firstly, previously the contributions. that three show claims re- paper camera current the The perception. lightness both eye inverse, human the its and by function it sponse and thus itself domain; by image compatible mimics, eye generative human the energy into radiometric case, domain the transfers a model LTIP such the of In function Image Type model. Logarithmic (LTIP) named be Processing will it a hence uses logarithmic-li function, a longer only no but model function generative The logarithmic described model. as Naka-Rushton perception the human by global the with consistent ,ltri a hw htteLPmdl a egen- be can models LIP the that shown was it later ), ioiadDebayle and Pinoli Stevens ege al. et Deng , 1961 etn tal. et Mertens CONSTANTIN , .Crety otgoa Human global most Currently, ). 2007 CORNELIU , 1995 LAURA ,wihi o unanimously not is which ), .Teiiilmdlwas model initial The ). , 2007 etne al. et Vertan . . . VERTAN FLOREA FLOREA ori n Pinoli and Jourlin ne h LTIP the under ) @ @ @ UPB UPB UPB , 2008 . . . RO RO RO is ) ke , HDR by Perceptual Merging

The paper is constructed as follows: in Section 2 we present and to the book by Banterle et al.(2011). a short overview of the existing HDR trends and we em- Among other TMO attempts, a notable one was proposed phasize their correspondence with human perception. In by Reinhard et al. (2002) which, inspired by Ansel Adams’ Section 3, state of the art results in the LIP framework and Zone System, firstly applied a logarithmic scaling to mimic the usage of the newly introduced LTIP framework for the the exposure setting of the camera, followed by dodging- generation of models compatible with human perception and-burning (selectively and artificially increase and de- are discussed. In Section 4 we derive and motivate the sub- crease image values for better contrast) for the actual com- mitted HDR imaging, such that in Section 5 we discuss im- pression. Durand et al. (2002) separated, by means plementation details and achieved results, ending the paper of a bilateral filter, the HDR irradiance map into a base with discussion and conclusions. layer that encoded large scale variations (thus, needing range compression) and into a detail preserving layer to 2. Related work form an approximation of the image pyramid. Fattal et al. (2002) attenuated the magnitude of large gradients The typical acquisition of a image based on a Poisson equation. Drago et al. (2003) imple- relies on the “Wyckoff Principle”, that is differently ex- mented a logarithmic compression of luminance values that posed images of the same scene capture different informa- matches the HVS. Krawczyk et al. (2005) implemented tion due to the differences in exposure (Mann and Picard, the Gestalt based anchoring theory of Gilchrist Gilchrist 1995). Bracketing techniques are used in practice to ac- et al. (1999) to divide the image in frameworks and per- quire pictures of the same subject but with consecutive ex- formed range compression by ensuring that frameworks are posure values. These pictures are, then, fused to create the well-preserved. Banterle et al. (2012) segmented the im- HDR image. age into luminance components and applied independently For the fusion step two directions are envisaged. The the TMOs introduced in Drago et al. (Drago et al., 2003) first direction, named irradiance fusion, acknowledges that and in Reinhard et al. (Reinhard et al., 2005) for further the camera recorded frames are non-linearly related to the adaptive fusion based on previously found areas. Ferradans scene reflectance and, thus, it relies on the irradiance maps et al. (2012) proposed an elaborated model of the global retrieval from the acquired frames, by inverting the camera HVS response and pursued local adaptation with an itera- response function (CRF), followed by fusion in the irradi- tive variational algorithm. ance domain. The fused irradiance map is compressed via Yet, as irradiance maps are altered with respect to the re- a tone mapping operator (TMO) into a displayable low dy- ality by the camera optical systems, additional constraints namic range (LDR) image. The second direction, called are required for a perfect match with the HVS. Hence, this exposure fusion, aims at simplicity and directly combines category of methods, while being theoretically closer to the acquired frames into the final image. A short compari- the pure perceptual approach, requires supplementary and son between these two is presented in Table 1 and detailed costly constraints and significant computational resources in further paragraphs. for the CRF estimation and for the TMO implementation. 2.1. Irradiance fusion 2.2. Exposure merging Originating in the work of Debevec and Malik (1997), the Noting the high computational cost of the irradiance maps schematic of the irradiance fusion may be followed in Fig. fusion, Mertens et al. (2007) proposed to implement the 1 (a). Many approaches were schemed for determining the fusion directly in the image domain; this approach is de- CRF (Grossberg and Nayar, 2004). We note that the dom- scribed in Fig. 1 (b). The method was further improved for inant shape is that of a gamma function (Mann and Mann, robustness to ghosting artifacts and details preservation in 2001), trait required by the compatibility with the HVS. HDR composition by Pece and Kautz (2010). After reverting the CRF, the irradiance maps are combined, Other developments addressed the method of computing typically by a convex combination (Debevec and Malik, local contrast to preserve edges and local high dynamic 1997),(Robertson et al., 1999). For proper displaying, a range. Another expansion has been introduced by Zhang tone mapping operator (TMO) is then applied on the HDR et al. (2012), who used the direction of gradient in a par- irradiance map to ensure that in the compression process tial derivatives type of framework and two local quality all image details are preserved. For this last step, follow- measures to achieve local optimality in the fusion process. ing Ward’s proposal (Ward et al., 1997), typical approaches Bruce (2014) replaced the contrast computed by Mertens et adopt a HVS-inspired function for domain compression, al. (2007) onto a Laplacian pyramid with the entropy calcu- followed by local contrast enhancement. For a survey of lated in a flat circular neighborhood for deducing weights the TMOs we refer to the paper of Ferradans et al. (2012) that maximize the local contrast. HDR by Perceptual Merging

Table 1. Comparison of the two main approaches to the HDR problem. Method CRF recovery Fused Components Fusion Method Perceptual Irradiance fusion Weighted Yes Irradiance Maps Yes (via TMO) (Debevec and Malik, 1997) convex combination Exposure fusion Weighted No Acquired Frames No (Mertens et al., 2007) convex combination

(a) (b)

Figure 1. HDR imaging techniques: (a) irradiance maps fusion as described in Debevec and Malik (1997) and (b) exposure fusion as described in Mertens et al.(2007). Irradiance map fusion relies on inverting the Camera Response Function (CRF) in order to return to the irradiance domain, while the exposure fusion works directly in the image domain, and thus avoiding the CRF reversal.

The exposure fusion method is the inspiration source for plications for the classical LIP model is presented in many commercial applications. Yet, in such cases, expo- (Pinoli and Debayle, 2007). In parallel, other logarith- sure fusion is followed by further processing that increases mic models and logarithmic-like models were reported. the visual impact of the final image. The post–processing In this particular work we are mainly interested in the includes contrasting, dodging-and-burning, edge sharp- logarithmic-like model introduced by Vertan et al. (2008), ening, all merged and tuned in order to produce a which has a cone space structure and is named LTIP surreal/fantasy-like aspect of the final image. model. A summary of existing models may be followed in (Navarro et al., 2013). Recently parametric extensions While being sensibly faster, the exposure fusion is not of the LTIP models were also introduced (Panetta et al., physically motivated, nor perceptually inspired. However, 2011),(Florea and Florea, 2013). The LTIP models are while the academic world tends to favor perceptual ap- summarized in Table 2. proaches as they lead to images that are correct from a per- ceptual point of view, the consumer world naturally tends to favor images that are photographically more spectacular 3.1. Relation between LIP models and HVS and the exposure merging solution pursuits this direction. From its introductionin the 80s, the original LIP model had a strong argument being similar with the Weber-Fechner 3. Logarithmic Type Image Processing law of contrast perception. This similarity was thor- oughly discussed in (Pinoli and Debayle, 2007), where it Typically, image processing operations are performed us- was shown that logarithmic subtraction models the incre- ing real-based algebra, which proves its limitations under ment of sensation caused by the increment of with specific circumstances, like upper range overflow. To deal the quantity existing in the subtraction. Yet the logarithmic with such situations, non-linear techniques have been de- model of the global perceived luminance contrast system veloped (Markovic and Jukic, 2013). Such examples are assumed by the Weber-Fechner model was vigorously chal- the LIP models. The first LIP model was constructed by lenged (Stevens, 1961) and arguments hinted to the power- Jourlin and Pinoli (Jourlin and Pinoli, 1987) starting from law rules (Stevens and Stevens, 1963). Thus, we note that the equation of light passing through transmitting filters. the Stevens model is more inline with the LTIP model. On The LIP model was further developed into a robust math- the other hand, Stevens experiments were also questioned ematical structure, namely a cone/vector space. Sub- (Macmillan and Creelman, 2005), so it doesnotseem to be sequently many practical applications have been pre- a definite answer in this regard. sented and an extensive review of advances and ap- Still, lately, the evidence seems to favor the Naka- HDR by Perceptual Merging

Table 2. The classical LIP model introduced by Jourlin and Pinolli, the logarithmic type (LTIP) model with the basic operations and the parametric extension of the LTIP model. D is the upper bound of the image definition set (typically D = 255 for unsigned int representation or D = 1 for float image representation). Scalar Addition Model Domain Isomorphism multiplication u ⊕ v α ⊗ u D −∞ − D uv − − u α LIP φ = ( ; D] Φ(x)= D log D−x u + v + D D D 1 D − − LTIP D x x − (1 u)(1 v) αu φ = [0; 1) Φ( )= 1−x 1 1−uv 1+(α−1)u  m − m − m Parametric x m (1 u )(1 v ) m α D = [0; 1) Φ (x)= m 1 − m m u m LTIP φ m 1−x 1−u v 1+(α−1)u q q

x Rushton/Michaelis-Menten model of retinal adaptation ΦV (x)= 1−x while the inverse is: (Ferradans et al., 2012), thus an important class of TMO − y techniques following this model for the global adapta- Φ 1(y)= (3) tion step. The Naka-Rushton equation is a particular case V y +1 of the Michaelis-Menten model that expresses the hyper- bolic relationship between the initial velocity and the sub- The inverse function (Eq. (3) ) mimics the Naka-Rushton strate concentration in a number of enzyme-catalyzed re- model–Eq. (2), with the differencethat instead of the semi- actions. Such a process is the change of the electric saturation, IS , as in the original model, it uses full satura- potential of a photoreceptor (e.g. the eye cones) mem- tion. Given this observation, we interpreted logarithmic- brane, r(I) due to the absorption of light of intensity like model as the mapping of the irradiance intensity I. The generic form, called Michaelis-Menten equation (which are defined over the real number set) onto photore- (Valeton and van Norren, 1983) is: ceptor acquired intensities, i.e human observable chromatic intensity. While the logarithmic-like model is only similar ∆V (I) In r(I)= = (1) and not identical with the Naka-Rushton model of the hu- In In ∆Vmax + S man eye, it has the strong advantage of creating a rigorous mathematical framework of a cone space. where ∆Vmax is the maximum difference of potential that n can be generated, IS is the light level at which the photore- ceptor response is half maximal (semisaturation level) and 3.2. Relation between the LIP models and CRF n is a constant. Valeton and van Norren (1983) determined The dominant non-linear transformation in the camera that n = 0.74 for rhesus monkey. If n = 1 the Naka- pipe-line is the gamma adjustment necessary to adapt the Rushton equation (Naka and Rushton, 1966) is retrieved as image to the non-linearity of the display and respectively a particular case of the Michaelis-Menten model: of the human eye. The entire pipeline is described by the ∆V (I) I Camera Response Function (CRF) which, typically, has a r(I)= = (2) gamma shape (Grossberg and Nayar, 2004). ∆Vmax I + IS It was previously pointed to the similarity between the For the TMO application, it is considered that the electric LTIP generative function and the CRF (Florea and Florea, voltage in the right is a good approximation of the per- 2013). To show the actual relation between the LTIP gener- ceived brightness (Ferradans et al., 2012). Also, it is not ative function and the CRF, we considered the Database of uncommon (Meylan et al., 2007) to depart from the initial Response Functions (DoRF) (Grossberg and Nayar, 2004) meaning of semisaturation for IS (the average light reach- which consists of 200 recorded response functions of dig- ing the light field) and to replace it with a convenient cho- ital still cameras and analogue photographic films. These sen constant. TMOs that aimed to mimic the Naka-Rushton functions are shown in Fig. 2 (a); to emphasize the rela- model (Reinhard et al., 2002),(Tamburino et al., 2008) as- tion, in subplot (b) of the same figure we represented only sumed that the HDR map input was I and obtained the the LTIP generative function and the average CRF. As one output as r(I). may see, while the LTIP generative function is not identi- cal to the average CRF, there do exist camera and films that On the other hand, the generativefunction of the LIP model have a response function identical to the LTIP generative maps the image domain onto the real number set. The in- function. verse function acts as a homomorphism between the real number set and the closed space that defines the domain To improve the contrast and the overall appearance of the of LIP. For the LTIP model, the generative function is image, some camera models add an S-shaped tone mapping HDR by Perceptual Merging

1 1 will obtain: 0.9 0.9

0.8 0.8 0.7 0.7 f 1 N f E 0.6 0.6 · HDR(k,l)= g η i=1 w ( i(k,l)) i(k,l) 0.5 0.5 Intensity Intensity 0.4 0.4 1 ⊗ N f · E  0.3 0.3 = η g i=1Pw ( i(k,l)) i(k,l) 0.2 0.2 average camera CRF 0.1 0.1 LTIP generator = 1 ⊗ ⊕  N g (w (f (k,l)) · E (k,l)) (8) 0 0 P i i 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 η i=1 Irradiance Irradiance 1 ⊗ ⊕ N w f k,l ⊗ g E k,l  (a) (b) = η Pi=1 ( ( ( ))) ( i( )) 1 ⊗ ⊕ N f ⊗ f  = η P i=1 (w ( i(k,l))) i(k,l) Φ Figure 2. The relation between the LTIP generative function V   and camera response functions as recorded in DoRF database: (a) where ⊗ and ⊕ areP the LTIP operations shown in Table 2, full database and (b) average CRF (black) with respect to the LTIP while ⊕ N u stands for: function (green). i=1 i   P N that no longer follows the Naka-Rushton model. In such a ⊕ ui = u1 ⊕ u2 ⊕···⊕ uN i=1 ! case, a symmetrical LTIP model, such as the one described X in (Navarro et al. 2013) has greater potential to lead to better results. Eq. (8) shows that one may avoid the conversion of the input images to irradiance maps, as the HDR image may be 4. HDR by Perceptual Exposure Merging simply computed using additions and scalar multiplications in the logarithmic domain. Furthermore, we accentuate that Once that camera response function, g, has been found, the Eq. (8), if written with real-based operations, matches the acquired images fi are turned into irradiance maps, Ei by exposure fusion introduced in (Mertens et al., 2007); yet (Debevec and Malik, 1997): we have started our calculus based on the irradiance maps fusion. Thus, the use of LTIP operations creates a unifying −1 framework for both approaches. In parallel, it adds partial g (fi(k,l)) Ei(k,l)= (4) motivation, by compatibility with HVS, for the exposure ∆t fusion variant. The motivation is only partial as the LTIP where ∆t is the exposure time and (k,l) is the pixel lo- model follows only the global HVS transfer function and cation. Further, the HDR irradiance map is calculated not the local adaptation. as the weighted sum of the acquired irradiance maps (Debevec and Malik, 1997),(Robertson et al., 1999): The weights, w (f(k,l)), should complement the global tone mapping by performing local adaptation. In (Mann and Mann, 2001) these weights are determined by N w (f (k,l)) · E (k,l) E (k,l)= i=1 i i (5) derivation of the CRF, while in (Mertens et al., 2007) they HDR N f P i=1 w ( i(k,l)) are extracted as to properly encode contrast, saturation and well-exposedness. More precisely: where w (fi(k,l)) are weightsP depending on the chosen al- gorithm and N is the number of frames. • contrast wC is determined by considering the re- However, we stress that the weights are scalars with respect sponse of Laplacian operators; this is a measure of to image values. This means that their sum is also a scalar the local contrast which exists in human perception as and we denote it by: center-surround ganglion field organization. N • saturation w is computed as the standard deviation η = w (f (k,l)) (6) S i of the R, G and B values, at each pixel location. This i=1 X component favors photographic effects since the nor- mal consumers are more attracted to vivid images and Taking into account that the CRF may be approximated has no straight-forward correspondence to human per- by the LTIP generative function g and, also, that the final ception. image was achieved by a tone mapping operator from the

HDR irradiance map, we may write that: • well-exposedness we is computed by giving small weights to values in the mid-range and large weights f E HDR(k,l)= g ( HDR(k,l)) (7) to outliers favoring the glistening aspect of consumer approaches. More precisely, one assumes that a per- If one expands the HDR irradiance map using Eq. (5), he fect image is modelled by a Gaussian histogram with HDR by Perceptual Merging

µ mean and σ2 variance, and the weight of each pixel from (Mertens et al., 2007) are much simpler, and produce is the back–projected probability of its intensity given results without physically motivation, but which are visu- the named Gaussian. ally pleasant for the average user; consumer applications further process these image to enhance the surreal effects We will assume the same procedure of computing which is deemed, but fake. Thus, the subjective evaluation the weights with some small adjustments: while in and no-reference objective metrics that evaluate the over- (Mertens et al., 2007) for well-exposedness both outliers all appearance will positively appreciate such images, al- were weighted symmetrically, we favor darker tones to though they are not a realistic reproduction of the scene. compensate the tendency of LIP models to favor bright Thus, to have a complete understanding of a method per- tones, caused by their closing property. Details about the formance, we will evaluate the achieved results with two precise implementation parameters values will be provided categories of methods: subjective evaluation and evalua- in Section 6, subsect. 1. tion based on objective metrics.

5. Implementation and evaluation procedure 5.1. Objective Evaluation Implementation We have implemented the HDR algo- While not unanimously accepted, several metrics were cre- rithm described mainly by the equation (8) within the LTIP ated for the evaluation of TMOs in particular and HDR model and weights similar to the procedure described in images in general. Here, we will refer to the metrics in- (Mertens et al., 2007) in Matlab. The actual values of troduced in (Aydin et al., 2008) and respectively the more the weights are standard for contrast wC = 1, saturation recent one from (Yeganeh and Wang, 2013). wS = 1, but differs for well-exposedness, where the mid The metric from (Aydin et al., 2008), entitled “Dynamic range (the parameter of the Gaussian distribution modelling 2 Range (In)dependent Image Metrics” (DRIM) uses a spe- it) are µ = 0.37 and σ = 0.2. The choices are based on cific model of the HVS to construct a virtual low dynamic maximizing the objective metrics and will be further ex- range (LDR) image from the HDR reference and compares plained in Sections 6.1 and 6.3. the contrast of the subject LDR image to the virtual one. An example of the achieved extended dynamic range image In fact, the HVS model and the comparison can be merged may be seen in Fig. 3. together, so that the matching is between the subject LDR image and the HDR reference, skipping the virtual LDR The common practice is to evaluate HDR methods using image. The comparison takes into consideration three cat- few publicly available images. We adopted the same princi- egories: artificial amplification of contrast, artificial loss ple, using more extensive public imagery data, such as the ˇ of contrast and reversal of contrast. The metric points to ones from (Cad´ık et al., 2008), OpenCV examples library pixels that are different from their standard perception ac- and from (Drago et al., 2003). We have evaluated the pro- cording to the authors aforethought HVS modelling and a posed algorithm on a database containing 22 sets of HDR typical monitor setting (γ = 2.2, 30 pixels per degree and frames acquired from various Internet sources, being con- viewing distance of 0.5m). For each test image we nor- strained by the fact that the proposed method requires also malized the error image by the original image size (as de- the original frames and not only the HDR image. We made 1 scribed in (Ferradans et al., 2012)). The metric only as- public the full results and the code to obtain them so to signs one type of error (the predominant one) and has two encourage other people to further test it. shortcomings: it heavily penalizes global amplification er- ror (which is not so disturbing from a subjective point of Evaluation The problem of evaluating HDR images is view) and it merely penalizes artifacts (such as areas with still open as HDR techniques includes two categories: ir- completely wrong luminance), which, for a normal viewer, radiance map fusion which aims at correctness and expo- are extremely annoying. Thus the metric, in fact, assigns a sure fusion which aims at pleasantness. As mentioned in degree of perceptualness (in the sense of how close is that Section 2.2, the irradiance map is physically supported and method to the human contrast transfer function) to a certain typical evaluation is performed with objective metrics that HDR method. are inspired from human perception. Thus the evaluation with such objective metrics will show how realistic is one A more robust procedure for evaluation was proposed in method (i.e. how closely is the produced image to the hu- (Yeganeh and Wang, 2013) where, in fact, three scores are man perception of the scene). introduced: On the other hand, the exposure fusion methods inspired • Structural fidelity, S, which uses the structural sim- 1The code and supplementary results are available at ilarity image metric (SSIM) (Wang et al., 2004) to http://imag.pub.ro/common/staff/cflorea/LIP establish differences between the LDR image and HDR by Perceptual Merging

(a) (b) (c) (d) (e) (f)HDR

Figure 3. Example of HDR imaging: initial, differently exposed frames (a-e)) and the HDR image obtained by using the proposed algorithm (f).

the original HDR one and a bank of non-linear fil- the source of each image. While we introduced them to a ters based on the human contrast sensitivity function method for monitor calibration and discussed aspects about (CSF) (Barten, 1999). The metric points to structural view angle anddistance to monitor,we did not impose them artifacts of the LDR image with respect to the HDR as strict requirements since the average consumer does not image and has a 0.7912 correlation with subjective follow rigorous criteria for visualization. Thus, the subjec- evaluation according to Yeganeh et al. (2013). tive evaluation was more related to how appealing an image is. • Statistical naturalness, N gives a score of the close- ness of the image histogram to a normal distribution which was found to match average human opinion. 6. Results • Overall quality, Q which integrates the structural fi- 6.1. Algorithm parameters delity, S, and statistical naturalness, N,by: To determine the best parameters of the proposed method Q = aSα + (1 − a)N β (9) we resorted to empirical validation. where a =0.8012, α =0.3046, β =0.7088 as found Logarithmic Model. The first choice of the proposed in (Yeganeh and Wang, 2013). The metric has 0.818 method is related to the specific Image Process- correlation with human opinion. ing model used. While we discuss this aspect by means of an example shown in Fig. 4, top row (b-d), we have The structural fidelity appreciates how close a TMO to the to stress that all the results fall in the same line. The CSF function, thus is theoretical oriented measure, while LTIP model provides the best contrast, while the classi- the structural fidelity is subjective measure and shows how cal LIP model (Jourlin and Pinoli, 1987) leads to very sim- close is a TMO to the consumer preferences. ilar results, with marginal differences like slightly flat- ter sky and less contrast on the forest. The symmetri- 5.2. Subjective evaluation cal model (Patrascu and Buzuloiu, 2001) produces over– saturated images. Given the choice between our solution On the subjective evaluation, for HDR images, Cad´ıkˇ et ˇ and the one based on (Jourlin and Pinoli, 1987), as differ- al. (Cad´ık et al., 2008) indicated the following criteria as ences are rather small, the choice relies solely on the per- being relevant: luminosity, contrast, color and detail repro- ceptual motivation detailed in Section 4. duction, and lack of image artifacts. The evaluation was performed in two steps. First we analyzed comparatively, Next, given the parametric extension of the LTIP model by means of example, the overall appearance and existence from (Florea and Florea, 2013), we asked which is the best of artifacts with respect to the five named criteria in the value for the parameter m. As shown in Fig. 4, bottom row tested methods; next, we followed with subjective evalua- (e-h), the best visual results are obtained for m =1, which tion where an external group of persons graded the images. corresponds to the original LTIP model. Choices different from m = 1 use direct or inverse transformations that are To perform the external evaluation, we instructed 18 stu- too concave and, respectively, too convex, thus distorting dents in the 20-24 years range to examine and rank the the final results. Also, the formulas become increasingly images on their personal displays, taking into account the complex and precise computation more expensive. Con- five named criteria. We note that the students follow com- cluding, the best results are achieved with models that are puter science or engineering programme, but they are not closer to the human perception. closely related to image processing. Thus, the subjective evaluation could be biased towards groups with technical Algorithm weights. In Section 4 we nominated three cate- expertise. gories of weights (contrast – wC , saturation wS and well- exposedness) that interfere with the algorithm. For the The testing was partially blind: the subjects were aware first two categories, values different from standard ones of the theme (i.e. HDR), but they were not aware about HDR by Perceptual Merging

(a)Normallyexposedframe (b)LTIP, m =1 (c)LIP (d)symmetricalLIP

(e) LTIP, m =0.5 (f) LTIP, m =0.75 (g) LTIP, m =1.33 (h) LTIP, m =2

Figure 4. Normal exposed image frame (a) and the resulting image obtained by the standard LTIP model (b), Classical LIP model (c), symmetrical model introduced by Patrascu (d). Images obtained with the parametric extension of the LTIP model (e-h).

(w =1 and w =1) have little impact. C S Table 3. HDR image evaluation by the average values for struc- The well-exposedness, which is described by mainly the tural fidelity (S), statistical naturalness (N) and the overall qual- central value µ of the ”mid range” has significant impact. ity (Q) and detailed in Section 5.1. With bold letters we marked As one can see in Fig. 5, the best result is achieved for the best result according to each category, while with italic the µ = 0.37 while for larger values (µ > 0.37) the image is second one. Method S[%] N[%] Q[%] too bright and, respectively, for smaller ones (µ < 0.37) is too dark. While a µ = 0.4 produced similar results, the Ward et al. (1997) 66.9 14.38 72.7 objective metrics reach the optimum in µ = 037. Chang- Fattal et al. (2002) 59.9 6.4 61.0 ing the variance also has little impact. These findings were Durand et al. (2002) 81.7 41.0 85.4 confirmed by objective testing, as further showed in Sec- Drago et al. (2003) 82.3 50.2 87.0 tion 6.3. Reinhard et al. (2005) 83.1 50.5 87.5 Krawczyk et al. (2005) 71.7 36.8 76.6 6.2. Comparison with state of the art Banterle et al. (2012) 83.7 52.1 87.8 Mertens et al. (2007) 81.7 64.2 89.4 To test against various state of the art methods, we used the Zhang et al. (2012) 77.6 59.7 83.4 HDR irradiance map (stored as .hdr file) which was either delivered with the images (and typically producedusing the Proposed, µ =0.5 81.0 39.2 84.5 method from (Robertson et al., 1999)), or produced with Proposed, µ =0.4 81.6 52.1 87.3 some online available code2. Proposed, µ =0.37 81.5 57.4 88.0 Proposed, µ =0.32 81.4 53.7 87.4 For comparative results we considered the exposure fusion in the variant modified according to (Mertens et al., 2007) and (Zhang and Cham, 2012) using the author released code and the TMOs applied on the .hdr the Toolbox creators to match with initial article reported images described in (Ward et al., 1997),(Fattal et al., results and for better performance; hence, we used the 2002),(Durand and Dorsey, 2002),(Drago et al., 2003), implicit values for the algorithms parameters. We note that (Reinhard et al., 2005),(Krawczyk et al., 2005) and envisaged TMO solutions include both global operators (Banterle et al., 2012) as they are the foremost such and local adaptation. A set of examples with the results methods. The code for the TMOs is taken from the Matlab produced with all the methods is presented in Fig. 6. HDR Toolbox (Banterle et al., 2011) and is available online3. The implemented algorithms were optimized by 6.3. Objective metrics 2The HDR creator package is available at http://cybertron.cg.tu-berlin.de/pdci09/ Structure and Naturalness. We started the evaluation using hdr_tonemapping/download.html the set of three objective metrics from (Yeganeh and Wang, 3The HDR toolbox may be retrieved from 2013). The results obtained are presented in Table 3. The http://www.banterle.com/hdrbook/downloads/ HDR_Toolbox_current.zip HDR by Perceptual Merging

(a) µ =0.32 (b) µ =0.37 (c) µ =0.4 (d) µ =0.5

Figure 5. Output image when various values for mid-range (µ) in well-exposedness weight were used. The preferred choice is µ = 0.37.

(a) Proposed (b) Pece and Kautz (2010) (c)Banterleetal. (2012) (d)Krawczyketal. (2005)

(e) Reinhard et al. (2005) (f)Dragoetal.(2003) (g)Durandetal.(2002) (h)Fattaletal.(2002)

Figure 6. The resulting images obtained with HDR state of the art imaging techniques (irradiance maps fusion followed by TMO and exposure fusion ). best performing version of the proposed method was for posed method outperformed ExpoBlend: on ”Memorial” µ =0.37. we reach 95.5% compared to 93.2%, while on ”Lamp” we reach 90.1% compared to 89.4% reported in (Bruce, 2014). The proposedmethod, when comparedwith various TMOs, ranked first, according to the overall quality and statis- Furthermore, the proposed method is the closest to the stan- tical naturalness, and it rank fifth according to struc- dard exposure fusion result (Mertens et al., 2007), which is tural fidelity (after (Banterle et al., 2012),(Reinhard et al., currently the state of the art method for consumer applica- 2005),(Drago et al., 2003),(Durand and Dorsey, 2002)). tions while building HDR images. This aspect is shown in Our method was penalized when compared to other TMO Table 4, where we computed the natural logarithm of the due to their general adaptation being closer to the stan- Root-Mean-Square to the image resulting from the stan- dard contrast sensitivity function (CSF) (Barten, 1999). dard exposure fusion and, respectively, the structural simi- Yet we stress that some TMOs (Krawczyk et al., 2005) or larity when compared to the same image. (Banterle et al., 2012) work only for calibrated images in Perceptualness. One claim of the current paper is that the specific scene luminance domain. proposed method adds perceptualness to the exposure fu- When compared with other exposure fusion methods sion. To test this, we compared our method against the (Mertens et al., 2007) and (Zhang and Cham, 2012), it standard exposure fusion, (Mertens et al., 2007) using the ranked second for the overall quality after (Mertens et al., perceptual DRIM metric from (Aydin et al., 2008). Over 2007). This is an expected results as standard exposure the considered database, the proposed method produced fusion was build to match subjective opinion score as the an average total error (the sum of three categories) with envisaged metrics did too. Yet the proposed method out- 2% smaller than the standard exposure fusion (64.5% com- performed the overall performance of the exposure fusion pared to 66.8%). On individual categories, the proposed introduce in (Zhang and Cham, 2012). Furthermore a more method produced a smaller amount of amplification of con- recent algorithm, namely the ExpoBlend (Bruce, 2014) re- trast, with comparable results on loss and reversal of con- ports the overall quality on two image that we used too trast. Thus, overall, the results confirm the claim. (”Memorial” and ”Lamp”). On these images, the pro- HDR by Perceptual Merging

one by 10 users, the exposure fusion (Mertens et al., 2007) Table 4. HDR image evaluation by taking the log of root mean by 7, while the rest won 1 case. Also the second place square to standard exposure fusion. Best values are marked with bold letters. was monopolized by the ”glossier” exposure fusion based Method logRMSE[dB] SSIM method. Ward et al. (1997) 149.9 83.5 When compared to direct exposure fusion proposed by Fattal et al. (2002) 281.1 38.1 Mertens et al. (2007), due to the perceptual nature of the Durand et al. (2002) 160.5 74.4 proposed method, a higher percentage of the scene dy- Drago et al. (2003) 148.4 74.7 namic range is in the visible domain; direct fusion losses Reinhard et al. (2005) 148.4 74.8 information in the dark-tones domain and respectively in Krawczyk et al. (2005) 136.9 66.4 very bright part; this is in fact the explanation for the nar- Banterle et al. (2012) 147.9 75.2 row margin of our method advance. Proposed 72.1 93.8 7. Discussion and Conclusions In this paper we showed that the LTIP model is compat- ible with the Naka-Rushton equation modelling the light absorption in the human eye and similar with the CRF of digital cameras. Upon these findings, we asserted that it is possible to treat different approaches to HDR imaging uni- tary. Implementation of the weighted sum of input frames (a) Durand et al. (2002) (b)Proposed is both characteristic to irradiance map fusion and to expo- sure fusion. If implemented using LTIP operations, the per- Figure 7. Examples of artifacts produced by state of the art meth- ceptualness is added to the more popular exposure fusion. ods compared to the robustness of the proposed method. Close- Finally, we introduced a new HDR imaging technique that ups point to artifact areas. adapts the standard exposure fusion to the logarithmic type operations, leading to an algorithm which is consistent in both theoretical and practical aspects. The closing property 6.4. Artifacts of the LTIP operations ensures that details are visible even in areas with high luminosity, as previously shown. The HDR-specific objective metrics have the disadvantage of not properly weighting the artifacts that appear in im- The method maintains the simplicity of implementation ages, while human observers are very disturbed by them. typical to the exposure fusion, since the principal differ- This fact was also pointed by (Cad´ıkˇ et al., 2008) and to ence is the redefinition of the standard operations and dif- compensate we performed visual inspection to identify dis- ferent parameter values. The supplemental calculus associ- turbing artifacts. The proposed method never produced any ated with the non-linearity of LTIP operation could easily artifact in the tested image sets. Examples of state of the art be trimmed out by the use of look-up tables, as shown in methods and artifacts produced may be seen in Fig. 7. (Florea and Florea, 2013). In direct visual inspection, when compared against the The evaluation results re-affirmed that in an objective as- standard exposure fusion method (Mertens et al., 2007), sessment aiming at naturalness and pleasantness of the im- our algorithm shows details in bright areas, while normal, age, the proposed method outperforms irradiance map fu- real–based operations do not. This improvement is due sion followed by TMOs as they try to mimic more a theo- to the closing property of the logarithmic addition and re- retical modelwhich is not perfectand is not how the normal spectively scalar amplification. This aspect is also visible user expects HDR images to look. The same conclusion when comparing with the most robust TMO based method, was emphasized by the subjective evaluation, where meth- namely (Banterle et al., 2012). Examples that illustrate ods developed in the image domain are preferred as the these facts are presented in Fig. 8. resulting images are more ”appealing”. The method out- performed, even by a small margin, the standard exposure 6.5. Subjective ranking fusion when evaluated with DRIM metric showing that is more HVS oriented. The proposed method, having a HVS The non-experts ranked the images produced with the inspired global adaptation and ”glossy” tuned local adapta- proposed method, standard exposure fusion, the meth- tion, by a narrow margin, ranks best in the subjective eval- ods from (Banterle et al., 2012),(Pece and Kautz, 2010), uation. (Drago et al., 2003)and(Reinhard et al., 2005). Regarding the results, the proposed method was selected as the best HDR by Perceptual Merging

(a) Banterle et al. (2012) (b)Proposed (c)Mertensetal.(2007) (d)Proposed

Figure 8. Examples of loss of details produced by state of the art methods compared to the robustness of the proposed method.

Acknowledgments Deng, G., Cahill, L. W. and Tobin, G. R. (1995). A study of logarithmic image processing model and its applica- The authors wish to thank Martin Cadik for providing the tion to image enhancement, IEEE Transactions on Image frames for testing. We also would like to thank Tudor Iorga Processing 4(4): 506 – 512. for running tests and providing valuable ideas. The work has been partially funded by the Sectoral Operational Pro- Drago, F., Myszkowski, K., Annen, T. and Chiba, N. gramme Human Resources Development 2007-2013 of the (2003). Adaptive logarithmic mapping for display- Ministry of European Funds through the Financial Agree- ing high contrast scenes, Computer Graphics Forum ment POSDRU/159/1.5/S/134398. 22(3): 419 – 426. Durand, F. and Dorsey, J. (2002). Fast bilateral filtering for References the display of high-dynamic-range images, ACM Trans- Aydin, T., Mantiuk, R., Myszkowski, K. and Seidel, H. actions on Graphics 21(3): 257 – 266. (2008). Dynamic range independent image quality as- Fattal, R., Lischinski, D. and Werman, M. (2002). Gradient sessment, ACM Transactions on Graphics (TOG), Vol. domain high dynamic range compression, ACM Trans- 27(3), pp. 1–10. actions on Graphics (TOG) 21(3): 249–256. Banterle, F., Artusi, A., Debattista, K. and Chalmers, A. Ferradans, S., Bertalmio, M., Provenzi, E. and Caselles, V. (2011). Advanced High Dynamic Range Imaging: The- (2012). An analysis of visual adaptation and contrast ory and Practice, AK Peters (CRC Press), Natick, MA, perception for tone mapping, IEEE Transactions on Pat- USA. tern Analysis and Machine Intelligence 33(10): 2002– 2012. Banterle, F., Artusi, A., Sikudova, E., Edward, T., Bashford-Rogers, W., Ledda, P., Bloj, M. and Chalmers, Florea, C. and Florea, L. (2013). Parametric logarithmic A. (2012). Dynamic range compression by differen- type image processing for contrast based auto-focus in tial zone mapping based on psychophysical experiments, extreme lighting conditions, InternationalJournal of Ap- ACM Symposium on Applied Perception (SAP), pp. 39– plied Mathematics and Computer Science 23(3): 637 – 46. 648.

Barten, P. G. J. (1999). Contrast Sensitivity of the Human Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T., Catal- Eye and Its Effects on Image Quality, SPIE, Washington, iotti, J., Li, X., Spehar, B., Annan, V. and Economou, DC. E. (1999). An anchoring theory of lightness perception, Psychological Review 106(4): 795–834. Bruce, N. D. (2014). Expoblend: Information preserving Grossberg, M. D. and Nayar, S. K. (2004). Modeling exposure blending based on normalized log-domain en- the space of camera response functions, IEEE Transac- tropy, Computers & Graphics 39: 12–23. tions on Pattern Analysis and Machine Intelligence 26 Cad´ık,ˇ M., Wimmer, M., Neumann, L. and Artusi, A. (10): 1272 – 1282. (2008). Evaluation of HDR tone mapping methods us- Jourlin, M. and Pinoli, J. C. (1987). Logarithmic image ing essential perceptual attributes, Computers & Graph- processing, Acta Stereologica 6: 651 – 656. ics 32(3): 330–349. Krawczyk, G., Myszkowski, K. and Seidel, H.-P. (2005). Debevec, P. and Malik, J. (1997). Recovering high dy- Lightness perception in tone reproduction for high namic range radiance maps from photographs, ACM dynamic range images, Computer Graphics Forum, SIGGRAPH, pp. 369–378. Vol. 24, pp. 635–645. HDR by Perceptual Merging

Macmillan, N. and Creelman, C. (eds) (2005). Detection Reinhard, E., Ward, G., Pattanaik, S. and Debevec, P. theory and: A user’s guide, Lawrence Erlbaum. (2005). High Dynamic Range Imaging: Acquisition, Display and Image-Based Lighting, Morgan Kaufmann Mann, S. and Mann, R. (2001). Quantigraphic imaging: Publishers, San Francisco, California, S.U.A. Estimating the camera response and exposures from dif- ferently exposed images, Porc. of IEEE Computer Vision Robertson, M., Borman, S. and Stevenson, R. (1999). Dy- and Pattern Recognition, Vol. 1, pp. 842–849. namic range improvement through multiple exposures, Proceedings of International Conference on Image Pro- Mann, S. and Picard, R. (1995). Being ’undigital’ with cessing, pp. 159 – 163. digital cameras: Extending dynamic range by combining differently exposed pictures, Proceedings of IS&Ts 48th Stevens, J. and Stevens, S. (1963). Brightness func- Annual Conference, Vol. 1, pp. 422–428. tions: Effects of adaptation, Jorunal of Optical Society of America A 53: 375–385. Markovic, D. and Jukic, D. (2013). On parameter esti- mation in the bass model by nonlinear least squares fit- Stevens, S. (1961). To honor Fechner and repeal his law, ting the adoption curve, International Journal of Applied Science 133: 80–133. Mathematics and Computer Science 23(1): 145 – 155. Tamburino, D., Alleysson, D., Meylan, L. and Strusstruk, Mertens, T., Kautz, J. and Reeth, F. V. (2007). Exposure S. (2008). workflow for high dynamic fusion, Proceedings of Pacific Graphics, pp. 382 – 390. range images using a model of retinal process, Proceed- ings of SPIE, Vol. 6817. Meylan, L., Alleysson, D. and Susstrunk, S. (2007). Model of retinal local adaptation for the tone mapping of color Valeton, J. and van Norren, D. (1983). Light adaptation of filter array images, Journal of Optical society of Amer- primate cones: an analysis based on extracellular data, ica, A 24: 2807 – 2816. Vision Research 23(12): 1539 – 1547.

Naka, K.-I. and Rushton, W. A. H. (1966). S-potentials Vertan, C., Oprea, A., Florea, C. and Florea, L. (2008). A from luminosity units in the retina of fish (cyprinidae), pseudo-logarithmic framework for edge detection, Ad- The Journal of Physiology 185: 587 – 599. vanced Concepts for Intelligent Vision Systems, Vol. Navarro, L., Courbebaisse, G. and Deng, G. (2013). The 5259, pp. 637 – 644. symmetric logarithmic image processing model, Digital Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P. Processing 23(5): 1337 – 1343. (2004). Image quality assessment: From error visibil- Panetta, K., Zhou, Y., Agaian, S. and Wharton, E. (2011). ity to structural similarity, IEEE Transactios on Image Parameterized logarithmic framework for image en- Processing 13(4): 600–612. hancement, IEEE Transactions on Systems, Man, and Ward, G., Rushmeier, H. and Piatko, C. (1997). A visibility Cybernetics - part B: Cybernetics 41(2): 460 – 472. matching tone reproduction operator for high dynamic Patrascu, V. and Buzuloiu, V. (2001). Color image en- range scenes, IEEE Transactions on Visualization and hancement in the framework of logarithmic models, Computer Graphics 3: 291–306. Proceedings of the 8th IEEE International Conference Yeganeh, H. and Wang, Z. (2013). Objective quality as- on Telecommunications, Vol. 1, Bucharest, Romania, sessment of tone mapped images, IEEE Transactios on pp. 199 – 204. Image Processing 22(2): 657–667. Pece, F. and Kautz, J. (2010). Bitmap movement detection: Zhang, W. and Cham, W.-K. (2012). Gradient-directed Hdr for dynamic scenes, Proceedings of Conference on multi-exposure composition, IEEE Transactios on Im- Visual Media Production, pp. 1–8. age Processing 21(4): 2318–2323. Pinoli, J. C. and Debayle, J. (2007). Logarithmic adaptive neighborhood image processing (LANIP): Introduction, connections to human brightness perception, and appli- 8. Biographies cation issues, EURASIP Journal on Advances in Signal Processing 1: 114–114. Corneliu Florea born in 1980, got his master degree from University “Politehnica” of Bucharest in 2004 and the PhD Reinhard, E., Stark, M., Shirley, P. and Ferwerda, J. (2002). from the same university in 2009. There, he lectures on Photographic tone reproduction for digital images, ACM statistical signal and image processing, and has introduc- Transactions on Graphics 21: 267–276. tory courses in computational and computer HDR by Perceptual Merging vision. His research interests include non-linear image pro- cessing algorithms for digital still cameras and computer vision methods for portrait understanding. He is author of more than 30 peer-reviewed papers and of 20 US patents and patent applications. Constantin Vertan Professor Constantin Vertan holds an image processing and analysis tenure at the Image Process- ing and Analysis Laboratory (LAPI) at the Politehnica Uni- versity of Bucharest. For his contributions in image pro- cessing he was awarded with UPB ”In tempore opportuno” award (2002), Romanian Research Council ”In hoc signo vinces” award (2004) and was promoted as IEEE Senior Member (2008). His research interests are: general im- age processing and analysis, content-based image retrieval, fuzzy and medical image processing applications. He au- thored more than 50 peer-reviewed papers. He is the secre- tary of the Romanian IEEE Signal Processing Chapter and associate editor at EURASIP Journal on Image and Processing. Laura Florea received her PhD in 2009 and M.Sc in 2004 from University “Politehnica” of Bucharest. From 2004 she teaches classes in the same university, where she is cur- rently a Lecturer. Her interests include image processing algorithms for digital still cameras, automatic understand- ing of human behavior by analysis of portrait images, med- ical image processing and statistic signal processing the- ory. Previously she worked on computer aid diagnosis for hip joint replacement. She is author of more than 30 peer- reviewed journal and conference papers.