A Model for Colour Naming and Comparing based on Conceptual Neighbourhood. An Application for Comparing Art Compositions

A computational model for Qualitative Colour Description, named the QCD model, is deﬁned using the Hue, Saturation and Luminance colour space. This model can name rainbow colours, pale , light and dark colours, and colours in the grey scale, and it has been parameterised by participants of a study in two universities in Spain: University Jaume I and University of Sevilla . The relational structure of the QCD model is analysed by means of a conceptual neighbourhood diagram and it is used to formulate a measure of similarity for solving absolute and relative comparisons of qualitative colours. Moreover, a similarity measure between colour compositions, called SimQCDI , is also developed. A survey test on several art compositions is carried out and the results obtained by the participants are analysed and compared to the computational results provided by the SimQCDI . Also, a comparison to the standard RGB Colour Histogram similarity method is carried out, which shows that the proposed similarity is more intuitive and that the results obtained are similar with respect to quantiﬁcation. Finally, the cognitive adequacy of the QCD model is also analysed.

with each colour label of the M colour sets. 151 The HSL colour space distributes colours in the following way. The rainbow colours are 152 located in the horizontal central circle. The colour luminance changes in the vertical direction,    values between r ul MAX and 100 or between r ul and r ul MIN ,respectively. 198 It is worth noting that the parameters (number of selected colour names for the grey scale) 199 and r (number of chosen colour names for the rainbow scale) depend on the granularity that an 200 expert needs in each scenario. The higher the values for these parameters, the more subjective 201 the description, and the lower the values, the more universal the description. 202 As an example, taking as a reference the Natural Colour System (NCS) [31] the QCD model 203 may establish three pairs of elementary colours (white-black, green-red and yellow-blue).  cording to that, the minimal values for these parameters would be assumed to be ≥ 2 (white and 205 black) and r ≥ 4 (green, red, yellow and blue). Therefore, the values l = 2 and r = 4 would be 206 more universal than, for example, values of l = 30 where colour names such as ivory (a kind of 207 white) could appear as needed in a more specific use case (i.e. snow expert or fashion designer). 208 According to Steels and Belpaeme [32], when grounding colour categories, multiple sources 209 of constraints act: (i) constraints from embodiment, each visual sensory system can vary for 210 every individual; (ii) constraints coming from the world, the individuals must be adapted to the 211 environment and its statistical regularity has to be taken into account to reach viable performance; 212 and (iii) constraints coming from cultural negotiation, or collective decisions made by population 213 (i.e. a population may decide to combine blue and green categories, as many cultures have done).

214
The QCD model can adapt its parameters and r to fulfill these constraints to the case of study. System, a test were carried out on 534 participants (students and teachers) at Universitat Jaume I 218 and Universidad de Sevilla in Spain. A computer application was implemented which showed 10 219 different colours selected randomly and uniformly using their HSL coordinates. For each colour 220 selected, participants were asked if they considered the colour to be in the grey or rainbow scale.

221
For those colours classified in the grey scale, participants were asked if the colour was white, 222 light grey, grey, dark grey or black, that is, = 5. For those colours classified in the rainbow 223 scale, participants were asked if the colour was red, orange, yellow, green, turquoise, blue, purple 224 or pink, that is, r = 8, and if it was light, pale or dark. Thus, a total of 37 colour names were 225 considered.

226
Let us justify the parameters selected: (i) = 5 because the less saturated and extreme colours 227 in luminance are white and black and, according to the M sets defined, there are two more gra-228 dations in lightness lightand darkand one more in saturation pale-, which correspond to light-229 grey, dark-grey, and grey, respectively; and (ii) r = 8 since the rainbow/spectral colours are 7 230 and the majority of the participants of the test suggested to add also pink 5 .

231
From the survey, a dataset with 5340 colour names and its corresponding HSL coordinates 232 were obtained. Then, a supervised discretization algorithm, AMEVA [33], was used in order 233 to calculate the classes of the intervals corresponding to each colour name. This algorithm was 234 chosen because its main aim is to maximise the dependency relationship between the class labels,  pants measured from the contingency coefficient between colours and intervals. Note also that 238 the AMEVA algorithm discretises each variable independently from the others. However, the 239 dependency constraint of the unit of Saturation and the unit of Lightness in the HSL colour space 240 has been also taken into account.

241
As a result, Table 1 shows the values extracted by AMEVA for parameterising the QCD 242 model, taking into account the topological structure of the HSL colour space showed by the 243 QCRS, and Figure 2 shows the colour values assigned to each colour name, which correspond to 244 the central value of each interval in HSL. Figure 3 shows that the QCD model gives the same colour category to different colour in-246 tensities in the same way as suggested by participants. It is straightforward to see that most of 247 the people may agree to name any of the colours in each grid with the name given by the QCD 248 model.  The relational structure of the QCD model is studied by analysing the conceptual neighbour-251 hood of the qualitative colours defined. Freksa     The dissimilarity between qualitative colours in the QCD model, denoted by dsColour(·, ·), 268 is calculated as the minimal path between the nodes of the CND in Figure 4. In this CND, the 269 paths connecting pairs of adjacent nodes that map to continuous transformations can be assigned 270 11 the following positive weights in order to establish priorities:

245
• w 1 is the weight assigned to the transition between a colour name and the same colour

285
According to the importance of these transitions, the following relations are hold:

286
• w 1 is given to the changing transition between a colour name and the same colour name 287 (same hue) but different lightness or saturation, whereas the w 2 is given to the changing 288 transitions between different colour names (different hues). From a cognitive point of 289 view, the difference in colour perception is higher when the hue changes that when it does 290 not; in fact, not perceiving the difference between some hues is considered a disease (i.e. 291 colourblindness). Hence w 1 ≤ w 2 is considered.

292
• w 3 is given to the changing transition between a colour name (denoted by any hue) and only two distinctions in light is more significant than having a range of grey perception; 301 hence w 3 ≤ w 4 is considered.

302
Therefore, the priorities established must verify: Hence, given two qualitative colours, denoted by QC A and QC B , a similarity between them, where dsColour(QC A , QC B ) denotes the previously defined dissimilarity and MaxDsColour de-306 notes the maximum dissimilarity for all colour names.

307
The main properties of this similarity measure are: The SimQCD calculus is parameterised by assigning, as a baseline, the following values to 321 weights: w 1 = 1, w 2 = 3, w 3 = 5 and w 4 = 6. Hence, MaxDsColour = 14 which is given between 322 black and white colours.

323
The adequacy of this parameterisation is tested by: • the null similarity is given between white and black.

335
• the similarity given between any rc and black/white or any pale rc and black/white is the 336 same.

337
• the same similarity is given between any light rc and white and any dark rc and black.

338
• the same similarity is given between any light rc and dark and any light rc and black.

339
• the similarity given between any rc and the same dark, pale or light rc is the same.

340
• the same similarity is given between any prefix (pale, dark or light) of the same rc.

341
• the similarity given between any pale rc and grey, and between any light rc and light grey, 342 and between any dark rc and dark grey is the same.

343
• any light rc is more similar to white than any pale rc to white and, in the same way, any 344 dark rc is more similar to black than any pale rc to black.  Let us denote the set of the 37 representative colour names of the QCD model as: C = {QC 1 , · · · , QC 37 }. Thus, the similarity SimQCD : which is symmetric and whose main diagonal contains 1 values.

368
Let us consider Y as the set of the colour compositions/images to compare. If Image represents a colour composition, the system obtains a colour histogram: where f i corresponds to the percentage of the colour QC i within the Image ( f i ≥ 0). Therefore, each image is assigned a unique vector, that is, Image ≡ I where I ∈ R 37 . Note that two images or colour compositions are equal in the 369 system presented if they have the same representation as R N vector.

370
In order to define a similarity measure, let us consider the following matrix S * associated to S and defined as follows: where 6 Thus, a Quadratic Form 7 is considered as follows: and given an image The S * matrix is defined positive since all its eigenvalues are positive (see Table 2). Therefore, QF defines a norm in R 37 as follows: x = QF(x) for any x ∈ R 37 , and hence, a 'quasi'distance 8 in Y is defined as: where Image1 = I 1 = ( f 1 , · · · , f 37 ) and Image2 = I 2 = ( f 1 , · · · , f 37 ).

371
Furthermore, it holds that for any i, j, and: From the distance, d(·, ·), a similarity measure between two images regarding only their colour compositions I 1 and I 2 is obtained as follows: The main properties of the SimQCDI similarity are: • If I 1 = I 2 then d(I 1 , I 2 ) = 0 and, hence SimQCDI(I 1 , I 2 ) = 1, that it, the maximum simi-374 larity.

389
• Diego Velázquez (1599-1660) was one of the most important painters of the Spanish 390 Golden Age in the contemporary Baroque period.

393
• Salvador Dalí (1904-1989) was a prominent Catalan-Spanish surrealist painter.  Results obtained when comparing the art compositions in Figure 8 are given in Table 3. The 439 mean and the standard deviation of the similarities are given in Table 4.

Comparing the Similarity Results to the Survey Results
The results obtained by the computational models QCD and SimQCDI are compared with 489 the main results provided by the participants of the survey. To simplify, the results obtained in 490 the survey are presented in each corresponding item where they are discussed.

491
The survey asked the participants which pair of art pieces by the same authors were more 492 similar according to their colours:

493
• When comparing the art pieces D1-D4-D5, the results in Table 5 were obtained. From  • When comparing the art pieces G1-G2-G3, the answers gathered were those in Table 6.

500
In this case, the SimQCDI similarity agrees completely with the participants of the survey, 501 since the difference in similarity between (80.93, 16) and (80.55, 17) is not very significant.

502
• When comparing the art pieces H1-H2-H4, the votes were those indicated in Table 7. In 503 this case, all the similarities obtained by SimQCDI are very high, and they agree with the 504 opinion of the participants of the survey: the higher the similarity in colours between art 505 pieces, the higher number of votes.

506
The survey also asked the participants to compare pairs of art pieces by different authors and 507 the following results were provided:

508
• When comparing the art pieces V1-G2 versus V1-D4, the results in Table 8 were obtained.

509
The 50% of the participants chose each pair equally, which coincides with the similarity 510 values obtained, which are relatively close.   • When comparing the art pieces D1-M2 versus D1-H2, the results in Table 9 were obtained.

512
In this case, note that an inverse control-question was made, that is, which pair of art pieces 513 was less similar. The opinion of the participants agrees with the dissimilarity values calcu-514 lated as 1 − SimQCDI. The fact that the participants noticed when the survey was asking 515 'more' or 'less' similar pairs confirms that they did the survey thoughtfully. Therefore, 516 according to these answers, the survey results were validated. • When comparing art pieces D4-H2 versus D4-V1, the results in Table 10 were provided.

518
This comparison was asked for similarity but also for dissimilarity checking, as a control.

519
Hence, the 71% of the participants (67% in the inverse question, 'less' similar) answered 520 that D4 and V1 were more similar than D4 and H2, which contrast with the similarity 521 values obtained. Probably the contrasting colours in H2 are perceived differently by the 522 participants than the pale colours in D4 and V1. Regarding the similarities obtained between an art piece and a group of compositions by different authors, the results were the following:

525
• The survey asked the participants if M4 was more similar to D4-D5 or to M2-M5, and the 526 participants' votes were summarised in Table 11. The 49% of the participants said that  • The survey asked the participants if D2 was more similar to G1-G2 or to V1-V3 and the 535 results gathered were those in Table 12. The similarity of pale colours in D2 and V1-V3 536 was obvious for 90% of the participants in the survey, while 10% found that D2 was more 537 similar to G1-G2. In this case, the high similarity in colours between the art pieces D2 can condition the criterium of the participants for classifying into groups.

Discussion
In this section, first the results obtained by the SimQCDI and the survey results are discussed.   from those on the foreground and therefore it is affected by the percentage of the most 583 popular colour in the paintings.

584
In order to find out the adequacy of SimQCDI to discriminate art compositions without 585 taking into account the background, the following proof-of-concept has been carried out 586 on the art compositions in Table 13. The SimQCDI has been calculated after extracting the 587 background colour from the histogram and normalising it.
588 Table 14 show the results obtained of this proof-of-concept, where it can be seen that 589 the SimQCDI obtained between the art compositions is higher when the background is 590 not considered, in the same way as the participants of the survey could automatically do.

591
However, it is still a challenge to distinguish pixels from the background from those in  The approach presented in this paper obtains a colour model and a similarity value between 658 colour names taking into account the spatial relational structure of the colour model selected.

659
To the best of our knowledge, there are no works in the literature with explore the conceptual  guishes between neighbouring colours. For the rest of the non-neighbouring colours, the given 677 distance is the maximum (1.0), therefore the discrimination between colour names is poorer than 678 that provided by the SimQCD. With respect to the colour naming metric, the work by Mojsilovic [16] defined a distance based on the geometric property of the HSL system, where (H,S,L) are the components of the HSL colour system, which holds: (1)) Thus, when the saturation component is incremented in 1 unit, the distance is also incre-682 mented in 1. The same happens for lightness. Therefore, the same significance is given to a 683 change in saturation than to a change in lightness components, whereas the SimQCD colour 684 model can be tuned to give more importance to the changes in colour saturation which determine 685 the limit between between grey colours and rainbow colours. Moreover, the distance defined 686 by Mojsilovic [16] is not normalised, therefore a distance of 24 units obtained when calculating 687 the similarity between two similar red colours cannot be assigned a high or low significance, in 688 contrast, the SimQCD presented in this paper is normalised. can be used [53], and it has been the one selected in this comparison.

702
The quantised RGB histogram (Figure 10 (c)) is more similar to the QCD histogram ( Figure   703 10 (a)). However, the advantage of the QCD histogram is that the colour name (semantic infor-704 mation) about which colour is appearing in the image is obtained, whereas the quantised RGB 705 histogram need further interpretation of the groups of colours obtained.

706
For each art composition in Figure 8, the quantised RGB colour histograms has been ob-707 tained and the Euclidean distance between these RGB histograms has been computed [53] and 708 normalised (see the Appendix), which is denoted by SimRGB. Then, SimRGB and SimQCDI 709 are compared in order to analyse which of these methods is closer to the results of the survey 710 described previously in Section 10.4:

711
• When comparing the art pieces D1-D4-D5, the results obtained are shown in Table 16.  Hence, the SimQCDI is more coherent with the participants of the survey.

720
• When comparing the art pieces G1-G2-G3, the results obtained are those in Table 17. 721 Considering the opinion of the participants surveyed, the most intuitive order of similarity 722  voting, in the same order but a bit far from the opinion of the participants in the survey. • When comparing the art pieces V1-G2 versus V1-D4, the results were those in Table 19.

733
In this situation, SimQCDI and SimRGB have similar performance.

734
• When comparing the art pieces D1-M2 versus D1-H2, the results obtained are those in   in colour than SimRGB, as the participants do.  presented and proved to name colours in a general and adaptive way by distinguishing rainbow 757 colours, pale, light, and dark colours and colours in the grey scale. The relational structure of 758 the QCD model is also analyzed by means of a conceptual neighbourhood diagram.

759
A measure of similarity between colour names has also been defined taking into account the  We acknowledge the reviewers for their comments which help us to improve this paper. We