v.1, n.1, p.2-18, 2018. The use of artificial intelligence in the search for structural parameters of clusters of young

A. Hetem1, S. E. Matsuda Sampa1, J. L. Lima Berretta1 1Federal University of ABC

ABSTRACT

Due to the huge amount of data available to the astrophysicists nowadays it is imperative to use state of the art tools in the many processes involving data reduction, data traformation and generation of high level indicators in order to advance to the physical analysis. With the rise of huge parallel computer resources, it is possible to concatenate the many Artificial Inteligence tools necessary to model the astrophysical data. We present a realization of this idea applied to a real case of astrophysical study: the analysis of a large sample of clusters of young stars in order to investigate the inherent properties of clustering and dynamic evolution of stellar components. To achieve the statistical parameter Q measured for each cluster, it is necessary to pass the original data set through a series of processes, each one with its own characteriscits, needs and behaviour. So, we present a set of results obtained by the proposed method and some perspectives of future work in this path.

Keywords: artificial inteligence; genetic algorithm; cross-entropy; physical-mathematical model; young star cluster.

INTRODUCTION

Clusters of young stars

In our it is possible to find some groups of stars that move as an unity, when compared to the other stars. In the studies of stellar evolution, the open clusters are very important objects because their members (stars) are of similar chemical composition and same age. Other properties as distance, metallicity and extinction can be easily determined than for isolated stars. Figures 1 and 2 presents examples of such object.

When a survey of the characteristics and dimensions of groups of young stars is made, a broad spectrum of values and qualities is observed [1]. Such groups can be found both in the form of large associations of young stars and compact concentrations of protostars embedded in regions of star formation. Taking into account the actual models and processes of stellar formation [2], it is evident the need to investigate the natural connection between these various stellar group scales. Most likely, all stellar formation processes must be connected, despite the scales involved.

2 v.1, n.1, p.2-18, 2018.

Figure 1. NGC 265: an example of in the constelation of Tucana. One can see the almost spherical aglomeration of stars in the center of the picture. (Image Credit: [3])

Figure 2. Mel 111, also known as Coma star cluster: an open star cluster in the constalation of Coma Berenices. (Image Credit: [4])

3 v.1, n.1, p.2-18, 2018.

In previous works [5]; [6] a large sample of groups of young stars was investigated in order to investigate the inherent properties of the stellar components. In these studies, special attention was given to the statistical pa- rameter Q [7], measured for stellar groups, and their possible correlations with the fractal dimension estimated for the projected clouds [8]. The conclusions of these studies show that more than 50% of the studied sample has substructures (or subgroups) that, once analyzed from a statistical and geometric point of view, tend to reproduce simulations of artificial star distributions [9]. An increasing number of publications in recent years have shown that this subject is of great interest, both in the studies of galaxy star groups and in nearby gala- xies [10]. As a consequence, fractal-statistical analysis tools have evolved and allowed new interpretations and modeling [11][12][13].

METHODOLOGY

This work was accomplished through the amalgamation of several known methodologies, including: mem- bership and kinematic determination from proper motion, age and mass from colour-magnitude diagrams and theoretical evolutive models, validation with previous results, fractal analysis and dynamic evolution parame- ters and correlation of obtained clusters properties. Fig. 3 presents a pictorial explanation of the sequence of data acquisition, calculus and manipulation and each step is explained in the following subsections.

Selected objects

The main data relates to the stars belonging to each target cluster. Position, proper motion and parallaxe (and respective uncertainties) were obtained from GAIA DR2 [13]. Magnitudes JHK were obtained from 2MASS catalogue [14]. All the selected young clusters have intermediate distances of d > 2 kpc and similar angular sizes (R < 20 arcmin) for most of the objects.

Table 1 presents the fields extracted via SQL queries from the GAIA DR2 archive, including the 2MASS JHK magnitudes cross referenced by the search on-line tool. Table 2 lists the young star clusters analyzed in the present work.

Cluster membership determination

The extracted data about a given cluster consists of the stars belonging to that cluster contaminated by back- ground stars. We applied artificial inteligence to choose from the obtained data those stars that belong to the cluster, apart of the field stars. A reliable way to evaluate the star membership with relation to a cluster or not is to consider their own motions, that is radial velocities and/or parallaxes velocities. Cluster stars move- ments stablish the velocity components the center of mass due to the gravitational bounding of their members. An usual reference on this subject is [15], who proposed a method for identifying cluster stars by their own movements modelled in a maximum probability frame.

The cross entropy (CE) technique was first introduced by [16] and later modified by the same author [17] to deal with discrete combinatorial and continuous multiextremal optimization problems. The CE method has as objective the estimating of probabilities of rare events in complex stochastic networks. It is asymptotic

4 v.1, n.1, p.2-18, 2018.

Figure 3. Methods fluxogram: 1) Membership and kinematic properties from proper motion data by using a Bayesian distribution model whose parameters are obtained by cross-entropy and/or genetic algorithms. 2) Age and mass from JHK unreddened magnitudes and theoretical evolutive models. 3) Method validation by comparing with previous results. 4) Fractal analysis and dynamic evolution parameters. 5) Correlation of clusters properties (age, mass, crossing time, tidal radius, fractal parameters, etc).

Table 1. Fields extracted from GAIA DR2 catalog.

GAIA DR2 description field Gaia2 Unique source designation ext_source_id 2MASS Catalogue source identifier ra [deg] Right ascension ra_error [mas] Standard error of ra dec [deg] Declination dec_error [mas] Standard error of dec parallax [mas] Parallax parallax_error [mas] Standard error of parallax pmra [mas/yr] Proper motion in ra direction pmra_error [mas/yr] Standard error of proper motion in ra direction pmdec [mas/yr] Proper motion in dec direction pmdec_error [mas/yr] Standard error of proper motion in dec direction Qflag Photometric quality flag j_m [mag] Default J-band magnitude h_m [mag] Default H-band magnitude ks_m [mag] Default Ks-band magnitude

5 v.1, n.1, p.2-18, 2018.

Table 2. Sample analysed in the present work with information from the literature. Non evident columns are: NT is the number of star members; n is the density (stars per square ); R is the radius of the cluster; and rc is the radius of core.

cluster NT age d n R rc (Myr) (pc) pc−2 (pc) (pc) Berkeley 86 78 3 1600 1.5 4 Collinder 205 114 3 1900 1.5 4.9 0.3 Hogg 10 21 2 2300 0.9 2.7 0.46 Hogg 22 28 3 1700 1.5 2.5 1.9 Lynga 14 10 5 1000 4.6 0.8 0.22 Markarian 38 23 3 1600 2.5 1.7 0.13 NGC 2244 292 3 1600 1.8 7.1 NGC 2264 337 3 740 3.4 5.6 NGC 2302 17 9 1600 1.6 1.9 0.54 NGC 2362 96 4 1400 8 2 0.26 NGC 2367 32 3 2200 1.4 2.7 0.34 NGC 2645 71 5 1900 2.3 3.2 0.22 NGC 2659 200 4 2000 2.2 5.3 1.93 NGC 3572 50 3 2000 1.9 5.3 0.16 NGC 3590 19 8 1700 1.1 3.7 0.29 NGC 5606 54 5 2400 1 4 0.51 NGC 6178 38 2 1400 0.8 2.7 0.3 NGC 6530 75 1 1300 0.7 5 NGC 6604 58 3 1700 1.9 2.5 0.5 NGC 6613 66 4 1600 3 2.8 0.12 Ruprecht 79 71 3 2700 1.9 3.1 2.2 Stock 13 16 3 2000 1.9 3.3 0.11 Stock 16 82 2 2000 0.8 5.4 1.2 Trumpler 18 144 3 2800 0.6 3 0.38 Trumpler 28 9 2 1100 1.2 4.7 0.84 Observation: A parsec (symbol pc) corresponds to 3.26156 lightyears or 3.0857 × 1016 m. convergent, as demonstrated in works of [18], and efficient in solving continuous multi-extremal optimization problems [19]. [20] presents an extended list of applications of CE method and its formalism.

Cross-Entropy method

In this work, we applied the global optimization technique based on the CE global optimization procedure to fit the observed distribution of proper motions and to obtain the probability of a given star belonging or not to the cluster. Being a stochastic method, the CE technique behavior is similar to simple general adaptive method for estimating optimal parameters values and does not need an implementation of priory deep knowledge of the model.

The iterative CE method consists of some steps, as follows. Initially the first generation is stablished as sets of parameters randomly chosen based on some pre-defined criteria, which depends on the problem being analyzed. On these sets, it is applied the physical/mathematical model that rules the problem. From this step some sets appear to be best candidates than others, and are selected as the seeds of next iteration. The method thus described is looped until a number of loops is reached or a prespecified satisfying criterion is fulfilled.

Formaly, the continuous multi-extremal optimization provided the CE method is to find a set of parameters ~ ∗ Ψi (i=1 to m), with m the number of parameters to be found, for which the model provides the best description of the data. Before the start of the iterations loop, it is performed the generation of randomly N independent

6 v.1, n.1, p.2-18, 2018. ~ ~ ~ ~ sets of parameters Ξ = (Ψ1, Ψ2, ..., ΨN), where Ψ = (x1, x2, ..., xm) where xi represents parameter number i. Γ(Ξ) is the objective function that measures the quality of the fit during the method procedure and its behavior must be such that when the convergence goes to the exact solution, one may observe Γ → 0 then ~Ψ → ~Ψ∗. (x x x ) = { min max} The starting Ξ0 set is obtained by choosing 1, 2, ..., m i in the range ρxi ξxi , ξxi , the extremes of the possible values of xi, supposing a normal distribution for each ith parameter. The regular iterative loop starts ~ by evaluating Γ for each Ψ and stablishing ΞE, a subset of Ξ with the NE lower Γ values candidates, named elite candidates. Then, it is evaluated µxi,k and σxi,k, the mean and standard deviation of xi belonging to ΞE for ~ the kth iteration and these values lead to a new ρxi,k+1 used to generate Ψj,k+1.

As observed by [21], the CE method described responds with an intrinsic rapid convergence which can lead to sub-optimal local minima solutions, that can lead to a non-global minimum solution and an exponential speed of convergence. To prevent this and to achieve polynomial speed of convergence, we implemented a

fixed smoothing scheme for µxi,k and σxi,k, following the suggestions of [22]:

= 0 + ( − 0) µˆ xi,k a µxi,k 1 a µxi,k−1 (1) and

= ( ) + [( − ( )] σˆxi,k b k σxi,k 1 b k σxi,k−1 (2) where a0 is a smoothing constant parameter (0 < a0 < 1) and b(k) is a dynamic smoothing parameter at kth iteration given by b(k) = a − a(1 − 1/k)q, with 0 < a < 1 and q being an integer typically between 5 and 10.

Likelihood Model

As cited, we start from the hypothesis that observational data obtained in the vicinity of the (previously known) center of the cluster contains data from the stars of the field of the cluster itself. The segregation model is based on the hipothesis of high accuracy proper motions of cluster and field stars avalilable on GAIA DR2 catalog.

As stated by [23], it is possible to carry a segregation procedure on the basis of a mixed bivariate density func- tion model for proper motions considering that the cluster and the field are independent. A mixed multivariate density function for a set of parameters is given by

1 1 Φ(~x) = exp{− [(~x − ~µ)T · Σ−1(~x − ~µ)]} (3) 2πm/2|Σ|1/2 2

where ~x = (x1, x2, ..., xm) is the vector or parameters, the vector ~µ contains the averages of the parameters, and 2 Σ is the covariance matrix with elements σi if i = j and ρij = ρji if i 6= j, where σi are the standard deviations of

7 v.1, n.1, p.2-18, 2018. the parameter i, and ρij are the correlation coefficient of the normal bivariate distribution between parameter xi and xj. This mixed density satisfies the normalization relation

Z Z Z ... Φ(x1, x2, ..., xm)dx1dx2...dxm = 1 (4) 1 2 m

( ) ( ) We followed the formalism presented by [24], so Φci µxi , µyi and Φ fi µxi , µyi denote the probability density function (PDF) of star i, with respect to the cluster and to the field, respectively and the total PDF for star i = + { } is Φi Φci Φ fi . The parameter vector to be fit is µx,c, µy,c, σx,c, σy,c, ρc, µx, f , µy, f , σx, f , σy, f , ρ f , nc , being nc and n f the normalized star number fraction belonging to cluster and to the field, respectively. The correlation coefficients between µx and µy for the cluster and the field are denoted by ρc and ρ f . For the proper calculation q q 2 + 2 2 + 2 of deviates, one should consider the expressions σx,c exi and σy,c eyi , where exi and eyi are the formal observational error in the proper motion.

Genetic Algorithm method

As the CE is very sensitive to initial conditions, it is necessary a tool to act before it, to obtain initial values for the parameters. We choose the Genetic Algorithm (GA) method as we have used it in previous works.

GAs are a family of computer models inspired on natural evolution. Their basis is to assume a potential solution for a specific problem, viewed as chromosome like structure, on which are applied genetic operators (mutation, crossover, adaptation and evolution). The use of this technique simplifies the formulation and solution of optimisation problems, and parallel simultaneous procedure approach is implicit in the method, providing evaluation of the viability of a parameter set as possible solution for complex problems [25][26][27].

Considering that the GA nomenclature can be misplaced in the astrophysical context, we first present the translation from one field to another. A parameter (e.g. mean velocity), corresponds to the concept of a “gene”, and a change in a parameter is a “mutation”. A parameters set that yields to a possible solution corresponds to a “chromosome”. An “individual” is a solution, that is composed by one parameters set and two additional GA control variables. One of these variables is χ2, which means the “adaptation” level. The term “generation” means “all the individuals” (or all the solutions) present in a given iteration.

Essentially, the GA method presented herein implements a χ2 minimisation of the PDF fitting provided by the Bayesian model. There are three main advantages in using GA for this task: (i) the GA method potentially browses the whole permitted parameter space, better avoiding the “traps” of local minima; (ii) the method is not affected by changes in the model; (iii) the GA implementation does not need to compute the derivatives 2 2 of χ (like ∂χ /∂RD for example) required by the usual methods. This fact simplifies the code and minimises computer errors due to gradient calculations. [28] provides a full explanation on these technique.

8 v.1, n.1, p.2-18, 2018.

Finding Age and Mass

In the recent literature, it is common to find authors who estimate ages and masses of stars through interpo- lation on isochrones and evolutionary trajectories. A simple visual inspection can bring interesting results, but this practice should be avoided by its intrinsic subjectivity. We constructed a simple interpolation method based on algebraic fitting inside triangles.

The ages and masses of the stars are estimated by a colour-magnitude diagram (CMD) intrinsic colours points, with the distribution of observed colours expressed by σj[xj, yj], the position occupied by the star j in the CMD, with j = 1...m. In this case, m is the total number of cluster members. We defined aset of curve points,

Zk[magi, cori], from kth unreddened theoretical Siess models [29] in the magnitude vs. colour plan. The proce- dure first looks for an existent triangle Tj in Z space that contains a given a star j, with (magj, corj) coordinates.

If Tj exists, it means that the (magj, corj) point lies in a region between two isochrones and two evolutionary trajectories and from the corners of Tj we stablish a Jacobian transformation matrix to perform the translation from (magj, corj) → (agej, massj) spaces. The same procedure are used for determining the uncertainties in age and mass. Figure 4 presents the resulting application of this methodology.

As we suppose a normal distribution for age and mass, the final step is to fit a Gaussian on the age and mass histograms, resulting in an average age and mass for the cluster and their deviates.

The Q parameter of the clusters

In the technique proposed by [7], the fractal parameter Q is related to the geometrical structure of the point distribution and statistically quantifies the fractal substructures. Studies of the hierarchical structure in young clusters have used the Q parameter to distinguish fragmented from smooth distributions [30].

Two parameters are involved in the Q estimation: m, the mean edge length, which is related to the surface density of the points distribution, and s which is the mean separation of the points. Defining Q = m/s, distributions with large-scale radial clustering, which causes more variation on s than m, are expected to have Q > 0.8. However, Q < 0.8 is indicative of small-scale fractal subclustering, where the variation in m is larger than in s.

The parameter m is given by

1 N−1 m = 1/2 ∑ mi (5) (AN N) i=1

where N is the total number of considered points, mi is the length of edge i in the minimum spanning tree

(described below) and AN is the area of the smallest circle or convex hull that contains all points projected on the plane of the cluster. Each smallest circle was determined by adopting the algorithm proposed by [31].

The minimal spanning tree is defined as the unique network of straight lines that can connect a set of points

9 v.1, n.1, p.2-18, 2018.

6 Figure 4. MJ0 versus H − K0 diagram from 2MASS magnituges for cluster NGC 2367. Continuos lines represent the isochrones (age in 10 years) and dashed lines are mass evolutive models (in solar masses).

10 v.1, n.1, p.2-18, 2018. without closed loops, such that the sum of all the lengths of these lines (or edges) is the minimal possible. To construct the minimum spanning tree, we used the algorithm given by [32]. Figure 5 gives examples of smallest circle, convex hull and respective spanning tree, which were constructed using the method described by [33] in order to obtain m for each cluster.

The value of s is given by

2 N−1 N −→ −→ s = ∑ ∑ | r i − r j| (6) N(N−1)RN i=1 j=1+i

where ri is the vector position of point i, and RN is the radius of the smallest circle or convex hull that contains all points.

Correction of elongation

Some authors suggest to refine the definition of the cluster area by using the normalised convex hull rather than a circular or rectangular area around the objects.

It is not possible to be sure all clusters present a circular distribution, since many of the model clusters are strongly elongated, causing a large difference in the cluster area depending on whether it is defined by the enclosing circle or by the normalised convex hull. So, we calculated the elongation measure ξ as the ratio

ξ = Rcirc (7) Rhull proposed by [34]. A value ξ ≈ 1 indicates the distribution is almost circular, being more and more elongated as ξ increases.

In order to find the convex hull and its properties (area, equivalent radius, etc) of our clusters, we followed the algorithm proposed by [35]. The equivalent radius of a circle with the same area of the obtained convex hull replaces RN in expression 6. Fig. 5 presents a representation of both minimal circle and convex hull for cluster NGC 2264 as an example of application of the techniques explained above.

Dynamical parameters

Along with the cluster parameters described above, some other quantities are calculated. The description of each one is given in the following paragraphs.

Using artificial distributions of points, [9] performed a statistical analysis that gives inferences on the fractal dimension measured on grey-scale images generated by models of stars clusters and clouds. They used data sets varying from 64 to 65536 points, which give a parameters space of Q as a function of radial profiles (index

α) and fractal dimension (D3) that are similar to, but larger than the calculations performed by [7] and [36], for instance.

11 v.1, n.1, p.2-18, 2018.

Figure 5. Spanning tree, minimal circle and convex hull for cluster NGC 2264.

Despite the cluster radius available in the bibliography, we consider the effective radius originating from the smaller circle/convex hull calculations (Subsec. ). The position and distance are obtained from the average of respective properties of cluster members with membership > 50%. The age and mass of the cluster are also the average of age and the sum of the mass of the stars, respectively.

3 1/2 The crossing times of the clusters were estimated from the expression Tcr = 10(R /GM) , where M and R are the total mass and cluster radius. [37] suggested there is a boundary that distinguishes stellar groups under different dynamical conditions, which is expressed by the dynamical age, also called parameter Π, which is given by the ratio of age and crossing time. Unbound associations (expanding objects) have Π < 1, while bound star clusters have Π > 1.

DEVELOPMENT

Figure 6 presents an example of the first result in the IA pipeline. To each star it is attributed a membership probability, which determines the degree of certitude of the association with its cluster. The membership probabilities are calculated by the IA procedures (GA and CE) and the following calculations use this value to recognize the star as a cluster member. Figure 7 presents an other way of confirmation of the selection, by comparing one of the components of proper motion with the paralaxe of each star (distance).

The mass and age results are exemplified em figures 8 and 9. The histograms reflect the distribution of age and mass that were found as described in subsections above. Only the trusted membership stars are considered in these calculations.

12 v.1, n.1, p.2-18, 2018.

Figure 6. Results from GA for cluster NGC 2645. In the left panel are presented the stars in their original (GAIA) positions (units: degrees). The right panel presents the velocities of each star (units: miliarcseconds per year). In both panels, black dots are stars that are excluded from the analisys by some reason (bad error bars or some erroneous data), the other dots receive a color according to their membership probability (color scale at right) as stabilished by the IA procedures. The "green core"in the right panel is composed by the trusted membership stars.

Figure 7. Proper motion and paralaxe for cluster NGC 2645. The velocity in right ascencion is ploted against the paralaxe (distance) for the stars. The trusted membership stars (green dots) stand out as a vertical line near (units: miliarcsecond per year and ).

FINDINGS AND DISCUSSION

As this is an ongoing work, we do not have a definitive set of results. However, the scenario obtained up to the date of preparation of this document gives us some assurance that the presented values are very close to what is expected of this line of research.

Table 3 presents an overall view of the present results. These values are better appreciated if they are ploted in graphics. Figure 10 presents the behaviour of m and s, superimposed to the region stablished by [38] in their Fig. 4. These authors generated a huge amout of artificial star clusters using some techniques: box-fractal model, radial density profile model, and fractional Brownian motion model. The different models define some regions on the m × s plane and, as they are generated with known parameters, these regions can be used to

13 v.1, n.1, p.2-18, 2018.

Figure 8. Mass distribution for cluster NGC 2645. This histogram presents and distribuition of the masses of the stars. The black solid line represents the normal distribution, with its parameters at the top (units: solar masses).

Figure 9. Age distribution for cluster NGC 3572. This histogram presents and distribuition of the ages as found bye the IA routines. The black solid lines represent the best fit for normal distributions. In this case there an interesting result: two different gaussians are need to fit the data (see the Discussion Section).

14 v.1, n.1, p.2-18, 2018. Table 3. Overall view of the present results. Columns are: Q is the fractal parameter; m is the mean edge length; s is the mean separation of the points; n is the star density (calculated); Tcr is the crossing time; and Π is the dynamical age.

cluster Q m s age n Tcr Π (Myr) (pc−2) (Myr) Berkeley 86 0.737 ± 0.003 0.699 ± 0.003 0.948 ± 0.001 9.8 1.502 111 0.088 Collinder 205 0.823 ± 0.003 0.665 ± 0.005 0.808 ± 0.005 8.3 3.125 84 0.099 Hogg 10 0.781 ± 0.002 0.639 ± 0.003 0.818 ± 0.002 3.7 3.397 49 0.075 Hogg 22 0.737 ± 0.002 0.676 ± 0.003 0.917 ± 0.005 3.8 3.198 60 0.064 Lynga 14 0.835 ± 0.008 0.779 ± 0.011 0.933 ± 0.017 10.2 4.352 59 0.172 Markarian 38 0.787 ± 0.002 0.684 ± 0.002 0.869 ± 0.001 3.8 4.795 46 0.083 NGC 2244 0.781 ± 0.001 0.640 ± 0.001 0.819 ± 0.000 6.3 2.441 151 0.041 NGC 2264 0.819 ± 0.001 0.575 ± 0.001 0.702 ± 0.000 6.2 4.827 108 0.057 NGC 2302 0.814 ± 0.002 0.711 ± 0.002 0.874 ± 0.001 8.5 1.951 78 0.109 NGC 2362 0.827 ± 0.001 0.622 ± 0.001 0.753 ± 0.001 5.0 9.633 31 0.163 NGC 2367 0.800 ± 0.001 0.665 ± 0.001 0.831 ± 0.001 5.7 1.879 104 0.055 NGC 2645 0.812 ± 0.002 0.635 ± 0.002 0.782 ± 0.002 6.9 3.393 57 0.121 NGC 2659 0.823 ± 0.002 0.670 ± 0.002 0.814 ± 0.002 5.1 2.925 80 0.064 NGC 3572 0.787 ± 0.002 0.675 ± 0.002 0.858 ± 0.002 7.3 2.784 88 0.083 NGC 3590 0.665 ± 0.002 0.747 ± 0.002 1.124 ± 0.003 5.2 2.194 66 0.079 NGC 5606 0.753 ± 0.002 0.702 ± 0.003 0.933 ± 0.004 5.2 1.413 96 0.054 NGC 6178 0.748 ± 0.003 0.752 ± 0.004 1.004 ± 0.005 3.7 3.326 78 0.047 NGC 6530 0.828 ± 0.002 0.635 ± 0.002 0.766 ± 0.001 1.9 6.365 53 0.035 NGC 6604 0.777 ± 0.003 0.679 ± 0.005 0.873 ± 0.004 3.6 3.349 72 0.050 NGC 6613 0.831 ± 0.003 0.682 ± 0.003 0.820 ± 0.001 5.5 3.534 70 0.079 Ruprecht 79 0.762 ± 0.001 0.668 ± 0.002 0.877 ± 0.001 2.6 1.664 122 0.021 Stock 13 0.781 ± 0.002 0.739 ± 0.003 0.947 ± 0.003 5.5 1.555 121 0.045 Stock 16 0.710 ± 0.002 0.649 ± 0.002 0.913 ± 0.002 3.4 2.346 105 0.033 Trumpler 18 0.720 ± 0.001 0.689 ± 0.002 0.957 ± 0.001 4.1 1.217 146 0.028 Trumpler 28 0.792 ± 0.002 0.637 ± 0.002 0.804 ± 0.002 2.5 3.796 43 0.059

"measure"parameters of real (observed) clusters.

As can be seen on Fig. 10, some of the clusters (m, s) points fall out of the grey region, despite the fact that their Q = m/s are very close to their neighbours. Motivated by this result, we decided to make an extensive (not yet finalized) study on how the m and s react to variations of some of the other parameters.

The parameter whose effect was most significant is the number of members of the cluster. We conducted an experiment on the NGC 2244 cluster, on which we imposed a distance variation such that the weaker stars would no longer be detected by GAIA sensors. Thus, whenever the cluster moved farther apart, fewer stars were visible, and we measured the fractal parameters corresponding to this distance. Figure 11 presents a representation of this. As the number of stars decreases, there are minimal variations on Q, m and s while N remains greater than 100. For N < 100, the perturbations go larger and larger until the meaning of the fractal parameters became useless.

CONCLUSIONS

The use of artificial intelligence routines proved to be very effective in the calculation and identification of member stars in each cluster. In particular, Genetic Algorithm and Cross-Entropy techniques play an important role in accurately separating the stars belonging to each cluster of the background stars.

On the other hand, it was also evident the need to accurately evaluate the total number of stars in the cluster,

15 v.1, n.1, p.2-18, 2018.

Figure 10. m × s for the sample, superimposed to the region stablished by [Lomax et al. 2018] in their Fig. 4. This region is the locus of artificial star clusters with fractional Brownian motion, whose generating parameters are known. The continous line represents Q = 0.8, delimiting the regions of small-scale fractal subclustering from large-scale radial clustering. The dashed lines are ±10% ranges.

Figure 11. Behaviour of Q, m and s with the number of members.

16 v.1, n.1, p.2-18, 2018. since N < 100 can lead to degenerate fractals, although the value of Q remains consistent.

This research continues to advance, this being an initial yet consistent result. The next steps are to increase the number of clusters and improve our technique of choosing member stars.

ACKNOWLEDGEMENTS

This work was supported by FAPESP Proc. No. 2017/19458-8 (AHJ). This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Fun- ding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.

REFERENCES

[1] Lada, C. J., & Lada, E. A. 2003, Annual Review of Astronomy and Astrophysics, 41, 57 [2] Elmegreen, B. G., 2011, EAS Publications Series, 51, 31 [3] E. Olszewski (2006) ESA and NASA - University of Arizona - http://spacetelescope.org/images/heic0603b/). [4] Donald R. Pettit/NASA (2003) - International Space Station Imagery. https://spaceflight.nasa.gov/gallery/images/station/crew-6/html/iss006e40537.html [5] Gregorio-Hetem, J., Hetem, A., Santos-Silva, T., Fernandes, B., 2015, Monthly Notices of the Royal Astrono- mical Society, v. 448, p. 2504:2513 [6] Fernandes, B., Gregorio-Hetem, J., Hetem, A. 2012, Astronomy & Astrophysics, 541, A95 [7] Cartwright, A., & Whitworth, A. P., 2004, Monthly Notices of the Royal Astronomical Society, 348, 589:598 [8] Hetem, A. & Lépine, J. R. D., 1993, Astronomy & Astrophysics, 270, 451 [9] Lomax, O., Whitworth, P., & Cartwright, A. 2011, Monthly Notices of the Royal Astronomical Society, 412:627 [10] Davidge, T. J., 2017, Astrophysical Journal, 837-178 [11] Parker, R. J., & Dale, J. E., 2015, Monthly Notices of the Royal Astronomical Society, 451, 3664:3670 [12] Alfaro, E. J., & González, M., 2016, Monthly Notices of the Royal Astronomical Society, 456, 2900:2906 [13] Gaia Collaboration et al. (2016) The Gaia mission. Astronomy & Astrophysics, 595, pp. A1. [14] Jaffa, S. E., Whitworth, A. P., Lomax, O., 2017, Monthly Notices of the Royal Astronomical Society, 466, 1082:1092 [15] Cutri, R. M. et al., 2003, The IRSA 2MASS All-Sky Point Source Catalog, NASA/IPAC Infrared Science Archive [16] Sanders, W. L. 1971, Astronomy & Astrophysics, 14, 226 [17] Rubinstein, R. Y., 1997, Eur. J. Operat. Res., 99, 89 [18] Rubinstein, R. Y., 1999, J. Method. Comp. Appl. Prob., 1, 127 [19] Margolin, L., 2005, "On the Convergence of the Cross-Entropy Method", Annals of Operations Research 134, 201:214 [20] Kroese, D. P., P. S. & Rubinstein, R. Y., 2006, Methodol. Comput. Appl. Probab., 8, 383

17 v.1, n.1, p.2-18, 2018.

[21] de Boer, P. T., Kroese, D. P., & Rubinstein, R. Y., 2004. "A Fast Cross-Entropy Method for Estimating Buffer Overflows in Queueing Networks", Management Science 50(7), 883:895 [22] Monteiro, H., Dias, W. S., Caetano, T. C., 2010, Astronomy & Astrophysics, 516 A2 [23] Uribe, A., & Brieva, E. 1994, Ap&SS, 214, 171 [24] Dias, W. S., Monteiro, H., Caetano T. C., Lépine, J. R. D., Assafin, M. & Oliveira, A. F. 2014, Astronomy & Astrophysics 564, A79 [25] Koza J. R., 1992, Genetic Programming: On the Programming of Computers by Means of Natural Selec- tion. MIT Press [26] Koza J. R., 1994, Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press [27] Holland J. H., 1992, Adaptation in Natural and Artificial Systems. MIT Press [28] Hetem, A. & Gregorio-Hetem, J., 2007, Monthly Notices of the Royal Astronomical Society, 382, 1707:1818 [29] Siess L., Dufour E., Forestini M., 2000, Astronomy & Astrophysics, 358, 593 [30] Elmegreen, B. G., 2010, in Star Clusters: basic galactic building blocks, Proceedings of IAU Symp. No. 266, 2009, R. de Grijs & J. R. D. Lépine, eds., p. 3 [31] Megiddo, N. 1983, SIAM J. Comput., 12, 759 [32] Kruskal, J. B. J. 1956, Proc. Amer. Math. Soc., 7, 48 [33] Gower J.C., Ross G.J.S., 1969, Appl. Stat., 18, 54 [34] Schmeja, S. & Klessen, R. S. 2006, Astronomy & AstrophysicsP 449, 151 [35] Graham, R. L., 1972, Information Processing Letters 1, 132 [36] Sánchez, N., & Alfaro, E. J. 2009, Astrophysical Journal, 696:2086-2093 [37] Gieles M., Portegies Zwart S. F., 2011, Monthly Notices of the Royal Astronomical Society, 410, L6 [38] Lomax, O., Bates, M. L., & Whitworth, A. P., 2018, arXiv:1804.06844v3 [astro-ph.GA] 17 Jul 2018

18