Reconstruction of 3D surfaces from incomplete digitisations using statistical shape models for manufacturing processes

Digitization of large parts with tight geometric tolerances is a time-consuming process that requires a detailed scan of the outer surface and the acquisition and processing of massive data. In this work, we propose a methodology for fast digitization using a partial scan in which large regions remain unmeasured. Our approach capitalizes on a database of fully scanned parts from which we extract a low-dimensional description of the shape variability using Statistical Shape Analysis. This low-dimensional description allows an accurate representation of any sample in the database with few independent parameters. Therefore, we propose a reconstruction algorithm that takes as input an incomplete measurement (faster than a complete digitization), identifies the statistical shape parameters and outputs a full scan reconstruction. We showcase an application to the digitization of large aeronautical fuselage panels. A statistical shape model is constructed from a database of 793 shapes that were completely digitized, with a point cloud of about 16 million points for each shape. Tests carried out at the manufacturing facility showed an overall reduction in the digitization time by 80% (using a partial digitization of 3 million points per shape) while keeping a high accuracy (reconstruction precision of 0.1 mm) on the reconstructed surface.


Introduction
The so-called fourth industrial revolution, Industry 4.0, is undergoing a fast development thanks to the advances in Industrial Internet of Things and artificial intelligence  (Cohen et al. 2019). As a typical application of Industry 4.0, smart industrial systems includes a wide range of sensors and processors all over the manufacturing process (Kurfess et al. 2020) to monitor equipment and timely discover faults. In this framework artificial intelligence algorithms are used for automated quality control and surface defect inspection tasks in the manufacturing industry (Gao et al. 2006;Hao et al. 2020;Pimenov et al. 2018;Tabernik et al. 2020;Židek et al. 2020). In such applications, digitization plays an increasingly important role, as it is essential to create accurate digital representations of the as-manufactured part that could then be sent to software for quality assessment or defect detection (Teutsch 2007). In many industrial applications such as the one showcased in this paper, digitization takes place at an intermediate stage of the manufacturing chain and not necessarily on the finished product. Interestingly, the objective of intermediate digitization would not be to make a decision on whether to accept, reject or rework a part, but rather to adapt the subsequent manufacturing operations to the specifics of the part being produced in an automated way. Digitization, in combination with machine learning and artificial intelligence algorithms, has already demonstrated its potential to bring new levels of accuracy and productivity to the industry (Jack Feng and Wang 2002;Židek et al. 2020).
Despite the benefits it brings in terms of shape identification, we must include the digitization time in the total manufacturing workflow time, which includes the acquisition time plus the data processing time. Sometimes the digitization time may be negligible compared to the overall processing cycle. However, in many industrial cases digitization is concerned with large parts that have to meet tight geometric tolerances, requiring a detailed scan of the outer surface and the acquisition and processing of massive data (Drouot et al. 2018;Poulhaon et al. 2014). In this sense, the introduction of a complete digitization in the manufacturing process may have a negative effect in the total production time, which is in conflict with the desired requirements for the introduction of artificial intelligence in the industry (Lee et al. 2020).
In this work, we consider as an application the mechanical machining of large aeronautical fuselage panels which are about six meters long and two meters wide, with small thickness in the order of few millimeters and double curvature, see Fig. 1. These panels undergo a milling process to reduce the thickness locally in order to optimize the overall weightresistance ratio of the panel. Because of their pronounced slenderness, the milling process requires sophisticated technology, including special supports and the use of a mirror milling system (Panczuk and Foissac 2008), with a milling cutter on the machined face and a counter-bearing on the opposite side.
Thickness tolerances are in the order of 0.1 millimeters. Significant geometric variability is observed from one panel to another, even if they belong to the same batch. Presumably, variability originates from, at least, two sources: (i) previous manufacturing stages, including a metal forming process that precedes milling; and (ii) the positioning of the part on the support system, as the part is likely to deform significantly under its own weight. As a consequence, the milling toolpath cannot be designed according to a CAD model as it would not be an accurate representation of the real part. Instead, panels must be fully digitized before machining in order to adapt the milling path to their actual geometry. Both the acquisition time and data processing (generation of CAD files and adaptation of the machining path) have a negative impact on productivity.
In light of this practical case, it is reasonable to assume that there are similar issues in other manufacturing processes and even beyond the industrial sector. For example, the interested reader can refer to the references below related to the medical imaging community. Leaving aside possible technological improvements regarding the data acquisition procedure (see e.g. D'Apuzzo 2006; Khalfaoui et al. 2013;Mian and Al-Ahmari 2019;Qian et al. 2021 in this regard), it is straightforward that a reduced sampling of the surface leads to a shorter scanning time, at the cost of having an incomplete measurement of the panel. The challenge we face is that of designing a digitization methodology that takes as input an incomplete data acquisition of a surface in a reduced time, and outputs a complete reconstruction of the surface without negatively impacting accuracy.
The research field of surface reconstruction from digitally scanned shapes has achieved considerable progress in the last decades, with contributions mainly found in applications involving computer vision, e.g. Belongie et al. (2002), Pauly et al. (2005), Marton et al. (2009), Guo et al. (2013) and especially, the medical imaging and bioengineering community, see Heimann andMeinzer (2009), Wang et al. (2016), Bernard et al. (2017), Ballester et al. (2017), Zhang et al. (2017), Lauzeral et al. (2019) to cite only a few. All methods rely on some assumptions, or priors, that are imposed either on the scanned point cloud or on the scanned surface. A complete review of such methods and different priors can be found in Berger et al. (2014). One popular approach is that of Statistical Shape Analysis, which capitalizes on previously digitized samples of the same class (considered as training data) to build a statistical shape model (SSM). The SSM is then able to represent any shape in the object class by a mean shape and the variability observed in the training set. Among the different possibilities to build an SSM, we find Point Distribution Models (Cootes et al. 1992), in which the shapes are represented as a set of points or landmarks distributed over the surface. Within this approach, the creation of an SSM is achieved in three main steps: (i) registration, that is, the identification of the landmark points in each training shape; (ii) computation of the mean shape; and (iii) extraction of a low-dimensional description of the shape variability using Principal Component Analysis (PCA) (Chatterjee 2000;Jolliffe 2002;Jolliffe and Cadima 2016).
Previous works in surface reconstruction from incomplete data can be found in Guo et al. (2013), which uses a database of shapes for 2D contour completion and in Pauly et al. (2005) for 3D objects surface completion. In Shen et al. (2012) highquality 3D models are reconstructed from low-quality data by assembling templates from an object database. These works, however, assume that the incomplete regions are small, thus the objective is to fill in the gaps that may appear in the acquired data, whereas the aim of this work is to identify a particular shape by digitizing only a portion of its surface. The work in Bernard et al. (2017) presents a method for surface reconstruction from sparse point clouds, where the input consists of a reduced number of points which are registered onto the template of the SSM using a Gaussian mixture model approach. The interested reader is also referred to this last reference for an in-depth review of the state of the art in this topic. The idea of building object reconstructions from reduced samplings is already present in Bernard et al. (2017), however, the digitization outputs in high accuracy manufac- turing, which is the main scope of this work, usually consist of clouds with a large amount of data points. Indeed, even with a partial scan of the surface, the number of scanned points is in the order of millions. Another relevant difference is that the existing literature on surface reconstruction deals with closed surfaces of 3D solid objects (Lauzeral et al. 2019), whereas our interest relies on curved shapes that can be treated as 2D surfaces in a 3-dimensional space.
In this work we propose a methodology for fast 3D surface digitization that works as follows: first, an SSM is computed from a database of full samples, using prior information of the surface's boundary. Then each new sample is partially digitized and reconstructed by fitting the SSM to the partial measure solving a non-linear minimization problem. After the new surface is identified in the SSM, the proposed methodology outputs a reconstruction point cloud that resembles that of a complete surface scan. With this choice, the introduction of our proposed methodology in the manufacturing workflow is straightforward, as all the following data processing phases in the manufacturing line remain unchanged, (see Fig. 2).
The article is structured as follows: in "Data" section we introduce the data used in this work together with the relevant features of the data acquisition. The details of the SSM creation and reconstruction algorithm are presented in "Methodology" section, together with error and validation metrics definitions. Finally, the numerical results concerning the application case are shown in "Results" and "Conclusions" sections contains the conclusions.

Data
All data used in this work has been provided by Stelia Aerospace (St. Nazaire, France). In this section, we present the data acquisition process and then the different datasets used in this work, together with its features. We denote a point in the 3D space as x = {x, y, z} ∈ R 3 , and a 3D cloud of points corresponding to a panel shape is represented with capital letters, S = {x 1 , ..., x n } T ∈ R n×3 . The Euclidean distance is denoted by · .

Data acquisition method and data format
The data acquisition process is as follows: a laser beam travels along a predefined path that sweeps the panel, storing 3D coordinates of points on the panel's surface. The beam is 7.5 cm wide, it reads 400-450 points at a time (acquisition step), and the device performs acquisition steps at a constant frequency as it travels along the measured surface. A complete scan of a panel requires a scanning path around 180 m long, yielding a point cloud of approximately 16 million points. This data is then stored in a file that contains the coordinates of the entire point cloud with a numerical precision of 0.001mm, which will be the lower bound when measuring the reconstruction error in "Results" section. All points in the sample are considered for the statistical shape model construction, and the proposed reconstruction algorithm will provide surface reconstructions with the same point density as that of a completely scanned panel. To help the shape's location in the 3D space, the component in consideration contains two small holes that are drilled for aligning purposes during the different manufacturing processes undergoing, serving as reference marks. The coordinates of both marks are obtained during the scanning phase, and they are denoted as Φ 12 and Φ 14 (see Fig. 3a).

Datasets
Three different datasets have been considered along with this work, and are referenced as complete scan, partial sampling, Fig. 2 Scheme of the original manufacturing workflow, in blue, where a complete digitization of the surface is performed before machining due to high quality requirements. The proposed modification, consisting of a partial digitization and a shape reconstruction Algorithm, is highlighted in green. In order to be as less intrusive as possible in the manufacturing workflow, the output of the proposed fast digitization methodology tries to resemble the point cloud of a complete scan acquisition (Color figure  online) and verification datasets. The features of each one are the following: Complete scan dataset This is the main database, consisting of 746 panels that were completely scanned at Stelia Aerospace's workshop between 2014 and 2019 from 7 different laser data acquisition sets working in parallel. The data is considered to be homogeneous, this is, no distinctions are made between each acquisition set. A thorough review of the raw database was performed to dismiss corrupted data. Overall, there were missing files in 53 samples (either the point cloud file or any of the reference markers' locations), and 138 samples were found to have flaws in the acquired data (either point cloud with empty regions or incoherent measures). All these defective samples were dismissed, having a final training set consisting of n s = 555 panels.
Partial sampling dataset To test the reconstruction algorithm, a partial sample dataset is built from the complete scan database. The partial scan trajectory ( Fig. 3b) has been arbitrarily chosen, partially covering all the panel's surface with a trajectory similar to a triangular wave. This has been selected to fulfill the technological constraints of the laser data acquisition devices at Stelia Aerospace, so that an insitu validation can be performed. This partial scan path is near 35 m long and provides an incomplete point cloud of 3.5 million points, this is, roughly a 20% of the data contained in a complete scan.

Verification dataset
The last set consists of 9 new panels, different from those in the complete scan dataset, which are used in the verification of the methodology. These samples have been digitized twice, using both the complete and partial scan strategies. As the data comes from the same sample, the reference landmarks Φ 12 , Φ 14 should have the same coordinates in the partial and complete files of the same sample. However, we have noticed differences around 0.1 mm in these values between the partial and complete data. This is prob-ably because the scanning machine resets its home position between scans, and that process may introduce a small offset in the local reference system. Despite those differences being small with respect to the total size of the component, they will be relevant when evaluating the reconstruction, which is meant to be within the same order of magnitude.

Methodology
The method in this work is divided into two stages: first, an SSM of the panel is built in an offline stage, and then the online stage generates a reconstruction from an incomplete panel measure using the shape parametrization. (a) Alignment phase. A point cloud is arbitrarily chosen to serve as template shape (S 0 ). Then, the alignment of each sample in the training set (target shape) is performed in two steps: a rigid registration of the target onto the template followed by a non-rigid registration of the template onto the target. The output of this step is the template's deformation field such that it fits each target shape. (b) A dimensionality reduction step, using techniques such as the Principal Component Analysis (PCA) (Jolliffe 2002), to extract a low-dimensional description of the shape variability. Specifically, the principal modes of deformation of the template are extracted from the registered data. It is worth mentioning that the PCA returns an ordered set of modes according to how much each of them contributes to explain the input data. Therefore, the user can establish a tradeoff between accuracy and model size, expressed by the number of modes considered. By selecting the set of modes that best describe variability, the PCA provides a statistical description of any shape with a restricted number of modes. New shapes x can be written as the sum of the mean componentx plus a linear combination of the selected principal modes φ i and its shape parameters α i (Jolliffe 2002): 2. Online stage panel reconstruction. At the online stage, the input data is an incomplete scan of a geometry outside the SSM's samples, thus the online phase is divided into two steps: shape parameter identification and geometry reconstruction.
(a) The shape parameters in Eq. (1) corresponding to the new geometry are identified through an iterative minimization problem. As a result, we obtain a deformation field of S 0 that fits the shape of the partially scanned sample. (b) After obtaining the shape parameters that best fit the incomplete data, the SSM is particularized, with those parameters, at a fine cloud of arbitrarily distributed points to create a virtual reconstruction of the sample with the same format of a completely scanned panel. This is called the oversampling step.
The offline and online phases are now detailed in "Offline training" and "Online reconstruction" sections respectively.

Offline training
The purpose of this stage is to build a Statistical Shape Model from a set of previously scanned complete shapes, which is our training set. Following the PDM building procedure, we first align all shapes in the training set to an arbitrarily chosen template shape. We divide this alignment into rigid and nonrigid registration steps. Then the final SSM is obtained after a dimensionality reduction step.

Rigid registration
Each of the 7 different scanning machines used to acquire all the samples in the database has its own coordinate system, which introduces considerable differences in the scanned coordinates of each panel. The rigid registration step aims to minimize rotations and displacements between samples so that the differences are mainly due to shape variations. A simple rigid registration can be performed using the reference marks Φ 12 , Φ 14 , the samples are aligned so that Φ 14 is located at the origin of the new reference system, and Φ 12 is located on the x-axis. Let x s be the coordinate of a point belonging to a given sample, then the rigid transformation is written as: where θ and β are the angles between the segment Φ 14 −Φ 12 and the XY , X Z planes respectively, with their corresponding rotation matrices R θ and R β .

Template shape
The construction of the SSM requires an arbitrarily chosen template shape, denoted as S 0 . The geometry under consideration is a nearly rectangular-shaped shell which, despite having a curvature, is mainly aligned with the reference coordinate system, z being the out-of-plane direction. Considering this topology we define the template as a flat surface on the x y plane. After performing the rigid registration on all samples of the training set, we define a rectangular bounding box that covers all {x, y} coordinates (see Fig. 4). The bounding box is then meshed with a regular structured grid of 1000 by 250 nodes, thus S 0 ∈ R 2×n S 0 , with n S 0 = 250000.

Non-rigid registration
From now on and to simplify the notation we will use the following decomposition for any point in the space: x ≡ { p, z}, with p representing the 2D coordinates {x, y}. Let T i be the i-th sample in the complete scan dataset. The objective in this step is the creation of a mapping a i : R 2 → R 3 such that a i (S 0 ) fits the target shape T i . Following again the in-plane / out-of-plane approach we divide the non-rigid registration into two phases: first we obtain a mapping f 1 : R 2 → R 2 that fits the in-plane coordinates p i , then we compute a scalar mapping f 2 : R 2 → R such that f 2 • f 1 fits z i . The complete non-rigid registration mapping can be written as: XY registration The first step of the non-rigid registration is performed using the thin plate spline model (Duchon 1977;Meinguet 1979). This is an efficient tool commonly used for shape matching tasks in the framework of object recognition methods (Belongie et al. 2002;Bookstein 1997;Lauzeral et al. 2019). In its 2D version, the TPS defines a function f (x, y) that minimizes the bending energy (Duchon 1977;Meinguet 1979): subject to f ( p i ) = p i , where i = 1, ..., N are some previously defined correspondences between the target shape p i and its transformations p i , also known as landmark points.
There are different options to establish those landmarks when there is no prior knowledge of the target shape available, such as closest point metrics (Lauzeral et al. 2019) or the shape context (Belongie et al. 2002), which establish as landmarks a set of points arbitrarily distributed at the shape's boundary.
In both works, the correspondences are set in an iterative process that alternates between setting the landmarks and solving the TPS problem (4) until convergence is reached. The landmarks can also be directly specified if previous knowledge of the shapes being matched is available, with the benefit of avoiding the iterative process. In the application case at hand, the reference marks {Φ 12 , Φ 14 } provide very little information of the shape to perform a non-rigid registration with high accuracy in terms of shape match. However, as the panel's shape is simple (the projection on the x, y plane is a slightly deformed rectangle), we can set the landmarks to solve the TPS problem as follows. The boundary of the panel is the convex hull of all the plane coordinates p T ∈ T (Fig. 4a).
The first landmarks are the corners of the square-shaped target T i , which are found using a corner detection algorithm and matched to the correspondent corners of the template S 0 . Then we can re-parametrize the target's boundary so that it has the same amount of points as that of the template's boundary. This provides enough landmarks to solve the TPS problem (4) and obtain f 1 (S 0 ). An example of this mapping is shown in Fig. 4, where the template's mesh has been coarsened for the sake of clarity.

Z registration
To present the second phase in the non-rigid registration step we describe the computation of f 2 • f 1 at a given node n ∈ S 0 with coordinates p n = f 1 p n , which is illustrated in Fig. 5a. First we identify all points P = { p ∈ T : p − p n < r }, within the searching radius r = h √ 2/2, and h being a representative element size of S 0 (in this application, h = 6.3 mm). The coordinate z n is then computed as a weighted average of all z i ∈ P and using a Gaussian filter (Shapiro and Stockman 2000): where we have considered a standard deviation σ = r /2. The data acquisition procedure presented some specific features that influenced the quality of the z-coordinate registration near the panel boundaries. On one hand, the samples analyzed in this work contained a much higher point density near the panel boundaries (as a consequence of the scanning device travelling at a slower speed near the boundaries while keeping the acquisition frequency constant), and on the other hand, the scan device's trajectory produced a saw-toothed pattern of the boundary instead of the smooth curve of the actual panel, as shown in the detail of Fig. 5b. To address these 1. The Gaussian filter's standard deviation is reduced to σ = r /8 for nodes along the boundary of S 0 , and σ = r /6 for nodes at the two following inner layers. With this measure, we give more importance to the points located near p n without reducing the number of considered points, since there is a higher number of points in the boundary. 2. The nodes at the boundary of S 0 laying outside the sawtoothed pattern don't have any point within a distance r . The non-rigid registration at these nodes is computed employing a least squares fitting of the z values obtained at the adjacent nodes.

Statistical shape model: dimensionality reduction
When all targets are registered into S 0 we build the matrix A = [a 1 , ..., a n s ], and compute the average displacement fieldx. Then the SVD is applied to matrix A, which is obtained by subtractingx from A. Finally, the shape parametrization x of the shape generation model is written as Jolliffe (2002), Lauzeral et al. (2019): where we have used compact notation, α ∈ R m being the m shape parameters and Φ the truncated basis with [φ 1 , ..., φ m ] modes. The SSM is defined at the discrete set of points in the template's mesh S 0 . However, the online reconstruction stage, which is presented in the following section, requires a continuous definition of the SSM. This can be easily obtained by including the traditional FE interpolation in Eq. (6), with linear shape functions. Thus, the SSM at a given point p inside the template shape S 0 with interpolation coordinates ξ p ∈ R 2 is computed as: where B is the linear interpolation operator.

Online reconstruction
Let T be a new sample outside the SSM original data from which only a partial scan with coordinates X t ∈ T is available. The goal is to generate a reconstruction of the panel from the incomplete data acquired using the SSM from the offline training. This is achieved in two steps: identification of the shape parameters and oversampling.

Shape identification from a partial scan
The location of the reference marks {Φ 12,t , Φ 14,t } is also available in the new sample T , thus the rigid registration in Eq.
(2) is applied to obtain x t . Now we must identify the shape parameters of T that best fit the partial measures x t . For this, the following minimization problem is proposed: Introducing Eq. (7) in (8), and using the simplified notation B t = B ξ t , we can obtain the solution of the minimization problem: The problem in (8) is non linear, as the interpolation coordinates ξ t corresponding to the points x t also depend on the unknowns α. In other words, while the p t coordinates remain constant the underlying mesh S 0 is being modified by α, so the sample's points can change their location in the interpolation. For the solution of the problem, we propose a fixed-point procedure presented in Algorithm 1. After setting an initial guess α 0 the mapping ϕ(α) is defined, and the interpolation coordinates of the sample's points, ξ t , can be directly obtained. Having the interpolation coordinates Eq. (9) becomes a linear system of m equations with m unknowns that can be easily solved, yielding new shape parameters. The process is repeated until stagnation of the shape parameters.

Algorithm 1 Fixed-point iteration to identify shape parameters
Find the interpolation coordinates ξ t Evaluate B t Solve α i Eq. (9) end while We propose two different alternatives to set the initial guess α 0 : 1. Assume α 0 = 0. This is equivalent to setting the mean value of the reduced basis,x, as the initial guess for the SSM in (6). 2. We also propose a more problem-specific approach. We first obtain the perimeter of the partially scanned sample as the convex hull of the p t coordinates. For these points, we can obtain interpolation coordinates assuming the same procedure as in the non-rigid registration ("XY registration" section) for the generation of landmarks. Then the problem in Eq. (9) can be solved considering only those points along the boundary to obtain the initial values α 0 .

Oversampling
Having solved problem (8), the SSM's parameters α t are identified, and the mapping ϕ(α t ) is fitted to represent the partially scanned sample T . We can create a new point cloud x r that resembles that of a complete scanned sample (i.e. with around 16 million points), as this data is later used in the manufacturing workflow of the panel. For that purpose, we evaluate and store the interpolation coordinates ξ re f of a sample arbitrarily chosen from the database used to generate the SSM in the offline stage. Then, the SSM is evaluated at the reference coordinates and the rigid registration transformation in Eq.

Validation and error metrics
The purpose of the validation is twofold. First, to ensure the quality of the SSM obtained from the complete scan dataset. Second, to verify the reconstruction algorithm from an incomplete sample. Here we present the metrics used for validation, and the results are shown in "Results" section.

SSM performance metrics
The quality of the SSM is evaluated with the compactness, generalization, and specificity metrics introduced in Davies (2002). The compactness is defined as: with λ i being the i-th proper value associated with the SVD. This metric represents the contribution of each additional shape parameter in the reduced model. The fewer parameters required to properly describe any shape, the more compact the model is.
The generalization can be seen as the ability of the SSM to represent a shape out of the training set. This is evaluated using a leave-one-out approach: a sample T from the complete scan dataset is removed and a new SSM is built using the n T − 1 remaining shapes. Then, T is registered to obtain a t , and the shape parameters corresponding to the new SSM can be obtained as follows: where Φ m stands for the truncated basis with the corresponding amount of modes m andx the average field. A reconstruction T (m) of sample T is computed using Eq. (6) and the estimated shape parameters. Finally, the approximation error is measured as the average point-wise distance between all n s 0 points in the reconstruction and the sample registration: This process is then repeated for each sample in the complete scan dataset, and the generalization is calculated as a function of the number of modes as in Davies (2002): Finally, the specificity measures the ability of the SSM to generate shapes that are similar to those in the training set. A random set of N shapes R s is generated, with a variable number of modes m. Then, for each shape S i (m) ∈ R s the distance to the closest registered sample in the complete dataset T s is computed as in Eq. (13), and the specificity is written as Davies (2002):

Reconstruction error metrics
To evaluate the quality of the panel reconstruction algorithm we will consider a complete scan C and a partial scan P.
The reconstruction process will output a new reconstructed point cloud, R. The complete and partial samples may come from the complete dataset, with its respective sample in the partial sampling dataset, or from the complete and partial scans in the verification dataset. As noted in section the data in the verification dataset presents some differences in the reference landmarks' Φ 12 , Φ 14 location, which would introduce an artificial source of error, since in a real production workflow a sample wouldn't be scanned twice. To exclude this error source from the reconstruction error metrics we will evaluate the reconstruction error after performing the rigid registration step in "Rigid registration" section where all shapes are aligned, thus we will work with the coordinates x c and x r . The error between the reconstruction R and the complete scan C can be defined as the distance between both shapes, which we denote as Dist(C, R). The following definition for the distance between two surfaces was used in Xu et al. (2013), Lauzeral et al. (2019) for SSM validations: where N c and N r stand for the number of points at C and R respectively. Although being a robust, unbiased function, this method has a high computational cost as the number of points increases. Due to the high density of the point clouds involved in this application (around 16 million), we propose an alternative definition that takes advantage again of the geometry's shape to reduce the computational cost. First we define ϕ s ≡ { p s , z s } as the solution of problem (8) applied to obtain the reconstruction R. Then an interpolant of the z-coordinates is built using again the linear interpolation operator B: Finally, the distance function between the reconstructed and completely scanned shapes is defined as: Using this distance function and the partial sampling dataset we introduce an alternative computation of the generalization metric, G * (m), which considers not only the SSM but also the reconstruction algorithm's performance in identifying shapes that are outside the training dataset. For that, we will use the same leave-one-out approach, but now the approximation error is computed between each sample C i in the complete scan dataset and a reconstruction R i (m) built from its corresponding sample in the partial sampling dataset P i using m modes. Therefore, the new generalization metric can be written as: where the distance between shapes is calculated using Eq. (18). With this measure, we can evaluate the quality of the complete process, which includes the registration, the model order reduction, and the non-linear reconstruction algorithm.

Results
The performance of the proposed method has been analysed in three different ways: 1. Via compactness, which measures the effect of truncation in the ability of the model to reproduce the training dataset. 2. Via generalization, which measures the ability of the model to represent unseen data using a leave-one-out procedure. 3. Via in-situ verification tests, which demonstrate the ability of the methodology to perform in a relevant environment, as the resulting output of the proposed methodology has been introduced in the actual manufacturing workflow, with successful results in the following quality control procedures.
The results concerning the SSM performance are shown in "SSM performance evaluation" section, whereas the in-situ verification test is presented in "Online shape reconstruction verification test" section.

SSM performance evaluation
The SSM metrics are presented in Figs. 6 and 7. All these results were calculated using the complete scan dataset, with the partial sampling dataset being also used for the computation of G * (m) and a random set of 1000 generated shapes for the computation of the specificity. The compactness graph indicates that starting at around 50 modes the contribution of each additional parameter drastically diminishes. Metrics S(m), G(m) and G * (m) were calculated for m = {1, 2, 5, 10, 20, 50, 100, 200, 500}. In this test, when solving the non-linear minimization problem in Eq. (9) the boundary-specific approach was used to set the initial guess of the shape parameters. A comparison between the proposed alternatives is shown in the following section. Both generalization factors show a similar trend, with a logarithmic descent of the average error up to 0.044 and 0.053mm respectively. This means that the reconstruction algorithm from a partial sample provides reliable results, similar to projecting the intrinsic ability of the SSM. The bars in the graphs represent one standard deviation. Although in both cases the variability decreases with the number of modes, the G * (m) metric presents always higher variability than G(m). In both cases the generalization metrics are stabilized at m = 100 modes, showing a slight improvement for 200 and 500 modes. In light of these results, it seems appropriate to choose a truncated basis with 100 modes for this application, which represents a balance between accuracy and computational cost.

Online shape reconstruction verification test
After testing the performance of the method with the complete scan and the virtually generated partial sampling datasets, in this section we validate the presented methodology in an industrial environment. For this, 9 different samples were digitized twice in the manufacturing plant, both partially and completely (for error measurements), as presented in "Datasets" section. These samples form the verification dataset. Then, for each panel in the verification dataset, we obtained a surface reconstruction from the partial digitization using the algorithm proposed in this work, and the output reconstruction point cloud was introduced in the manufacturing workflow, substituting the complete digitization point cloud, as presented in Fig. 2. The experiment is designed and executed as follows for each panel: (i) run the full scan program (ii) stop the execution and enter in manual execution mode, to avoid the milling step to be started after scanning (iii) log on the industrial computer and download a copy of the full scan for later comparison (iv) load the reduced scan program (v) log on the industrial computer and download a copy of the partial scan (vi) give the partial scan as input to our reconstruction subroutines, running on a laptop (vii) compare the reconstructed to the fully scanned panel (viii) transfer a copy of the reconstructed panel to the industrial computer (ix) go back to automated execution mode by moving to the next manufacturing step (milling).
The graph in Fig. 8 shows the reconstruction error between the complete scans and its reconstruction for all samples in the verification test, using the distance function defined in Eq. (18). The performance of both alternatives for the initial guess to solve the non-linear problem presented in "Shape identification from a partial scan" section is also compared. The horizontal black line in the graph represents the generalization metric of the SSM for 100 modes, to serve as a reference.
The use of boundary information to build the initial guess provides a better reconstruction overall, with error values near the generalization ability of the SSM. Although using the mean panel as an initial guess (i.e. α 0 = 0) provides slightly worse results for most of the samples, there is a considerable discrepancy in samples 4, 7, and 9. Figure 9 shows the reconstruction error color map for sample 9 using both strategies. The oversampling solution using the mean panel clearly fails to adapt along the boundary of the surface, and 1.6% of the points in the oversampling cloud have reconstruction errors over 1 mm, whereas the resulting cloud in Fig. 9a has no points with such error levels. Similar behavior is found in samples 4 and 7. Finally, Fig. 10 shows the reconstruction error color map for all panels in the verification dataset using the boundary-specific initial guess. Fig. 9 Verification dataset, sample 9. Reconstruction error using boundary-specific procedure a and mean panel b as the initial guess of the fixed-point algorithm

Discussion
The typical applications of statistical shape models are found in 2D images and 3D volumes with closed surfaces. The application in this work presents a new challenge, which is the identification of an open surface in a 3D space. Whereas the construction of Statistical Shape Models for 3D objects can be achieved by properly detecting the boundary of the solid and then performing a dimension reduction (as in Lauzeral et al. 2019 for example), that is not enough when it comes to open surfaces in a three-dimensional space. In this case the contours of the open surface must be identified, there is no such solid's bulk, and the surface's curvature cannot be captured with the identification of its contours. The proposed method is designed to solve this new challenge by building the SSM in two steps: first, identification of the surface's contours and then mapping of the interior curvatures. The performed tests in "Results" section show that the quality of the generated SSM is in the order of 0.01mm, this is just one order of magnitude over the data numerical precision, thus we have achieved the maximum possible accuracy to describe any shape in the provided database, with a reduced basis of 100 modes in this particular case. Regarding the surface reconstruction from a partially digitized sample, we compared the performance of the proposed algorithm using different strategies for the initial guess in the fixed-point iteration in the test using the panels from the verification dataset. The results show that the identification of the surface's boundary as initial step of the non-linear problem solution (presented in "Shape identification from a partial scan" section) is crucial to obtain an accurate shape parameter identification. Not only because the average reconstruction error is lower in all test cases, but also because using the mean panel of the SSM as initial guess can lead to stagnation of the solver at local minima with worse adaption along the boundaries of the surface, resulting in a higher reconstruction error. The case of sample number 6 in Fig. 10f is worth to remark, since the performance of the reconstruction algorithm was clearly worse in that particular case when compared to the other samples. The average reconstruction error was significantly higher, and the error map showed a vertical line pattern with higher reconstruction error. After a thorough study of that sample by the industrial partner, an error in the work-shop's scanning device was detected, thus the complete scan of that sample wasn't representing its actual surface. Therefore, we think that the methodology presented in this work could also open the possibility of using the part's SSM as an aditional tool to check for anomalies in the manufacturing or digitization workflow.
In the performed tests we observed that using 100 modes in the initial fixed-point step when using only the contour of the partial digitization may lead to overfitting problems. This is because, in that particular case, the number of points used to solve the fitting problem in (8) is considerably lower than that of the whole partial sample. To avoid this problem we performed the initial iteration of the fixed-point algorithm using only 10 modes and the boundary of the partial sample in the tests shown in this work, which lead to satisfactory results.
The total reconstruction error of the proposed reconstruction algorithm from a partial sample is slightly higher than the generalization ability of the SSM (which is measured with complete samples), with values between 0.05 and 0.1mm. Although we have successfully achieved high-quality shape reconstructions from incomplete data, we think that the trajectory of the partial scan (Fig. 3b), which was arbitrarily chosen, might affect in the total reconstruction error, and it also explains the slightly worse generalization (and higher variability) of the SSM when it is taken into account (Fig.  7), since that trajectory is not related to the features existing in the computed SSM. The choice of an optimal set of measuring locations [using for example the EIM ( Barrault et al. 2004;Chaturantabut and Sorensen 2010) or a data-driven approach (Poulhaon et al. 2014)] for the SSM would lead to lower variability in the surface reconstruction algorithm.
Although we have presented this surface reconstruction methodology using a specific application case of an aircraft manufacturing component, the proposed approach can be extended for the reconstruction of any other surface, provided there is a database of completely digitized samples to build a Statistical Shape Model, and enough information of the surface's boundary topology for the non-rigid registration step in "Non-rigid registration" section. After building the SSM that captures the variability of the new surface, the application of the presented reconstruction algorithm is straightforward.
Finally, concerning the computational cost of this method, an implementation in MATLAB (The Mathworks Inc., USA) takes 1 second per iteration in the fixed-point algorithm. The shape identification problem required between 15 and 20 iterations for all samples, and the whole reconstruction algorithm, from reading the input file to obtaining the oversampling reconstruction, takes between 35 and 45 s, depending on the number of iterations. Considering the time spent in the partial scan acquisition and the reconstruction algorithm the proposed method can reach an 80% reduction in the surface digitization time, according to estimations by Stelia Aerospace's experts.

Conclusions
We presented a methodology for 3D surface reconstruction combining partial sampling and SSMs. For the computation of the SSM, we make use of prior known information of the surface, especially in which concerns boundary detection and mapping. We propose a non-linear fitting problem to identify the shape parameters of a partially scanned sample, which is solved with a fixed-point algorithm. The results show that the obtained SSM can describe the surfaces using 100 modes, with an accuracy only one order of magnitude higher than the scanning device's numerical precision. The fixed-point algorithm's efficiency is increased if the boundary of the partial sample is used for the initial iteration, and the total reconstruction error of the proposed reconstruction algorithm, this is, including SSM and fixed-point algorithm, remains between 0.05 and 0.1mm of average error reconstruction for all tested samples.
Whereas typical applications of statistical shape models are found in 2D images and 3D volumes with closed surfaces, we present a method for the fast digitization of open surfaces in a 3D space using incomplete measurements, which are obtained in a significantly shorter time than complete digitizations, achieving high-quality final reconstructed shapes. The implementation of this methodology in an industrial environment achieved an estimated time reduction of 80% in the shown application case, provided that there exists a database with samples that have been completely digitized. The search of optimal partial measuring strategies for the surface reconstruction algorithm will be studied in future works, together with the possibility of building the SSM as panel data is collected, this is, without having a previously obtained sample database.