Comparative Analysis of Predictive Performance in Nonparametric Functional Regression: A Case Study of Spectrometric Fat Content Prediction
DOI:
https://doi.org/10.6000/1929-6029.2023.12.22Keywords:
Nonparametric regression, Functional data, Kernel function, Functional covariates, KNN estimator, Semi-metricsAbstract
Objective: This research aims to compare two nonparametric functional regression models, the Kernel Model and the K-Nearest Neighbor (KNN) Model, with a focus on predicting scalar responses from functional covariates. Two semi-metrics, one based on second derivatives and the other on Functional Principle Component Analysis, are employed for prediction. The study assesses the accuracy of these models by computing Mean Square Errors (MSE) and provides practical applications for illustration.
Method: The study delves into the realm of nonparametric functional regression, where the response variable (Y) is scalar, and the covariate variable (x) is a function. The Kernel Model, known as funopare.kernel.cv, and the KNN Model, termed funopare.knn.gcv, are used for prediction. The Kernel Model employs automatic bandwidth selection via Cross-Validation, while the KNN Model employs a global smoothing parameter. The performance of both models is evaluated using MSE, considering two different semi-metrics.
Results: The results indicate that the KNN Model outperforms the Kernel Model in terms of prediction accuracy, as supported by the computed MSE. The choice of semi-metric, whether based on second derivatives or Functional Principle Component Analysis, impacts the model's performance. Two real-world applications, Spectrometric Data for predicting fat content and Canadian Weather Station data for predicting precipitation, demonstrate the practicality and utility of the models.
Conclusion: This research provides valuable insights into nonparametric functional regression methods for predicting scalar responses from functional covariates. The KNN Model, when compared to the Kernel Model, offers superior predictive performance. The selection of an appropriate semi-metric is essential for model accuracy. Future research may explore the extension of these models to cases involving multivariate responses and consider interactions between response components.
References
Ramsay J, Silverman BW. Functional Data Analysis, Springer, New York 1997. https://doi.org/10.1007/978-1-4757-7107-7 DOI: https://doi.org/10.1007/978-1-4757-7107-7
Ramsay J, Silverman BW. Functional Data Analysis, Second ed., Springer, New York 2005. https://doi.org/10.1007/b98888 DOI: https://doi.org/10.1007/b98888
Cao G, Wang S, Wang L. Estimation and inference for functional linear regression models with partially varying regression coefficients. Stat 9(1) (2020): p.e286Montgomery AA, Peters TJ, Little P. Design, analysis and presentation of factorial randomised controlled trials. BMC Medical Research Methodology 2003; 3(1): 1-5. https://doi.org/10.1002/sta4.286 DOI: https://doi.org/10.1002/sta4.286
Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica 2003; 13(3): 571-592.
Ramsay J, Silverman BW. Applied Functional Data Analysis: Methods and case studies, Springer, New York 2002. https://doi.org/10.1007/b98886 DOI: https://doi.org/10.1007/b98886
Mahalanobis PC. A method of fractile graphical analysis. Sankhy: The Indian Journal of Statistics Series A 1961; 23(1): 41-64.
Nadaraya EA. On estimating regression. Theory of Probability and Its Applications 1964; 9(1): 141-142. https://doi.org/10.1137/1109020 DOI: https://doi.org/10.1137/1109020
Watson GS. Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A 1964; pp. 359-372.
H¨ardle W. Applied nonparametric regression, Cambridge university press, UK 1990. https://doi.org/10.1017/CCOL0521382483 DOI: https://doi.org/10.1017/CCOL0521382483
Ferraty F, Vieu P. Nonparametric functional data analysis: theory and practice. Springer Science, New York 2006.
Omar KMT, Wang B. Nonparametric regression method with functional covariates and multivariate response. Communications in Statistics-Theory and Methods 2019; 48(2): 368-380. https://doi.org/10.1080/03610926.2017.1410716 DOI: https://doi.org/10.1080/03610926.2017.1410716
Midi H, Sani M, Ismaeel SS, Arasan J. Fast Improvised Influential Distance for the Identification of Influential Observations in Multiple Linear Regression. Sains Malaysiana 2021; 50(7): 2085-2094. https://doi.org/10.17576/jsm-2021-5007-22 DOI: https://doi.org/10.17576/jsm-2021-5007-22
Ferraty F, Vieu P. Curves discrimination: a nonparametric functional approach. Computational Statistics and Data Analysis 2003; 44(1): 161-173. https://doi.org/10.1016/S0167-9473(03)00032-X DOI: https://doi.org/10.1016/S0167-9473(03)00032-X
Ferraty F, Mas A, Vieu P. Nonparametric regression on functional data: inference and practical aspects. Australian and New Zealand Journal of Statistics 2007; 49(3): 267-286. https://doi.org/10.1111/j.1467-842X.2007.00480.x DOI: https://doi.org/10.1111/j.1467-842X.2007.00480.x
Rachdi M, Vieu P. Nonparametric regression for functional data: automatic smoothing parameter selection. Journal of Statistical Planning and Inference 2007; 137(9): 2784-2801. https://doi.org/10.1016/j.jspi.2006.10.001 DOI: https://doi.org/10.1016/j.jspi.2006.10.001
Ismaeel SS, Omar KMT, Wang B. K-nearest neighbor method with principal component analysis for functional nonparametric regression. Baghdad Science Journal 2022; 19(6 (Suppl.)): 1612. https://doi.org/10.21123/bsj.2022.6476 DOI: https://doi.org/10.21123/bsj.2022.6476
Doori A. Hazard Rate Estimation Using Varying Kernel Function for Censored Data Type I. Baghdad Science Journal 2019; 16(3 (Suppl.)): 0793-0793.
Ferraty F, Vieu P. Nonparametric models for functional data, with application in regression, time series prediction and curve discrimination. Nonparametric Statistics 2004; 16(1-2): 111-125. https://doi.org/10.1080/10485250310001622686 DOI: https://doi.org/10.1080/10485250310001622686
Burba F, Ferraty F, Vieu P. k-Nearest Neighbor method in functional nonparametric regression. Journal of Nonparametric Statistics 2009; 21(4): 453-469. https://doi.org/10.1080/10485250802668909 DOI: https://doi.org/10.1080/10485250802668909
Midi H, Ismaeel SS. Fast improvised diagnostic robust measure for the identification of high leverage points in multiple linear regression. Journal of Statistics and Management Systems 2018; 21(6): 1003-1019. https://doi.org/10.1080/09720510.2018.1466443 DOI: https://doi.org/10.1080/09720510.2018.1466443
Ba´ıllo A, Gran´e A. Local linear regression for functional predictor and scalar response. Journal of Multivariate Analysis 2009; 100(1): 102-111. https://doi.org/10.1016/j.jmva.2008.03.008 DOI: https://doi.org/10.1016/j.jmva.2008.03.008
Shang HL. Bayesian bandwidth estimation for a nonparametric functional regression model with mixed types of regresses and unknown error density. arXiv preprint arXiv:1403.1913, 2014. https://doi.org/10.1080/10485252.2014.916806 DOI: https://doi.org/10.1080/10485252.2014.916806
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .