Comparison between Mexican and International Medical Graduates’ scores in the ENARM Competing for Clinical Specialities in Mexico during 2012-2019: Data Visualization, Trends and Forecasting Analyses

Objectives: Because there is heterogeneity in the ENARM scores obtained between Mexicans and International medical graduates (IMG) in the eight clinical specialities with direct-entry (Anesthesiology, and Emergency Medicine. Geriatrics, Internal Medicine, Medical Genetics, Pediatrics, Pneumology, Psychiatry), we aimed to evaluate those scores. We hypothesized that Mexican test-takers achieve higher scores than IMG with significant growth trends in their exam scores. Methods: This study was cross-sectional, used historical data from the annual public report of the ENARM for eight years (2012 to 2019). We compare the minimum (MinSco) and maximum (MaxSco) scores of each speciality using ANOVA. Mexican versus IMG scores were evaluated with an independent student t-test, trends with Spearman’s correlation coefficient, and a 5-years forecasting trend. Results: There was a significant difference among the MinSco for five surgical specialities; F (7, 115) = 26.611, p = < .001; the global mean of MinSco was 69.133; specialities above this mean were Internal Medicine, Anesthesiology, Pediatrics, and Pneumology. The global mean for MaxSco was 79.422; five specialities were above: Internal Medicine, Pneumology, Geriatrics, Psychiatry, and Medical Genetics. We did not find a significant difference in the MinSco between Mexicans and IMG, but a significant difference was found in the MaxSco between both groups. Conclusions: ENARM represents a market of high-performance test-takers across the clinical specialities. Mexicans and IMG achieved similar entrance scores, but Mexicans showed a higher MaxSco over IMG in all clinical specialities.


Education of Graduated Doctors
The residence is a critical step in graduated doctors' education since 90% aspire to a postgraduate or medical speciality [1]. In the USA, up to 88% of general practitioners will eventually study a medical speciality; this percentage decrease to 35% in Mexico [2]. The score that a general practitioner (GP) obtains in the National Evaluation for Medical Residency Applicants (ENARM, Examen Nacional de Aspirantes a Residencias Medicas) is the entrance door to a specialization course endorsed by a Mexican University [3,4].

Logistics of the ENARM
The ENARM is a one-step only exam that uses multiple-choice questions and computerized patient cases to assess examinees' knowledge related to *Address correspondence to this author at the Directorate of Research, Hospital General de Mexico, Dr. Balmis 148, Colonia Doctores, Delegacion Cuauhtemoc, 06726 Ciudad de Mexico, Mexico; Tel: +52155-2789-2000 ext. 1149; E-mail: ernest.roldan@usa.net foundational science concepts applicable to medical and scientific theories to clinical medicine; details concerning the logistics' of the exam has been published previously [5,6].
In Mexico, the Interinstitutional Commission issued the reports for Human Resources Training for Health (CIFRHS, Comisión Interinstitucional para la Formación de Recursos Humanos para la Salud) is an inter-institutional, consultation, advisory and technical support organization of the Ministry of Public Education and the Ministry of Health [7]; it considers 27 medical specialities with a direct entry [8]. For the Mexican educational institutions, the ENARM scores and the percentages of their graduates' selection are indicators of efficiency and reason of prestige and even of propaganda among the aspirants to study medicine [9].
were more than 57,000 applicants, and only 9,668 Mexican and international medical graduates (IMG) were selected [11]. Several problems about the ENARM have been addressed in recent publications, for example, the number of Mexican test-takers and accepted GPs belonging to each Mexican medical school registered in the ENARM [3]; the logistics and transparency of the ENARM exam [5]; the performance of private versus public schools using a summary measures method, exploring significant differences in the performance based on geographic regions and socioeconomic level of the Mexican states to which each school belongs [3,12]; and the assessment of the assumption of equity in the ENARM [6].
There is an educational problem in Mexico related to the applicant's heterogenous ENARM scores to clinical specialities [1,13,14]. We do not know the eight clinical specialities' academic performance with a direct entry: Anesthesiology, Emergency Medicine, Geriatrics, Internal Medicine, Medical Genetics, Pediatrics, Pneumology, and Psychiatry [14].
We aimed to assess these eight direct-entry clinical specialities' performance and compare the scores of Mexican versus IMG in each speciality; we also included a trend analysis over eight years (2012-2019). We hypothesized that Mexican test-takers achieve higher scores than IMG with significant growth trends in their exam scores.

Study Design and Data Acquisition
This study was cross-sectional and used historical data that did not require approval by an Institutional Review Board. We based our analyses on the annual public report of the ENARM for eight years from 2012 to 2019 issued by the CIFRHS. The reports contained quantitative information on each medical speciality's academic performance from graduate physicians who took the ENARM; these reports are freely available as PDF files at the CIFRHS website [11]. Original data are included as an online-only supplementary file.

Logistics of ENARM and Assessed Variables
Five test forms are created each year, each comprising 450 multiple-choice single-best answer items; no item is used in more than one test form. All test forms contain the same number of items per area of knowledge (speciality/subspecialty), with an approximate item distribution of 37.5% internal medicine, 25% paediatrics, 22% gynaecologyobstetrics, and 15% surgery. Applicants for each speciality are ranked from highest to lowest according to their total ENARM score. Ranked applicants receive a 'pass' certificate until the quota is met according to that speciality's available positions [6].
For each year (2012-2019), we recorded the minimum and maximum scores (calculated by dividing the absolute number of correct answers by the total number of items) clustered by nationality (Mexican or IMG) and chosen speciality (8 direct-entry specialities) that coincidentally appear in the annual CIFRHS report.

Part I, Comparison of the Minimum and Maximum Scores among Surgical Specialities
In the first part of our analysis, we compare the minimum (MinSco) and maximum (MaxSco) scores of the eight direct-entry clinical specialities evaluated by the ENARM (Anesthesiology, Emergency Medicine, Geriatrics, Internal Medicine, Medical Genetics, Pediatrics, Pneumology, Psychiatry); the Kolmogorov-Smirnoff and Shapiro-Wilk tests showed a nonsignificant p-value for each speciality, which indicated a normal distribution of data in both variables (MinSco and MaxSco). Then, we performed a one-way ANOVA to reveal the differences in the scores achieved by each speciality; variables were tested for homogeneity of variance, and posthoc tests used the LSD (least significant difference) method. To test the assumption that MinSco and MaxSco increase every year, we assessed a significant linear trend for the scores to increase across the specialities. For this assessment, we use the Polynomial option (in the ANOVA menu of SPSS); it chose the Degree: Linear (default) option in its Contrast box. Detailed descriptions of the ANOVA test in clinical settings have been previously published by our group [15,16]. Descriptive statistics were used for each variable and 95% confidence intervals (C.I.) [17]. The effect size assessment (proportion of the variance in the dependent variable that the independent variable can explain) of each result was obtained using the Partial Eta Squared (η 2 ). Partial eta squared was defined as the ratio of variance associated with an effect, plus that effect and its associated error variance. The values of η 2 were classified in three groups 0.01 to 0.06 = small effect, 0.06 to 0.14 = moderate impact, and > 0.14 = substantial effect [18].
To visualize the results, we use graph lines showing the evolution of MinSco and MaxSco every year for each speciality. We also drew bar graphs with the global means indicating those specialities whose mean were above or below a global mean for all specialities.

Part II, Comparison of the Minimum and Maximum Scores between Mexican and IMG, Correlations, Trend Lines and Forecasting Analyses
For the second part of our analysis, we looked for significant differences between Mexican and IMG in their scores by independently analyzing each speciality.
The Comparison of means was made using the independent T-test. The Pearson's correlation coefficient helped us to reveal direction trends: positive for increasing scores (↑) with every year (2012 to 2019) or negative for decreasing scores (↓).

Linear Trend Lines
We calculated the trend of the MinSco and MaxSco every year for each speciality, Linear trend lines are lines of best fit used to estimate a linear relationship in the data. They have the following form: where Y is the dependent variable, and X is the independent variable that affects it. They represent the simplest trend line model in that they estimate a relationship that is increasing or decreasing at a steady rate β 1 and are therefore best used when the trend of the data resembles a linear pattern. We reported the pvalues and the R-squared (a measure of how well the trend line fits the data). The latter considered the best indicator of model performance.

Forecasting Analyses
We forecasted our quantitative time-series data using a triple exponential smoothing method, which is also called Holt-Winters exponential smoothing [19,20]. It was applied using ©Tableau software. This method is used for forecasting the univariate time series when the data might have both linear trend and seasonal pattern. In Holt-Winters exponential smoothing, recent observations are given relatively more weight than older observations; it is suitable for short-term forecasting and uses the maximum likelihood function for estimating parameters [21]. We calculated models that captured the evolving trend or seasonality of the data and extrapolated them into the future five-year period with 95% confidence prediction intervals.
The triple exponential smoothing formulas are given by: The model used to generate the forecast had three components: Level, Trend, and Season. The value for each component might be one of the following:

1.
None: The component is not present in the model.

2.
Additive: The component is present and is added to the other components to create the overall forecast value.

3.
Multiplicative: The component is present and is multiplied by the other components to create the overall forecast value.
The QUALITY OF THE MODEL was evaluated with five statistical values: The smoothing coefficients were optimized to weigh more recent data values over older ones, such that within-sample one-step-ahead forecast errors were minimized.
Alpha is the level smoothing coefficient,

Beta is the trend smoothing coefficient, and
Gamma is the seasonal smoothing coefficient.
The closer a smoothing coefficient was to 1.00, the less smoothing was performed, allowing for rapid component changes and heavy reliance on recent data. The closer a smoothing coefficient was to 0.00, the more smoothing was performed, allowing for gradual component changes and less reliance on recent data [22].
The forecasting method calculated a 5-years trend in the MinSco and MaxSco of each speciality; we detected a crossing point between Mexican and IMG for each medical speciality.
We used our previously calculated global means for the MinSco and MaxSco to group the Mexican and IMG in specialities that lay above or below each speciality's mean.

Scores Included in the Analysis
For each score (MinSco and MaxSco), we evaluated 128 measures, 16 for each speciality (8 scores for Mexicans and 8 for IMG for the years 2012 to 2019), with a total of 256 measures included.
However, from the 256 total number of scores, we substracted 24 scores corresponding to some years in which some specialities did not have test-takers; then, a total of 232 scores were included in the analysis.

Grouping of Specialities above or below a Global Mean
We calculated a MinSco global mean of 69.133. Specialities above this mean were Internal Medicine, Anesthesiology, Pediatrics, and Pneumology. Specialities below the mean corresponded to Psychiatry, Geriatrics, Medical Genetics, and Emergency Medicine.
The global mean for the MaxSco was 79.422, and five specialities were above this mark: Internal Medicine, Pneumology, Geriatrics, Psychiatry, and Medical Genetics. The other four specialities below the global mean were Pediatrics, Anesthesiology, and Emergency Medicine. Figures 1A and B show the scores above or below the global mean for surgical specialities.

Comparison of Minimum and Maximum Scores Achieved by Surgical Specialities
The one-way ANOVA depicted a significant difference among the minimum scores achieved by the eight clinical specialties; F (7, 115) = 26.611, p = < .001; the η 2 = 0.632 indicated a great effect size. Posthoc tests showed significant differences between each surgical speciality (bonferronni adjusted p-value = .006). Only two pairs of speciality-comparisons were non-significant: Anesthesiology vs Medical Genetics (p = 0.010), and Anesthesiology vs Pediatrics (p = 0.039). There was a significant linear trend for the increasing scores with every year F (7, 115) = 4.167, p = < .044; the η 2 = 0.033 indicated a small effect size.
We also found a significant ANOVA test in the Comparison of the MaxSco between surgical specialities, F (7, 115) = 5.561, p < 0.001, which pointed a difference in the MaxSco among the eight specialities; the η 2 = 0.264 indicated a great effect size.

Comparison of Minimum and Maximum Scores between Mexicans and IMG in each Clinical Speciality
For the MinSco, it was very interesting to notice that the IMG got higher scores for all clinical specialities. However, Anesthesiology was the only speciality with a significant difference between Mexicans and IMG. For Mexicans, the highest score was Internal Medicine, but for the IMG was Pneumology.
For the MaxSco, we observed exactly the reverse trend, Mexicans got the higher scores in all the specialities, and the differences between scores were all statistically significant. For Mexicans and IMG, the highest score was Internal Medicine, the lowest for Mexicans was Emergency Medicine and IMG Geriatrics. Table 2 depicts the means, SD, standard error of the mean between Mexicans and IMG for each clinical speciality; p-values were calculated with the independent t-test.

Modelling of Linear Trends
All linear trend models were computed for the median Maximum or Minimum given years according to the formula: Type of test-taker * (Year of years + Intercept) Table 4 shows the R-Squared and p-values of the trend lines for the minimum and maximum scores grouped by the eight selected specialities. Figure 2 depicts the mathematical model for each trend lines grouped by medical speciality. Figure 3 shows the graphical representation of the observed means and linear trends for both Min and Max scores.

Comparison of 5-Year Forecasting Trends between the Minimum and Maximum Scores of Mexicans and IMG
We identified convergent and divergent forecasting trends between each speciality's minimum and

Comparison of Clinical Specialities' Scores Obtained in the ENARM
International Journal of Statistics in Medical Research, 2021, Vol. 10 57 Figure 5: Description of the forecasting models grouped by speciality. Geriatrics and Pneumology were not included due to fewer years that did not allow for calculated reliable models. maximum scores, depending on if the lines will or will not eventually touch each other during or after a 5-year forecasted period (2020-2024 years).
Five specialities showed a convergent pattern for Mexicans between the MinSco and MaxSco: Anesthesiology, Internal medicine, Medical Genetics, Geriatrics, and Pneumology, and three a divergent pattern: Emergency medicine, Pediatrics, Psychiatry.
In IMG, one speciality depicted a convergent trend: Pediatrics; five specialities had a divergent tendency: Anesthesiology, Internal medicine, Medical Genetics, Emergency medicine, Pediatrics, Psychiatry. For Geriatrics and Pneumology, because there were not test-takers in all the evaluated years, for that reason, the software could not calculate forecasting graphs. Figure 4 shows the forecasted trends between MinSco and MaxSco for Mexicans and IMG. Figure 5 presents the description of the forecasted models grouped by speciality (definitions for the different components were described in the methods sections).

Ranking of Specialities between Mexicans and IMG
Additionally, we ranked the specialities based on the MinSco between Mexicans and IMG for each speciality. Adjacent rows with connecting arrows show the displacement in the ranking from the initial rank each speciality reached for Mexicans compared with their position for IMG.
For the MinSco, it was evident that the ranking of medical specialities was different between both groups: three specialities in the Mexican ranking (Pneumology, psychiatry, and Medical genetics) went up when compared them with the IMG; three moves down (Internal medicine, Geriatrics, and Pediatrics), and only two (Anesthesiology and Emergency medicine) depicted the same raking for Mexicans and IMG.
For the MaxSco, the ranking of medical specialities was different in almost all the specialities between both groups: four specialities went up in the Mexican ranking (Anesthesiology, Pneumology, Emergency medicine, and Medical genetics) after compared them with the IMG; three moves down (Psychiatry, Geriatrics, and Pediatrics), and only Internal medicine depicted the same raking for Mexicans and IMG. Figure 6 showed the ranking displacement in Mexican specialities (MinSco and MaxSco) when we compared them with the scores of IMG.

DISCUSSION
Residency is a critical step in a physician's education; the matching into a residency program is a competitive process of selection by both applicants and program directors [23]. Residency program directors usually do not make a decision based only on the test scores of the applicants. They must have a more comprehensive evaluation and therefore receive large amounts of information about applicants, including academic transcripts, the medical student performance assessment, letters of recommendation and others [24]; a 2006 survey evinced that 2,528 program directors chose top academic selection criteria based on clinical performance [25].
Thus, the results will benefit four groups of actors interested in the processes of a successful match: ENARM applicants, education department directors, medical school advisors, and medical students who are planning to enter a residency program. The strengths of our study lie in different approaches to analyze the information. We compared the means in eight clinical specialities, the differences between Mexicans and IMG scores, calculated correlations and linear trends, 5-years forecasting, and ranking displacement for Mexicans and IMG in each speciality. Reporting information about a pattern in the assessments across specialities has been considered valuable to residents and program directors [26].

Educational Framework
The preparation for the exam should: motivate the learner through improvement in real-life, final performance; take into account the learner's preexisting knowledge (learning curve); allow repetition of the skills multiple times; be accompanied by immediate feedback, and be varied (mixed) across content areas. We think the significantly different scores between Mexicans and IMG might primarily represent a lack of practice and direct supervision of skills acquisition (answering previous exam models). Knowing in advance, the clinical field scores are relevant to predicting the performance during the residence. As it was evinced in a recent article of 2019, the performance of USMLE Step 2 CK correlated with higher scores during residence tests with better clinical performance [24].
Publications about the Mexican ENARM have triggered a great interest in the medical community in the last years; some authors have published descriptive reports about the scores of schools and faculties of medicine [3]; other authors have revealed flaws in the design of the ENARM that produce inequity, [6,27]; a recent study was published about the performance of IMG in the ENARM but without a comparison with Mexicans [8]. To the best of our knowledge, there are no publications about the ENARM that had presented a comparison of scores in clinical specialities between Mexicans and IMG; that situation did not allow us to compare most of our results with others literature.

Grouping of Specialities above or below a Global Mean
The use of an overall mean to compare above or below this mark is helpful to reflect the performance of eight different groups of test-takers that revealed to us which specialities had the students with the best scores. The ENARM global mean for the minimum score (from 2012 to 2019) was 69.133, a score above the previous observation made in a study by de la Garza-Aguilar [4]; this number is also above the mean for the last seven years for the test known as MIR (Medical Intern Resident) in Spain with 57.29 reported by the Ministry of Health [28,29]. Our findings showed that the clinical specialities whose applicants achieved scores above this mean were Internal medicine, Anesthesiology, Pediatrics, and Pneumology. This observation of high scores at the ENARM contrasts with the matching program results in the USA [30,31]. The specialities below the mean corresponded to Emergency Medicine and Anesthesiology.

Comparison of Minimum and Maximum Scores Achieved by Clinical Specialities
During the eight years assessed, it was evident that the eight clinical specialities' ranking was preserved for the MinSco (Figure 1D), specialities in the upper values were internal medicine and Pneumology, and in the lower values emergency medicine and anesthesiology. On the contrary, for the MaxSco, although there is an entanglement of scores was evident along the eight years, representing the change of ranking for the clinical specialities at different years, internal medicine and emergency medicine are dominant with the upper and lower scores ( Figure 1C).

Comparison of Minimum and Maximum Scores between Mexicans and IMG in each Clinical Speciality
Our findings revealed that Mexicans and IMG got mostly similar passing grades, which might indicate an equivalent level of education in their medical schools; however, for Pneumology, anesthesiology, and emergency medicine, the IMG got up to 2% points in higher scores ( Table 2). This finding differs from a previous report from the USA observed in 8 years for the orthopaedic surgery residency applicants that national got better scores than IMG [32]. The absence of significant differences in the minimum scores in most specialities comparing Mexican and IMG can also be interpreted as high competitiveness across all specialities ( Table 2). However, MaxSco revealed the superiority of Mexicans above IMG for all specialities, and all specialities showed a significant difference ( Table 2), which reflected a better level of preparation for this exam. This score revealed a significant gap in knowledge between Mexicans and IMG test-takers [33].

Positive and Negative Trends in the Minimum and Maximum Scores between Mexicans and IMG in each Surgical Specialities
The limited information about trends for applicants matching into USA specialities has been previously addressed. Most foreign articles describe specific specialities' performance without comparing their nationals and IMG [34]. We learned from our findings that there is still missing information, and we do not know which scores at specialities are ruled by the applicants every year and which others by the level of difficulty of the exam; an additional analysis will be necessary to understand how the number of residency positions influences the scores at each medical speciality.

Comparison of 5-Year Forecasting Trends between the Minimum and Maximum Scores of Mexicans and IMG
The predictive graphs help us understand that for Mexicans, the gap between MinSco and MaxSco will decrease for Anesthesiology, Internal medicine, and Medical genetics. However, for IMG Pediatrics and medical genetics. It means there are only 3 out of 8 surgical specialities (Emergency medicine, medical genetics, and Psychiatry) between Mexicans and IMG that share the same learning trend.

Ranking of Specialities between Mexicans and IMG
From this analysis, we learned that Mexicans achieved higher scores for MaxSco in the eight clinical specialities; on the contrary, IMG got higher values for their MinSco (Figure 4). For the MaxSco, the 1 st speciality with the highest scores is Internal medicine. This fact represents a challenge for future applicants, as they would have to get the best scores to be selected for a residency position. (Figure 4).

Limitations of the Study
Several limitations need to be acknowledged for this study. With the ENARM, the Mexican Secretariat of Health selects the best candidates each year with reasonable confidence, but a number much higher than the accepted is left without entering a medical speciality; we did not analyze those numbers as this topic was out of the scope of this study. Also, we did not comment on the context regarding the offer and demand of Mexican physicians per number of inhabitants; in 2015, Mexico had 2.2 physicians per 1,000 population, including professionals in the private sector, these numbers represent a significant disparity in the distribution of human health resources in the country. We did not understand which medical schools corresponded the test-takers with the highest scores, as this information was not available in the annual CIFRHS reports. Our assessment did not perform subgroup performance differences considering age, gender, test-takers race, and English as a second language because all these items were not publicly available. The same limitations had been addressed in previous reports for USMLE; residency program directors look in the ENARM results for the best candidates for their programs, considering all aspects of a student's application and an interview; however, we did not take into account intangible factors such as away rotations, personal interactions, membership, and research experience, although all of them might influence the chance of matching [23], these variables were not assessed in the context of this paper. Other topics no included in this study were the need to examine whether there is an ideal applicant-to-position ratio that would allow clinical residency coordinators to remain selective in their choices or whether increasing the number of clinical residency positions would dilute the quality of successful candidates.
In conclusion, our study provides objective and valuable information for residency program directors looking for the best candidates for their programs and also to applicants, revealing that ENARM represents a market of high-performance test-takers across the clinical specialities. Mexicans and IMG achieved similar entrance scores, but Mexicans showed a higher MaxSco than IMG in all clinical specialities. The comparisons using scores will allow program directors to compare academic performance across specialities and understand their competitiveness and evolution in recent years. Future studies are needed to explore if ENARM scores can predict performance on subsequent speciality assessments in training and certification examinations.