FORENSIC CORNER - ORIGINAL ARTICLE
|Year : 2021 | Volume
| Issue : 3 | Page : 554-555
Inter-observer agreement in the radiographic interpretation of Demirjian's developmental stages in the mandibular second and third molars – A comparative study
Jayasankar P Pillai1, Debesh Nilendu2, Namitha Thomas3, Sugandha Nagpal4, Lakshmi Sai Sneha Nedunari5
1 Department of Oral and Maxillofacial Pathology, Government Dental College and Hospital, Ahmedabad, India
2 Medical Officer, Health India Insurance TPA, Vadodara, Gujarat, India
3 Forensic Odontologist & Consultant Dental Surgeon, Kannur, Kerala, India
4 Scientific Officer, Sherlock Institute of Forensic Science, New Delhi, India
5 General Forensic Assistant, Delhi Police Crime Branch, New Delhi, India
|Date of Submission||15-Mar-2021|
|Date of Acceptance||15-Nov-2021|
|Date of Web Publication||11-Jan-2022|
Jayasankar P Pillai
Department of Oral and Maxillofacial Pathology, Government Dental College and Hospital, Ahmedabad - 380 016, Gujarat
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: The developmental stages of the teeth in the radiographs are graded on an ordinal scale. The present study was conducted using 123 digital orthopantomograms from individuals in the age group of 5 to 22 years and to analyze and evaluate the inter-observer agreement in grading the developmental stages of second and third molars. Four observers with different levels of practical experience in age estimation participated in the study. The development stages of both the molars in the lower left quadrant (3rd quadrant) were assigned based on the Demirjian's 10 stage chart. The percentage agreement and Kappa statistics were used to test the agreement between the observers. The data of the observer 1 were used as the standard for the comparison.
Results: There was 70.0%–75.6% agreement among the observers in staging second molar and 52%–68.3% for the third molars. There was an excellent agreement (k > 080) between observer 1 and observer 2 and a good agreement (k = 060–0.79) between observer 1 and the other two observers for both the molars. The Fleiss Kappa revealed moderate to a good overall agreement for both the molars (k = 0.51–0.66). The Freidman's test revealed a significant difference in the grading of third molars between all the raters (χ2 = 25.02, df 3, P < 0.001) and for the second molar the difference was not significant (χ2 = 3.89, df 3, P > 0.05). The stage-wise overall agreement was fair for Stage 3 in the second molar and Stage 9 and Stage 4 in the third molar.
Conclusions: In conclusion, proper training in the radiographic interpretation of developmental stages may minimize the errors during the age estimation methods.
Keywords: Demirjian's method, dental age estimation, inter-observer agreement, kappa statistics, orthopantomograms, percentage agreement
|How to cite this article:|
Pillai JP, Nilendu D, Thomas N, Nagpal S, Sneha Nedunari LS. Inter-observer agreement in the radiographic interpretation of Demirjian's developmental stages in the mandibular second and third molars – A comparative study. J Oral Maxillofac Pathol 2021;25:554-5
|How to cite this URL:|
Pillai JP, Nilendu D, Thomas N, Nagpal S, Sneha Nedunari LS. Inter-observer agreement in the radiographic interpretation of Demirjian's developmental stages in the mandibular second and third molars – A comparative study. J Oral Maxillofac Pathol [serial online] 2021 [cited 2022 Aug 18];25:554-5. Available from: https://www.jomfp.in/text.asp?2021/25/3/554/335552
| Introduction|| |
In forensic dentistry, the age estimation from radiographs for medico-legal purposes is one of the main domains and panoramic radiographs are often used for this purpose. The reliability and acceptability of such age estimation methods based on graded interpretation of the dental radiographs depend on the repeatability and the agreement between observers when a particular radiograph is graded by more than one observer (inter-observer) or repeatedly by the same observer (intra-observer). Hence, studies testing the reliability and agreement are also important. The quality of such studies depends on the amount of error inherent to scoring or grading the developmental stages of teeth as observed in the radiographs. Using the radiographs, dental maturity, the degree of mineralization, and the extent of root formation are assessed based on certain preexisting standards.,, The eight stages (A-H) standards developed by Demirjian et al. and later modified into 10 stages numeric scale (0–10) by Chaillet and Demirjian are the most widely accepted methods worldwide. These above-mentioned scales developed by Demirjian et al. also formed the basis for developing the population-specific and tooth-specific age estimation methods.,,,, As ordinal scales are used to grade the developmental stages of teeth, there exists a chance for subjectivity in grading by different raters or observers. The radiographic studies based on heterogeneous groups of observers are thus the need of the hour. Discrepancies between observers in the range of ± one stage have been observed while grading the teeth using radiographs. Studies have found significant effects of intra- and inter-observer variations on age estimation from the assessment of the stage of development, which is greater between observers, with the majority confined to one development stage. Furthermore, the level of agreement can be improved by using reference radiographs for calibration and by prior and repeated calibrations of the observers. The level of practical experience by the observers also plays an important role while interpreting the developmental stage of teeth for age estimation purposes. In the dental undergraduate curriculum, dental students are exposed to the theoretical and practical training on dental age estimation methods using dental models and X-rays. This may be an academic approach to teach dental age estimation methods, but when dealing with real-life forensic cases, the forensic odontologists or any qualified dentists need to be highly trained to confidently and accurately handle the age estimation cases. The dental age estimation by radiographic methods is one of the sought after topics in several fellowships and hands-on training programs in forensic odontology. However, the issue of subjectivity or inter-observer variations in grading the developmental stages is often encountered both during research and practical applications. The level of agreement between observers when dealing with ordinal data is usually tested using Kappa statistics. The kappa values of <0.20 suggest poor agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, and >0.81 excellent agreement. The purpose of this study was to investigate how consistently multiple observers assign the dental developmental stages for mandibular second and third molars using the digital orthopantomograms (OPGs). The present study was conducted with four independent observers, with different levels of experience in dental age estimation. The first observer was an experienced forensic odontologist with additional training in dental age estimation and who has practically performed dental age estimation on several medico-legal cases and conducted hand-on training in dental age estimation as a trainer in several continuing dental education programs. The second and the third observers were dental graduates with postgraduate training in forensic odontology. Both had different levels of training on dental age estimation before the start of the study. The observer 2 had both theoretical and hands-on practical training on age estimation, whereas the observer 3 hands only theoretical training, without much exposure to practical hands-on. The fourth observer was a general dentist without any basic training on dental age estimation methods.
| Methods|| |
The sample comprised of digital panoramic radiographs from a total of 123 individuals (63 females, 60 males) aged 5 to 22 years with a mean age of 13.79 years ± 3.62. Participants with similar socioeconomic status and ethnic origin were included in the study. The radiographs were archived from the departments of orthodontia and pedodontia of the institute and comprised of patients from diverse regions of Gujarat seeking orthodontic and pediatric dental treatments. The radiographs were pre-treatment in nature and belonged to healthy individuals with no obvious developmental anomalies, especially in the mandibular second and third molar regions. The development stages of mandibular second and third molars on the lower left quadrant were evaluated based on the Demirjian and Chaillet's 10 stage chart and description. The radiographs were randomly selected by one of the authors and were sent to the remaining four observers, who were blinded about the sex and age of the patients. Thus, the four observers independently assigned the developmental stages of both the molars. The grading of observer 1 was considered as the reference standard for comparison. The chronological age was calculated from the date of birth and the date of X-ray using the Excel sheet (Microsoft®, Redmond, Washington, U.S). To assess the intra-observer error in grading the stages of teeth, a set of 20 OPGs from the existing set of OPGs were re-evaluated by all the four observers for both the molars.
The inter-observer reliability was determined using Kappa statistics, based on the data from each observer. The degree of agreement between each pair of observers was assessed by calculation of weighted Kappa (κ) statistics. The overall agreement in the grading of stages among all the four raters was tested using Fleiss Kappa. The intra-observer difference in grading the stages was tested using the Wilcoxon signed-rank test. The overall difference in the grading between the observers for all the four teeth was tested using the Freidman's non-parametric test, which is an alternative to the one-way ANOVA with repeated measures. This test compares the mean ranks between the related groups and indicates how the groups differ. The Statistical Package for the Social Studies (SPSS) software version 26 (IBM Corp. Armonk, NY: IBM Corp.) P < 0.05 was set as a significance threshold. The Microsoft Office Excel® 2007 (Microsoft®, Redmond, Washington, U.S) was used for the descriptive statistics and to calculate the percentage agreement between the pairs of raters.
| Results|| |
The intra-observer agreement in grading the test and retest samples showed a significant moderate to a near excellent agreement (k = 0.512–0.876; P < 0.001) for both the molars by all the observers [Table 1]. The Wilcoxon signed-rank test also showed an insignificant difference (P > 0.05) in the grading between the test and the retest samples by all the observers for both the teeth. [Table 2] and [Table 3] illustrate the frequency distribution of the second and the third molar respectively according to their developmental stages by all the four observers. The percentage agreement in grading the stages between the pair of observers with observer 1's data as reference ranged between 70.7% and 75.6% in the second molars and between 52% and 68.3% in the third molars [Table 4] and [Table 5]. The disagreement was maximum for stage 7 in the second molar and stage 3 in the third molar. Cohen's weighted kappa statistics revealed a good (k = 0.61–0.80) to an excellent agreement (k ≥ 0.80) between the reference data set of observer 1 and the other three observers for both the molars [Table 6]. The overall agreement tested using the Fleiss kappa statistics between the data of all the observers' kappa (k) value of 0.66 and 0.51 for second and third molars, respectively [Table 7]. The nonparametric tests revealed a significant difference in the grading among all the observers for the third molar, whereas the second molar grading did not show any significant difference [Table 8]. The pair-wise comparison of grading by observer 1 (reference data) with that of other observers done using the Wilcoxon signed-rank test revealed a significant difference with observer 3 and observer 4 for the third [Table 9]. The Fleiss Kappa agreement on individual stage categories among all the observers in the second molar revealed an insignificant poor agreement for Stage 3. Stages 4 and 5 showed an excellent significant agreement, whereas Stages 7 and 8 showed a moderate agreement. For the third molar, stage 4 showed fair agreement. Stage 0 showed a good agreement and the remaining stages showed moderate agreement among the observers [Table 10].
|Table 1: Intra-observer agreement in staging the development stages of #37 and #38 tested using Cohen's kappa test and Wilcoxon signed rank test|
Click here to view
|Table 2: The frequency distribution of the second molar (#37) based on their developmental stages by all the observers|
Click here to view
|Table 3: The frequency distribution of the third molars (#38 and #48) based on their developmental stages by all the observers|
Click here to view
|Table 4: The percentage agreement in the grading of development stages of second molar (#37) between the observer 1 and other observers|
Click here to view
|Table 5: The percentage agreement in the grading of development stages of third (#38) between the observer 1 and other observers|
Click here to view
|Table 6: The results of the weighted Kappa statistics between pairs of observers|
Click here to view
|Table 7: The results of the Fleiss Kappa statistics among all the four observers|
Click here to view
|Table 8: The results of the Freidmanfs nonparametric test comparing the developmental stages recorded by all the four observers (tooth-wise)|
Click here to view
|Table 9: The pairwise comparisons of the grading of developmental status of molars between the reference observer 1 and other observers|
Click here to view
|Table 10: The overall Fleiss Kappa agreement on individual stage categories in both the teeth|
Click here to view
| Discussion|| |
In the context of dental age estimation using the ordinal scores, it is very important to ensure that individual assessors can assign the tooth development stage reliably and consistently. The Demirjian's method of age estimation has been considered as a landmark in radiographic age assessment techniques globally, and gradually, researchers have come up with modifications with repeated use in different ancestries. However, this method leaves room for subjectivity when assigning the developmental stages. However, the pioneers of the method did judge the impact, the agreement in the results between observers could have on the accuracy of the estimated age. The inter-examiner variability in their study was <10% with a difference of only one stage. Furthermore, the examiners were well trained by one of them which gave the results better agreeability., Thus, the extent of the agreement between observers could have a direct effect on the accuracy of the estimated age. The agreement is based on the closeness of the stages between two observers. An extensive work by Levesque et al. in 1980 proved the accuracy rate of 80% in assigning the tooth development stages, and in the remaining 20%, the staged assessment between observers varied by plus or minus one stage. The inter-observer agreement while assigning the ordinal scales, like the one in Demirjian's method is usually assessed using the Kappa statistics, which tests the closeness or equality of grades between the observers. Cohen's kappa coefficient compares the observed probability of disagreement to the probability of disagreement expected by chance. The kappa coefficients vary between − 1 and 1. Perfect agreement (k = 1) is obtained when no disagreement is observed. A value of zero (k = 0) indicates that the probability of disagreement is only to be expected by chance while negative values express that the observed probability of disagreement is larger than what is expected by chance. In this present study, the data of observer 1 was used as a standard for comparisons. Although there was a good agreement with the data of all the observers were compared together, the pair-wise comparisons revealed an excellent agreement (k > 0.80) between the reference data (observer 1) and observer 2, who had undergone hands-on training in grading and had a better experience than the remaining observers at the time of the study. The percentage agreement with the observer 1 was least for observer 4 who neither had theoretical nor practical training in the radiographic dental age estimation method. Reliability study in grading the stages of both the upper and lower third molars was earlier reported. Their study compared the inter and intra-observer agreement levels in four different age estimation methods. Their report shows that Demirjian's method had a good agreement when compared to other methods and the mandibular third molars showed the highest agreement between observers. The present study assigned the stages of only the mandibular second and third molars. These two molars are considered as valuable age indicators in adolescents and young adults. The overall interobserver agreement was higher for the second molar. Inter-observer variability of one stage was reported in Norwegian children. In their study, the two observers differ by one earlier stage in 10.5% of the scores and by one later stage in 11.6% of the cases. In the present study, it was observed that while grading the stage of third molars, observers 2, 3, and 4 selected an advanced stage (positive ranks) in 19.5%–43.09% of cases when compared to the observer 1. The study also points out that the agreement or near-perfect identification of the correct stage could be possible with additional hands-on training in identifying the correct stage using the standard reference. Between observer 1 and observer 2, it was observed that the observer 2 gave an advanced grading in the second molar, whereas in the third molar, he gave a lesser grading when compared with observer 1. In another comparative study comparing the results of application of Demirjian's, and Nolla's methods, the overall inter-observer agreement was 0.98 for both the methods. The present study compared only the Demirjian's staging in second and third molars and found the overall inter-observer agreement 0.66 and 0.51 for second and third molars, respectively. The intra-observer agreement in grading the development stages in molars ranged from 0.51 to 0.876 for all the four observers. Studies have shown an intra-observer agreement from 0.79 to 0.94., The interobserver variability in staging the development stages of teeth using the intraclass coefficient revealed a Cronbach's alpha value of 0.759 and 0.259 for second and third molars, respectively. In the present study too, the interobserver agreement in third molar was comparatively lesser when compared to the second molar. Commonly in various studies, transitory stages of the Demirjian and Chaillet's methods like stage D and E have been found to have maximum observer agreement. This fact has been consistent with the present study. Surprisingly, when there has been a high rate of discrepancies in scoring the beginning or end stages: Stage A or Stage G or H, there has been inter-observer reliability at Stage 8 in the present study. Furthermore, if the traits of various stages of the method used are more defined and dichotomized, observers find it helpful to consistently give better and accurate scores. The scientific community recommends the use of different methods and repetitive observations to enhance the reproducibility among examiners. The use of more reference radiographs together with assisted magnification and digital images have proven to provide fairly higher precision due to the calibration of the readings. The third molars exhibit marked differences in terms of formation, eruption morphology, and agenesis compared to the second molar. That could be attributed to the significant difference in scoring and a lesser agreement by all the observers in this study. Conversely, it forms the basis of many other certified methods. This again necessitates the use of more projections for calibration and a better outcome. Therefore, in the case of third molars, a scoring method should comprise the entirety of the maturation sequence with clear-cut demarcations. A sound statistical approach in conjunction with a practical one that not only renders appropriate estimation but also evades misassumptions is, hence imperative for age evaluation practices. More studies such as the present one with interdisciplinary observer's error assessment shall be considered welcome to improve the approach towards age estimation. A properly applied statistical approach and the experience of the forensic odontologist are essential for the success of any age estimation method. The more the stages in a scoring method, the better is its precision, owing to well-defined objective measures. However, after a point, it decreases the reproducibility as a result of increased confusion among observers. Thus, a balance between detailed and sufficient staging along with practical feasibility with fair consistency and precision is always desirable for best results. This makes any method straight forward enough to eliminate stage overlaps and facilitate better learning among forensic odontologists. The good inter-observer agreement is also essential in other age estimation methods where radiographs are not applicable. The budding forensic odontologists may undergo exclusive training on the radiographic grading of the tooth development for age estimation from experienced forensic odontologist. From legal perspective, the age estimation methods need to the assessment that are accurate and reproducible with minimal or nil interobserver variability. This is possible by the correct and uniform identification of the development stages of teeth by the forensic odontologists irrespective of their practical experience in age estimation.
| Conclusions|| |
Based on the above study, it can be concluded that:
- The inter-observer agreement in staging the developmental stages of second and third molars was significantly different among observers for the third molar and insignificantly different for second molars
- There was moderate to a substantial agreement between the reference data and the data from the other three observers for the second molar and the third molars, there was fair to a substantial agreement between the reference data and the data from the other three observers
- Proper hand-on training in radiographic interpretation of developmental stages may improve the chance of minimizing the errors during the age estimation methods.
The authors wish to thank Dr.Rajarajeswari, Consultant Dental surgeon, Ahmedabad for her contribution in this study. The authors also express their gratitude to Dr. Puneet Gupta, Associate Professor, Public Health Dentistry, Govt. Dental College and Hospital, Indore for the statistical assistance in the study.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Nolla CM. The development of permanent teeth. J Dent Child 1960;27:254-66.
Moorrees CF, Fanning EA, Hunt EE Jr. Age variation of formation stages for ten permanent teeth. J Dent Res 1963;42:1490-502.
Demirjian A, Goldstein H, Tanner JM. A new system of dental age assessment. Hum Biol 1973;45:211-27.
Chaillet N, Demirjian A. Dental maturity in South France: A comparison between Demirjian's method and polynomial functions. J Forensic Sci 2004;49:1059-66.
Mincer HH, Harris EF, Berryman HE. The ABFO study of third molar development and its use as an estimator of chronological age. J Forensic Sci 1993;38:379-90.
Willems G, Van Olmen A, Spiessens B, Carels C. Dental age estimation in Belgian children: Demirjian's technique revisited. J Forensic Sci 2001;46:893-5.
Acharya AB. Age estimation in Indians using Demirjian's 8-teeth method. J Forensic Sci 2011;56:124-7.
Moness Ali AM, Ahmed WH, Khattab NM. Applicability of Demirjian's method for dental age estimation in a group of Egyptian children. BDJ Open 2019;5:2.
Yassin SM. Accuracy of Demirjian's four methods of dental age estimation in a sample of Saudi Arabian population. Aust J Forensic Sci 2020;27:1-4.
Levesque GY, Demirjian A. The inter-examiner variation in rating dental formation from radiographs. J Dent Res 1980;59:1123-6.
Pillai JP, Chokkalingam TS, Aasaithambi B, Nuzzolese E. Establishment of the forensic odontology department: A proposed model for the basic infrastructure and forensic odontology kit. J Forensic Dent Sci 2019;11:64-72.
] [Full text]
Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, et al.
Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. Int J Nurs Stud 2011;48:661-71.
Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. John Wiley & Sons, Hoboken. 2013.
Stine W. Interobserver relational agreement. Psychol Bull 1989;106:341-7.
Vanbelle S. A new interpretation of the weighted kappa coefficients. Psychometrika 2016;81:399-410.
Dhanjal KS, Bhardwaj MK, Liversidge HM. Reproducibility of radiographic stage assessment of third molars. Forensic Sci Int 2006;159 Suppl 1:S74-7.
Lee SS, Byun YS, Park MJ, Choi JH, Yoon CL, Shin KJ. The chronology of second and third molar development in Koreans and its application to forensic age estimation. Int J Legal Med 2010;124:659-65.
Nyka¨nen R, Espeland L, Kvaal SI, Krogstad O. Validity of the Demirjian method for dental age estimation when applied to Norwegian children. Acta Odont Scand 1998;56:238-44.
Paz Cortés MM, Rojo R, Alía García E, Mourelle Martínez MR. Accuracy assessment of dental age estimation with the Willems, Demirjian and Nolla methods in Spanish children: Comparative cross-sectional study. BMC Pediatr 2020;20:361.
Günen Yılmaz S, Harorlı A, Kılıç M, Bayrakdar İŞ. Evaluation of the relationship between the Demirjian and Nolla methods and the pubertal growth spurt stage predicted by skeletal maturation indicators in Turkish children aged 10-15: investigation study. Acta Odontol Scand 2019;77:107-13.
Tomás LF, Mónico LS, Tomás I, Varela-Patiño P, Martin-Biedma B. The accuracy of estimating chronological age from Demirjian and Nolla methods in a Portuguese and Spanish sample. BMC Oral Health 2014;14:160.
Stella A, Jeevarathan J, Ennamuel DS, Selvi T. Analyzing the Interobserver Variability in Stages of Tooth Development with Orthopantomogram (OPG). Int J Recent Technol Eng 2019;7:256-9.
Willems G, Moulin-Romsee C, Solheim T. Non-destructive dental-age calculation methods in adults: Intra-and inter-observer effects. Forensic Sci Int 2002;126:221-6.
De Tobel J, Phlypo I, Fieuws S, Politis C, Verstraete KL, Thevissen PW. Forensic age estimation based on development of third molars: A staging technique for magnetic resonance imaging. J Forensic Odontostomatol 2017;35:117-40.
Lynnerup N, Belard E, Buch-Olsen K, Sejrsen B, Damgaard-Pedersen K. Intra-and interobserver error of the Greulich-Pyle method as used on a Danish forensic sample. Forensic Sci Int 2008;179:242e1-6.
Borrman H, Solheim T, Magnusson B, Kvaal SI, Stene-Johansen W. Inter-examiner variation in the assessment of age-related factors in teeth. Int J Legal Med 1995;107:183-6.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7], [Table 8], [Table 9], [Table 10]