HRS-Swin: A Hierarchical Representation Separation Swin Transformer for Automated Neonatal Auricular Deformity Classification
DOI:
https://doi.org/10.54097/825wrs61Keywords:
Vision Transformer, Progressive representation reconstruction, Attention mechanism, Adaptive margin learningAbstract
The incidence of auricular deformities in newborns is notably high, and even experienced clinicians may encounter issues such as misdiagnosis and missed diagnosis due to subjective judgment. Although several studies have explored the use of deep learning methods for auxiliary diagnosis, the highly complex and individualized characteristics of auricular morphology pose significant challenges to existing approaches in achieving automated identification and fine-grained subtype classification. To address this issue, we propose HRS-Swin, a progressive representation reconstruction framework built upon a Swin Transformer backbone. The model integrates a Class Token Fusion module to enhance global semantic representation, a Stable Semantic Enhancement and Residual Compression mechanism for compact and discriminative embedding learning, and a Dynamic Margin Enhancer to enlarge inter-class separability in the embedding space. Experiments on the BabyEar4K dataset (1,926 newborns) demonstrate that HRS-Swin outperforms representative CNN and Transformer baselines. The proposed method achieves an accuracy of 0.8009 and a macro F1-score of 0.7024, showing consistent improvements over standard Swin Transformer. These results indicate that the proposed framework provides a robust and effective solution for automated auricular deformity classification and early clinical assistance.
Downloads
References
[1] Byrd H S, Langevin C J, Ghidoni L A. Ear molding in newborn infants with auricular deformities [J]. Plastic and reconstructive surgery, 2010, 126(4): 1191-1200.
[2] Wu H, Niu Z, Li G, Li Y, Wang B, Qian J, Wang Y, Jiang H, Chen Y, Han Y. Nonsurgical Treatment for Congenital Auricular Deformities: A Systematic Review and Meta-analysis. Aesthetic Plast Surg. 2022 Feb;46(1):173-182. doi: 10.1007/s00266-021-02427-9. Epub 2021 Jul 6. PMID: 34231021.
[3] Hallac R R, Lee J, Pressler M, et al. Identifying ear abnormality from 2D photographs using convolutional neural networks[J]. Scientific reports, 2019, 9(1): 18198.
[4] Galdámez P L, Raveane W, Arrieta A G. A brief review of the ear recognition process using deep neural networks[J]. Journal of Applied Logic, 2017, 24: 62-70.
[5] Tajbakhsh N, Shin J Y, Gurudu S R, et al. Convolutional neural networks for medical image analysis: Full training or fine tuning?[J]. IEEE transactions on medical imaging, 2016, 35(5): 1299-1312.
[6] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[7] Hallac R R, Jackson S A, Grant J, et al. Assessing outcomes of ear molding therapy by health care providers and convolutional neural network[J]. Scientific reports, 2021, 11(1): 17875.
[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, et al., Going Deeper with Convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, et al., Attention Is All You Need, in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[10] Z. Liu, H. Hu, Y. Lin, Z. Yao, et al., Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arXiv preprint, 2021.
[11] G. Jocher, A. Chaurasia, J. Qiu, Ultralytics YOLOv8, 2023. GitHub repository: https://github.com/ultralytics/ultralytics (include as software citation since no formal published paper exists).
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Advanced Engineering and Technology Research

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.










