HRS-Swin: A Hierarchical Representation Separation Swin Transformer for Automated Neonatal Auricular Deformity Classification

Authors

  • Daini Li College of Computer Science and Artificial Intelligence, Southwest Minzu University, Chengdu 610000, China
  • Xiaomeng Yang College of Computer Science and Artificial Intelligence, Southwest Minzu University, Chengdu 610000, China

DOI:

https://doi.org/10.54097/825wrs61

Keywords:

Vision Transformer, Progressive representation reconstruction, Attention mechanism, Adaptive margin learning

Abstract

The incidence of auricular deformities in newborns is notably high, and even experienced clinicians may encounter issues such as misdiagnosis and missed diagnosis due to subjective judgment. Although several studies have explored the use of deep learning methods for auxiliary diagnosis, the highly complex and individualized characteristics of auricular morphology pose significant challenges to existing approaches in achieving automated identification and fine-grained subtype classification. To address this issue, we propose HRS-Swin, a progressive representation reconstruction framework built upon a Swin Transformer backbone. The model integrates a Class Token Fusion module to enhance global semantic representation, a Stable Semantic Enhancement and Residual Compression mechanism for compact and discriminative embedding learning, and a Dynamic Margin Enhancer to enlarge inter-class separability in the embedding space. Experiments on the BabyEar4K dataset (1,926 newborns) demonstrate that HRS-Swin outperforms representative CNN and Transformer baselines. The proposed method achieves an accuracy of 0.8009 and a macro F1-score of 0.7024, showing consistent improvements over standard Swin Transformer. These results indicate that the proposed framework provides a robust and effective solution for automated auricular deformity classification and early clinical assistance.

Downloads

Download data is not yet available.

References

[1] Byrd H S, Langevin C J, Ghidoni L A. Ear molding in newborn infants with auricular deformities [J]. Plastic and reconstructive surgery, 2010, 126(4): 1191-1200.

[2] Wu H, Niu Z, Li G, Li Y, Wang B, Qian J, Wang Y, Jiang H, Chen Y, Han Y. Nonsurgical Treatment for Congenital Auricular Deformities: A Systematic Review and Meta-analysis. Aesthetic Plast Surg. 2022 Feb;46(1):173-182. doi: 10.1007/s00266-021-02427-9. Epub 2021 Jul 6. PMID: 34231021.

[3] Hallac R R, Lee J, Pressler M, et al. Identifying ear abnormality from 2D photographs using convolutional neural networks[J]. Scientific reports, 2019, 9(1): 18198.

[4] Galdámez P L, Raveane W, Arrieta A G. A brief review of the ear recognition process using deep neural networks[J]. Journal of Applied Logic, 2017, 24: 62-70.

[5] Tajbakhsh N, Shin J Y, Gurudu S R, et al. Convolutional neural networks for medical image analysis: Full training or fine tuning?[J]. IEEE transactions on medical imaging, 2016, 35(5): 1299-1312.

[6] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[7] Hallac R R, Jackson S A, Grant J, et al. Assessing outcomes of ear molding therapy by health care providers and convolutional neural network[J]. Scientific reports, 2021, 11(1): 17875.

[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, et al., Going Deeper with Convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, et al., Attention Is All You Need, in Advances in Neural Information Processing Systems (NeurIPS), 2017.

[10] Z. Liu, H. Hu, Y. Lin, Z. Yao, et al., Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arXiv preprint, 2021.

[11] G. Jocher, A. Chaurasia, J. Qiu, Ultralytics YOLOv8, 2023. GitHub repository: https://github.com/ultralytics/ultralytics (include as software citation since no formal published paper exists).

Downloads

Published

2026-03-08

Issue

Section

Articles

How to Cite

Li, D., & Yang, X. (2026). HRS-Swin: A Hierarchical Representation Separation Swin Transformer for Automated Neonatal Auricular Deformity Classification. International Journal of Advanced Engineering and Technology Research, 1(1), 31-34. https://doi.org/10.54097/825wrs61