Research on Brain Tumor Segmentation Deployment Method Based on Mixed Precision Inference Acceleration

Authors

  • Yamin Wang School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China
  • Beibei Hou School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China

DOI:

https://doi.org/10.54097/a2ghj869

Keywords:

Brain tumor segmentation, Mixed Precision Acceleration, TensorRT

Abstract

Aiming at the engineering bottlenecks faced by deep learning brain tumor segmentation models during real-world clinical deployment, such as high inference latency, large peak GPU memory consumption, and redundant model file sizes, this paper proposes a Mixed Precision Acceleration (MPA) inference deployment method. This method exports the standard FP32 precision model via the ONNX format and relies on the NVIDIA TensorRT inference engine to perform deep optimizations in FP16 half-precision mode, including computational graph redundancy elimination, layer fusion, kernel auto-tuning, and dynamic memory reuse. Experiments under three representative missing-modality scenarios on the BraTS2020 dataset demonstrate that, under the premise of maintaining nearly lossless segmentation accuracy (evaluation metric differences are all less than 0.01%), the MPA method compresses the model file volume by 70.6% on average (reduced to 17.5MB), significantly decreases peak memory consumption by 82.6%~85.7% (requiring only 700MB at minimum), and improves inference speed by 13.6%~16.5%. Compared to the Torch-Compile compilation scheme, the MPA method achieves a more optimal trade-off among inference speed, memory overhead, and storage cost, providing a highly feasible engineering solution for deploying brain tumor segmentation models to computationally constrained environments such as primary medical workstations, mobile terminals, and intraoperative navigation systems.

Downloads

Download data is not yet available.

References

[1] Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234-241.

[2] Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D U-Net: learning dense volumetric segmentation from sparse annotation. In MICCAI, 424-432.

[3] Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211.

[4] Menze, B. H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., et al. (2014). The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Transactions on Medical Imaging, 34(10), 1993-2024.

[5] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., et al. (2017). Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data, 4(1), 1-13.

[6] Hatamizadeh, A., Tang, Y., Nath, V., Tseng, D., Myronenko, A., et al. (2022). UNETR: Transformers for 3D medical image segmentation. In WACV, 574-584.

[7] Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., et al. (2022). Swin-Unet: Unet-like pure transformer for medical image segmentation. In ECCV Workshops.

[8] Peiris, H., Hayat, M., Chen, Z., Egan, G., & Mehrtash, A. (2022). A robust volumetric transformer for accurate 3D tumor segmentation. In MICCAI, 162-172.

[9] Futrega, M., Milesi, A., Marcinkiewicz, M., & Ribalta, P. (2022). Optimized U-Net for brain tumor segmentation. In BrainLes Workshop, 15-29.

[10] Jia, H., Xia, Y., Song, Y., Zhang, D., Huang, H., et al. (2023). HD-Net: High-resolution decoupled network for brain tumor segmentation. IEEE Journal of Biomedical and Health Informatics, 27(2), 871-882.

[11] Lee, J., & Kim, M. (2024). Modality-agnostic attention networks for brain tumor segmentation with missing MRIs. Medical Image Analysis, 91, 103026.

[12] Azad, R., Jia, H., Zhang, Y., & Merhof, N. (2022). Medical image segmentation review: The success of U-Net. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4030-4054.

[13] Wang, G., Li, W., Ourselin, S., & Vercauteren, T. (2022). Edge-guided representation learning for brain tumor segmentation. IEEE Transactions on Medical Imaging, 41(9), 2307-2319.

[14] Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2024). A comprehensive survey on multi-modal medical image segmentation. Artificial Intelligence Review, 57(2), 1-45.

[15] Ding, Y., Sun, L., & Zheng, Y. (2023). V-Net to Swin-UNETR: A survey on 3D medical image segmentation. Pattern Recognition, 137, 109279.

[16] Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.

[17] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., et al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

[18] Liang, T., Glossner, J., Wang, L., Shi, S., & Zhang, X. (2021). Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461, 370-403.

[19] Gouk, S., Hosseini, M., Cowls, J., & Nissen, S. (2021). Knowledge distillation: A survey. International Journal of Computer Vision, 129(3), 828-854.

[20] Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2022). A survey of quantization methods for efficient neural network inference. IEEE Access, 10, 105658-105676.

[21] Nagel, M., Fournarakis, M., Amjad, R. A., Yadan, Y., & Blankevoort, T. (2021). A white paper on neural network quantization. arXiv preprint arXiv:2106.08295.

[22] Yao, Z., Dong, Z., Zheng, Z., Gholami, A., Yu, J., et al. (2022). HAWQ-V3: Dyadic neural network quantization. In International Conference on Machine Learning, 11875-11886.

[23] Liu, Z., Wang, Y., Han, K., Zhang, W., Ma, S., & Gao, W. (2023). Post-training quantization for vision transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 281-290.

[24] Zhang, Y., Chen, J., Li, X., & Wang, Y. (2024). High-performance deployment of medical image segmentation models using TensorRT. Computers in Biology and Medicine, 168, 107751.

[25] Chen, X., Liu, Y., Zhao, Z., & Sun, H. (2024). Edge computing-based medical image segmentation for point-of-care devices. IEEE Internet of Things Journal, 11(5), 8234-8245.

[26] Bai, J., Lu, F., Zhang, K., et al. (2019). ONNX: Open neural network exchange. GitHub repository.

[27] NVIDIA Corporation. (2023). NVIDIA TensorRT Developer Guide. Available online.

[28] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., et al. (2017). In-datacenter performance analysis of a tensor processing unit. In ISCA, 1-12.

[29] Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., et al. (2017). Mixed precision training. In ICLR.

[30] Markidis, S., Der Chien, S. W., Laure, E., Peng, I. B., & Vetter, J. S. (2018). NVIDIA tensor core programmability, performance & precision. In IPDPSW, 522-531.

Downloads

Published

2026-03-15

Issue

Section

Articles

How to Cite

Wang, Y., & Hou, B. (2026). Research on Brain Tumor Segmentation Deployment Method Based on Mixed Precision Inference Acceleration. International Journal of Advanced Engineering and Technology Research, 1(1), 89-97. https://doi.org/10.54097/a2ghj869