A Hybrid CNN–Vision Transformer Framework with Attention-Based Feature Optimization for Medicinal Leaf Classification
Main Article Content
Abstract
Accurate identification of medicinal plants plays a crucial role in healthcare, herbal medicine, and biodiversity conservation. Traditional plant identification methods rely heavily on expert knowledge and manual inspection, making them time-consuming and error-prone. Recent advancements in deep learning have demonstrated promising results in image-based plant classification; however, existing approaches often suffer from high computational complexity, limited robustness, and poor generalization. This work presents a hybrid medicinal leaf classification framework that integrates convolutional neural networks (CNNs) and Vision Transformers (ViTs) to exploit both local and global feature representations. Initially, an image quality assessment module filters and enhances input leaf images based on blur and illumination conditions. A lightweight CNN is employed to extract local spatial features, while a ViT encoder captures global contextual information. The extracted features are fused and optimized using an attention-based feature selection mechanism to emphasize discriminative attributes. Finally, MobileNet and DenseNet models are used for accurate classification of about 92% medicinal plant species. Experimental results on a publicly available medicinal leaf dataset demonstrate improved classification performance and robustness compared to conventional deep learning models. The proposed system provides an effective and scalable solution for automated medicinal plant identification.