Plant diseases affect the growth of crops and reduce their yields, so early detection of plant diseases and implementation of treatments are crucial for management and decision-making in agricultural production. Most of the current plant disease diagnosis techniques utilize image data for model training. The rapid development of multimodal learning techniques provides new solutions for accurate plant disease diagnosis. However, most of the existing plant disease databases have single data, which makes it difficult for plant disease diagnosis models to know enough about the plant disease field. To address the above issues, we construct a large multimodal database of plant diseases and propose a series of multimodal pre-trained models of plant diseases to improve the performance of disease diagnosis. More specifically, the PDDM database has 40 plant species, 116 disease or health categories, 205,007 images, and two expert diagnosis textual descriptions for each image. PDDM-Pretrain consists of visual pre-trained models, textual pre-trained models, and a Transformer-based pre-trained model. These pre-trained models can be used as the visual and textual backbone of the plant disease diagnosis multimodal model, providing multimodal features with sufficient prior knowledge for the subsequent multimodal fusion module, as shown in the figure below.
The role of pre-trained models in plant disease diagnosis