Plant identification has applications in ethnopharmacology and agriculture. Since leaves are one of a distinguishable feature of a plant, they are routinely used for identification. Recent developments in deep learning have made it possible to accurately identify the majority of samples in five publicly available leaf datasets. However, each dataset captures the images in a highly controlled environment. This paper evaluates the performance of EfficientNet models, B1 to B6, and several other convolutional neural network (CNN) architectures when applied to a combination of the LeafSnap, Middle European Woody Plants 2014, Flavia, Swedish, and Folio datasets. To normalize the impact of imbalance resulting from combining the original datasets, oversampling, undersampling, and transfer learning techniques were used to construct an end-to-end CNN classifier. Greater emphasis was placed on metrics that are appropriate for a diverse-imbalanced dataset, rather than stressing high performance on any one of the original datasets. The B6 model of EfficientNet achieved highly accurate results, with an F-score of 0.9938 on the combined dataset.

Viewing Instructions

The downloadable file is very large and will take additional time to download.

You may wish to download and view these individual folders of images.

There are 364 folders of 42,420 images.

Contact Information

Viraj Gajjar, vgf4c@mst.edu
Ph.D. candidate, Electrical Engineering Department
Missouri University of Science and Technology

Anand Nambisan, akn36d@mst.edu
Ph.D. candidate, Electrical Engineering Department
Missouri University of Science and Technology

Dr. Kurt Kosbar, kosbar@mst.edu
Associate Professor, Electrical Engineering Department
Missouri University of Science and Technology


Electrical and Computer Engineering


Recommended citations

  • V. Gajjar, A. Nambisan, and K. Kosbar, “Plant Identification in a Combined-Imbalanced Leaf Dataset,” IEEE Access (To be submitted)
  • Kumar, N., Belhumeur, P., Biswas, A., Jacobs, D., Kress, W., Lopez, I. and Soares, J., "Leafsnap: A Computer Vision System for Automatic Plant Species Identification," in European conference on computer vision, Springer 2012, pp. 502-516.
  • P. Novotný and T. Suk, “Leaf recognition of woody species in Central Europe,” Biosystems Engineering, vol. 115, no. 4, pp. 444–452, 2013.
  • S. G. Wu, F. S. Bao, E. Y. Xu, Y.-X. Wang, Y.-F.Chang, and Q.-L. Xiang, “A leaf recognition algorithm for plant classification using probabilistic neural network,” in2007 IEEE international symposium on signal processing and information technology, IEEE,2007, pp. 11–16.
  • O. Söderkvist, Computer vision classification of leaves from Swedish trees, 2001.
  • T. Munisami, M. Ramsurn, S. Kishnah, and S. Pudaruth, “Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest NeighbourClassifiers,” inProcedia Computer Science, vol. 58, Elsevier, 2015, pp. 740–747.

Document Type


Document Version

Final Version

File Format


File Size

22 GB




© 2021 Missouri University of Science and Technology, All rights reserved.

kFold.zip (2 kB)
Python code for Stratified k-fold cross-validation