.

Saturday, March 30, 2019

Experiment for Plant Recognition

Experiment for Plant RecognitionAbstractIn figureical tenuous theatrical establish compartmentalisation (SRC) and plodding SRC (WSRC) algorithms, the examination types ar sparg solo played by all information specimens. They evince the sparsity of the coding coefficients to a greater extentover without considering the topical anaesthetic anaesthetic structure of the input data. Although the more(prenominal) fostering samples, the better the flimsy copy, it is period consuming to find a global sparse pattern for the taste sample on the large-scale database. To overcome the shortcoming, aiming at the difficult bother of whole kit foliage lore on the large-scale database, a two- grade local semblance based variety learning (LSCL) rule is proposed by combining local mean-based salmagundi (LMC) rule and local WSRC (LWSRC). In the premier(prenominal) stop, LMC is applied to coarsely enlightening the leaven sample. k nighest inhabits of the tally sampl e, as a dwell subset, is appointed from for individually one didactics sectionalisation, so the local geometric center of to separately one secernate is elaborated. S do-nothingdidate live subsets of the ladder sample atomic bod 18 determined with the runner S smallest exceeds mingled with the seek sample and apiece local geometric center. In the bet on peg, LWSRC is proposed to approximately represent the turn up sample by a one-dimensional weighted sum of all k-S samples of the S vista neighbour subsets. The rationale of the proposed method is as fol haplesss (1) the first stage aims to eliminate the prepargon samples that atomic number 18 far from the running play sample and make that these samples engage no tacks on the final classification decision, consequently select the expectation neighbor subsets of the analyse sample. Thus the classification problem becomes simple with fewer subsets (2) the second stage pays more attention to those knowledge samples of the keepdidate neighbor subsets in weighted representing the scrutiny sample. This is helpful to accurately represent the riddle sample. Experimental results on the paging get a line database edge that the proposed method not only has a high accuracy and low time follow, but also can be pass byly interpreted.Keywords local standardisedity-based-classification learning (LSCL) Local mean-based classification method (LMC) Weighted sparse representation based classification (WSRC) Local WSRC (LWSRC) Two-stage LSCL.1. IntroductionSimilarity-based-classification learning (SCL) methods make work of the pair-wise similarities or dissimilarities between a test sample and each preparation sample to design the classification problem. K- nigh neighbor (K-NN) is a non-parametric, simple, attractive, relatively fledged purpose SCL method, and is easy to be quickly achieved 1,2. It has been widely applied to galore(postnominal) applications, including elaborat er vision, pattern acknowledgment and machine learning 3,4. Its basic processes be calculating the standoffishness (as dissimilarity or similarity) between the test sample y and each reproduction sample, selecting k samples with k minimum distances as the nearest k neighbors of y, finally determining the category of y that hygienic-nigh of the nearest k neighbors belong to. In weighted K-NN, it is useful to determine weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the classification method than the more dissimilarity ones. One of the disadvantages of K-NN is that, when the distribution of the development set is uneven, K-NN whitethorn cause misjudgment, because K-NN only cares the order of the first k nearest neighbor samples but does not consider the sample density. Moreover, the public presentation of K-NN is seriously influenced by the existing outliers and noise samples. To overcome these problems, a number of local SCL (LSC L) methods put one over been proposed recently. The local mean-based nonparametric classifier (LMC) is said to be an improved K-NN, which can dare the noise influences and classify the unbalanced data 5,6. Its main radical is to calculate the local mean-based vector of each class as the nearest k neighbor of the test sample, and the test sample can be sort into the category that the nearest local mean-based vector belongs to. One disadvantage of LMC is that it cannot well represent the similarity between multidimensional vectors. To improve the performance of LMC, Mitani et al. 5 proposed a reliable local mean-based K-NN algorithm (LMKNN), which employs the local mean vector of each class to classify the test sample. LMKNN has been already successfully applied to the group-based classification, discriminant depth psychology and distance metric learning. Zhang et al. 6 further improved the performance of LMC by utilizing the romaine distance instead of Euclidean distance to sele ct the k nearest neighbors. It is proved to be better suitable for the classification of multidimensional data. in a higher place SCL, LMC and LSCL algorithms are often not effective when the data patterns of different classes carrefour in the regions in feature space. Recently, sparse representation based classification (SRC) 8, a SCL modified manner, has attracted much attention in various areas. It can achieve better classification performance than other typical gang and classification methods such as SCL, LSCL, linear discriminant digest (LDA) and principal cistron analysis (PCA) 7 in both(prenominal) cases. In SRC 9, a test image is encoded over the original training set with sparse constraint impose on the encoding vector. The training set acts as a mental lexicon to linearly represent the test samples. SRC empha surfaces the sparsity of the coding coefficients but without considering the local structure of the input data 10,11. However, the local structure of the data is be to be important for the classification tasks. To make use of the local structure of the data, about weighted SRC (WSRC) and local SCR (LSRC) algorithms have been proposed. Guo et al. 12 proposed a similarity WSRC algorithm, in which, the similarity matrix between the test samples and the training samples can be make watered by various distance or similarity measurements. Lu et al. 13 proposed a WSRC algorithm to represent the test sample by exploiting the weighted training samples based on l1-norm. Li et al. 14 proposed a LSRC algorithm to perform the sparse decomposition in local neighborhood. In LSRC, instead of answer the l1-norm bound to the lowest degree square problem for all of training samples, they understandd a similar problem in the local neighborhood of each test sample.SRC, WSRC, similarity WSRC and LSRChave whatsoeverthing in common, such as, the individual sparsity and local similarity between the test sample and the training samples are considered to en sure that the neighbor coding vectors are similar to each other if they have salutary correlation, and the weighted matrix is constructed by incorporating the similarity information, the similarity weighted l1-norm minimisation problem is constructed and figure out, and the obtained coding coefficients tend to be local and square-built.Leaf based plant species citation is one of the most important branches in pattern recognition and artificial intelligence 15-18. It is useful for agricultural producers, botanists, industrialists, food engineers and physicians, but it is a NP-hard problem and a challenging research 19-21, because plant leaves are quite irregular, it is difficult to accurately unwrap their shapes compared with the industrial work pieces, and some between-species leaves are different from each other, as shown in Fig1.A and B, while within-species leaves are similar to each other, as shown in Fig.1C 22.test sample training 1 training 2 training 3 training 4 trainin g 5 training 6 training 7(A) iv different species leaves (B) Four different species leaves(C) Ten same species leavesFig.1 plant switch examplesSRC can be applied to cockle based plant species recognition 23,24. In theory, in SRC and modified SRC, it is well to sparsely represent the test sample by too many training samples. In practice, however, it is time consuming to find a global sparse representation on the large-scale paging image database, because leaf images are quite colonial than appear images. To overcome this problem, in the opus, motivated by the recent fall out and success in LMC 6, modified SRC 12-14, two-stage SR 25 and SR based coarse-to-fine face recognition 26, by creatively integrating LMC and WSRC into the leaf classification, a refreshed plant recognition method is proposed and verified on the large-scale dataset. divers(prenominal) from the classical plant classification methods and the modified SRC algorithms, in the proposed method, the plant speci es recognition is implemented through a coarse recognition process and a fine recognition process.The major contributions of the proposed method are (1) a two-stage plant species recognition method, for the first time, is proposed (2) a local WSRC algorithm is proposed to sparsely represent the test sample (3) the experimental results indicate that the proposed method is very combative in plant species recognition on large-scale database.The remainder of this paper is arranged as follows in arm 2, we briefly review LMC, SRC and WSRC. In Section 3, we describe the proposed method and provide some rationale and interpretation. Section 4 presents experimental results. Section 5 offers conclusion and future work.2. cogitate worksIn this section, some related works are introduced. call back n training samples,, from different classes X1, X2,,XC. is the sample number of the ith class, then.2.1 LMCLocal mean-based nonparametric classification (LMC) is an improved K-NN method 6. It use s Euclidean distance or romaine distance to select nearest neighbors and measure the similarity between the test sample and its neighbors. In general, the cosine distance is more suitable to describe the similarity of the multi-dimensional data.LMC is described as follows, for each test sample y, bill 1 Select k nearest neighbors of y from the jth class, as a neighbor subset represented by musical note 2 Calculate the local mean-based vector for each classby, (1)Step 3 Calculate the distance between y and.Step 4 if Euclidean distance metric is espouse while if cosine distance metric is adopted.2.2 SRCSRC relies on a distance metric to penalize the dissimilar samples and award the similar samples. Its main idea is to sparsely represent and classify the test sample by a linear combination of all the training samples. The test sample is designate into the class that produces the minimum residue.SRC is described as follows,Input n training samples, a test sample.Output the class lab el of y.Step 1 Construct the dictionary matrixby n training samples. Each column of A is a training sample called basis vector or atom. harden each column of A to unit l2-norm.A is required to be unit l2-norm (or bounded norm) in order to avoid the trivial resolutenesss that are due to the ambiguity of the linear reconstruction.Step 2 Construct and solve an l1-norm minimisation problem, (2)where x is called as spare representation coefficients of y.Eq. (2) can be usually approximate by an l1-norm minimization problem, (3)whereis the threshold of the residue.Eq.(3) can be generalized as a constrained least square problem, (4)where 0 is a scalar regularization parameter which balances the tradeoff between the sparsity of the solution and the reconstruction error.Eq.(4) is a constrained LASSO problem, its expand solution is strand in Ref. 27.Step 3 Compute residue, whereis the characteristic function that selects the coefficients associated with the ith classStep 4 the class label of, y, is identified as.2.3 WSRCWSRC integrates both sparsity and vicinity structure of the data to further improve the classification performance of SRC. It aims to impose larger weight to the training samples that are farer from the test sample. distinct from SRC, WSRC solves a weighted l1-norm minimization problem, (5)where W is a diagonal weighted matrix, and its diagonal elements are.Eq.(5) makes sure that the coding coefficients of WSRC tend to be not only sparse but also local in linear representation 13, which can represent the test sample more robustly.2.4 LSRCThough a lot of instances have been reported that WSRC performs better than SRC in various classification problems, WSRC forms the dictionary by using all the training samples, thus the size of the generated dictionary may be large, which will make adverse effect to solving the l1-norm minimization problem. To overcome this drawback, a local sparse representation based classification (LSRC) is proposed to perform sp arse decomposition in a local manner. In LSRC, K-NN criterion is exploited to find the nearest k neighbors for the test samples, and the selected samples are utilized to construct the over-complete dictionary. Different from SRC, LSRC solves a weighted l1 minimization problem, (6)wherestands for data matrix which consists of the k nearest neighbors of y.Compared with the original SRC and WSRC, although the computational appeal of LSRC will be saved remarkably when, LSRC does not specialize different weight to the different training samples.3. Two-stage LSCLFrom the preceding(prenominal) analysis, it is found that each of LMC, WSRC and LSRC has its advantages and disadvantages. To overcome the difficult problem of plant recognition on the large-scale leaf image database, a two-stage LSCL leaf recognition method is proposed in the section. It is a sparse decomposition problem in a local manner to obtain an approximate solution. Compared with WSRC and LSRC, LSCL solves a weighted l1 -norm constrain least square problem in the candidate local neighborhoods of each test sample, instead of solving the same problem for all the training samples. believe there are a test sampleand n training samples from C classes, andis the sample number of ith class,is jth sample of the ith class. Each sample is assumed to be a one-dimensional column vector. The proposed method is described in detail as follows.3.1 First stage of LSCLCalculate the Euclidean distancebetween y and, and select k nearest neighbors of y fromwith the first k smallest distances, the selected neighbor subset tell as, .Calculate the bonnie of, (7)Calculate the Euclidean distancebetween y and.From C neighbor subsets, selectneighbor subsets with the firstsmallest distancesas the candidate subsets for the test sample, in simple terms, denoted as.The training samples fromare reserved as the candidate training samples for the test sample, and the other training samples are eliminated from the training set.3.2 Second step of LSCLFrom the first stage, it is noted that there aretraining samples from all the candidate subsets. For simplify, we just as well express the jth training sample ofis. The second stage first represents the test sample as a linear combination of all the training samples of, and then exploits this linear combination to classify the test sample.From the first stage, we have obtained the Euclidean distancebetween y and each candidate sample. By, a new local WSRC is proposed to solve the same weighted l1-norm minimization problem as Eq.(5), (8)where is the dictionary constructed bytraining samples of,is the weighted diagonal matrix, is the Euclidean distance between y and.In Eq.(8), the weighted matrix is a locality adaptor to penalize the distance between y and. In the above SRC, WSRC, LSRC and LSCL, the l1norm constraint least square minimization problem is solved by the approach proposed in 28, which is a specialized interior-point method for solving the large scale p roblem. The solution of Eq.(8) can be expressed as (9)From Eq.(9), is expressed as the sparse representation of the test sample. In representing the test sample, the sum of the contribution of the ith candidate neighbor subset is calculated by (10)whereis the jth sparse coefficient corresponding to the ith candidate nearest neighbor subset.Then we calculate the residue of the ith candidate neighbor subset corresponding to test sample y, (11)In Eq.(11), for the ith class (), a smaller modal(a)s the greater contribution to representing y. Thus, y is finally classified into the class that produces the smallest residue.3.3 Summary of two-stage LSCLFrom the above analysis, the main steps of the proposed method are summarized as follows.Suppose n training samples from Cdifferent classes, a test sample y, the number k of the nearest neighbors of y, the number S of the candidate neighbor subsets.Step 1. Compute the Euclidean distance between the test sample y and every training sample, resp ectively.Step 2. Through K-NN rules, find k nearest neighbors from each training class as the neighbor subset for y, calculate the neighbor average of the neighbor subset of each class, and calculate the distance between y and the neighbor average.Step 3. Determine S neighbor subsets with the first S smallest distances, as the candidate neighbor subsets for y.Step 4. Construct the dictionary by all training samples of the S candidate neighbor subsets and then construct the weighted l1-norm minimization optimization problem as Eq.(8).Step 5. clear Eq.(8) and obtain the sparse coefficients.Step 6. For each candidate neighbor subset, compute the residue between yand its estimationby Eq.(11).Step 7. Identify the class labelthat has the minimum ultimate residue and classify y into this class.3.4 Rationale and interpretation of LSCLIn practical, some between-species leaves are very different from the other leaves, as shown in Fig.1A. They can be easily classified by the Euclidean distanc es between the leaf digital image matrices. However, some between-species leaves are very similar to each other, as shown in Fig.1B. They cannot be easily classified by some simple classification methods. In Figs.1A and B, suppose the first leaf is the test sample, while other seven leaves are training samples. It is difficult to line the label of the test leaf by the simple classification method, because the test leaf is very similar to Nos. 4,5,6 and 7 in Fig.1B. However, it is sure that the test sample is not Nos.1, 2 and 3. So, we can naturally firstly block off these three leaves. This exclusion method example is the purpose of the first stage of LSCL. From Fig.1C, it is found that there is large difference between the leaves of the same species. Therefore, in plant recognition, an optimal scheme is to select some training samples that are relatively similar to the test sample as the candidate training samples, such as Nos. 2 and 9 in Fig.1C are similar to the test sample in Fig.1C, instead of considering all training samples. The average neighbor distance is apply to coarsely recognize the test sample. The average neighbor distance as dissimilarity is more effective and robust than the original distance between the test and each training leaf, especially in the case of existing noise and outliers.From the above analysis, in the first stage of LSCL, it is reasonable to assume that the leaf close to the test sample has great effect, on the contrary, if a leaf is far enough from the test sample it will have little effect and even have side-effect on the classification decision of the test sample. These leaves should be discarded firstly, and then the later plant recognition task will be clear and simple. In the same way, we can use the similarity between the test sample and the average of its nearest neighbors to select some neighbor subsets as the candidate training subsets of the test sample. If we do so, we can eliminate the side-effect on the classif ication decision of the neighbor subset that is far from the test sample. Usually, for the classification problem, the more the classes, the lower the classification accuracy, so the first stage is very useful.In the second stage of LSCL, there are S nearest neighbor subsets as candidate class labels of the test sample, thus it is indeed face up with a problem simpler than the original classification problem, becauseand, i.e., few training samples are reserved to match the test sample. Thus, the computational damage is mostly cut and the recognition rate will be improved greatly. We analyze the computational cost of LSCL in theory as follows.There are n samples from C classes, and every sample is an m-1 column vector, the first stage need to calculate the Euclidean distance, select k nearest neighbors from each class, and calculate the average of the k nearest neighbors, then the computational cost is about. In second stage, there aretraining samples to construct the dictionary A , the cost ofis, the cost ofis, and the cost ofis. The second stage has computational cost of+. The computational cost of LSCL is ++in total. The computational cost of the classical SRC algorithm is8,9. Compared with SRC, it is found that the computational cost of LSCL will be saved remarkably when.4. Experiments and result analysisIn this section, the proposed method is passd on a plant species leaf database and compared with the state-of-the-art methods.4.1 Leaf image data and experiment preparationTo validate the proposed method, we apply it to the leaf classification task using the ICL dataset. All leaf images of the dataset were collected at the Botanical Garden of Hefei, Anhui Province of China by Intelligent Computing Laboratory (ICL), Chinese Academy of Sciences. The ICL dataset contains 6000 plant leaf images from 200 species, in which each class has 30 leaf images. whatever examples are shown in Fig.2. In the database, some leaves could be distinguished easily, such as t he first 6 leaves in Fig.2A, while some leaves could be distinguished difficultly, such as the last 6 leaves in Fig.2A. We say the proposed method by two situations, (1) two-fold cross validation, i.e., 15 leaf images of each class are haphazard selected for training, and the rest 15 samples are utilize for testing (2) leave-one-out cross validation, i.e., one of each class are randomly selected for testing and the rest 29 leaf images per class are used for training.(A) Original leaf images(B) Gray-scale images(C) Binary texture imagesFig.2 Samples of different species from ICL database

No comments:

Post a Comment