Random Forest (RF) algorithm is known to be one of the most efficient classification methods. Due to its inherent interdisciplinary nature, it draws researchers from different backgrounds. This study aims at investigating the performance of RF algorithm using multispectral satellite images having different spatial resolutions and scene characteristics. The satellite images used include Ikonos and QuickBird images with four multispectral bands. Ikonos image taken in 2003 covers mainly urban area, whereas QuickBird images acquired in 2005 and 2008 covers both urban and rural areas, respectively. QuickBird image taken in 2005 also contains noisy patterns over Black Sea due to waves resulting from windy weather.
To evaluate the performance of RF, the classification results are compared with the results obtained from Gentle AdaBoost (GAB), Support Vector Machine (SVM) and Maximum Likelihood classification (MLC) algorithms. Preliminary results indicate that RF gives higher classification accuracies than other methods. For Ikonos image over urban area, the results show that RF algorithm gives 10% higher classification accuracy than SVM, whereas GAB algorithm has the lowest classification accuracy (14 % lower than RF). For QuickBird image (taken in 2008) of rural area, RF gives the best result compared to the others. Also, for QuickBird image containing noisy pattern, RF has around 11% higher overall accuracy than SVM Random Forest (RF) algorithm is known to be one of the most efficient classification methods. Due to its inherent interdisciplinary nature, it draws researchers from different backgrounds. This study aims at investigating the performance of RF algorithm using multispectral satellite images having different spatial resolutions and scene characteristics.
To perform appropriate RFC, the MATLAB codes follow the procedure below, after data set is loaded.
1. Decide the number of decision trees For example, it is 500.
2. Decide candidates of the ratio of the number of explanatory variables (X) for decision trees
For example, they are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8.
3. Run RFC for every candidate of X-ratio and estimate values of objective variable (Y) for Out Of Bag (OOB) samples
4. Calculate misclassification rate between actual Y and estimated Y for each candidate of X-ratio
5. Decide the optimal X-ratio with the minimum misclassification rate value
6. Construct RFC model with the optimal X-ratio
7. Calculate confusion matrix between actual Y and calculated Y for the optimal X-ratio
8. Calculate confusion matrix between actual Y and estimated Y of OOB samples for the optimal X-ratio
9. Estimate Y based on the RFC model in 6.
If it takes too much time to train RFC, please decrease the number of decision trees.
Image classification is the process of converting Digital Number (DN) values to significant land cover information at every pixel location in the image. In other words, image classification assigns pixels of an image to many classes according to statistical decision rules in spectral domain or logical decision rules in spatial domain. Spectral domain uses decision rules, which are based on spectral values of pixels; whereas, decision rules in spatial domain are based on neighborhood information of pixels and spatial contexts such as shape, texture and pattern.
Ensemble classification methods are learning algorithms that construct a set of classifiers instead of one classifier, and then classify new data points by taking a vote of their predictions. The most commonly used ensemble classifiers are Bagging, Boosting and RF. To initialize RF algorithm, the user must define two parameters. These parameters are N and m, which are the number of trees to grow and the number of variables used to split each node, respectively. First, N bootstrap samples are drawn from the 2/3 of the training data set. Remaining 1/3 of the training data, also called out-of-bag (OOB) data, are used to test the error of the predictions. Then, an un-pruned tree from each bootstrap sample is grown such that at each node m predictors are randomly selected as a subset of predictor variables, and the best split from among those variables is chosen.
This study is carried out using high resolution multiple images over the city of Trabzon, Turkey and its vicinity with both urban and rural features. Image data used include QuickBird pan-sharpened multispectral (0.6 m) images acquired.
Classification accuracy of RF method depends on user-defined parameters N and m; hence, optimal selection of these parameters increases classification accuracy. To find the optimum values for N and m, multiple combinations are tested and assessed to obtain more reliable thematic maps for the study areas. For different N and m combinations, OOB error, test accuracy, kappa and computational time results for the training set are given in Table 1.As seen in Table 1, N = 100 and m = 2 is selected for Ikonos image over urban area. For QuickBird image taken over urban area N = 350 and m = 2 is chosen; whereas N = 500 and m = 2 is selected for QuickBird image of rural area..