Author(s): , ,
Institution(s): 1. National Astronomical Observatories, 2. Wuhan University
For a long time the quasars’ photometric redshifts have been estimated by learning from all available training dataset. In the scenario of big data, the amount of available data is huge and the dataset may include noise. Consequently, a major research challenge is to design a learning process that gains the most informative data from the available dataset in terms of optimal learning of the underlying relationships. By filtering out noisy data and redundant data, the optimal learning can improve both estimation accuracy and speed. Towards this objective, in this study we figure out an active learning approach that automatically learns a series of suppport vector regression models based on small size of different sampling data chunks. These models are applied on a validation dataset. By active learning, those validation data with estimation results vary in a certain range are regarded as the informative data and are aggregated in multiple training datasets. Next, the aggregated training datasets are combined into an ensemble estimator through averaging and then applied on a test dataset. Our experimental results on SDSS data show that the proposed method is helpful to improve quasars’ photometric redshift estimation accuracy.