Abstract:
Today, the web is a vast and valuable source of weather data. Every day, several petabytes of meteorological information are
generated, leading to the weather big data. By using various Machine Learning (ML) techniques, weather big data is used for forecasting
and decision making. However, processing such a large weather data is a challenge for ML algorithms and computing resources. Weather
big data often includes a very large number of variables, which requires huge resources for the analysis and processing. As a result, ML
techniques used produce forecasts that are not always efficient and take longer to forecast.
To improve the prediction precision in a minimum of time, this paper aims to investigate the influence of data sampling techniques
on the accuracy of ML models used in weather data analysis. To this end, we used the dimensionality reduction technique ”Random
Projection” (RP) combined with two ML classifiers (Decision Tree, Na¨ıve Bayes) and applied it on weather big data collected from web
sources using web scraping technique. The results of the conducted experimentation show that reducing the dimensionality of weather
data considerably maximizes the performance of ML models and thus improves the accuracy of weather forecasts while reducing the
processing resources