Abstract:
Many scientific fields are now facing a data deluge. One of the approaches proposed to allow the processing of such volumes is the programming paradigm MapReduce introduced by Google in 2004. This very simple implementation pattern is divided into two phases, map and reduce, between which a phase of massive exchange of data takes place among the machines running the application.
In this article, we propose the integration of the dispensing algorithm at intervals (Distributed Range Partitioning) in the MapReduce paradigm. The schema obtained is called MR2P*. This new approach aims at dealing with dynamic scheduling of data and shuffle phase optimization (the intermediary phase between map and reduce).
The experiments show that our approach produces performance within a very interesting run-time execution and a better transition to scale (Scalability)