In this post, we are going to explain how to write a simple MapReduce job. Before starting out coding, we need to understand what means a MapReduce job from a high level point of view.
A Map operation is an operation that will group values based on the specified key. For instance, this example might be used by Amazon in order to display how good has an author performed in general (we are using the dataset from the Aggregation example):
This Map function is querying the database and grouping by the rating value and author (we are using a composite key). The array that we pass to the Reduce function has the following data structure:
A Reduce operation is an operation that given a key and an array of values, uses those values to return an answer to the specific problem. In our case, let’s define the reduce function as follows:
As we can see, we are just sorting the results of the given array.
We execute the MapReduce job as follows:
This command creates a new collection called “books_ratings” where we save the output of the performed job.
The implementation in Java is pretty straightforward:
For the reasons given above, and the recommendations at the MongoDB Conference in Stockholm, you should use the MapReduce jobs just in a couple of selected cases since everything (if not all) you can do with in MapReduce job can be done as well using the Aggregation Framework (explained in this blog here). Therefore, my recommendation would be to use the MapReduce jobs in cases where the dataset is not that big (since the performance is not as good as executing Java code) and, of course, avoid using it for realtime operations.
I hope that everything is clear and simple enough so that you can continue reading the official docs (which are really good, BTW).
Have fun coding!!!