Mongo Aggregation Framework

Aggregation queries in MongoDB

We are going to explain how to use the MongoDB Aggregation framework using what could be a real case scenario.

We are going to assume that we have a simple app that catalogs your books. Since this is just a simple app, let’s assume that our book collection has the following scheme:

{
  author: [
      { name: 'Kiko Fernández', id: 'ABCD'},
      ...
  ],
  title: 'Mastering MongoDB',
  ISBN: 'XXXXXXXXXX',
  description: 'How to master MongoDB.',
  rating: '4'
},
...
]

Simple queries such as giving us all the books that Kiko Fernandez has written would be something like:

db.books.find({'author.id': 'ABCD'});

Nonetheless, if we want to get something such as a list of documents with all the people I have written books with, then we need to use the Aggregation framework or map-reduce job. In this case, we are going to explain how to do this query using the Aggregation Framework:

  1. We want to match only the documents where I am one of the authors
    // CODE
    { $match: { 'author.id': 'ABCD' } },
    
  2. We don’t need the other fields (at least for this query), so let’s work only with the author array:
    //CODE
    { $project: { author: '$author' } },
    
    // DOCUMENTS
        {
          author: [
            {name: 'Kiko Fernandez', id: 'ABCD'},
            {name: 'Pontus Naimell', id: 'XXXX'}
          ]
        },{
          author: [
            {name: 'Kiko Fernandez', id: 'ABCD'},
            {name: 'Alberto Fernandez', id: 'YYYY'}
          ]
        },
        ...
    
  3. Let’s unfold the array:
    // CODE
    { $unwind: '$author' },
    
        // DOCUMENTS
        {
          author: {
            {name: 'Kiko Fernandez', id: 'ABCD'}
          }
        },{
          author: {
            {name: 'Pontus Naimell', id: 'XXXX'}
          }
        },{
          author: {
            {name: 'Kiko Fernandez', id: 'ABCD'}
          }
        },{
          author: {
            {name: 'Alberto Fernandez', id: 'YYYY'}
          }
        },
        ...
      
  4. In this step, as we can see from the code above, we don’t need the author document containing another sub-document. Furthermore, the author is not an array anymore since we unfolded the array, so we are going to re-structure the data:
    // CODE
    { $project: { name: '$author.name', id: '$author.id' }},
    
        // DOCUMENTS
        {
          id: 'ABCD',
          name: 'Kiko Fernandez'
        },{
          id: 'XXXX',
          name: 'Pontus Naimell'
        },{
          id: 'ABCD',
          name: 'Kiko Fernandez'
        },{
          id: 'YYYY',
          name: 'Alberto Fernandez'
        },
        ...
      
  5. Since this is not about me, but about my colleagues with whom I wrote with, I am going to filter myself out.
    // CODE
    { $match: { id: { $ne: 'ABCD' }}},
    
    // DOCUMENTS
    {
      id: 'YYYY',
      name: 'Alberto Fernandez'
    }, {
      id: 'XXXX',
      name: 'Pontus Naimell'
    }
    
  6. Finally, the only thing left is to group by the id. We do the group by id because we might have more than one author repeated twice when we did the unwind operation. Moreover, we did a small trick with the name. This one consists on doing the grouping by id, knowing that ‘name’ attributes have exactly the same data. Thus, we can use the $first operation to select one of those names. The use of the $first operation is usually done after a $sort operation ’cause otherwise the order is not guarantee. In our case, the data is the same and therefore, we can use the $first operation to select one.
      // CODE
      { $group: {
          _id: '$id', // mandatory
          name: { $first: '$name' } // trick
        }
      }
      
    // DOCUMENTS
    {
      id: 'YYYY',
      name: 'Alberto Fernandez'
    }, {
      id: 'XXXX',
      name: 'Pontus Naimell'
    }
    

The code we have written is the following:

db.books.aggregate(
  // 1. we want all the documents where I am
  // one of the authors
  { $match: { 'author.id': 'ABCD' } },

  // 2. we want to work only with the author
  // array of documents
  { $project: { author: '$author' } },

  // 3. we undwind / 'unfold' the array to get a
  // list with all the authors in different documents
  { $unwind: '$author' },

  // 4. we change the structure of the document,
  // so no more author field
  { $project: { name: '$author.name', id: '$author.id' }},

  // 5. remove myself from the list
  { $match: { id: { $ne: 'ABCD' }}},

  // 6. group by id 'cause otherwise we might have
  // duplicate authors
  { $group: {
      _id: '$id', // mandatory
      name: { $first: '$name' } // trick
    }
  }
)

I hope everything is clear so, go ahead and play with the Mongo Aggregation Framework!!!

This entry was published on February 1, 2013 at 1:18 pm. It’s filed under Uncategorized and tagged , . Bookmark the permalink. Follow any comments here with the RSS feed for this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: