Using GROUP_CONCAT to Combine Rows in a Drupal Query

Recently I was working on a D7 to D8 migration. I was trying to import news items and their taxonomy terms (among many other things). To make things simple I wanted the query results to have (for each node) a single field that contained a comma-separated list of taxonomy terms which I could then explode during the row processing. Using GROUP_CONCAT we can achieve this! Let’s break it down:

The Drupal 7 site has the following structure (focusing on the important bits for this blog post):

The migration relies on the d7_node migrate source plugin, which basically queries for nodes of a specific type. The query object looks like this (simplified for this blog post):

The field_categories field collection field on the news_item content type looks like this in the database (table = field_data_field_categories):

field_categories table

You can see we have an entity_id which is the field collection’s host entity (the news_item).

We can pull the field collection entities into the query with a left join:

Next we have to dive another level deep to get the values of the fields within the field collection. This is the step that gives us the title for the blog post. When we execute the query as we have it so far we retrieve 2899 records. If we simply request the field_term_audience via a join we will end up with 6,867 records (additional rows for each term reference value). We can group by n.nid to have one row per node, but then what happens with all of our taxonomy terms? My goal was to have a single field that contained a comma-separated list of term ids. We can achieve this using GROUP_CONCAT. First, let’s add the GROUP BY:

This brings us from 6,867 records back to 2899 records but we don’t have any term data.

To use GROUP_CONCAT we have to implement the addExpression() method of the Query object.

First, let’s take a look at the field_data_field_term_audience table:field_data_field_term_audience table

We can use the field collection’s entity_id to find all of its audience field terms. After we have that data we can GROUP_CONCAT the data into a single field:

The result is that the query results contain a tag_list_audience element that contains a string like 276,289,301 .

This data can then be exploded in the migration. Thought it’s not the focus of this post I’ll share what that looks like:

Here’s the final code for my overridden query() method (trimmed down to remove distractions):

Note: If you’re stuck on a MySQL version prior to 5.7 you may have issues with the grouping. Do some googling for ONLY_FULL_GROUP_BY if you have issues. On a shared host where I could not modify the MySQL 5.6 configuration I had to add groupBy statements for all of the other fields that are in the standard d7_node query.

That looks something like this:

DISCLAIMER: I still need to figure out how to handle the revision_id throughout these queries. It may not matter, but I’m still working on it.

Tagged with: , , , , , , ,
Posted in Development

Leave a Reply

Your email address will not be published. Required fields are marked *

*