Using GROUP_CONCAT to Combine Rows in a Drupal Query
Recently I was working on a D7 to D8 migration. I was trying to import news items and their taxonomy terms (among many other things). To make things simple I wanted the query results to have (for each node) a single field that contained a comma-separated list of taxonomy terms which I could then explode during the row processing. Using GROUP_CONCAT we can achieve this! Let’s break it down:
The Drupal 7 site has the following structure (focusing on the important bits for this blog post):
1 2 3 |
-- News Item (news_item) <content type> ---- Categories (field_categories) <field collection> ------ Audience (field_term_audience) <taxonomy reference; unlimited values> |
The migration relies on the d7_node migrate source plugin, which basically queries for nodes of a specific type. The query object looks like this (simplified for this blog post):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
// Select node in its last revision. $query = $this->select('node_revision', 'nr') ->fields('n', [ 'nid', 'type', 'language', 'status', 'created', 'changed', 'comment', 'promote', 'sticky', 'tnid', 'translate', ]) ->fields('nr', [ 'vid', 'title', 'log', 'timestamp', ]); $query->addField('n', 'uid', 'node_uid'); $query->addField('nr', 'uid', 'revision_uid'); $query->innerJoin('node', 'n', static::JOIN); $query->condition('n.type', 'news_item'); |
The field_categories field collection field on the news_item content type looks like this in the database (table = field_data_field_categories):
You can see we have an entity_id which is the field collection’s host entity (the news_item).
We can pull the field collection entities into the query with a left join:
1 |
$query->leftJoin('field_data_field_categories', 'fcat', 'fcat.entity_id = n.nid'); |
Next we have to dive another level deep to get the values of the fields within the field collection. This is the step that gives us the title for the blog post. When we execute the query as we have it so far we retrieve 2899 records. If we simply request the field_term_audience via a join we will end up with 6,867 records (additional rows for each term reference value). We can group by n.nid to have one row per node, but then what happens with all of our taxonomy terms? My goal was to have a single field that contained a comma-separated list of term ids. We can achieve this using GROUP_CONCAT. First, let’s add the GROUP BY:
1 |
$query->groupBy('n.nid'); |
This brings us from 6,867 records back to 2899 records but we don’t have any term data.
To use GROUP_CONCAT we have to implement the addExpression() method of the Query object.
First, let’s take a look at the field_data_field_term_audience table:
We can use the field collection’s entity_id to find all of its audience field terms. After we have that data we can GROUP_CONCAT the data into a single field:
1 2 |
$query->leftJoin('field_data_field_term_audience', 'fta', 'fta.entity_id = fcat.field_categories_value'); $query->addExpression('GROUP_CONCAT(DISTINCT fta.field_term_audience_tid)', 'php_tag_list_audience'); |
The result is that the query results contain a tag_list_audience element that contains a string like 276,289,301 .
This data can then be exploded in the migration. Thought it’s not the focus of this post I’ll share what that looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
process: field_tax_audience: - plugin: skip_on_empty method: process source: php_tag_list_audience - plugin: explode delimiter: , - plugin: migration_lookup migration: grin_taxonomy_terms no_stub: true |
Here’s the final code for my overridden query() method (trimmed down to remove distractions):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
/** * {@inheritdoc} */ public function query() { $query = parent::query(); // Only migrate published records. $query->condition('n.status', 1); // field_categories is a field collection containing a term ref field. // Get field_categories field collection ID (used only for joins below). $query->leftJoin('field_data_field_categories', 'fcat', 'fcat.entity_id = n.nid'); // Get the terms associated with the categories field collection. // Concatenate the rows (per field) into a single field (comma-separated). $query->leftJoin('field_data_field_term_audience', 'fta', 'fta.entity_id = fcat.field_categories_value'); $query->addExpression('GROUP_CONCAT(DISTINCT fta.field_term_audience_tid)', 'php_tag_list_audience'); // Only return one row per node. $query->groupBy('n.nid'); return $query; } |
Note: If you’re stuck on a MySQL version prior to 5.7 you may have issues with the grouping. Do some googling for ONLY_FULL_GROUP_BY if you have issues. On a shared host where I could not modify the MySQL 5.6 configuration I had to add groupBy statements for all of the other fields that are in the standard d7_node query.
That looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
// Only return one row per node. // This is one way of solving "isn't in GROUP BY" issues with < MySQL 5.7. // We cannot set ONLY_FULL_GROUP_BY in Webfaction, unfortunately. $query->groupBy('n.nid'); $query->groupBy('n.uid'); $query->groupBy('n.type'); $query->groupBy('n.language'); $query->groupBy('n.status'); $query->groupBy('n.created'); $query->groupBy('n.changed'); $query->groupBy('n.comment'); $query->groupBy('n.promote'); $query->groupBy('n.sticky'); $query->groupBy('n.tnid'); $query->groupBy('n.translate'); $query->groupBy('nr.vid'); $query->groupBy('nr.title'); $query->groupBy('nr.log'); $query->groupBy('nr.timestamp'); $query->groupBy('nr.uid'); |
DISCLAIMER: I still need to figure out how to handle the revision_id throughout these queries. It may not matter, but I’m still working on it.