{"id":1336,"date":"2015-09-28T00:23:28","date_gmt":"2015-09-28T04:23:28","guid":{"rendered":"http:\/\/agileadam.com\/?p=1336"},"modified":"2021-08-02T17:00:45","modified_gmt":"2021-08-02T21:00:45","slug":"automatic-screenshots","status":"publish","type":"post","link":"https:\/\/agileadam.com\/2015\/09\/automatic-screenshots\/","title":{"rendered":"Automatic Screenshots of Drupal Content"},"content":{"rendered":"

In an earlier post<\/a> I recommended webkit2png<\/em> for automatically screenshotting a list of URLs. A lot of time has passed since that post, and I’ve discovered a more robust tool.\u00a0Pageres<\/a>\u00a0is incredible, and it has a CLI and an api.<\/p>\n

I’ll let you discover, on your own, what the\u00a0Pageres<\/em>\u00a0tool can do. I needed to take screenshots of all of the content types on a site, at all of the important resolutions. Here’s a quick Drupal function I threw together to get N\u00a0number of random nodes per content type:<\/p>\n

function generate_random_node_urls_by_type($num_per_type = 3, $include_type = FALSE, $alias = FALSE, $node_types = array()) {\r\n  $output = '';\r\n  if (empty($node_types)) {\r\n    foreach (node_type_get_types() as $type) {\r\n      $node_types[] = $type->type;\r\n    }\r\n  }\r\n  foreach ($node_types as $node_type) {\r\n    $result = db_query_range('SELECT n.nid as nid, ua.alias as alias\r\n                              FROM {node} n\r\n                              LEFT JOIN {url_alias} ua ON ua.source = CONCAT(\\'node\/\\', n.nid)\r\n                              WHERE n.type = :ntype\r\n                              ORDER BY RAND()', 0, $num_per_type, array(':ntype' => $node_type));\r\n    if ($result) {\r\n      while ($row = $result->fetchAssoc()) {\r\n        if ($include_type) {\r\n          $output .= str_pad($node_type, 35);\r\n        }\r\n        if ($alias && $row['alias']) {\r\n          $output .= $GLOBALS['base_url']. '\/' . $row['alias'] . \"\\n\";\r\n        }\r\n        else {\r\n          $output .= $GLOBALS['base_url'] . '\/node\/' . $row['nid'] . \"\\n\";\r\n        }\r\n      }\r\n    }\r\n  }\r\n \r\n  return $output;\r\n}\r\n\r\n\/\/ Example 1: 10 of each specific node type:\r\ndpm(generate_random_node_urls_by_type(10, FALSE, TRUE, array('homepage_feature', 'page')));\r\n\r\n\/\/ Example 2: 5 of every node type:\r\ndpm(generate_random_node_urls_by_type(5, FALSE, TRUE));<\/pre>\n

The function spits out a list of URLs ready for usage with\u00a0pageres<\/em>. Simply save the results to a txt file (urls.txt<\/em> in my example below).<\/p>\n

Here’s the pageres command I used to generate the screenshots:<\/p>\n

pageres --delay 1 --header='Cache-Control: no-cache' --filename=\"<%= date %> - <%= url %> - <%= size %>\" 1200x100 1024x100 768x100 520x100 320x100 < urls.txt<\/pre>\n

Why the 100-pixel height? Well, the height doesn’t really matter unless you enable cropping. I use 100 on all of them so that it’s obvious the value doesn’t mean anything. I tried 1200×1 but it breaks pageres. 1200×100 works perfectly.<\/p>\n

How about another quick function? Here’s one to generate a list of URLs within a menu:<\/p>\n

function generate_node_urls_in_menu($menu_name, $alias = FALSE) {\r\n  $output = '';\r\n  $result = db_query('SELECT m.link_path as link_path, ua.alias as alias\r\n                      FROM {menu_links} m\r\n                      INNER JOIN {url_alias} ua ON ua.source = m.link_path\r\n                      WHERE menu_name = :mname', array(':mname' => $menu_name));\r\n  if ($result) {\r\n    while ($row = $result->fetchAssoc()) {\r\n      if ($alias) {\r\n        $output .= $GLOBALS['base_url']. '\/' . $row['alias'] . \"\\n\";\r\n      }\r\n      else {\r\n        $output .= $GLOBALS['base_url']. '\/' . $row['link_path'] . \"\\n\";\r\n      }\r\n    }\r\n  }\r\n\r\n  return $output;\r\n}\r\n\r\ndpm(generate_node_urls_in_menu('menu-for-undergraduates', TRUE));<\/pre>\n

Now, how does this handle many URLs? Well, unfortunately not that well. Python comes to the rescue in just a few lines of simple code. This will process one URL at a time, generating all resolutions for each URL. I’m certain this could be better (filename should be an argument, for example), but it gets the job done.<\/p>\n

import subprocess\r\nwith open(\"urls.txt\", \"r\") as file:\r\n    for line in file:\r\n        print \"Generating screenshots for\", line\r\n        p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='<%= date %> - <%= url %> - <%= size %>' 1200x100 1024x100 768x100 520x100 320x100\",\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(line)\r\n        p.communicate()[0]\r\n        p.stdin.close()<\/pre>\n

UPDATE #1:<\/strong> Here’s a rough draft of a Python script that is a little more robust than the code above. It still lacks some niceties, but I’ll just wait until next time I need it to make improvements.<\/p>\n

You would execute this like: python ~\/repos\/pageres_capture\/pageres_capture.py urls.txt<\/span><\/p>\n

#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x100\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action=\"store\", type=file)\r\nargs = parser.parse_args()\r\n\r\n# loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    # Clean the line\r\n    lineclean = line.strip()\r\n\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='<%= date %> - <%= url %> - <%= size %>' \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.stdin.write(line)\r\n    p.communicate()[0]\r\n    p.stdin.close()<\/pre>\n

UPDATE\u00a0#2:\u00a0<\/strong>Here’s a version that appends the URL to the top of the screenshot using ImageMagick. You can turn it off using –no-overlay. As with the code above, this is alpha code. As I’m looking at it it’s clear I should make “sizes” an argument\/switch. In fact, I should probably allow several of the pageres options.<\/p>\n

This requires ImageMagick. Before running, you must be able to run mogrify<\/span>\u00a0 successfully\u00a0from the command line.<\/p>\n

#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x100\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action='store', type=file)\r\nparser.add_argument('--no-overlay', help='Do not add URL overlay', action='store_true')\r\nargs = parser.parse_args()\r\n\r\n# loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    # Clean the line\r\n    lineclean = line.strip()\r\n\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='<%= date %> - <%= url %> - <%= size %>' \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.stdin.write(line)\r\n    p.communicate()[0]\r\n    p.stdin.close()\r\n\r\n    if not args.no_overlay:\r\n        p = subprocess.Popen('OUTPUT=\"$(ls -Art | tail -n 1)\"; mogrify -pointsize 14 -background Gold -gravity North -splice 0x18 -annotate +0+2 \\'%s\\' \"${OUTPUT}\"' % lineclean,\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(line)\r\n        p.communicate()[0]\r\n        p.stdin.close()<\/pre>\n

Update #3:<\/strong> Same as above but corrects behavior if a URL is not accessible (and shows an error as it encounters those). This still only works for Python 2.7:<\/p>\n

#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\nfrom urllib import urlopen\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x1200\"\r\n\r\n# CLI arguments from https:\/\/www.npmjs.com\/package\/pageres-cli\r\n# Example:\r\n# options = \"--header='Cache-Control: no-cache' --filename='<%= date %> - <%= url %> - <%= size %>'\"\r\noptions = \"--format=png --header='Cache-Control: no-cache' --filename='<%= date %> - <%= url %> - <%= size %>'\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action='store', type=file)\r\nparser.add_argument('--no-overlay', help='Do not add URL overlay', action='store_true')\r\nargs = parser.parse_args()\r\n\r\n# Loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    lineclean = line.strip()\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    try:\r\n        urlopen(lineclean).getcode()\r\n    except:\r\n        LOG.error('Line %d - Error capturing %s' % (i, lineclean))\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres \\\"\" + lineclean + \"\\\" \" + options + \" \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.communicate()[0]\r\n    p.stdin.close()\r\n\r\n    if not args.no_overlay:\r\n        p = subprocess.Popen('OUTPUT=\"$(ls -Art | tail -n 1)\"; mogrify -pointsize 14 -background Gold -gravity North -splice 0x18 -annotate +0+2 \\'%s\\' \"${OUTPUT}\"' % lineclean,\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(lineclean)\r\n        p.communicate()[0]\r\n        p.stdin.close()\r\n<\/pre>\n

 <\/p>\n","protected":false},"excerpt":{"rendered":"

In an earlier post I recommended webkit2png for automatically screenshotting a list of URLs. A lot of time has passed since that post, and I’ve discovered a more robust tool.\u00a0Pageres\u00a0is incredible, and it has a CLI and an api. I’ll let you discover, on your own, what the\u00a0Pageres\u00a0tool can do. I needed to take screenshots of all of the content types on a site, at all of the important resolutions. Here’s a quick Drupal function I threw together to get N\u00a0number of random nodes per content type:<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[79,76],"tags":[219,83],"_links":{"self":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336"}],"collection":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/comments?post=1336"}],"version-history":[{"count":22,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336\/revisions"}],"predecessor-version":[{"id":2943,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336\/revisions\/2943"}],"wp:attachment":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/media?parent=1336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/categories?post=1336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/tags?post=1336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}