{"id":1336,"date":"2015-09-28T00:23:28","date_gmt":"2015-09-28T04:23:28","guid":{"rendered":"http:\/\/agileadam.com\/?p=1336"},"modified":"2021-08-02T17:00:45","modified_gmt":"2021-08-02T21:00:45","slug":"automatic-screenshots","status":"publish","type":"post","link":"https:\/\/agileadam.com\/2015\/09\/automatic-screenshots\/","title":{"rendered":"Automatic Screenshots of Drupal Content"},"content":{"rendered":"<p>In an <a href=\"http:\/\/agileadam.com\/2014\/10\/webkit2png\/\">earlier post<\/a> I recommended <em>webkit2png<\/em> for automatically screenshotting a list of URLs. A lot of time has passed since that post, and I&#8217;ve discovered a more robust tool.\u00a0<a href=\"https:\/\/github.com\/sindresorhus\/pageres-cli\/blob\/master\/readme.md\">Pageres<\/a>\u00a0is incredible, and it has a CLI and an api.<\/p>\n<p>I&#8217;ll let you discover, on your own, what the\u00a0<em>Pageres<\/em>\u00a0tool can do. I needed to take screenshots of all of the content types on a site, at all of the important resolutions. Here&#8217;s a quick Drupal function I threw together to get N\u00a0number of random nodes per content type:<!--more--><\/p>\n<pre class=\"lang:php decode:true \">function generate_random_node_urls_by_type($num_per_type = 3, $include_type = FALSE, $alias = FALSE, $node_types = array()) {\r\n  $output = '';\r\n  if (empty($node_types)) {\r\n    foreach (node_type_get_types() as $type) {\r\n      $node_types[] = $type-&gt;type;\r\n    }\r\n  }\r\n  foreach ($node_types as $node_type) {\r\n    $result = db_query_range('SELECT n.nid as nid, ua.alias as alias\r\n                              FROM {node} n\r\n                              LEFT JOIN {url_alias} ua ON ua.source = CONCAT(\\'node\/\\', n.nid)\r\n                              WHERE n.type = :ntype\r\n                              ORDER BY RAND()', 0, $num_per_type, array(':ntype' =&gt; $node_type));\r\n    if ($result) {\r\n      while ($row = $result-&gt;fetchAssoc()) {\r\n        if ($include_type) {\r\n          $output .= str_pad($node_type, 35);\r\n        }\r\n        if ($alias &amp;&amp; $row['alias']) {\r\n          $output .= $GLOBALS['base_url']. '\/' . $row['alias'] . \"\\n\";\r\n        }\r\n        else {\r\n          $output .= $GLOBALS['base_url'] . '\/node\/' . $row['nid'] . \"\\n\";\r\n        }\r\n      }\r\n    }\r\n  }\r\n \r\n  return $output;\r\n}\r\n\r\n\/\/ Example 1: 10 of each specific node type:\r\ndpm(generate_random_node_urls_by_type(10, FALSE, TRUE, array('homepage_feature', 'page')));\r\n\r\n\/\/ Example 2: 5 of every node type:\r\ndpm(generate_random_node_urls_by_type(5, FALSE, TRUE));<\/pre>\n<p>The function spits out a list of URLs ready for usage with\u00a0<em>pageres<\/em>. Simply save the results to a txt file (<em>urls.txt<\/em> in my example below).<\/p>\n<p>Here&#8217;s the pageres command I used to generate the screenshots:<\/p>\n<pre class=\"lang:default decode:true \">pageres --delay 1 --header='Cache-Control: no-cache' --filename=\"&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;\" 1200x100 1024x100 768x100 520x100 320x100 &lt; urls.txt<\/pre>\n<p>Why the 100-pixel height? Well, the height doesn&#8217;t really matter unless you enable cropping. I use 100 on all of them so that it&#8217;s obvious the value doesn&#8217;t mean anything. I tried 1200&#215;1 but it breaks pageres. 1200&#215;100 works perfectly.<\/p>\n<p>How about another quick function? Here&#8217;s one to generate a list of URLs within a menu:<\/p>\n<pre class=\"lang:php decode:true \">function generate_node_urls_in_menu($menu_name, $alias = FALSE) {\r\n  $output = '';\r\n  $result = db_query('SELECT m.link_path as link_path, ua.alias as alias\r\n                      FROM {menu_links} m\r\n                      INNER JOIN {url_alias} ua ON ua.source = m.link_path\r\n                      WHERE menu_name = :mname', array(':mname' =&gt; $menu_name));\r\n  if ($result) {\r\n    while ($row = $result-&gt;fetchAssoc()) {\r\n      if ($alias) {\r\n        $output .= $GLOBALS['base_url']. '\/' . $row['alias'] . \"\\n\";\r\n      }\r\n      else {\r\n        $output .= $GLOBALS['base_url']. '\/' . $row['link_path'] . \"\\n\";\r\n      }\r\n    }\r\n  }\r\n\r\n  return $output;\r\n}\r\n\r\ndpm(generate_node_urls_in_menu('menu-for-undergraduates', TRUE));<\/pre>\n<p>Now, how does this handle many URLs? Well, unfortunately not that well. Python comes to the rescue in just a few lines of simple code. This will process one URL at a time, generating all resolutions for each URL. I&#8217;m certain this could be better (filename should be an argument, for example), but it gets the job done.<\/p>\n<pre class=\"lang:python decode:true \">import subprocess\r\nwith open(\"urls.txt\", \"r\") as file:\r\n    for line in file:\r\n        print \"Generating screenshots for\", line\r\n        p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;' 1200x100 1024x100 768x100 520x100 320x100\",\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(line)\r\n        p.communicate()[0]\r\n        p.stdin.close()<\/pre>\n<p><strong>UPDATE #1:<\/strong> Here&#8217;s a rough draft of a Python script that is a little more robust than the code above. It still lacks some niceties, but I&#8217;ll just wait until next time I need it to make improvements.<\/p>\n<p>You would execute this like: <span class=\"lang:default decode:true crayon-inline \">python ~\/repos\/pageres_capture\/pageres_capture.py urls.txt<\/span><\/p>\n<pre class=\"lang:python decode:true \">#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x100\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action=\"store\", type=file)\r\nargs = parser.parse_args()\r\n\r\n# loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    # Clean the line\r\n    lineclean = line.strip()\r\n\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;' \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.stdin.write(line)\r\n    p.communicate()[0]\r\n    p.stdin.close()<\/pre>\n<p><strong>UPDATE\u00a0#2:\u00a0<\/strong>Here&#8217;s a version that appends the URL to the top of the screenshot using ImageMagick. You can turn it off using &#8211;no-overlay. As with the code above, this is alpha code. As I&#8217;m looking at it it&#8217;s clear I should make &#8220;sizes&#8221; an argument\/switch. In fact, I should probably allow several of the pageres options.<\/p>\n<p>This requires ImageMagick. Before running, you must be able to run <span class=\"lang:default decode:true crayon-inline \">mogrify<\/span>\u00a0 successfully\u00a0from the command line.<\/p>\n<pre class=\"lang:python decode:true \">#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x100\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action='store', type=file)\r\nparser.add_argument('--no-overlay', help='Do not add URL overlay', action='store_true')\r\nargs = parser.parse_args()\r\n\r\n# loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    # Clean the line\r\n    lineclean = line.strip()\r\n\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres --header='Cache-Control: no-cache' --filename='&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;' \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.stdin.write(line)\r\n    p.communicate()[0]\r\n    p.stdin.close()\r\n\r\n    if not args.no_overlay:\r\n        p = subprocess.Popen('OUTPUT=\"$(ls -Art | tail -n 1)\"; mogrify -pointsize 14 -background Gold -gravity North -splice 0x18 -annotate +0+2 \\'%s\\' \"${OUTPUT}\"' % lineclean,\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(line)\r\n        p.communicate()[0]\r\n        p.stdin.close()<\/pre>\n<p><strong>Update #3:<\/strong> Same as above but corrects behavior if a URL is not accessible (and shows an error as it encounters those). This still only works for Python 2.7:<\/p>\n<pre class=\"lang:php decode:true \">#!\/usr\/bin\/env python\r\n\r\nimport argparse\r\nimport subprocess\r\nimport logging\r\nimport sys\r\nfrom urllib import urlopen\r\n\r\n# Example:\r\n# sizes = \"1200x100 1024x100 768x100 520x100 320x100\"\r\nsizes = \"1200x1200\"\r\n\r\n# CLI arguments from https:\/\/www.npmjs.com\/package\/pageres-cli\r\n# Example:\r\n# options = \"--header='Cache-Control: no-cache' --filename='&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;'\"\r\noptions = \"--format=png --header='Cache-Control: no-cache' --filename='&lt;%= date %&gt; - &lt;%= url %&gt; - &lt;%= size %&gt;'\"\r\n\r\nLOG = logging.getLogger(__name__)\r\nLOG.setLevel(logging.DEBUG)\r\nformatter = logging.Formatter(\"%(asctime)s [%(levelname)s] %(message)s\", \"%Y-%m-%d %H:%M:%S\")\r\n\r\n# Console logging\r\nch = logging.StreamHandler(sys.stdout)\r\nch.setLevel(logging.INFO)\r\nch.setFormatter(formatter)\r\nLOG.addHandler(ch)\r\n\r\nparser = argparse.ArgumentParser(description='Captures screenshots of URLs from a file using Pageres', version='1.0', add_help=True)\r\nparser.add_argument('inputfile', action='store', type=file)\r\nparser.add_argument('--no-overlay', help='Do not add URL overlay', action='store_true')\r\nargs = parser.parse_args()\r\n\r\n# Loop through all of the lines in the input file and process them\r\nlines = args.inputfile.read().splitlines()\r\n\r\ni = 0\r\nfor line in lines:\r\n    # Increase the line number by one for our user messages\r\n    i += 1\r\n\r\n    lineclean = line.strip()\r\n    if lineclean == '':\r\n        LOG.info('Line %d - Ignoring blank line' % i)\r\n        continue\r\n\r\n    try:\r\n        urlopen(lineclean).getcode()\r\n    except:\r\n        LOG.error('Line %d - Error capturing %s' % (i, lineclean))\r\n        continue\r\n\r\n    LOG.info('Line %d - Capturing %s' % (i, lineclean))\r\n    p = subprocess.Popen(\"pageres \\\"\" + lineclean + \"\\\" \" + options + \" \" + sizes,\r\n        shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n    p.communicate()[0]\r\n    p.stdin.close()\r\n\r\n    if not args.no_overlay:\r\n        p = subprocess.Popen('OUTPUT=\"$(ls -Art | tail -n 1)\"; mogrify -pointsize 14 -background Gold -gravity North -splice 0x18 -annotate +0+2 \\'%s\\' \"${OUTPUT}\"' % lineclean,\r\n            shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)\r\n        p.stdin.write(lineclean)\r\n        p.communicate()[0]\r\n        p.stdin.close()\r\n<\/pre>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In an earlier post I recommended webkit2png for automatically screenshotting a list of URLs. A lot of time has passed since that post, and I&#8217;ve discovered a more robust tool.\u00a0Pageres\u00a0is incredible, and it has a CLI and an api. I&#8217;ll let you discover, on your own, what the\u00a0Pageres\u00a0tool can do. I needed to take screenshots of all of the content types on a site, at all of the important resolutions. Here&#8217;s a quick Drupal function I threw together to get N\u00a0number of random nodes per content type:<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[79,76],"tags":[219,83],"_links":{"self":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336"}],"collection":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/comments?post=1336"}],"version-history":[{"count":22,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336\/revisions"}],"predecessor-version":[{"id":2943,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/posts\/1336\/revisions\/2943"}],"wp:attachment":[{"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/media?parent=1336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/categories?post=1336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/agileadam.com\/wp-json\/wp\/v2\/tags?post=1336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}