not my Web Front End

This example takes a single server based web-hosting solution as covered in the previous article and starts the process of using AWS to offload traffic from the Front End webserver.  If following the previous example non-http traffic has been offloaded using VPC firewall features, even SSH etc are not open to the outside world, so now to offload and secure the http traffic.

Route53

First, to link the website addresses to dynamically provisioned resources such as Application Load Balancer or CloudFront, it’s convenient to manage the DNS in Amazon Route53 which allows an address like mysite.com to be aliased to Amazon managed resources and dynamically route to different resources without a pre-defined ip address.

In short, create the Hosted Zone, copy the DNS entries for the domain and change the namservers for the domain, all well explained here:

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/migrate-dns-domain-in-use.html

AWS Certificate Manager

Request a Certificate for each domain which will be load balanced via the AWS Certificate Manager.  Where domains are not hosted by AWS, Certificate Manager will verify ownership by requesting special DNS records:  when these are created, check back with Certificate Manager to ensure that these are correctly detected, Certificate Manager will then issue the certificate.    [If using Route 53, the Certificate Manager will detect this and offer to create the records for you.]

Note that although Certificate Manager supports multiple domains per certificate, Chrome may not regard an AWS certificate as valid if it presented for a web address which does not match the primary certificate domain.

Local server configuration changes

When using a Load Balancer all inbound http connections will arrive from within the VPC, from the local network address of the Load Balancer itself.  If  using CloudFront or other CDN then inbound traffic will be from the IP addresses of the CDN so adjust server-side logging and security features accordingly before enabling these to avoid problems..

Application Load Balancer

  • Create a Target Group, add Instance(s) or create an auto-scaling group to feed the Target Group with instances.  Check the Target Group is healthy (has at least one Registered target with Status healthy).
  • Create an Application Load Balancer, pointing at the Target Group
    • add HTTPS in the first step and in the second step select the certificate(s) created in AWS Certificate Manager
  • use a local host entry to pointing at the Load Balancer IP to test the website, check that the traffic is passing through the ALB in ALB monitoring and check on the webserver that traffic is coming in from the ALB and that the original ip is also detected (sent by ALB in the X-Forwarded-For header).  When satisfactory,
  • In Route 53, edit the Record Set for the web address to Alias Target of the Application Load Balancer

Load Balancer Considerations

Although the load balancer listens on HTTP/2, it uses HTTP/1.1 at the back end to distribute the requests to healthy notes in the Target Group, therefore after switching to the load balancer, the server logs will show http/2 is no longer being used – in fact loading a page with an image library or other large amount of files may be notably slower as the files are now being requested one by one by the load balancer rather than passed directly to the client in one http2 request. This is remediated by increasing the servers in the Target Group, or better still by using CloudFront and offloading requests for static files to S3.

AWS Web Application Firewall

AWS WAF filters http traffic with a customised ruleset for blocking http attacks. This performs the same function as ModSecurity but offloads this processing from the servers – but at a cost with charges of $1/[rule or managed rule group]/month plus $5/month charge per web ACL – and with a limited number of rules per ACL.

AWS WAF cannot be enabled for an EC2 Instance, it has to be attached to either a CloudFront distribution or a Load Balancer.

From within the EC2 Load Balancer console, select the Load Balancer, Integrated Services, AWS WAF and create a web ACL in the desired region and attach it to the Load Balancer.

The standard rule sets are non-customisable groups of rules.  Logging needs to be enabled separately via Kinesis, see: https://docs.aws.amazon.com/waf/latest/developerguide/logging.html

So for example one of the standard rule sets disallows file uploads, which may be a commonly required function on WordPress or other CMS.  To  find out which rule group it is we turn on logging, test the function that isn’t working, wait a minute for Kinesis to stream the results to S3 and then check the logs which may look something like this [redacted], here we see that only a small file has been posted but  has triggered a BLOCK on rule SizeRestrictions_BODY in AWSManagedRulesCommonRuleSet:

{
   "timestamp": 1590398159992,
   "formatVersion": 1,
   "webaclId": "arn:aws:wafv2:[id]",
   "terminatingRuleId": "AWS-AWSManagedRulesCommonRuleSet",
   "terminatingRuleType": "MANAGED_RULE_GROUP",
   "action": "BLOCK",
   "terminatingRuleMatchDetails": [],
   "httpSourceName": "ALB",
   "httpSourceId": "[id]",
   "ruleGroupList": [
      {
         "ruleGroupId": "AWS#AWSManagedRulesCommonRuleSet",
         "terminatingRule": {
            "ruleId": "SizeRestrictions_BODY",
            "action": "BLOCK"
         },
         "nonTerminatingMatchingRules": [],
         "excludedRules": null
      }
   ],
   "rateBasedRuleList": [],
   "nonTerminatingMatchingRules": [],
   "httpRequest": {
      "clientIp": "[id]",
      "country": "GB",
      "headers": [
         {
            "name": "Host",
            "value": "www.mysite.com"
         },
         {
            "name": "Content-Length",
            "value": "43290"
         },
[additional client details redacted]
      "uri": "/wp-admin/async-upload.php",
      "args": "",
      "httpVersion": "HTTP/2.0",
      "httpMethod": "POST",
      "requestId": null
   }
}

While Amazon does not show the detail of the rules, there are selectable groups of rules within AWSManagedRulesCommonRuleSet so if file uploads need to be supported (which is always a security risk), Edit rule on the common rule set allows a rule subset to be overridden:

AWS_WAF-ruleset

CloudFront

Now it’s really easy to tell the difference in performance of a website from different global locations using services such as https://www.fastorslow.com/app from the creators of WordFence.

To address global latency effectively, a CloudFront Distribution between the users and the site allows webserver responses to be cached in Edge Locations close to the users reducing the latency overhead from different geographies – and reducing the load on the webserver.

Not everything can be cached though, for example we might not want to cache anything to do with logged on users or site administration and we may need to allow cookies for user session and ensure the users language, currency, shopping cart etc and be served effectively.  For WordPress there’s a good article on aws blogs for getting started, which would need tuning according to the features enabled – for example the article doesn’t consider WooCommerce requirements for cookies etc.

In short:

  1. create the CloudFront distribution with the necessary settings for cookie forwarding etc.  Remember to whitelist Host header when the origin is a webserver hosting multiple sites.
  2.  add a separate behaviour to the distribution for static files, which can be set to ignore cookies, add longer TTL – eg 1 year for files which are truly static (3153600 seconds), and optionally set the origin to S3 – see below.
  3. Test!  Change local hosts file to map the web address to the ip of the CloudFront distribution – any issues review steps 2, 3 until resolved
  4. In Route53 change the Alias for the web address to point to the CloudFront distribution

Caching and Cache-Control

The CloudFront TTL options govern how long CloudFront itself should cache responses – see: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html.

CloudFront does not add Cache-Control headers to the response to the browser, it only passes on these if provided by the source.

Not adding Cache-Control headers will cause browsers to request the files more often than necessary even if the CloudFront TTLs have been set to allow CloudFront itself to continue caching the file. It is possible to add a caching response header programmatically via Lambda@Edge – but this also has cost implications and why not fix it at the source.

  • S3 does not provide Cache-Control by default – see below.
  • Apache mod_expires settings in .htaccess like:
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
ExpiresDefault "access plus 1 month"
</IfModule>
  • Nginx where used as a caching proxy (eg Engintron) will override Apache .htaccess mod_expires settings using common_http.conf settings per static file type as eg:
location ~* \.(?:ico|jpg|jpeg|gif|png|webp)$ {
    include proxy_params_common;
    include proxy_params_static;
    expires 60d;
}

 

CloudFront over S3 bucket

Several considerations here:

  • When creating a CloudFront Behaviour on an S3 origin do NOT use the path pattern /wp-content/* as given in the aws blog article since this may allow .php and other code files to be downloaded, instead specify simply as eg “*.jpg” etc.  Unfortunately path patterns are very simple and do not support regex or multiple values and are case sensitive, so multiple behaviours would need to be added to catch all static files.
  • When switching Origin, the CloudFront console erases Whitelist Headers and will recommend not using them for S3 – just remember to add them back in again if changing the origin back to the webserver or load balancer.
  • Cache-Control is not added by default when you add files to S3
    • s3 sync can add cache control while adding files (but won’t apply it to unchanged files)
      aws s3 sync /path s3://yourbucket/ --cache-control max-age=31536000
    • s3cmd (https://github.com/s3tools/s3cmd) will allow adding the header programatically to files already uploaded
    • ./s3cmd --recursive modify --add-header="Cache-Control:max-age=31536000" s3://yourbucket/yourpath

Synchronisation to S3

How will your S3 bucket be kept up to date?

  • a periodic batch synchronisation?
  • a tool to synchronise files as they are uploaded / upload files directly to S3?
  • an s3fs mount for the fileshare?  what happens to your costs if this is continually scanned by your server security software?  monitor AWS Billing and Cost management for growth in charges for PUT, COPY, POST, or LIST requests.

Here are some examples for a typical WordPress installation: these all can work but for various reasons none of them can be unequivocally recommended so depends on the situation:

Cron

Set up a cron job calling a script like this to sync the WordPress uploads directory for the current month:

YEAR=`date +%Y`
MONTH=`date +%m`
/usr/local/bin/aws s3 sync /home2/[user]/public_html/wp-content/uploads/$YEAR/$MONTH s3://[bucket]/home2/[user]/public_html/wp-content/uploads/$YEAR/$MONTH --cache-control max-age=31536000

Disadvantages:

  • latency (uploaded file may be unavailable to display until next cron job)
  • excess scan frequency (increase AWS billing cost from running sync operations when nothing has changed)

inotify-tools

with inotify-tools a simple script could be set up to handle filesystem events and synchronise modified files to s3.  This would be called with a directory parameter for each directory needing monitoring:

inotifywait -mr -e modify --format '%w %f' $1 | while read dir file; do
  FILECHANGE=${dir}${file}
  /usr/local/bin/aws s3 cp "$FILECHANGE" "s3://[bucket]$FILECHANGE" --cache-control max-age=31536000
done

This gets files uploaded as soon as they are written, but…

Disadvantages: inotifywait suffers various limitations: editors may generate multiple events on the same file, potential race conditions in recursive directory watching. And the lifetime of the monitor needs managing – a cron @reboot isn’t enough.

Watchman from facebook

Watchman addresses some of the limitations of other solutions by waiting a configurable settle period before firing trigger with list of files – and only runs a single instance of the trigger at a time. Potentially, covers the gaps of inotify-tools while providing the same event driven online response.

Disadvantages:  additional layer of complexity, additional software layer to maintain and govern (won’t be covered by a yum update), and perhaps Watchman originated for automating build steps rather than running production workloads (though seems likely to handle these better than a custom inotifywait solution).

Symlink to s3fs

Make  images etc available via S3 as soon as they are uploaded: link the uploads directory directly to the drive mounted from s3fs and grant appropriate permissions.

cd /home2/user/public_html/wp-content
ln -s /backup/user/public_html/wp-content/uploads
chown -R user:user /home2/user/public_html/wp-content/uploads
find . -type d -exec chmod -R 755 {} \;
find . -type f -exec chmod -R 644 {} \;

Disadvantages:  mostly fundamentally relate to the fact that s3 is not a normal file system and Amazon don’t really support this type of usage:

  • s3fs is not supported or endorsed by amazon
  • extra configuration required in the mounting
  • the security requirements may conflict with the cPanel multiuser security setup and prevent the webserver from serving up files correctly even when Apache FollowSymLinks is enabled
  • s3fs has to add about 5 additional x-amz-meta- Metadata items to each object to track filesystem like properties
  • limitations as described in the s3fs FAQ
  • s3fs consumes considerable server resources, both memory and tmp space which offset the advantages of s3:  while the s3 bucket is infinite, server tmp space is not and can easily run out of space during intensive processes such as backup.

WordPress integration

A solution integrated with the CMS would allow images to be copied to S3 at the most opportune moment and configured individually for each site (or not, as appropriate).

As usual in the case of WordPress, someone else has done the integration already and WP Offload Media Lite by Delicious Brains https://wordpress.org/plugins/amazon-s3-and-cloudfront/ seems a fairly thorough solution with 40,000 active installs.  It does also add Cache-Control headers to the S3 objects.

Disadvantages:

CloudFront over S3 implementation

  • In the CloudFront Distribution, Create Origin, select the bucket which will be the source of the files, select the option to Restrict Bucket Access, create Origin access Identity and update Bucket Policy – otherwise CloudFront will not be able to retrieve the files, resulting in 403 errors and missing images etc in the web pages.
  • Create/update the behaviours for the static file patterns as per the CloudFront section.

 

part of a series on cPanel migration to AWS

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s