Posts tagged 'AWS' — Typing with mittens on

Not helpful, “aws s3 sync”

Rachel Evans — Mon, 24 Apr 2017 08:54:24 GMT

I have a requirement to change the Content-Type / Cache-Control headers of a load of objects in S3. At the API level, there's no way of modifying the metadata of an existing object — rather, you create a new object with the desired metadata. Of course, if this new object is in the same bucket and has the same key as the old object, it'll effectively overwrite it. You don't have to re-upload your data if you don't want to — you can copy the data from the old object to the new one.

Instead of using the API directly, various tools already exist which encapsulate this behaviour. For example, the aws command line offers “aws s3 sync”. So I'm wondering if “aws s3 sync” might be the tool for the job.

But then we come to this gem in the help text:

--metadata-directive (string) Specifies whether the metadata is copied from the source object or replaced with metadata provided when copying S3 objects. Note that if the object is copied over in parts, the source object's metadata will not be copied over, no matter the value for --metadata-directive, and instead the desired metadata values must be specified as parameters on the command line. Valid values are COPY and REPLACE. If this parameter is not specified, COPY will be used by default. If REPLACE is used, the copied object will only have the meta- data values that were specified by the CLI command. Note that if you are using any of the following parameters: --content-type, content-lan- guage, --content-encoding, --content-disposition, --cache-control, or --expires, you will need to specify --metadata-directive REPLACE for non-multipart copies if you want the copied objects to have the speci- fied metadata values.

Apart from being horrible to read, there's a big problem with this. Note the phrases “Note that if the object is copied over in parts” and “for non-multipart copies”: the behaviour varies depending on whether or not multipart copies are in use.

So, are multipart copies in use?

Well, we're not told. The S3 maximum size for non-multipart uploads is 5GB, so we know that for objects over 5GB, multipart uploads must be used, because that's the only option. But for smaller objects?

¯\_(ツ)_/¯

So the help text explaining --metadata-directive tells us that the behaviour of this option can vary, depending on an implementation detail which is not revealed to us.

Here's my attempt to reword that help text to be (a) clearer, and (b) more honest:

--metadata-directive (string)

Valid values are COPY (which is the default), and REPLACE. Specifies
whether the metadata is copied from the source object ("COPY"), or
replaced with metadata provided on the command line ("REPLACE") when
copying S3 objects.

Note that "COPY" does not work if multipart uploads are used, which is
definitely the case for objects larger than 5GB, and might be the case
for smaller objects too — good luck!

Not helpful.

The AWS S3 Inventory Service: don't end the destination prefix with “/”

Rachel Evans — Fri, 21 Apr 2017 10:00:16 GMT

This started out as a longer blog post, but then a lot of it boiled down to “read the fine documentation, Rachel”. So here's the short version.

Launched in December 2016, S3's Inventory Service is an alternative to using the ListObjects / ListObjectsV2 APIs for enumerating the objects in a bucket. You put an inventory configuration to your bucket (broadly speaking: which bit of S3 to list, where to put the results, and how often to do it), then sit back and wait for S3 itself to do all the hard work, so you don't have to. Great!

The documentation states where the inventory output goes:

destination-prefix/source-bucket/config-ID/YYYY-MM-DDTHH-MMZ/manifest.json
destination-prefix/source-bucket/config-ID/YYYY-MM-DDTHH-MMZ/manifest.checksum

And for the sake of brevity, let's cut to the chase: if you end your prefix with a “/” (either accidentally, or because like me you think you're being smart whereas in fact you simply haven't read the docs — good going, Rach), then due to a bug in the S3 Inventory service, your inventory will not be usable.

Specifically, I ended up with objects in S3 with keys like this:

s3-inventories//media/rachel-test-inventory/data/6eabc318-5ee0-41d9-b32b-a12b40a6f271.csv.gz
s3-inventories//media/rachel-test-inventory/data/b7dff5ea-c83d-4879-bc2a-0d0ced298356.csv.gz

whereas the manifest I got contained this (line breaks added for clarity):

{
  "files": [
    {
      "key": "s3-inventories/media/rachel-test-inventory/
                data/6eabc318-5ee0-41d9-b32b-a12b40a6f271.csv.gz",
      "size": 16486333,
      "MD5checksum": "3c94f6eed1fc3c2d057c098f355afffc"
    },
    {
      "key": "s3-inventories/media/rachel-test-inventory/
                data/b7dff5ea-c83d-4879-bc2a-0d0ced298356.csv.gz",
      "size": 20147436,
      "MD5checksum": "f0b39e0d85f0f5fb11bc5be73ecc26cf"
    }
  ]
}

The problem being that those double-slashes in the keys have become single slashes. On a Linux-ish filesystem, this would make no difference; on S3, it makes all the difference. The keys given in the manifest simply do not exist.

tl;dr: There's a bug in the S3 Inventory service which means that manifests are broken if the destination prefix ends with “/”. Solution: don't end your destination prefixes with “/”.

Save money and be tidy with s3-upload-cleaner

Rachel Evans — Tue, 01 Dec 2015 12:56:02 GMT

Amazon Web Services (AWS) S3 is a popular, highly-scalable object storage service. It's used by a lot of big companies, including the one I work for.

Getting data — especially large files — into S3 uses a mechanism called Multipart Uploads. For example, to upload a multi-gigabyte file to S3, you might make a sequence of calls like so:

CreateMultipartUpload
UploadPart (1 .. n times)
CompleteMultipartUpload

On the “complete” call, S3 assembles your parts together to form a single object, that then appears in the bucket. Or, you can call “AbortMultipartUpload” to abandon it, and throw away the parts.

So what's the catch?

The catch is that it's very easy to forget to ever call either CompleteMultipartUpload or AbortMultipartUpload. And if you neither complete nor abort the upload, then any parts you have uploaded just sit around in S3, waiting. Forever. It's relatively hard to see those parts, mind — they don't show up in the regular bucket listing. But they are there, and they are costing you money.

So what's the solution?

Enter s3-upload-cleaner. Simply put, it scans your buckets looking for stale (that is, started a long time ago) incomplete multipart uploads — the premise being, if you haven't completed an upload after, say, a week, then you never will — and aborts them. Thus, periodically running s3-upload-cleaner keeps your account's multipart uploads under control, and helps keep your bill down.

(I'm a little surprised that this isn't a native feature of S3, and to be honest, I expect that one day, it will be.)

Here it is running for a single bucket, and finding nothing to clean:

$ sudo apt-get install nodejs npm
$ npm install s3-upload-cleaner aws-sdk
$ export AWS_ACCESS_KEY_ID=…
$ export AWS_SECRET_ACCESS_KEY=…
$ nodejs ./node_modules/s3-upload-cleaner/example/minimal.js
Running cleaner
Clean bucket my-bucket-name
Bucket my-bucket-name is in location eu-west-1
Bucket my-bucket-name is in region eu-west-1
Running cleaner for bucket my-bucket-name
$

The code comes with a minimal bootstrap script, though you are encouraged to use your own if you wish.

To call out of a few of its features:

it's multi-region aware (it will attempt to process all of your buckets, no matter what region they are in);
it can be configured to process only some buckets, or only some regions, or only some keys;
the threshold for what counts as “stale” is configurable — the minimal bootstrap script uses 1 week as the cutoff age;
when a stale upload is found, it emits logging data in json form;
it can be run in “dry run” mode, where all the scanning and logging is performed, but the abort itself is not.

Finally, here's an example of one of its log entries:

[
  {
    "event_name": "s3uploadcleaner.clean",
    "event_timestamp": "1448495889.529",
    "bucket_name": "my-bucket-name",
    "upload_key": "bigfile.mpg",
    "upload_initiated": "1447888220000",
    "upload_storage_class": "STANDARD",
    "upload_initiator_id": "arn:aws:iam::123456789012:user/SomeUser",
    "upload_initiator_display": "SomeUser",
    "part_count": "135",
    "total_size": "2831189760",
    "dry_run": "true"
  }
]

s3-upload-cleaner typically only takes a few seconds to run, and doesn't need to be run very often, so this makes it perfect to run via a scheduled AWS Lambda function.

You can find the code on github and the package on npm.

Why I gave the AWS keynotes a miss

Rachel Evans — Thu, 13 Nov 2014 10:08:40 GMT

Fancy going to a three-hour-long presentation which is in parts self-contradictory, and includes half a dozen product launches with no coherent target audience? I promise, you will find some of it boring.

No? Not tempted?

Me neither.

Last year I went to the Amazon Web Services “re:Invent” conference in Las Vegas. It's five days of a rather tightly packed mix of certification sessions, training, product launches, all sorts of “breakout” sessions (some by Amazon themselves, and some by guest speakers — i.e. customers), the vendor expo, and of course what I'll somewhat euphemistically call the “after-hours” events.

The breakout sessions cover a wide range of topics — individual AWS technologies, both at basic and advanced levels; and how customers are using AWS, in all sorts of diverse industry sectors. Each session is about 45 minutes long, and you can pick and choose which ones to go to. Great!

Keynotes

The “keynotes” though, are a very different affair. On two successive days, the keynotes (yes, plural: two different keynotes) are each 90 minutes long.

For me, a keynote should be a summary of the key themes, typically 30–45 minutes long (I'm sure I've seen a definition of the word, to this effect), so Amazon's keynotes immediately ring at least two alarm bells:

Why are there two keynotes? It's not just two opportunities to see the same speech: it's two different presentations. Why are there two different summaries of the key themes?
Why is each one so long? Each keynote seems to be two to three times longer than is typical, and if the keynotes are different then this means that in total it's effectively up to six times longer.

If your summary is three hours long, then you need to summarise your summary.

But the explanation of course is that AWS “keynotes” aren't keynotes. Yes, they include a summary of some themes; but then they also include some more in-depth look at those themes, and the launches of some new AWS products that tie in with those themes. They're really an all-in-one mega-presentation, split into two halves.

Information overload

Where Amazon get it right with the “breakout” sessions, they get it very wrong with their keynotes: they include a diverse range of subjects all in the same presentation (e.g. software deployment, and also financial accounting) so that it's highly unlikely that anyone is going to find the whole presentation interesting. And even if it was all interesting, ninety minutes is too damn long.

While they do at least break it into two halves, on successive days, that
doesn't go anything like far enough.

I'd love it if Amazon would have the keynotes and product launches follow the same model as the breakout sessions:

The keynote (yes, one keynote), 45 minutes long, which can be a summary of the themes, and brief “teaser” announcements of the product launches;
Product launch presentations, probably around three separate sessions, each around 45 minutes long, grouped by approximate theme.

Launching two separate products related to software development? That's one session. Two products related to finance? That's another. Two unrelated product launches left over? Well, lump 'em together in a third session. (Not ideal, but still way better than the current approach).

Then we'll be free to pick and choose which sessions we go to. Each session will be more focussed on one audience, who will therefore be more engaged. And the worst case is that you go to one of the “mixed” product launch sessions, and find one half interesting, and the other half not. Time wasted: 20 minutes (compared to an hour or two, currently).

Summing up

The keynote sessions are at odds with the style of the rest of AWS re:Invent. Whereas most of the week is dynamic, punchy, focussed, and fast-moving, the keynotes come across as over-long self-indulgent ramblings, which include what should be gems of interest, but hidden amongst far too much other content seemingly designed to help the attendees catch up on the sleep lost due to after-hours indulgences.

By restructuring the keynotes and product announcements into a series of separate sessions, the Amazon “big name” presenters can up their game, and reinvent themselves as the focus of something interesting — so that we might just be tempted enough to go along and listen.

Managing AWS CloudFormation templates using stack-fetcher

Rachel Evans — Tue, 28 Oct 2014 17:14:47 GMT

Last month at the AWSUKUG meetup I talked about Video Factory, and there was a little section there where I spoke aboutthe tooling that we use to manage all of our components. One of the tools, “stack-fetcher”, generated quite a bit of interest from the audience, and there was interest in open-sourcing it. I definitely want to do this — but we're not quite there yet.

For now, though, I can talk about where stack-fetcher is right now, and what direction I want to take it in.

The problem space

“AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion,” says the documentation. As a developer, you do this by creating a template (JSON which defines one or more desired resources), then submitting that template to CloudFormation — either via the API, or via something which wraps the API (e.g. the web console). Then CloudFormation goes and creates or updates your stack to match your template.

As a developer who loves automation and consistency, this leaves you with several problems:

How do I generate the template JSON?
How do I generate the other JSON required by the stack (e.g. parameter values)?
If I was to push that JSON to CloudFormation — i.e. apply the change — how do I know what changes I'm actually pushing?
Can I push some changes but not others?
Once I know what I want to push, how do I do so?

A little BBC Media Services history

To put all of the above into a specific story: in BBC Media Services, we found during the development of Video Factory that we were managing more and more stacks, and by the start of this year we had something like 100 stacks to manage in each of our three environments.

By January 2014, we had a system for generating the JSON, but different people ran the relevant tools in different ways, therefore sometimes yielding differing results. And once the JSON had been generated, we had no way of knowing in what way it was different from the stack's existing template, so we didn't know what we were actually changing. And finally, we had no consistent approach for actually updating the stacks with the new template — mostly we were using the web console, but not always in the same way. And even then: it's a web console, so that's just awful from a productivity and automation point of view.

Thus, stack-fetcher was created, to address all of the above problems.

The workflow

Once you've updated your source files, the workflow to update a stack consists of three steps:

Run “stack-fetcher”. This generates a set of three files: current, generated, and next.
Use your favourite diff/merge tool to compare the current, generated and next files, making whatever changes you wish to next.
Run “stack-updater” to push next into CloudFormation.

The workflow in action

Here's a demo of a simple change, illustrating the basic workflow, and some of stack-fetcher's strengths.

Before running stack-fetcher, We have two stacks, “resource” and “component”. The first diff has already been applied: a queue was added to the resource stack. These screenshots show the second diff being applied: to modify the IAM policy defined in the “component” stack, such that access is granted to the queue in the resource stack.

Before running stack-fetcher

We then run stack-fetcher (in this example, “int” is the environment in question — integration). stack-fetcher retrieves the existing stack, generates the desired template, and compares the two. The summary shows “resource: same” (all in sync), and “component: DIFFERENT (20 lines)” (there are 20 lines of differences).

The output of stack-fetcher

stack-fetcher has generated three template files per stack: current, generated, and next. Here we see the three files compared, using vimdiff:

and the bottom half of the same files:

You can see that “generated” (in the middle column) has some sections that “current” doesn't — these is for the policy change we're trying to make. But you can also see that “current” has some lines that “generated” doesn't. This is because in this example, the stack in CloudFormation started off not in sync with our local copy (for example, maybe someone applied a change but neglected to commit the corresponding source).

So now we modify “next” (the right-hand file) to match whatever changes we want to apply. In this example we choose to pull in the new lines, but elect not to remove the extra, unexpected ones:

Merging the desired template into “next”, in the right-hand column

After saving these changes (remember, we didn't modify “current” or “generated” — only “next”), we run stack-updater:

Running stack-updater (first time)

stack-updater now warns us that it has detected a new parameter on the template (“MattressFailQueueArn” in this example): it adds this parameter, with the default value from the template, to the description file; then invites us to check this and edit the description file if we wish.

In this case the default is fine, so we just run stack-updater again:

Running stack-updater (second time)

Now stack-updater very clearly shows us the diffs between current and next: that is, if we elect to proceed, these are the changes that we're actually about to make.

After confirming that we're OK with this, stack-updater applies these changes, using the CloudFormation UpdateStack API:

Applying the changes using stack-updater

stack-updater polls the stack's status, waiting for it to reach a terminal state (i.e. not “in progress”). The stack events are displayed as they occur.

In this case the stack update completes successfully, and stack-updater's work is done.

In more detail

stack-fetcher is a name given to a collection of scripts, one of which is itself called “stack-fetcher”. The other script that is intended to be manually invoked is “stack-updater”. There are other scripts, but one of the goals of stack-fetcher is to invoke and orchestrate those other scripts so that the user doesn't generally have to think about them.

stack-fetcher

stack-fetcher's job is to generate a set of three outputs:

current is the existing stack, fetched from CloudFormation
generated is the stack that you want, generated from your codebase
next is what you're going to push back to CloudFormation using “stack-updater”

When stack-fetcher runs, next is generated simply as a copy of current — that is, if you don't edit the next file, then you won't push any changes.

To make generated, stack-fetcher runs a series of scripts. Currently, this step is rather BBC-specific: we invoke ./generate-templates with PYTHON_LIB set to point to part of the stack-fetcher codebase; if there's a transform script, then the json is then filtered through this; then there's a cosmos-cloudformation-postproc script which post-processes the json in various ways — primarily, providing defaults for the stack's parameters.

To make current, stack-fetcher needs to know what stack name it should work with — and again, currently calculating this stack name is fairly BBC-specific. Once entered, the stack name is remembered via the ./stack_names.json file, so you don't have to calculate or enter it again. Once the stack name is known, the existing stack template and descriptor are fetched, and saved as current.

After this, stack-fetcher normalises both current and generated. The purpose of the normalisation is partly to make the files more readable, but also to get rid of differences that are meaningless. As well as whitespace reformatting and sorting object keys, the normalisation also includes CloudFormation-specific elements, such as sorting parameters, tags and outputs; removing empty arrays, if that would mean the same thing; and even re-ordering statements within IAM Policies.

next always starts off as a copy of current, so that by default no changes are pushed.

Finally, stack-fetcher compares current and generated and shows a simple summary: they're either the “SAME” or “DIFFERENT” (or, if the stack doesn't exist yet, “NEW”); then shows some help text describing what to do next.

diff/merge

The help text displayed by stack-fetcher suggests using vimdiff to compare and edit the files, but of course you can use whatever tools you wish. The goal of this step is to update next to reflect what you want pushed back into CloudFormation (whilst leaving the current and generated files unchanged).

You may wish to simply review that generated is exactly what you want, then copy generated over next (this is probably what you want, ideally); or, you can cherry-pick, and perform more complex merges.

stack-updater

Once you've updated next to be as desired, you invoke stack-updater, with exactly the same arguments as you did for stack-fetcher.

If there are any differences between the set of parameters declared in the stack template, and the set of parameters passed in the stack descriptor, then stack-updater shows those differences (e.g. “You're passing a parameter called X but it doesn't exist”), automatically applies corrections (e.g. removing the no-longer-existent parameter), then stops, so that you can check its changes before re-running stack-updater.

Assuming the stack already exists, then stack-updater now diffs current against next — that is, it shows you the changes you're about to push. It also displays the differences between the stack's parameter defaults, and the actual parameter values you're passing, so you can check which ones you're overriding. (If the stack doesn't currently exist, then this step is skipped, and the confirmation step up next reminds you that you're about to create the stack).

It then asks for confirmation to proceed, and if you say yes, then the change is pushed using the CloudFormation “update stack” (or “create stack”) API, and then stack-updater polls the stack status, waiting for completion.

Finally there's another BBC-specific step, wherein the stack can be registered in Cosmos, our deployment manager.

Dependencies

stack-fetcher is written in ruby, and uses the aws-sdk gem.

Benefits

By using this tool, we have realised several benefits:

speed: Using this tool is much quicker than using the other (several) tools that we used before. There are fewer commands to type, with fewer options to remember. And probably most importantly, you never have to leave your terminal.
consistency: By automating more of the process, and by normalising the output, we now achieve more consistency: by which I mean between developers, between environments, and between components.
understanding: This tool makes it very obvious what changes you're about to apply to live (or whatever environment you're updating) — no more blind pasting of a load of json and hoping for the best — which means fewer mistakes.

All of which means: this tool has helped us to be more productive.

Next steps

We need to separate out the BBC-specific parts from the rest, so that we can offer this tool out to a wider audience.

I'd like to make the “generation” phase more uniform: run a series of executables (bash, ruby, whatever — the tool should not care), where the first executable receives null input, and each subsequent tool filters the output of the previous one. So for example you might have filters which do: make the basic template; customise it for this environment; fill in parameter defaults.

I don't have any news yet of when this might happen, but I certainly want it to happen. Please drop me a line via a comment or on twitter if you have thoughts on this — I'd love to hear your feedback.

Personal highlights from the AWS Enterprise Summit

Rachel Evans — Wed, 22 Oct 2014 22:32:59 GMT

Yesterday I attended the AWS Enterprise Summit in London. I've already written about how it was very poor, from a diversity perspective. But, it wasn't all bad: some of the content was rather good...

All hail the snail

The first customer presentation was given by John O'Donovan of the Financial Times. He told a fascinating and engaging story of the changing world in which they found themselves: with print distribution in decline, they needed to refocus on the net — and on future platforms and devices yet to come, whatever they are. John's presentation was had a great balance of information, insight, and humour.

A particular highlight for me — and by the reaction from the audience, I'm going to guess for many other engineers in the audience — was Chaos Snail. “Like Chaos Monkey, but more chilled”, its job is to slow down I/O on certain instances, to test how software reacts to such degraded conditions. I asked John later if this tool has already been, or will be, open sourced — he says they've had a few requests for this, so yes they will. Good news!

John also talked about Tagbot, which locates and terminates untagged instances (“My team loves turning stuff off”, he said). Sounds like a blend between Chaos Monkey and Conformity Monkey.

Maximum support

After lunch we heard from Brent Jaye, VP of AWS Support. He emphasised the value of Trusted Advisor as a way of identifying problems, and how they're keen on building quick fix facilities into the web console. (For example: if a volume hasn't been backed up for a long time, then highlight this as a potential problem, and show a “backup” button right there).

“We're in the business of you spending less money with us”, he said — which has a nice ring to it.

Brent also spoke of the value of integrating AWS and the customer's support system together; and of using Trusted Advisor and AWS Support not just via the console, but by their respective APIs. (John O'Donovan would I'm sure agree: earlier on he said “We don't buy a product unless it has an API”. +1 on that).

Finally Brent spoke of the importance of engaging with AWS Support early, not just when there's a problem.

Auntie adapts

Next up, Robert Shield from BBC iPlayer spoke about Video Factory: how it uses AWS, the benefits realised over the previous platform, and how the BBC's Operations function has adapted with the use of the cloud.

(I work with Robert, on the same team — I presented the Video Factory story to the AWS UK User Group last month. So of course it should be assumed that I'm biased :-) )

However, it was obvious that the audience enjoyed it: Rob talked of the benefits of smaller, simpler components; of how much data Video Factory shifts into S3 every day; and on the importance of automation and consistency.

By re-architecting for smaller, simpler, more easily understandable components, he said, each part also became more reliable, and thus people were more willing to look after the system.

News from the cloud

The last customer presentation was from Chris Birch of News UK. Like John and Robert before him, Chris told an entertaining and engaging story.

Much of News UK's business is about Sunday publications, and combined with their “paywall” (he didn't call it that, but that's what the rest of us know it as), this meant that their traffic is highly spiked around Sunday mornings. And the old system could handle only 17 transactions per second! But of course things were much faster on the cloud.

Part of Chris' talk was about the importance and the difficulty of assessing the Total Cost of Ownership — needed to be able to make the business case for moving to the cloud. One thing I found very interesting was the idea that an application's “App Book” (documentation on what it is, etc) should also document the app's TCO.

There was also a nice section where Chris said that 48% of their instances had no tags, so it wasn't clear what the instances were doing. However Chris also said that “It's really boring switching stuff off”, which I have to say I completely disagree with!

The two-pizza team

Two of the speakers (sorry, I forget which ones exactly) mentioned the idea of the “two-pizza team”. Basically: a team which requires more than two pizzas will have communication problems. I like this concept — it's a good rule of thumb that definitely matches my own experience.

And the others…

You may notice that I only wrote about four of the ten speakers. That's because the other speakers very much failed to hold my attention. I enjoyed the customer talks, all of which were interesting, and engaging, and got a great reaction from the audience; but the talks from the partners, and from Amazon themselves (with the exception of Brent), seemed to be aimed very much at CxO level — at “suits”, one might say — and as such really weren't my thing at all.

So I saw it as a summit of two opposing audiences: CxO versus techies. If the event was larger, then it would make more sense to split into two events, or two tracks in one event.

As it is, it seems to me that most people would have found half of the talks less than engaging — but, it's only a one-day event, so that's not such a burden.

Wrapping up

Overall I really enjoyed the day — the CxO-style talks weren't for me, and I didn't explore the partner and sponsor stands; but the customer presentations were great, and I had a good chat or two with AWS staff, and I loved swapping stories with the other attendees.

Oh, and there was highly practical swag!

I think I'll be back — maybe not every time, but it was a good day, and I'd be happy to do it again sometime. See you there!