I have a requirement to change the Content-Type / Cache-Control headers of a load of objects in S3. At the API level, there's no way of modifying the metadata of an existing object — rather, you create a new object with the desired metadata. Of course, if this new object is in the same bucket and has the same key as the old object, it'll effectively overwrite it. You don't have to re-upload your data if you don't want to — you can copy the data from the old object to the new one.
Instead of using the API directly, various tools already exist which encapsulate this behaviour. For example, the aws command line offers “aws s3 sync”. So I'm wondering if “aws s3 sync” might be the tool for the job.
But then we come to this gem in the help text:
--metadata-directive (string) Specifies whether the metadata is copied from the source object or replaced with metadata provided when copying S3 objects. Note that if the object is copied over in parts, the source object's metadata will not be copied over, no matter the value for --metadata-directive, and instead the desired metadata values must be specified as parameters on the command line. Valid values are COPY and REPLACE. If this parameter is not specified, COPY will be used by default. If REPLACE is used, the copied object will only have the meta- data values that were specified by the CLI command. Note that if you are using any of the following parameters: --content-type, content-lan- guage, --content-encoding, --content-disposition, --cache-control, or --expires, you will need to specify --metadata-directive REPLACE for non-multipart copies if you want the copied objects to have the speci- fied metadata values.
Apart from being horrible to read, there's a big problem with this. Note the phrases “Note that if the object is copied over in parts” and “for non-multipart copies”: the behaviour varies depending on whether or not multipart copies are in use.
So, are multipart copies in use?
Well, we're not told. The S3 maximum size for non-multipart uploads is 5GB, so we know that for objects over 5GB, multipart uploads must be used, because that's the only option. But for smaller objects?
¯\_(ツ)_/¯
So the help text explaining --metadata-directive
tells us that the behaviour of this option can vary, depending on an implementation detail which is not revealed to us.
Here's my attempt to reword that help text to be (a) clearer, and (b) more honest:
--metadata-directive (string) Valid values are COPY (which is the default), and REPLACE. Specifies whether the metadata is copied from the source object ("COPY"), or replaced with metadata provided on the command line ("REPLACE") when copying S3 objects. Note that "COPY" does not work if multipart uploads are used, which is definitely the case for objects larger than 5GB, and might be the case for smaller objects too — good luck!
Not helpful.