Cyberduck Mountain Duck CLI

#5186 closed enhancement (fixed)

Perform MD5 hash calculation during upload

Reported by: https://www.google.com/accounts/o8/id?id=aitoawnnvt-90frc5_hjyhabpzmeshpw03k3snc Owned by: dkocher
Priority: low Milestone: 3.8
Component: s3 Version: 3.6.1
Severity: minor Keywords:
Cc: wolfgang.nagele@… Architecture:
Platform:

Description

Currently a MD5 hash of every upload to S3 is calculated before starting the upload. This can consume a large amount of time and no progress bar can be given during that operation therefor the upload time estimate is useless.

I suggest to calculate the MD5 hash during the upload when reading from the stream. See for an example: http://stackoverflow.com/questions/304268/using-java-to-get-a-files-md5-checksum

Now S3 will not return an error for a corrupted upload since it has no hash to compare. Instead the returned ETag from S3 has to be used to verify that the upload was successful: http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectPOST.html

Alternatively it would be good to have at least the option to disable the hash computation since there are cases where the overhead is not justified.

Change History (6)

comment:1 Changed on Sep 13, 2010 at 4:24:47 PM by https://www.google.com/accounts/o8/id?id=aitoawnnvt-90frc5_hjyhabpzmeshpw03k3snc

  • Cc wolfgang.nagele@… added

comment:2 Changed on Sep 13, 2010 at 4:35:28 PM by dkocher

I agree. Calculating the hash on the fly would be an improvment. The only downside is that we need a second request when we still want to set the value of the MD5 in the metadata of the file as we currently do (see md5-hash in metadata).

comment:3 Changed on Sep 13, 2010 at 5:17:08 PM by https://www.google.com/accounts/o8/id?id=aitoawnnvt-90frc5_hjyhabpzmeshpw03k3snc

Agreed and i was thinking about it. I think however that it is obsolete, one could use the ETag all the way through instead.

See: http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectGET.html http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTObjectHEAD.html

comment:4 Changed on Sep 18, 2010 at 10:40:01 PM by dkocher

  • Milestone set to 4.1
  • Status changed from new to assigned

comment:5 Changed on Nov 19, 2010 at 11:05:34 AM by dkocher

  • Milestone changed from 4.1 to 4.0
  • Resolution set to fixed
  • Status changed from assigned to closed

If the property s3.upload.metadata.md5 is set to true (false is default), then set the Content-MD5 header and let S3 check the integrity of the upload. Otherwise, we calculate the MD5 on the fly during the upload and compare it to the ETag returned for the upload.

In r7665.

comment:6 Changed on Nov 19, 2010 at 11:16:13 AM by dkocher

Same fix for Rackspace Cloudfiles in r7666.

Version 0, edited on Nov 19, 2010 at 11:16:13 AM by dkocher (next)
Note: See TracTickets for help on using tickets.
swiss made software