Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading gzipped files decompresses and truncates the content #8263

Closed
cyberduck opened this issue Oct 17, 2014 · 20 comments
Closed

Downloading gzipped files decompresses and truncates the content #8263

cyberduck opened this issue Oct 17, 2014 · 20 comments
Assignees
Labels
bug fixed high priority s3 AWS S3 Protocol Implementation
Milestone

Comments

@cyberduck
Copy link
Collaborator

fe09999 created the issue

When I download CLoudTrail files from AWS S3, the files get decompressed and truncated.
For instance, the file AWSLogs//CloudTrail/////CloudTrail__*.json.gz has a size of 32.5KB.
Downloading it, the file becomes plain text (decompressed) and has a length of 32.5KB. Of course, when you decompress it it should have a bigger length afterwards, not the compressed length.

Btw, decompressing should be an option. Is really nice to have, but not useful in all cases.


Attachments

@cyberduck
Copy link
Collaborator Author

@dkocher commented

I cannot reproduce this issue. Added test in 7df441c. Can you please post the transcript from the Transfers window (Ctrl-L) if you reopen this issue. If you have choosen to open the downloaded file with the default application it could be uncompressed after the download is complete. Refer to Preferences → Transfers → Downloads → Open downloaded files with default application.

@cyberduck
Copy link
Collaborator Author

fe09999 commented

I added a few files so you can see my results.
I don't believe that the default application has something to do with it. When I try to decompress the files with 7zip I get an error message; and text editors can open the *.gz document and display it. For me this looks like CyberDuck is doing the decompression. (This does not happen when I use an alternative tool to download from S3.)
I am available for an online session if you want to. Let me know how to contact you.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:3 thuettner]:

I don't believe that the default application has something to do with it.

Can you you let me know the setting in Preferences → Transfers → Downloads → Open downloaded files with default application. and try to disable the feature if it is currently enabled.

@cyberduck
Copy link
Collaborator Author

fe09999 commented

The flag was not checked and there is no default application defined.

Replying to [comment:7 dkocher]:

Replying to [comment:3 thuettner]:

I don't believe that the default application has something to do with it.

Can you you let me know the setting in Preferences → Transfers → Downloads → Open downloaded files with default application. and try to disable the feature if it is currently enabled.

@cyberduck
Copy link
Collaborator Author

fe09999 commented

I have Windows 8.1 (not Windows 7).

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Still cannot reproduce the issue using your test file. I must assume there is another process that touches the file after the download is complete.

@cyberduck
Copy link
Collaborator Author

a9896e4 commented

Guys,

I got the same thing.
Gzipped files are decompressed and truncated to the size of the archive file, when downloading from S3.

Platform: Windows 7.
Version: 4.6.1 (tried to update to the current snapshot, 4.6.2. Didn't help).

@cyberduck
Copy link
Collaborator Author

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:14 dkocher]:

Also noted in (https://groups.google.com/forum/#!topic/cyberduck/yo7YldedY9E).

Can you confirm that your use case is manually compressing the content and setting the Content-Encoding header in S3.

@cyberduck
Copy link
Collaborator Author

a9896e4 commented

No, I can't, unfortunately.
I'm a consumer of those files. They are uploaded by other people.

Metadata-Info tab says this:
Content-Encoding: gzip
Content-Type: text/csv

P.S.
S3Browser downloads the files as is, without unzipping, as well as my self written java tool.
That's why i'm sure that the files are valid, and something's wrong on Cyberduck side.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

I can reproduce the bug here with files in S3 that are compressed with a Content-Encoding: gzip custom header set using metadata. The problem is that we limit reading from the known deflated size of an object which works in general for WebDAV because the Content-Encoding will be applied on the fly when serving the file. The file is stored on the server uncompressed and its length is known and we will read up the n bytes of the uncompressed file from the deflated stream. Compared to S3, the file is always compressed and the deflated size is not known. We only read the n bytes equal the compressed object from the deflated stream.

As a resolution I think we best disable the detection of Content-Encoding when connected to S3 instead of fixing the issue as otherwise users will have downloaded .gz files that are already decompressed. We may better want to retrieve the compressed file as is (and advertised in the object key extension).

@cyberduck
Copy link
Collaborator Author

@dkocher commented

See also Serving Compressed Files.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

In bf823c1.

@cyberduck
Copy link
Collaborator Author

a9896e4 commented

Thank you guys! It's working fine now.

@cyberduck
Copy link
Collaborator Author

fe09999 commented

I just tried it out with the latest version and downloaded CloudTrail files from S3. The files still get decompressed to plain text and then cut.

@cyberduck
Copy link
Collaborator Author

a9896e4 commented

Have you updated to the snapshot build?
The fix hasn't been released yet.

Maybe this can help you:
https://trac.cyberduck.io/wiki/help/en/howto/preferences#Update

@cyberduck
Copy link
Collaborator Author

fe09999 commented

You are right, that solves the issue.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Thanks for confirming the issue is resolved.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

#8263 is a duplicate.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Will possibly be reverted due to #11662.

@iterate-ch iterate-ch locked as resolved and limited conversation to collaborators Nov 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug fixed high priority s3 AWS S3 Protocol Implementation
Projects
None yet
Development

No branches or pull requests

2 participants