Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode normalization #5162

Closed
cyberduck opened this issue Sep 8, 2010 · 14 comments
Closed

Unicode normalization #5162

cyberduck opened this issue Sep 8, 2010 · 14 comments
Assignees
Milestone

Comments

@cyberduck
Copy link
Collaborator

thekiwi created the issue

I have been struggling for a while with uploading filenames with accented characters in them to an FTP site. The file name "seems" to be preserved on upload and it looks correct, but if I copy the filename out of CyberDuck after the upload has completed, and paste it into TextWrangler, it shows a red upside down question mark instead of the accented characters.

And if I retype that name in CyberDuck, I can then upload the file again from my Mac so that it then looks like there are 2 identically named files on the server.

For whatever it's worth, Captain FTP also seems to have this same problem, but FileZilla doesn't.

If you look at this directory listing

http://wmgs.org/tng_utf8/photos/

all the names are mangled (that's an Apache thing apparently), but the 2 that end in .CD.2.gif both point to different files, but if you click them, the name that comes up in the Safari address bar is apparently the same, yet both exist in the same directory on the server. One of these was uploaded with FileZilla, the other with CyberDuck.

This has come about because the files are used in PHP scripts for genealogy and there are problems reading the file names and writing them into the database for later retrieval. It is the files that when the name is copied into TextWrangler show the red upside down question mark that cause issues with the PHP scripts. The file names "look" OK in phpMyAdmin but there's an issue somewhere.

CyberDuck is set to UTF-8 in the Preferences, and in the settings for that Bookmark. I've been told that the server is set to UTF-8 also.

If you need access to this server let me know and I can eMail the credentials to you.


Attachments

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Have you tried using a different character encoding such as ISO-8559-1?

@cyberduck
Copy link
Collaborator Author

thekiwi commented

Replying to [comment:1 dkocher]:

Have you tried using a different character encoding such as ISO-8559-1?

yes, that makes things even worse. There is something about how CyberDuck (and Captain FTP) uploads files with accented characters when UTF-8 is chosen compared to how FileZilla uploads them. If you look in this directory

http://wmgs.org/tng_utf8/photos/FileZillaVsCyberDuck/

you see two files that apparently have the same name, but if you view the source code for the page, you see one file represented as

Ke%cc%81nna%cc%81tsi%cc%82deheads.gif <---- this one uploaded by CyberDuck

and the other as

K%c3%a9nn%c3%a1ts%c3%aedeheads.gif <---- this one uploaded by FileZilla

If I view their names in CyberDuck and copy it out to TextWrangler and turn on "Show Invisibles" then I see what is in the attached image - the CyberDuck file shows the red upside down ? symbol in place of each accented character.

If I turn off the Apache option for showing UTF-8 Directory listings, then the CyberDuck file shows as

Ke�nna�tsîdeheads

and the FileZilla file shows as

Kénnátsîdeheads

I have worked yesterday with the owner of Simply Hosting where this site is and he confirms that the server is running UTF-8. I am able to repeat the same results on my server which is running Mac OS X 10.5.8 with PureFTPd running as the FTP server - the file uploaded by CyberDuck isn't referenced correctly once it's saved using PHP. This can be seen here

http://wmgs.org/tng_utf8/browsemedia.php?mediatypeID=photos

where the bottom 2 files, if you click the link for the FileZilla one, you'll see the image, but if you click the link for the CyberDuck one you don't see the image, yet the file name field below the image is the same in each case.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Thanks for the detailed analysis! I certainly hope that will bring us closer to find the cause of the issue. In the meantime, could you try the following. Open a Terminal.app window and paste

defaults write ch.sudo.cyberduck path.normalize.unicode true

Restart Cyberduck. Let me know if that makes any difference. I'll have a closer look next week.

@cyberduck
Copy link
Collaborator Author

thekiwi commented

Yes!!!!!!

That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.

http://wmgs.org/tng_utf8/showmedia.php?mediaID=34

(compared to previously with CyberDuck)

http://wmgs.org/tng_utf8/showmedia.php?mediaID=32

where even though the filename on the page "seemed" correct, the image wasn't displaying

On the Apache index page here

http://wmgs.org/tng_utf8/photos/FileZillaVsCyberDuck/CyberDuckAsUnicode/

the View Source of that page shows the same encoding of the file name

K%c3%a9nn%c3%a1ts%c3%aedeheads

as was resulted after the upload by FileZilla (see above).

It also fixed the same issue on my Mac OS X 10.5.8 server where now the image will display on the PHP pages calling it.

What is the significance of setting that preference to Unicode?

Thanks

Roger

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Reference to #1965.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Refer to UNICODE NORMALIZATION FORMS.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:4 thekiwi]:

Yes!!!!!!

That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.

Hope you are still here. Can you let me know from where (which volume or network mount) you were uploading these files in question.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

In 889bed4 NFC normalize all paths from the local file system.

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:4 thekiwi]:

Yes!!!!!!

That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.

Can you please let me know if using the latest snapshot build still works with the custom property you set removed using

defaults delete ch.sudo.cyberduck path.normalize.unicode

Thanks for your reply.

@cyberduck
Copy link
Collaborator Author

60c4428 commented

David - sorry I hadn't noticed before now the couple of recent messages from you about this.

I've just now done as you asked - removed the custom property (how can I check that it truly was removed?), and installed the latest nightly build 4.0b9, and as far as I can remember from when this cropped up, the behaviour is working as one would expect - a file with accented characters in the file name seems to be correctly uploaded to a server, and can be found by the PHP scripts involved. And when the name is copied from CyberDuck and pasted into TextWrangler the name is as expected.

I tried it with 2 different servers - one at SimplyHosting.net running Linux, and my server on Mac OS X 10.6.5 with pureftpd 1.0.29 - the same computer I upload the files from.

As to the previous question - I'm not entirely sure what you're asking about "volume or network mount" - the files with the accented characters in the file name were on my Mac's startup disk - the same computer, and disk that has the server on it that the files are uploaded to.

Hope this all helps.

Edit about 30 minutes later with some extra information...

Actually I had forgotten to tell TextWrangler to "Show Invisibles" when I checked the file names as copied out of CyberDuck.

1 - for the Linux Server - it was all OK - the file name - Kénnátsîdeheads2.gif appeared as expected.

2 - for my Mac OS X Server, the file name copied from CyberDuck showed an upside down red question mark in each place where the accented characters were meant to be, but in the Finder the filename looked as expected, and in fact was identical to the filename that I uploaded, at least in as much as using the Finder to copy the file I uploaded from the source folder to the destination folder said the file already existed.

But the PHP scripts had detected the expected file name and written it into the MySQL database as expected - is this perhaps an issue with CyberDuck not displaying the name correctly?

Roger

@cyberduck
Copy link
Collaborator Author

60c4428 commented

Some more information on the connections to the 2 different servers.

1 - the connection to the Linux Server

220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 6 of 50 allowed.
220-Local time is now 21:41. Server port: 21.
220-This is a private system - No anonymous login
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 15 minutes of inactivity.
AUTH TLS
234 AUTH TLS OK.
USER wmgsorg
331 User wmgsorg OK. Password required
PASS ********
230-User wmgsorg has group access to:  wmgsorg   
230 OK. Current restricted directory is /
PBSZ 0
200 PBSZ=0
PROT P
200 Data protection level set to "private"
FEAT
211-Extensions supported:
 EPRT
 IDLE
 MDTM
 SIZE
 REST STREAM
 MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*;
 MLSD
 AUTH TLS
 PBSZ
 PROT
 ESTA
 PASV
 EPSV
 SPSV
 ESTP
211 End.
NOOP
200 Zzz...
SYST
215 UNIX Type: L8
PASV
227 Entering Passive Mode (64,38,39,4,135,51) 

2 - the connection to my Mac OS X Server

220---------- Welcome to Pure-FTPd [TLS] ----------
220-Local time is now 22:52. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 15 minutes of inactivity.
USER Roger
331 User Roger OK. Password required
PASS ********
230-User Roger has group access to:  12       61       80       98      
230- 100      204      305      20      
230 OK. Current directory is /
FEAT
211-Extensions supported:
 EPRT
 IDLE
 MDTM
 SIZE
 REST STREAM
 MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*;
 MLSD
 ESTP
 PASV
 EPSV
 SPSV
 ESTA
 AUTH TLS
 PBSZ
 PROT
 UTF8
211 End.
OPTS UTF8 ON
200 OK, UTF-8 enabled
PWD
257 "/" is your current location
NOOP
200 Zzz...
SYST
215 UNIX Type: L8
PASV
227 Entering Passive Mode (66,93,200,62,232,203)
MLSD / 

The notable difference is that the Mac OS X server indicates that

OPTS UTF8 ON
200 OK, UTF-8 enabled

while the Linux server doesn't show this. Knowing why this might be is outside my range of knowledge/skill.

The Linux server seemed to get the filename entirely correct while the Mac OS X server had trouble when the file name was copied out of CyberDuck and pasted into TextWrangler.

I also add that since my earlier testing some months ago on my Mac, I've upgraded to Mac OS X 10.6 from 10.5 and had to reinstall PureFTPd using the PureFTPd Manager application.

Roger

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:13 theKiwi]:

I've just now done as you asked - removed the custom property (how can I check that it truly was removed?),

Run the command twice, and it should output Defaults have not been changed.

@cyberduck
Copy link
Collaborator Author

60c4428 commented

OK - I ran it again and got


[MacPro:~] roger% defaults delete ch.sudo.cyberduck path.normalize.unicode
2011-01-02 13:11:45.147 defaults[61874:903] 
There is no (path.normalize.unicode) default for the (ch.sudo.cyberduck) domain.
Defaults have not been changed.

Roger

@cyberduck
Copy link
Collaborator Author

@dkocher commented

Replying to [comment:16 theKiwi]:

OK - I ran it again and got


[MacPro:~] roger% defaults delete ch.sudo.cyberduck path.normalize.unicode
2011-01-02 13:11:45.147 defaults[61874:903] 
There is no (path.normalize.unicode) default for the (ch.sudo.cyberduck) domain.
Defaults have not been changed.

Roger

The property was successfully removed then.

@iterate-ch iterate-ch locked as resolved and limited conversation to collaborators Nov 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants