Opened on Sep 8, 2010 at 3:18:35 PM
Closed on Dec 19, 2010 at 6:40:44 PM
Last modified on Jan 25, 2011 at 2:08:50 PM
#5162 closed defect (fixed)
Unicode normalization
Reported by: | thekiwi | Owned by: | dkocher |
---|---|---|---|
Priority: | normal | Milestone: | 4.0 |
Component: | core | Version: | 3.6.1 |
Severity: | normal | Keywords: | UTF-8 |
Cc: | Architecture: | Intel | |
Platform: | Mac OS X 10.5 |
Description
I have been struggling for a while with uploading filenames with accented characters in them to an FTP site. The file name "seems" to be preserved on upload and it looks correct, but if I copy the filename out of CyberDuck after the upload has completed, and paste it into TextWrangler, it shows a red upside down question mark instead of the accented characters.
And if I retype that name in CyberDuck, I can then upload the file again from my Mac so that it then looks like there are 2 identically named files on the server.
For whatever it's worth, Captain FTP also seems to have this same problem, but FileZilla doesn't.
If you look at this directory listing
http://wmgs.org/tng_utf8/photos/
all the names are mangled (that's an Apache thing apparently), but the 2 that end in .CD.2.gif both point to different files, but if you click them, the name that comes up in the Safari address bar is apparently the same, yet both exist in the same directory on the server. One of these was uploaded with FileZilla, the other with CyberDuck.
This has come about because the files are used in PHP scripts for genealogy and there are problems reading the file names and writing them into the database for later retrieval. It is the files that when the name is copied into TextWrangler show the red upside down question mark that cause issues with the PHP scripts. The file names "look" OK in phpMyAdmin but there's an issue somewhere.
CyberDuck is set to UTF-8 in the Preferences, and in the settings for that Bookmark. I've been told that the server is set to UTF-8 also.
If you need access to this server let me know and I can eMail the credentials to you.
Attachments (1)
Change History (18)
comment:1 follow-up: ↓ 2 Changed on Sep 9, 2010 at 2:32:35 PM by dkocher
Changed on Sep 9, 2010 at 6:11:04 PM by thekiwi
Image showing what TextWrangler thinks the file name is for the same file uploaded by CyberDuck and FileZilla
comment:2 in reply to: ↑ 1 Changed on Sep 9, 2010 at 6:14:40 PM by thekiwi
Replying to dkocher:
Have you tried using a different character encoding such as ISO-8559-1?
yes, that makes things even worse. There is something about how CyberDuck (and Captain FTP) uploads files with accented characters when UTF-8 is chosen compared to how FileZilla uploads them. If you look in this directory
http://wmgs.org/tng_utf8/photos/FileZillaVsCyberDuck/
you see two files that apparently have the same name, but if you view the source code for the page, you see one file represented as
Ke%cc%81nna%cc%81tsi%cc%82deheads.gif <---- this one uploaded by CyberDuck
and the other as
K%c3%a9nn%c3%a1ts%c3%aedeheads.gif <---- this one uploaded by FileZilla
If I view their names in CyberDuck and copy it out to TextWrangler and turn on "Show Invisibles" then I see what is in the attached image - the CyberDuck file shows the red upside down ? symbol in place of each accented character.
If I turn off the Apache option for showing UTF-8 Directory listings, then the CyberDuck file shows as
KeÌnnaÌtsiÌ‚deheads
and the FileZilla file shows as
Kénnátsîdeheads
I have worked yesterday with the owner of Simply Hosting where this site is and he confirms that the server is running UTF-8. I am able to repeat the same results on my server which is running Mac OS X 10.5.8 with PureFTPd running as the FTP server - the file uploaded by CyberDuck isn't referenced correctly once it's saved using PHP. This can be seen here
http://wmgs.org/tng_utf8/browsemedia.php?mediatypeID=photos
where the bottom 2 files, if you click the link for the FileZilla one, you'll see the image, but if you click the link for the CyberDuck one you don't see the image, yet the file name field below the image is the same in each case.
comment:3 follow-up: ↓ 4 Changed on Sep 9, 2010 at 7:57:48 PM by dkocher
Thanks for the detailed analysis! I certainly hope that will bring us closer to find the cause of the issue. In the meantime, could you try the following. Open a Terminal.app window and paste
defaults write ch.sudo.cyberduck path.normalize.unicode true
Restart Cyberduck. Let me know if that makes any difference. I'll have a closer look next week.
comment:4 in reply to: ↑ 3 ; follow-ups: ↓ 10 ↓ 12 Changed on Sep 9, 2010 at 8:30:46 PM by thekiwi
Yes!!!!!!
That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.
http://wmgs.org/tng_utf8/showmedia.php?mediaID=34
(compared to previously with CyberDuck)
http://wmgs.org/tng_utf8/showmedia.php?mediaID=32
where even though the filename on the page "seemed" correct, the image wasn't displaying
On the Apache index page here
http://wmgs.org/tng_utf8/photos/FileZillaVsCyberDuck/CyberDuckAsUnicode/
the View Source of that page shows the same encoding of the file name
K%c3%a9nn%c3%a1ts%c3%aedeheads
as was resulted after the upload by FileZilla (see above).
It also fixed the same issue on my Mac OS X 10.5.8 server where now the image will display on the PHP pages calling it.
What is the significance of setting that preference to Unicode?
Thanks
Roger
comment:5 Changed on Sep 10, 2010 at 8:41:36 PM by dkocher
- Milestone set to 3.6.2
- Status changed from new to assigned
comment:6 Changed on Sep 10, 2010 at 8:53:58 PM by dkocher
Reference to #1965.
comment:7 Changed on Sep 10, 2010 at 8:54:40 PM by dkocher
Refer to UNICODE NORMALIZATION FORMS.
comment:8 Changed on Sep 18, 2010 at 2:16:40 PM by dkocher
- Summary changed from File Names not uploaded properly if contain accented characters to Unicode normalization
comment:9 Changed on Sep 20, 2010 at 3:55:15 PM by dkocher
- Milestone changed from 3.6.2 to 4.1
comment:10 in reply to: ↑ 4 Changed on Dec 19, 2010 at 6:08:30 PM by dkocher
Replying to thekiwi:
Yes!!!!!!
That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.
Hope you are still here. Can you let me know from where (which volume or network mount) you were uploading these files in question.
comment:11 Changed on Dec 19, 2010 at 6:40:44 PM by dkocher
- Milestone changed from 4.1 to 4.0
- Resolution set to fixed
- Status changed from assigned to closed
In r8110 NFC normalize all paths from the local file system.
comment:12 in reply to: ↑ 4 Changed on Dec 20, 2010 at 1:46:48 PM by dkocher
Replying to thekiwi:
Yes!!!!!!
That has made all the difference. That same file now uploaded by CyberDuck is recognised correctly by the PHP scripts and can be retrieved and displayed.
Can you please let me know if using the latest snapshot build still works with the custom property you set removed using
defaults delete ch.sudo.cyberduck path.normalize.unicode
Thanks for your reply.
comment:13 follow-up: ↓ 15 Changed on Jan 2, 2011 at 4:04:12 AM by theKiwi
David - sorry I hadn't noticed before now the couple of recent messages from you about this.
I've just now done as you asked - removed the custom property (how can I check that it truly was removed?), and installed the latest nightly build 4.0b9, and as far as I can remember from when this cropped up, the behaviour is working as one would expect - a file with accented characters in the file name seems to be correctly uploaded to a server, and can be found by the PHP scripts involved. And when the name is copied from CyberDuck and pasted into TextWrangler the name is as expected.
I tried it with 2 different servers - one at SimplyHosting.net running Linux, and my server on Mac OS X 10.6.5 with pureftpd 1.0.29 - the same computer I upload the files from.
As to the previous question - I'm not entirely sure what you're asking about "volume or network mount" - the files with the accented characters in the file name were on my Mac's startup disk - the same computer, and disk that has the server on it that the files are uploaded to.
Hope this all helps.
Edit about 30 minutes later with some extra information...
Actually I had forgotten to tell TextWrangler to "Show Invisibles" when I checked the file names as copied out of CyberDuck.
1 - for the Linux Server - it was all OK - the file name - Kénnátsîdeheads2.gif appeared as expected.
2 - for my Mac OS X Server, the file name copied from CyberDuck showed an upside down red question mark in each place where the accented characters were meant to be, but in the Finder the filename looked as expected, and in fact was identical to the filename that I uploaded, at least in as much as using the Finder to copy the file I uploaded from the source folder to the destination folder said the file already existed.
But the PHP scripts had detected the expected file name and written it into the MySQL database as expected - is this perhaps an issue with CyberDuck not displaying the name correctly?
Roger
comment:14 Changed on Jan 2, 2011 at 5:07:30 AM by theKiwi
Some more information on the connections to the 2 different servers.
1 - the connection to the Linux Server
220---------- Welcome to Pure-FTPd [privsep] [TLS] ---------- 220-You are user number 6 of 50 allowed. 220-Local time is now 21:41. Server port: 21. 220-This is a private system - No anonymous login 220-IPv6 connections are also welcome on this server. 220 You will be disconnected after 15 minutes of inactivity. AUTH TLS 234 AUTH TLS OK. USER wmgsorg 331 User wmgsorg OK. Password required PASS ******** 230-User wmgsorg has group access to: wmgsorg 230 OK. Current restricted directory is / PBSZ 0 200 PBSZ=0 PROT P 200 Data protection level set to "private" FEAT 211-Extensions supported: EPRT IDLE MDTM SIZE REST STREAM MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*; MLSD AUTH TLS PBSZ PROT ESTA PASV EPSV SPSV ESTP 211 End. NOOP 200 Zzz... SYST 215 UNIX Type: L8 PASV 227 Entering Passive Mode (64,38,39,4,135,51)
2 - the connection to my Mac OS X Server
220---------- Welcome to Pure-FTPd [TLS] ---------- 220-Local time is now 22:52. Server port: 21. 220-IPv6 connections are also welcome on this server. 220 You will be disconnected after 15 minutes of inactivity. USER Roger 331 User Roger OK. Password required PASS ******** 230-User Roger has group access to: 12 61 80 98 230- 100 204 305 20 230 OK. Current directory is / FEAT 211-Extensions supported: EPRT IDLE MDTM SIZE REST STREAM MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*; MLSD ESTP PASV EPSV SPSV ESTA AUTH TLS PBSZ PROT UTF8 211 End. OPTS UTF8 ON 200 OK, UTF-8 enabled PWD 257 "/" is your current location NOOP 200 Zzz... SYST 215 UNIX Type: L8 PASV 227 Entering Passive Mode (66,93,200,62,232,203) MLSD /
The notable difference is that the Mac OS X server indicates that
OPTS UTF8 ON 200 OK, UTF-8 enabled
while the Linux server doesn't show this. Knowing why this might be is outside my range of knowledge/skill.
The Linux server seemed to get the filename entirely correct while the Mac OS X server had trouble when the file name was copied out of CyberDuck and pasted into TextWrangler.
I also add that since my earlier testing some months ago on my Mac, I've upgraded to Mac OS X 10.6 from 10.5 and had to reinstall PureFTPd using the PureFTPd Manager application.
Roger
comment:15 in reply to: ↑ 13 Changed on Jan 2, 2011 at 5:33:24 PM by dkocher
Replying to theKiwi:
I've just now done as you asked - removed the custom property (how can I check that it truly was removed?),
Run the command twice, and it should output Defaults have not been changed.
comment:16 follow-up: ↓ 17 Changed on Jan 2, 2011 at 6:13:54 PM by theKiwi
OK - I ran it again and got
[MacPro:~] roger% defaults delete ch.sudo.cyberduck path.normalize.unicode 2011-01-02 13:11:45.147 defaults[61874:903] There is no (path.normalize.unicode) default for the (ch.sudo.cyberduck) domain. Defaults have not been changed.
Roger
comment:17 in reply to: ↑ 16 Changed on Jan 25, 2011 at 2:08:50 PM by dkocher
Replying to theKiwi:
OK - I ran it again and got
[MacPro:~] roger% defaults delete ch.sudo.cyberduck path.normalize.unicode 2011-01-02 13:11:45.147 defaults[61874:903] There is no (path.normalize.unicode) default for the (ch.sudo.cyberduck) domain. Defaults have not been changed.Roger
The property was successfully removed then.
Have you tried using a different character encoding such as ISO-8559-1?