Hi, @jhillyerd
I’m encountering an issue where attachment filenames with non-ASCII characters are being parsed incorrectly. When the filename contains multibyte characters (Japanese, Korean, etc.) combined with a double quote ("), the parsed filename includes unwanted metadata appended to it.
Note: The filenames in question are coming from email attachments.
Examples
Input filename → Parsed result
テスト".pdf → テスト".pdf"; size=8759; creation-date="Tue, 02 Dec 2025 05:55:00 GMT"; modification-date="Tue, 02 Dec 2025 05:55:00 GMT
금지문자".pdf → 금지문자".pdf"; size=8759; creation-date="Tue, 02 Dec 2025 05:54:59 GMT"; modification-date="Tue, 02 Dec 2025 05:54:59 GMT
- Filenames containing only ASCII characters, including a double quote (") like filename".pdf, should be parsed correctly as is.
filename".pdf -> filename".pdf
Expected behavior
Filenames containing non-ASCII characters (e.g., Japanese, Korean) should also be parsed exactly as they appear in the attachment filename, without extra metadata appended.
Notes
The issue appears only when the attachment filename contains Unicode characters.
It seems the parser may be failing to detect the end of the filename correctly when multibyte characters are present before a quotation mark.
raw email
Content-Type: application;
name="=?utf-8?B?4YSA4YWz4Ya34YSM4YW14YSG4YWu4Yar4YSM4YWhIi5wZGY=?="
Content-Description:
=?utf-8?B?4YSA4YWz4Ya34YSM4YW14YSG4YWu4Yar4YSM4YWhIi5wZGY=?=
Content-Disposition: attachment;
filename="=?utf-8?B?4YSA4YWz4Ya34YSM4YW14YSG4YWu4Yar4YSM4YWhIi5wZGY=?=";
size=8759; creation-date="Tue, 02 Dec 2025 04:34:59 GMT";
modification-date="Tue, 02 Dec 2025 04:34:59 GMT"
Content-Transfer-Encoding: base64
Thanks!
Hi, @jhillyerd
I’m encountering an issue where attachment filenames with non-ASCII characters are being parsed incorrectly. When the filename contains multibyte characters (Japanese, Korean, etc.) combined with a double quote ("), the parsed filename includes unwanted metadata appended to it.
Examples
Input filename → Parsed result
テスト".pdf→テスト".pdf"; size=8759; creation-date="Tue, 02 Dec 2025 05:55:00 GMT"; modification-date="Tue, 02 Dec 2025 05:55:00 GMT금지문자".pdf→금지문자".pdf"; size=8759; creation-date="Tue, 02 Dec 2025 05:54:59 GMT"; modification-date="Tue, 02 Dec 2025 05:54:59 GMTfilename".pdf->filename".pdfExpected behavior
Filenames containing non-ASCII characters (e.g., Japanese, Korean) should also be parsed exactly as they appear in the attachment filename, without extra metadata appended.
Notes
The issue appears only when the attachment filename contains Unicode characters.
It seems the parser may be failing to detect the end of the filename correctly when multibyte characters are present before a quotation mark.
raw email
Thanks!