0
4

1 回答 1

0

So I believe I figured out my issue. Though AWS Redshift was reporting the issue as Missing newline: Unexpected character 0x20 found at location 226, after converting the string to a byte string I found that the actual value of the misencoded string was: x00. Now it makes sense why myDict[str(k)] = re.sub(r'[^\x00-\x7F]+',' ', myDict[str(k)]) wouldn't properly filter out the character, as \x00 is within the acceptable range. I instead added another try/except block, where I now replace \x00 with an empty string, like so: myDict[str(k)] = re.sub('\x00', '', myDict[str(k)])

My .csvs are now devoid of the replacement character, so I believe the issue is resolved. It's odd that AWS reported the character as 0x20 when it was in reality x00 but I'm unsure if thats a bug on their end or if I'm misunderstanding character encoding. Thank you all who commented with suggestions, as I was only able to figure it out through your guidance. I know its a bit anticlimactic for me to answer my own question, so if this goes against StackOverflow guidelines feel free to close this question. Thanks.

于 2019-05-07T19:29:20.777 回答