1 回答
So I believe I figured out my issue. Though AWS Redshift was reporting the issue as Missing newline: Unexpected character 0x20 found at location 226
, after converting the string to a byte string I found that the actual value of the misencoded string was: x00
. Now it makes sense why myDict[str(k)] = re.sub(r'[^\x00-\x7F]+',' ', myDict[str(k)])
wouldn't properly filter out the character, as \x00
is within the acceptable range. I instead added another try/except block, where I now replace \x00
with an empty string, like so: myDict[str(k)] = re.sub('\x00', '', myDict[str(k)])
My .csvs are now devoid of the replacement character, so I believe the issue is resolved. It's odd that AWS reported the character as 0x20
when it was in reality x00
but I'm unsure if thats a bug on their end or if I'm misunderstanding character encoding. Thank you all who commented with suggestions, as I was only able to figure it out through your guidance. I know its a bit anticlimactic for me to answer my own question, so if this goes against StackOverflow guidelines feel free to close this question. Thanks.