1

I currently have functions in Postgres and Redshift that take a randomly generated string, hash it, then uses part of the hash to generate a random number between 0-99. I am trying to replicate this functionality in Azure SQL Data Warehouse such that I get the same value in SQL DW as I do in Postgres and Redshift.

The issue I'm running into is when I cast the result to a VARCHAR or use a string function the result is a much different string. I'd like to get the result of the md5 function as an identical VARCHAR.

To illustrate, here is a query in Azure SQL DW:

SELECT
  'abc123' as random_string,
  HASHBYTES('md5', 'abc123') as md5,
  CAST(HASHBYTES('md5', 'abc123') AS VARCHAR) as md5_varchar,
  RIGHT(HASHBYTES('md5', 'abc123'), 5) as md5_right
;

This yields

random_string,md5,md5_varchar
abc123,0xE99A18C428CB38D5F260853678922E03,éšÄ(Ë8Õò`…6x’.,6x’.

As you can see, the resulting varchar is much different from the output of the md5 function. Is there a way to convert the result of md5 into an identical string?

In Postgres and Redshift the result of the md5 function is a VARCHAR so it is simple to do transformations on it.

Here are the queries in Redshift and Postgres:

-- Redshift
SELECT
  'abc123' as random_string,
  right(strtol(right(md5('abc123'), 3), 16), 2)::INT as tranche
;

-- Postgres
SELECT
  'abc123' as random_string,
  right(('x' || lpad(right(md5('abc123'), 3), 4, '0')) :: BIT(16) :: INT :: VARCHAR, 2) :: INT AS tranche
;

Both functions return the value 87.

4

1 回答 1

4

使用 convert 应该可以解决这个问题:

CONVERT(VARCHAR(32),HashBytes('MD5', 'abc123'),2)

这是因为您可以为我们转换 varbinary 值时需要的样式定义参数。此处描述: https ://technet.microsoft.com/pl-pl/library/ms187928(v=sql.105).aspx

这是该文档中有关使用 convert 进行二进制转换的备注部分:

二进制样式 当表达式为 binary(n)、varbinary(n)、char(n) 或 varchar(n) 时,style 可以是下表中显示的值之一。表中未列出的样式值会返回错误。

0(默认)

将 ASCII 字符转换为二进制字节或将二进制字节转换为 ASCII 字符。每个字符或字节都按 1:1 转换。如果 data_type 是二进制类型,则将字符 0x 添加到结果的左侧。

1, 2

如果 data_type 是二进制类型,则表达式必须是字符表达式。表达式必须由偶数个十六进制数字组成(0、1、2、3、4、5、6、7、8、9、A、B、C、D、E、F、a、b、c , d, e, f)。如果样式设置为 1,则字符 0x 必须是表达式中的前两个字符。如果表达式包含奇数个字符或任何字符无效,则会引发错误。如果转换后的表达式的长度大于 data_type 的长度,则结果将被右截断。大于转换结果的固定长度 data_types 将在结果右侧添加零。如果 data_type 是字符类型,则表达式必须是二进制表达式。每个二进制字符都转换为两个十六进制字符。如果转换后的表达式的长度大于 data_type 长度,它将被右截断。如果data_type是一个固定大小的字符类型并且转换结果的长度小于它的data_type长度;转换后的表达式右侧添加空格以保持偶数个十六进制数字。字符 0x 将添加到样式 1 的转换结果的左侧。

于 2017-11-11T21:08:20.560 回答