2

我在铸造厂有两个数据集:df1 和 df2,df1 有带有模式的数据。

df2 是没有应用架构的空数据框。

使用数据代理我能够从 df1 中提取模式

{
  "foundrySchema": {
    "fieldSchemaList": [
      {...

 }
    ],
    "primaryKey": null,
    "dataFrameReaderClass": "n/a",
    "customMetadata": {}
  },
  "rows": []
}

如何通过休息调用将此模式应用于空数据帧 df2?

下面的铸造示例展示了如何提交一个空事务,这个例子没有展示如何应用模式

curl -X POST \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{}' \
  "${CATALOG_URL}/api/catalog/datasets/${DATASET_RID}/transactions/${TRANSACTION_RID}/commit"
4

1 回答 1

2

这是一个 Python 函数,用于为具有已提交事务的数据集上传模式:

from urllib.parse import quote_plus
import requests


def upload_dataset_schema(dataset_rid: str,
                          transaction_rid: str, schema: dict, token: str, branch='master'):
    """
    Uploads the foundry dataset schema for a dataset, transaction, branch combination
    Args:
        dataset_rid: The rid of the dataset
        transaction_rid: The rid of the transaction
        schema: The foundry schema
        branch: The branch

    Returns: None

    """
    base_url = "https://foundry-instance/foundry-metadata/api"
    response = requests.post(f"{base_url}/schemas/datasets/"
                             f"{dataset_rid}/branches/{quote_plus(branch)}",
                             params={'endTransactionRid': transaction_rid},
                             json=schema,
                             headers={
                                 'content-type': "application/json",
                                 'authorization': f"Bearer {token}",
                             }
                             )
    response.raise_for_status()
于 2021-05-04T12:56:27.393 回答