| --- |
| title: Hashing - Guide - Apache DataFu Pig |
| version: 1.6.0 |
| section_name: Apache DataFu Pig - Guide |
| license: > |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --- |
| |
| ## Hashing |
| |
| ### MD5 |
| |
| The [MD5 hash](http://en.wikipedia.org/wiki/MD5) of a string can be computed with the |
| [MD5](https://datafu.apache.org/docs/datafu/<%= current_page.data.version %>/datafu/pig/hash/MD5.html) |
| UDF. |
| |
| For example: |
| |
| ```pig |
| define MD5 datafu.pig.hash.MD5(); |
| |
| --input: "hello, world!" |
| data_in = LOAD 'input' as (val:chararray); |
| data_out = FOREACH data_in GENERATE MD5(val) as val; |
| |
| -- produces: (fc3ff98e8c6a0d3087d515c0473f8677) |
| DUMP data_out; |
| ``` |
| |
| The function can instead output base64 by passing 'base64' to the constructor. |
| The default is 'hex' for hexadecimal. |
| |
| ```pig |
| define MD5 datafu.pig.hash.MD5('base64'); |
| ``` |
| |
| ### SHA |
| |
| A [SHA](http://en.wikipedia.org/wiki/Secure_Hash_Algorithm) hash can be computed with |
| [SHA](https://datafu.apache.org/docs/datafu/<%= current_page.data.version %>/datafu/pig/hash/SHA.html). |
| The output will be in hexadecimal. |
| |
| ```pig |
| define SHA datafu.pig.hash.SHA(); |
| |
| --input: "hello, world!" |
| data_in = LOAD 'input' as (val:chararray); |
| data_out = FOREACH data_in GENERATE SHA(val) as val; |
| |
| -- produces: (7509e5bda0c762d2bac7f90d758b5b2263fa01ccbc542ab5e3df163be08e6ca9) |
| DUMP data_out; |
| ``` |
| |
| By default this uses SHA-256. |
| The constructor also takes an optional parameter for the particular SHA algorithm to use. |
| To use SHA-512 instead: |
| |
| ```pig |
| define SHA512 datafu.pig.hash.SHA('512'); |
| ``` |