Java UDF to convert String to date in PIG

About Code

Many times it happens like you have received data from many systems and each system operates on a different kind of date format. But in the output you need to have a specific date format.
Let’s say you are receiving date string like :-
12-01-2018 12:22:33
2018/12/01 12:22:33
20181201 12:22:33

And you want the output of all to be:- 2018-12-01 12:22: 33 In that can you can use below UDF which is written in Java to be used in PIG scripting.

You just need to make a jar and call using below command in pig :-

Register The Jar and call

 
 
  1. com.transformation.udf.DateConversion(input_date_col,’yyyy-MM-dd HH:mm:ss’)

Now if there are multiple format in input_date_col then also it would convert all of them into expected format which is ’yyyy-MM-dd HH:mm:ss’

Below is the complete code:-

 

 
 
  1. package com.transformation.udf;
  2. import java.text.ParseException;
  3. import java.text.SimpleDateFormat;
  4. import java.util.Date;
  5. import org.apache.pig.EvalFunc;
  6. import org.apache.pig.backend.executionengine.ExecException;
  7. import org.apache.pig.data.Tuple;
  8. public class DateConversion extends EvalFunc<String> {
  9.       public String exec(Tuple input) throws ExecException {
  10.             String outputDate = null;
  11.             String inputDate = null;
  12.             String outputDateFormat = null;
  13.             String inputDateFormat = null;
  14.                   try {
  15.                         outputDate = dateConvertor((String) input.get(0),
  16.                                     (String) input.get(1));
  17.                   } catch (Exception e) {
  18.                         e.printStackTrace();
  19.                   }
  20.             return outputDate;
  21.       }
  22.       public static String dateConvertor(String inputDate,String outputDateFormat) throws Exception
  23.       {
  24.             SimpleDateFormat[] formats=null;
  25.             if (inputDate != null && !inputDate.isEmpty())
  26.             {
  27.                   if(inputDate.contains("-")||inputDate.contains("/"))
  28.                   {
  29.                         inputDate=inputDate.replace("-", ".");
  30.                         inputDate=inputDate.replace("/", ".");
  31.                   }
  32.                   char yyyyIdentifier=inputDate.charAt(4);
  33.                   if(yyyyIdentifier=='.'||yyyyIdentifier=='-'||yyyyIdentifier=='/')
  34.                   {
  35.                         formats=new SimpleDateFormat[] { new SimpleDateFormat("yyyy.MM.dd hh:mm:ss.SSS aa"),
  36.                                     new SimpleDateFormat("yyyy.MM.dd HH:mm:ss.SSS"),
  37.                                     new SimpleDateFormat("yyyy.MM.dd hh:mm:ss aa"),
  38.                                     new SimpleDateFormat("yyyy.MM.dd HH:mm:ss"),
  39.                                     new SimpleDateFormat("yyyy.MM.dd hh:mm aa"),
  40.                                     new SimpleDateFormat("yyyy.MM.dd HH:mm"),
  41.                                     new SimpleDateFormat("yyyy.MM.dd hh aa"),
  42.                                     new SimpleDateFormat("yyyy.MM.dd HH"),
  43.                                     new SimpleDateFormat("yyyy.MM.dd")};
  44.                   }
  45.                   else
  46.                   {
  47.                         formats=new SimpleDateFormat[] { new SimpleDateFormat("yyyyMMddHHmmss"),
  48.                                     new SimpleDateFormat("MM.dd.yyyy hh:mm:ss.SSS aa"),
  49.                                     new SimpleDateFormat("MM.dd.yyyy HH:mm:ss.SSS"),
  50.                                     new SimpleDateFormat("MM.dd.yyyy hh:mm:ss aa"),
  51.                                     new SimpleDateFormat("MM.dd.yyyy HH:mm:ss"),
  52.                                     new SimpleDateFormat("MM.dd.yyyy hh:mm aa"),
  53.                                     new SimpleDateFormat("MM.dd.yyyy HH:mm"),
  54.                                     new SimpleDateFormat("MM.dd.yyyy hh aa"),
  55.                                     new SimpleDateFormat("MM.dd.yyyy HH"),
  56.                                     new SimpleDateFormat("MM.dd.yyyy"),
  57.                                     new SimpleDateFormat("yyyyMMdd")};
  58.                   }
  59.                   return dateGenerator(formats,inputDate,outputDateFormat);
  60.             }
  61.             else
  62.                   return null;
  63.       }
  64.       public static String dateGenerator(SimpleDateFormat[] formats,String inputDate,String outputDateFormat) throws Exception
  65.       {
  66.             Date parsedDate = null;
  67.             String Output_Date=null;
  68.             for (int i = 0; i < formats.length; i++)
  69.             {
  70.                   try
  71.                   {
  72.                         if(inputDate.length()>19)
  73.                         {
  74.                               if(inputDate.contains("AM")||inputDate.contains("PM"))
  75.                               {
  76.                                     if(inputDate.contains(" AM"))
  77.                                     {
  78.                                           inputDate=inputDate.substring(0, 19);
  79.                                           inputDate=inputDate.concat(" AM");
  80.                                     }
  81.                                     if(inputDate.contains(" PM"))
  82.                                     {
  83.                                           inputDate=inputDate.substring(0, 19);
  84.                                           inputDate=inputDate.concat(" PM");
  85.                                     }
  86.                               }
  87.                               else{
  88.                                     inputDate=inputDate.substring(0, 19);
  89.                               }
  90.                         }
  91.                         parsedDate = formats[i].parse(inputDate);
  92.                         SimpleDateFormat dt = new SimpleDateFormat(outputDateFormat);
  93.                         Output_Date=dt.format(parsedDate);
  94.                         return Output_Date;
  95.                   }
  96.                   catch (ParseException e)
  97.                   {
  98.                         continue;
  99.                   }
  100.             }
  101.             return Output_Date;
  102.       }
  103. }

Load CSV file into hive AVRO table

Requirement You have comma separated(CSV) file and you want to create Avro table in hive on top of it, then ...
Read More

Load CSV file into hive PARQUET table

Requirement You have comma separated(CSV) file and you want to create Parquet table in hive on top of it, then ...
Read More

Hive Most Asked Interview Questions With Answers – Part II

What is bucketing and what is the use of it? Answer: Bucket is an optimisation technique which is used to ...
Read More
/ hive, hive interview, interview-qa

Spark Interview Questions Part-1

Suppose you have a spark dataframe which contains millions of records. You need to perform multiple actions on it. How ...
Read More

Leave a Reply