HIVE中的ObjectInspector,类似于数据类型检查器,告诉hive变量的类型是怎么样的。类似于Hadoop中的Text、IntWritable等这些类型一样。
hive认为,一个复杂的对象可以由一对ObjectInspector和JavaObject表示。 ObjectInspector不仅告诉我们对象的结构,而且还提供了访问对象内部字段的方法。ObjectInspector只记录类型并且可以直接返回,另外提供了一个获取实例的方法,该方法的参数是一个Object对象,即本身不存储具体的数据,而是根据传入的对象,利用自己的类型来转换成具有类型的对象。
HIVE中的数据类型分为了几种:
PRIMITIVE, LIST, MAP, STRUCT, UNION;
看源码可以看到。
public interface ObjectInspector extends Cloneable {
String getTypeName();
ObjectInspector.Category getCategory();
public static enum Category {
PRIMITIVE, -- hive原始类型
LIST, -- 列表类型
MAP, -- map类型
STRUCT, -- 结构类型
UNION; -- 组合类型
private Category() {
}
}
}
PrimitiveObjectInspector
PrimitiveCategory:
VOID,
BOOLEAN,
BYTE,
SHORT,
INT,
LONG,
FLOAT,
DOUBLE,
STRING,
DATE,
TIMESTAMP,
BINARY,
DECIMAL,
VARCHAR,
CHAR,
INTERVAL_YEAR_MONTH,
INTERVAL_DAY_TIME,
UNKNOWN;
主要有几种方面理解:
主要的实现和常用到的实现接口和类如下:
StandardListObjectInspector
其中接口ListObjectInspector中的getList方法也可以返回一个List类型
public interface ListObjectInspector extends ObjectInspector {
ObjectInspector getListElementObjectInspector();
Object getListElement(Object var1, int var2);
int getListLength(Object var1);
List<?> getList(Object var1);
}
是一个十分中要的实现类,源码是这样的:
// 一个list的检查器
public class StandardListObjectInspector implements SettableListObjectInspector {
private ObjectInspector listElementObjectInspector;
protected StandardListObjectInspector() {
}
protected StandardListObjectInspector(ObjectInspector listElementObjectInspector) {
this.listElementObjectInspector = listElementObjectInspector;
}
public final Category getCategory() {
return Category.LIST;
}
public ObjectInspector getListElementObjectInspector() {
return this.listElementObjectInspector;
}
public Object getListElement(Object data, int index) {
xxx xxx -- 省略 ...
return index >= 0 && index < list.size() ? list.get(index) : null;
}
}
public int getListLength(Object data) {
-- 省略 ...
return list.size();
}
public List<?> getList(Object data) {
-- 省略 ...
List<?> list = (List)data;
return list;
}
public String getTypeName() {
return "array<" + this.listElementObjectInspector.getTypeName() + ">";
}
public Object create(int size) {
List<Object> a = new ArrayList(size);
-- 省略 ...
return a;
}
public Object resize(Object list, int newSize) {
List a = (List)list;
-- 省略 ...
return a;
}
public Object set(Object list, int index, Object element) {
List<Object> a = (List)list;
a.set(index, element);
return a;
}
}
主要有几个方面:
以下也是很重要的接口和实现类
源码如下:
public interface MapObjectInspector extends ObjectInspector {
ObjectInspector getMapKeyObjectInspector();
ObjectInspector getMapValueObjectInspector();
Object getMapValueElement(Object var1, Object var2);
Map<?, ?> getMap(Object var1);
int getMapSize(Object var1);
}
是一个十分重要的实现类,看下源码:
public class StandardMapObjectInspector implements SettableMapObjectInspector {
// 分别key和value的检查器
private ObjectInspector mapKeyObjectInspector;
private ObjectInspector mapValueObjectInspector;
protected StandardMapObjectInspector() {
}
// key和value构造器
protected StandardMapObjectInspector(ObjectInspector mapKeyObjectInspector, ObjectInspector mapValueObjectInspector) {
this.mapKeyObjectInspector = mapKeyObjectInspector;
this.mapValueObjectInspector = mapValueObjectInspector;
}
// 返回key的检查器
public ObjectInspector getMapKeyObjectInspector() {
return this.mapKeyObjectInspector;
}
// 返回value的检查器
public ObjectInspector getMapValueObjectInspector() {
return this.mapValueObjectInspector;
}
// 获取key的value
public Object getMapValueElement(Object data, Object key) {
if (data != null && key != null) {
Map<?, ?> map = (Map)data;
return map.get(key);
} else {
return null;
}
}
// 获取map大小
public int getMapSize(Object data) {
if (data == null) {
return -1;
} else {
Map<?, ?> map = (Map)data;
return map.size();
}
}
// 返回一个map
public Map<?, ?> getMap(Object data) {
if (data == null) {
return null;
} else {
Map<?, ?> map = (Map)data;
return map;
}
}
public final Category getCategory() {
return Category.MAP;
}
public String getTypeName() {
return "map<" + this.mapKeyObjectInspector.getTypeName() + "," + this.mapValueObjectInspector.getTypeName() + ">";
}
// 创建一个新的map
public Object create() {
Map<Object, Object> m = new HashMap();
return m;
}
public Object clear(Object map) {
Map<Object, Object> m = (HashMap)map;
m.clear();
return m;
}
// 往map里放数据
public Object put(Object map, Object key, Object value) {
Map<Object, Object> m = (HashMap)map;
m.put(key, value);
return m;
}
public Object remove(Object map, Object key) {
Map<Object, Object> m = (HashMap)map;
m.remove(key);
return m;
}
}
看一个源码中的实现类例子:
mapOI = (MapObjectInspector) arguments[0];
// 得到一个map的key类型检查器
ObjectInspector mapKeyOI = mapOI.getMapKeyObjectInspector();
// 返回map类型
return ObjectInspectorFactory.getStandardListObjectInspector(mapKeyOI);
// MapObjectInspector类型
private transient MapObjectInspector mapOI;
Object mapObj = arguments[0].get();
// 转换为map
Map<?,?> mapVal = mapOI.getMap(mapObj);
STRUCT比较特殊,没有接口,直接使用的抽象类
public abstract class StructObjectInspector implements ObjectInspector {
public StructObjectInspector() {
}
public abstract List<? extends StructField> getAllStructFieldRefs();
public abstract StructField getStructFieldRef(String var1);
public abstract Object getStructFieldData(Object var1, StructField var2);
public abstract List<Object> getStructFieldsDataAsList(Object var1);
public boolean isSettable() {
return false;
}
public String toString() {
StringBuilder sb = new StringBuilder();
List<? extends StructField> fields = this.getAllStructFieldRefs();
sb.append(this.getClass().getName());
sb.append("<");
for(int i = 0; i < fields.size(); ++i) {
if (i > 0) {
sb.append(",");
}
sb.append(((StructField)fields.get(i)).getFieldObjectInspector().toString());
}
sb.append(">");
return sb.toString();
}
}
组合数据类型检查器
工厂和工具类
工厂一般创建新的检查器不是直接new一个实例,而是通过工厂类造出来一个
有两个可供使用的工厂:
PrimitiveObjectInspectorFactory -- 用于原始类型
ObjectInspectorFactory -- java类型
工具类中有大量的经常用到的工具可以使用,全类名:
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils