1. 首页
  2. 课程学习
  3. 讲义
  4. 站在hadoop上看hive

站在hadoop上看hive

上传者: 2020-09-24 22:49:12上传 PDF文件 3.11MB 热度 37次
开发量为什么要用he的 简单 public class SELECT Word public static class TokenizerMapper extends count(1) private final static Int Writable one new Int Writable(1) prigat e Text word new Text FROM public void map(obiect key, Text value, Context context select throws IOException, InterruptedException String Tokenizer itr new String Tokenizer(value toString()) while (itr word. set (itr next Token( explode( split(line, s′ context D AS word FROM article public static class t Int SumReducer extends Reduceri privat e Int Writable result new IntWritable( public void reduce (Text key, Iterable value at ext GROUP BY word y throws IOExcept ion, Int erruptedExcept ion i 卫tsum 0; 重or(工 ntWritable va1:wa1uea) ORDER BY Word s1m+=矿a1 cont ext write (key, result public static void main(String[l args)throws Exception Configuration conf new Configuration ( new GenericOntionsP er(conf, args). get RemainingArgs( if (other length ! =2) i ob. set TarByClass (Wordcount cLass 3 Hive架构&执行流程 HIVE JDBC ODBC Luna离线数据平台 Web u Console u Thrift server 口ryer (Compiler, Optimizer, Executor) Metastore Map reduce Jo HADOOP Cluster CHDES +Map-Reduc Name node Job Tracker Data Node = Task Tracker Hive执行流程&操作符 操作符 描述 TableScanOperator 扫描hive表数据 Reduce sinkOperator 创建将发送到 Reduce端的< key, reduce>对 JoinOperator Join两份数据 Selectoperator 选择输出列 FilesinkOperator 建立结果数据,输出至文件 FilterOperator 过滤输入数据 Group Byoperator Group By语句 MapJoinOperator /*+mapjoin(t * Limitoperator Limit语句 UnionOperator Union语句 逻辑执行计划 Tah leScanOperator Hi teT deR Etor Map阶段 GroupBy Operator ReduceSinkOpera tor GroumByOperator Selectoperator Reduce阶段 FiletutputOperator SQL: Enhancing SQL Semantics 强大 Hive SQL Datatypes Hive SQL Semantics SQL Compliance INT SELECT INSERT Hive 12 provides a wide TINYINT/SMALLINT/BIGINT GROUP BY ORDER BY SORT BY array of sQL data types and semantics so your BOOLEAN JOIN on explicit join key existIng tools Integrate FLOAT Inner, outer, cross and semi joins more seamlessly with DOUBLE Sub-queries in FROM clause Hadoop STRING ROLLUP and cUBe TIMESTAMP UNION BINARY Windowing Functions (OVER, RANK, etc) DECIMAL Custom Java UDFs ARRAY MAP STRUCT UNION Standard Aggregation(SUM, AVG, etc. DATE Advanced UDFs(ngram, Xpath, URL Available VARCHAR Sub-queries in WHERE, HAVING Hive.12 CHAR Expanded JOIN Syntax SQL Compliant Security(GRANT, etc.) Roadmap INSERT/UPDATE/DELETE (ACID) Hive== RDBMS TRUE FAlse HIVE RDBMS 查询语言 HQL SQL 数据存储 HDFS RAW Devices 寡物 NO ACID 索引 Yes(待测试) YES 执行 MR Excitor 扩展性 B|GBG大 20 nodes 数据规模 大 执行延迟 低 业务 数据分析 数据分析OR线上 硬件配置 般 8 查询慢?建索引?NO! 建 E NDEX ALL THETHINGS 世们 erroL, lle Your oracle dba 1.整体架构优化 2.MR阶段优化 3.JOB优化 4.SQL作业优化 5.平台优化
用户评论