Guide to Static Functions for Apache Spark V3.0.0 Preview Jean-Georges Perrin
Total Page:16
File Type:pdf, Size:1020Kb
Jean-Georges Perrin MANNING Save 50% on this book – eBook, pBook, and MEAP. Enter mesias50 in the Promotional Code box when you checkout. Only at manning.com. Spark in Action, Second Edition by Jean-Georges Perrin ISBN 9781617295522 565 pages $47.99 Guide to static functions for Apache Spark v3.0.0 Preview Jean-Georges Perrin Copyright 2019 Manning Publications To pre-order or learn more about these books go to www.manning.com For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: Erin Twohey, [email protected] ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 Cover designer: Leslie Haimes ISBN: 9781617297953 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 - EBM - 24 23 22 21 20 19 contents Static functions ease your transformations 1 1.1 Functions per category 2 Popular functions 2 Aggregate functions 2 Arithmetical functions 2 Array manipulation functions 3 Binary operations 3 Comparison functions 3 Compute function 3 Conditional operations 3 Conversion functions 3 Data shape functions 3 Date and time functions 4 Digest functions 4 Encoding functions 4 Formatting functions 4 JSON (JavaScript object notation) functions 4 List functions 4 Mathematical functions 4 Navigation functions 5 Rounding functions 5 Sorting functions 5 Statistical functions 5 iii iv CONTENTS Streaming functions 5 String functions 5 Technical functions 5 Trigonometry functions 6 UDFs (user-defined functions) helpers 6 Validation functions 6 Deprecated functions 6 1.2 Functions appearance per version of Spark 6 Functions appeared in Spark v3.0.0 6 Functions appeared in Spark v2.4.0 6 Functions appeared in Spark v2.3.0 7 Functions appeared in Spark v2.2.0 7 Functions appeared in Spark v2.1.0 7 Functions appeared in Spark v2.0.0 7 Functions appeared in Spark v1.6.0 7 Functions appeared in Spark v1.5.0 7 Functions appeared in Spark v1.4.0 8 Functions appeared in Spark v1.3.0 8 1.3 Reference for functions 8 abs(Column e) 8 acos(Column e) 8 acos(String columnName) 8 add_months(Column startDate, Column numMonths) 9 add_months(Column startDate, int numMonths) 9 aggregate(Column expr, Column zero, scala.Function2<Column,Column,Column> merge) 9 aggregate(Column expr, Column zero, scala.Function2<Column,Column,Column> merge, scala.Function1<Column,Column> finish) 10 approx_count_distinct(Column e) 10 approx_count_distinct(Column e, double rsd) 10 approx_count_distinct(String columnName) 10 approx_count_distinct(String columnName, double rsd) 11 array(Column... cols) 11 array(String colName, String... colNames) 11 CONTENTS v array(String colName, scala.collection.Seq<String> colNames) 11 array(scala.collection.Seq<Column> cols) 12 array_contains(Column column, Object value) 12 array_distinct(Column e) 12 array_except(Column col1, Column col2) 12 array_intersect(Column col1, Column col2) 12 array_join(Column column, String delimiter) 13 array_join(Column column, String delimiter, String nullReplacement) 13 array_max(Column e) 13 array_min(Column e) 14 array_position(Column column, Object value) 14 array_remove(Column column, Object element) 14 array_repeat(Column e, int count) 14 array_repeat(Column left, Column right) 15 array_sort(Column e) 15 array_union(Column col1, Column col2) 15 arrays_overlap(Column a1, Column a2) 15 arrays_zip(Column... e) 16 arrays_zip(scala.collection.Seq<Column> e) 16 asc(String columnName) 16 asc_nulls_first(String columnName) 16 asc_nulls_last(String columnName) 17 ascii(Column e) 17 asin(Column e) 17 asin(String columnName) 17 atan(Column e) 17 atan(String columnName) 18 atan2(Column y, Column x) 18 atan2(Column y, String xName) 18 atan2(Column y, double xValue) 19 atan2(String yName, Column x) 19 atan2(String yName, String xName) 19 atan2(String yName, double xValue) 20 vi CONTENTS atan2(double yValue, Column x) 20 atan2(double yValue, String xName) 20 avg(Column e) 21 avg(String columnName) 21 base64(Column e) 21 bin(Column e) 21 bin(String columnName) 21 bitwiseNOT(Column e) 22 broadcast(Dataset<T> df) 22 bround(Column e) 22 bround(Column e, int scale) 22 bucket(Column numBuckets, Column e) 23 bucket(int numBuckets, Column e) 23 callUDF(String udfName, Column... cols) 23 callUDF(String udfName, scala.collection.Seq<Column> cols) 24 cbrt(Column e) 24 cbrt(String columnName) 24 ceil(Column e) 24 ceil(String columnName) 25 coalesce(Column... e) 25 coalesce(scala.collection.Seq<Column> e) 25 col(String colName) 25 collect_list(Column e) 25 collect_list(String columnName) 26 collect_set(Column e) 26 collect_set(String columnName) 26 column(String colName) 26 concat(Column... exprs) 27 concat(scala.collection.Seq<Column> exprs) 27 concat_ws(String sep, Column... exprs) 27 concat_ws(String sep, scala.collection.Seq<Column> exprs) 27 conv(Column num, int fromBase, int toBase) 28 corr(Column column1, Column column2) 28 CONTENTS vii corr(String columnName1, String columnName2) 28 cos(Column e) 28 cos(String columnName) 29 cosh(Column e) 29 cosh(String columnName) 29 count(Column e) 29 count(String columnName) 29 countDistinct(Column expr, Column... exprs) 30 countDistinct(Column expr, scala.collection.Seq<Column> exprs) 30 countDistinct(String columnName, String... columnNames) 30 countDistinct(String columnName, scala.collection.Seq<String> columnNames) 30 covar_pop(Column column1, Column column2) 31 covar_pop(String columnName1, String columnName2) 31 covar_samp(Column column1, Column column2) 31 covar_samp(String columnName1, String columnName2) 31 crc32(Column e) 32 cume_dist() 32 current_date() 32 current_timestamp() 32 date_add(Column start, Column days) 32 date_add(Column start, int days) 33 date_format(Column dateExpr, String format) 33 date_sub(Column start, Column days) 34 date_sub(Column start, int days) 34 date_trunc(String format, Column timestamp, format:) 34 datediff(Column end, Column start) 35 dayofmonth(Column e) 35 dayofweek(Column e) 35 dayofyear(Column e) 36 days(Column e) 36 decode(Column value, String charset) 36 degrees(Column e) 36 viii CONTENTS degrees(String columnName) 36 dense_rank() 37 desc(String columnName) 37 desc_nulls_first(String columnName) 37 desc_nulls_last(String columnName) 38 element_at(Column column, Object value) 38 encode(Column value, String charset) 38 exists(Column column, scala.Function1<Column,Column> f) 38 exp(Column e) 39 exp(String columnName) 39 explode(Column e) 39 explode_outer(Column e) 39 expm1(Column e) 40 expm1(String columnName) 40 expr(String expr) 40 factorial(Column e) 40 filter(Column column, scala.Function1<Column,Column> f) 40 filter(Column column, scala.Function2<Column,Column,Column> f) 41 first(Column e) 41 first(Column e, boolean ignoreNulls) 41 first(String columnName) 42 first(String columnName, boolean ignoreNulls) 42 flatten(Column e) 42 floor(Column e) 42 floor(String columnName) 43 forall(Column column, scala.Function1<Column,Column> f) 43 format_number(Column x, int d) 43 format_string(String format, Column... arguments) 43 format_string(String format, scala.collection.Seq<Column> arguments) 44 from_csv(Column e, Column schema, java.util.Map<String,String> options) 44 CONTENTS ix from_csv(Column e, StructType schema, scala.collection.immutable.Map<String,String> options) 44 from_json(Column e, Column schema) 45 from_json(Column e, Column schema, java.util.Map<String,String> options) 45 from_json(Column e, DataType schema) 45 from_json(Column e, DataType schema, java.util.Map<String,String> options) 46 from_json(Column e, DataType schema, scala.collection.immutable.Map<String,String> options) 46 from_json(Column e, String schema, java.util.Map<String,String> options) 46 from_json(Column e, String schema, scala.collection.immutable.Map<String,String> options) 47 from_json(Column e, StructType schema) 47 from_json(Column e, StructType schema, java.util.Map<String,String> options) 48 from_json(Column e, StructType schema, scala.collection.immutable.Map<String,String> options) 48 from_unixtime(Column ut) 48 from_unixtime(Column ut, String f) 49 from_utc_timestamp(Column ts, Column tz) 49 from_utc_timestamp(Column ts, String tz) 49 get_json_object(Column e, String path) 50 greatest(Column... exprs) 50 greatest(String columnName, String... columnNames) 50 greatest(String columnName, scala.collection.Seq<String> columnNames) 51 greatest(scala.collection.Seq<Column> exprs) 51 grouping(Column e) 51 grouping(String columnName)