This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

User-defined Functions

Operators in the User-defined Functions category

Home > User-defined Functions

Subcategories

1 - Python

Operators in the Python category

Home > User-defined Functions > Python

Operators

OperatorDescription
2-in Python UDFUser-defined function operator in Python script
Python Lambda FunctionModify or add a new column with more ease
Python Table ReducerReduce Table to Tuple
1-out Python UDFUser-defined function operator in Python script
Python UDFUser-defined function operator in Python script

Total: 5 operators

1.1 - 1-out Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
ColumnsList-The columns of the source
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# from pytexera import *
# class GenerateOperator(UDFSourceOperator):
# 
#     @overrides
#     
#     def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]:
#         yield

Output Ports

PortMode
0Set Snapshot

1.2 - 2-in Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

PortMode
0Set Snapshot

1.3 - Python Lambda Function

Modify or add a new column with more ease

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Add/Modify column(s)List-
↳ Attribute NameString-
↳ ExpressionString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Output Ports

PortMode
0Set Snapshot

1.4 - Python Table Reducer

Reduce Table to Tuple

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Output columnsList-
↳ Attribute NameString-
↳ ExpressionString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Output Ports

PortMode
0Set Snapshot

1.5 - Python UDF

User-defined function operator in Python script

Home > User Defined Functions > Python

Input Properties

PropertyRequirementTypeDefaultDescription
Python scriptCode (python)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

PortMode
0Set Snapshot

2 - Java

Operators in the Java category

Home > User-defined Functions > Java

Operators

OperatorDescription
Java UDFUser-defined function operator in Java script

Total: 1 operator

2.1 - Java UDF

User-defined function operator in Java script

Home > User Defined Functions > Java

Input Properties

PropertyRequirementTypeDefaultDescription
Java UDF scriptCode (java)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

Java UDF script

import org.apache.texera.amber.operator.map.MapOpExec;
import org.apache.texera.amber.core.tuple.Tuple;
import org.apache.texera.amber.core.tuple.TupleLike;
import scala.Function1;
import java.io.Serializable;

public class JavaUDFOpExec extends MapOpExec {
    public JavaUDFOpExec () {
        this.setMapFunc((Function1<Tuple, TupleLike> & Serializable) this::processTuple);
    }
    
    public TupleLike processTuple(Tuple tuple) {
        return tuple;
    }
}

Output Ports

PortMode
0Set Snapshot

3 - R

Operators in the R category

Home > User-defined Functions > R

Operators

OperatorDescription
R UDFUser-defined function operator in R script
1-out R UDFUser-defined function operator in R script

Total: 2 operators

3.1 - 1-out R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

PropertyRequirementTypeDefaultDescription
R Source UDF ScriptCode (r)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Use Tuple API?BooleanfalseCheck this box to use Tuple API, leave unchecked
to use Table API
ColumnsList-The columns of the source
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

R Source UDF Script

# If using Table API:
# function() { 
#   return (data.frame(Column_Here = "Value_Here")) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function() {
#   yield (list(text= "hello world!"))
# })

Output Ports

PortMode
0Set Snapshot

3.2 - R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

PropertyRequirementTypeDefaultDescription
R UDF ScriptCode (r)See template belowInput your code here
Worker countInteger1Specify how many parallel workers to launch
Use Tuple API?BooleanfalseCheck this box to use Tuple API, leave unchecked
to use Table API
Retain input columnsBooleantrueKeep the original input columns?
Extra output column(s)List-Name of the newly added output columns that the
UDF will produce, if any
↳ Attribute NameString-
↳ Attribute Typestring, integer, long, double, boolean,
timestamp, binary, large_binary
-

Default Code Template

R UDF Script

# If using Table API:
# function(table, port) { 
#   return (table) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function(tuple, port) {
#   yield (tuple)
# })

Output Ports

PortMode
0Set Snapshot