Home > User-defined Functions
This is the multi-page printable view of this section. Click here to print.
User-defined Functions
Operators in the User-defined Functions category
- 1: Python
- 1.1: 1-out Python UDF
- 1.2: 2-in Python UDF
- 1.3: Python Lambda Function
- 1.4: Python Table Reducer
- 1.5: Python UDF
- 2: Java
- 2.1: Java UDF
- 3: R
- 3.1: 1-out R UDF
- 3.2: R UDF
1 - Python
Operators in the Python category
Home > User-defined Functions > Python
Operators
| Operator | Description |
|---|---|
| 2-in Python UDF | User-defined function operator in Python script |
| Python Lambda Function | Modify or add a new column with more ease |
| Python Table Reducer | Reduce Table to Tuple |
| 1-out Python UDF | User-defined function operator in Python script |
| Python UDF | User-defined function operator in Python script |
Total: 5 operators
1.1 - 1-out Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Columns | List | - | The columns of the source | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
Python script
# from pytexera import *
# class GenerateOperator(UDFSourceOperator):
#
# @overrides
#
# def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]:
# yield
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
1.2 - 2-in Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | List | - | Name of the newly added output columns that the UDF will produce, if any | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
Python script
# Choose from the following templates:
#
# from pytexera import *
#
# class ProcessTupleOperator(UDFOperatorV2):
#
# @overrides
# def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
# yield tuple_
#
# class ProcessBatchOperator(UDFBatchOperator):
# BATCH_SIZE = 10 # must be a positive integer
#
# @overrides
# def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
# yield batch
#
# class ProcessTableOperator(UDFTableOperator):
#
# @overrides
# def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
# yield table
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
1.3 - Python Lambda Function
Modify or add a new column with more ease
Home > User Defined Functions > Python
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Add/Modify column(s) | List | - | ||
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Expression | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
1.4 - Python Table Reducer
Reduce Table to Tuple
Home > User Defined Functions > Python
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Output columns | List | - | ||
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Expression | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
1.5 - Python UDF
User-defined function operator in Python script
Home > User Defined Functions > Python
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Python script | ✓ | Code (python) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | List | - | Name of the newly added output columns that the UDF will produce, if any | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
Python script
# Choose from the following templates:
#
# from pytexera import *
#
# class ProcessTupleOperator(UDFOperatorV2):
#
# @overrides
# def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
# yield tuple_
#
# class ProcessBatchOperator(UDFBatchOperator):
# BATCH_SIZE = 10 # must be a positive integer
#
# @overrides
# def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
# yield batch
#
# class ProcessTableOperator(UDFTableOperator):
#
# @overrides
# def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
# yield table
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
2 - Java
Operators in the Java category
Home > User-defined Functions > Java
Operators
| Operator | Description |
|---|---|
| Java UDF | User-defined function operator in Java script |
Total: 1 operator
2.1 - Java UDF
User-defined function operator in Java script
Home > User Defined Functions > Java
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| Java UDF script | ✓ | Code (java) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | List | - | Name of the newly added output columns that the UDF will produce, if any | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
Java UDF script
import org.apache.texera.amber.operator.map.MapOpExec;
import org.apache.texera.amber.core.tuple.Tuple;
import org.apache.texera.amber.core.tuple.TupleLike;
import scala.Function1;
import java.io.Serializable;
public class JavaUDFOpExec extends MapOpExec {
public JavaUDFOpExec () {
this.setMapFunc((Function1<Tuple, TupleLike> & Serializable) this::processTuple);
}
public TupleLike processTuple(Tuple tuple) {
return tuple;
}
}
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
3 - R
Operators in the R category
Home > User-defined Functions > R
Operators
| Operator | Description |
|---|---|
| R UDF | User-defined function operator in R script |
| 1-out R UDF | User-defined function operator in R script |
Total: 2 operators
3.1 - 1-out R UDF
User-defined function operator in R script
Home > User Defined Functions > R
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| R Source UDF Script | ✓ | Code (r) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Use Tuple API? | ✓ | Boolean | false | Check this box to use Tuple API, leave unchecked to use Table API |
| Columns | List | - | The columns of the source | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
R Source UDF Script
# If using Table API:
# function() {
# return (data.frame(Column_Here = "Value_Here"))
# }
# If using Tuple API:
# library(coro)
# coro::generator(function() {
# yield (list(text= "hello world!"))
# })
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |
3.2 - R UDF
User-defined function operator in R script
Home > User Defined Functions > R
Input Properties
| Property | Requirement | Type | Default | Description |
|---|---|---|---|---|
| R UDF Script | ✓ | Code (r) | See template below | Input your code here |
| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to launch |
| Use Tuple API? | ✓ | Boolean | false | Check this box to use Tuple API, leave unchecked to use Table API |
| Retain input columns | ✓ | Boolean | true | Keep the original input columns? |
| Extra output column(s) | List | - | Name of the newly added output columns that the UDF will produce, if any | |
| ↳ Attribute Name | ✓ | String | - | |
| ↳ Attribute Type | ✓ | string, integer, long, double, boolean, timestamp, binary, large_binary | - |
Default Code Template
R UDF Script
# If using Table API:
# function(table, port) {
# return (table)
# }
# If using Tuple API:
# library(coro)
# coro::generator(function(tuple, port) {
# yield (tuple)
# })
Output Ports
| Port | Mode |
|---|---|
| 0 | Set Snapshot |