Texera Documentation

Input Properties

Property	Requirement	Type	Default	Description
File	✓	String	-
Limit		Integer	-	Max output count
Offset		Integer	-	Starting point of output

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.2 - CSV File Scan

Scan data from a CSV file

Input Properties

Property	Requirement	Type	Default	Description
File	✓	String	-
File Encoding	✓	UTF_8, UTF_16, US_ASCII	UTF_8	Decoding charset to use on input
Limit		Integer	-	Max output count
Offset		Integer	-	Starting point of output
Delimiter		String	,	Delimiter to separate each line into fields
Header		Boolean	true	Whether the CSV file contains a header line

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.3 - CSVOld File Scan

Scan data from a CSVOld file

Input Properties

Property	Requirement	Type	Default	Description
File	✓	String	-
File Encoding	✓	UTF_8, UTF_16, US_ASCII	UTF_8	Decoding charset to use on input
Limit		Integer	-	Max output count
Offset		Integer	-	Starting point of output
Delimiter		String	,	Delimiter to separate each line into fields
Header		Boolean	true	Whether the CSV file contains a header line

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.4 - File Lister

Select a dataset version and output one filename tuple per file

Input Properties

Property	Requirement	Type	Default	Description
Dataset	✓	String	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.5 - File Scan

Scan data from a file

Input Properties

Property	Requirement	Type	Default
File	✓	String	-
Encoding	✓	UTF_8, UTF_16, US_ASCII	UTF_8
Extract		Boolean	false
↳ Include Filename		Boolean	false
Attribute Type	✓	string, single string, integer, long, double, boolean, timestamp, binary, large binary	string
Attribute Name	✓	String	line
Limit		Integer	-
Offset		Integer	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.6 - File Scan From Input

Scan data from file paths provided by input tuples

Input Properties

Property	Requirement	Type	Default
Encoding	✓	UTF_8, UTF_16, US_ASCII	UTF_8
Extract		Boolean	false
Include Filename		Boolean	false
Attribute Type	✓	string, single string, integer, long, double, boolean, timestamp, binary, large binary	string
Attribute Name	✓	String	line
Limit		Integer	-
Offset		Integer	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.7 - JSONL File Scan

Scan data from a JSONL file

Input Properties

Property	Requirement	Type	Default	Description
File	✓	String	-
File Encoding	✓	UTF_8, UTF_16, US_ASCII	UTF_8	Decoding charset to use on input
Limit		Integer	-	Max output count
Offset		Integer	-	Starting point of output
Flatten	✓	Boolean	false	Flatten nested objects and arrays

Output Ports

Port	Mode
0	Set Snapshot

5.1.1.8 - Text Input

Source data from manually inputted text

Input Properties

Property	Requirement	Type	Default
Text	✓	String	-
Attribute Type	✓	string, single string, integer, long, double, boolean, timestamp, binary, large binary	string
Attribute Name	✓	String	line
Limit		Integer	-
Offset		Integer	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.2 - Database Connector

Operators in the Database Connector category

Home > Database Connector

Operators

Operator	Description
AsterixDB Source	Read data from a AsterixDB instance
MySQL Source	Read data from a MySQL instance
PostgreSQL Source	Read data from a PostgreSQL instance

Total: 3 operators

5.1.2.1 - AsterixDB Source

Read data from a AsterixDB instance

Home > Database Connector

Input Properties

Property	Requirement	Type	Default	Description
Host	✓	String	-
Port	✓	String	default	A port number or ‘default’
Database	✓	String	-
Table Name	✓	String	-
Limit		Long	-	Max output count
Offset		Long	-	Starting point of output
Keyword Search?		Boolean	false
↳ Keyword Search Column		String	-
↳ Keywords to Search		String	-	“[‘hello’, ‘world’], {‘mode’:‘any’}” OR "[‘hello’, ‘world’], {‘mode’:‘all’}"
Progressive?		Boolean	false
↳ Batch by Column		String	-
↳ Min		String	auto
↳ Max		String	auto
↳ Batch by Interval		Long	1000000000
Geo Search?		Boolean	false
↳ Geo Search By Columns		List	-	Column(s) to check if any of them is in the bounding box below
↳ Geo Search Bounding Box		List	-	At least 2 entries should be provided to form a bounding box. format of each entry: long, lat
Regex Search?		Boolean	false
↳ Regex Search By Column		String	-
↳ Regex to Search		String	-
Filter Condition?		Boolean	false
↳ Predicates		List	-	Multiple predicates in OR
↳ Attribute	✓	String	-
↳ Condition	✓	=, >, >=, <, <=, !=, is null, is not null	-
↳ Value		String	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.2.2 - MySQL Source

Read data from a MySQL instance

Home > Database Connector

Input Properties

Property	Requirement	Type	Default	Description
Host	✓	String	-
Port	✓	String	default	A port number or ‘default’
Database	✓	String	-
Table Name	✓	String	-
Username	✓	String	-
Password	✓	String	-
Limit		Long	-	Max output count
Offset		Long	-	Starting point of output
Keyword Search?		Boolean	false
↳ Keyword Search Column		String	-
↳ Keywords to Search		String	-
Progressive?		Boolean	false
↳ Batch by Column		String	-
↳ Min		String	auto
↳ Max		String	auto
↳ Batch by Interval		Long	1000000000

Output Ports

Port	Mode
0	Set Snapshot

5.1.2.3 - PostgreSQL Source

Read data from a PostgreSQL instance

Home > Database Connector

Input Properties

Property	Requirement	Type	Default	Description
Host	✓	String	-
Port	✓	String	default	A port number or ‘default’
Database	✓	String	-
Table Name	✓	String	-
Username	✓	String	-
Password	✓	String	-
Limit		Long	-	Max output count
Offset		Long	-	Starting point of output
Keyword Search?		Boolean	false
↳ Keyword Search Column		String	-
↳ Keywords to Search		String	-	E.g. ‘sore & throat’ for AND; ‘sore’, ’throat’ for OR. See official postgres documents for details
Progressive?		Boolean	false
↳ Batch by Column		String	-
↳ Min		String	auto
↳ Max		String	auto
↳ Batch by Interval		Long	1000000000

Output Ports

Port	Mode
0	Set Snapshot

5.1.3 - Search

Operators in the Search category

Home > Search

Operators

Operator	Description
Dictionary matcher	Matches tuples if they appear in a given dictionary
Keyword Search	Search for keyword(s) in a string column
Regular Expression	Search a regular expression in a string column
Substring Search	Search for Substring(s) in a string column

Total: 4 operators

5.1.3.1 - Dictionary matcher

Matches tuples if they appear in a given dictionary

Input Properties

Property	Requirement	Type	Default	Description
Dictionary	✓	String	-	Dictionary values separated by a comma
Attribute	✓	String	-	Column name to match
Result Attribute	✓	String	matched	Column name of the matching result
Matching Type	✓	Scan, Substring, Conjunction	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.3.2 - Keyword Search

Search for keyword(s) in a string column

Input Properties

Property	Requirement	Type	Default	Description
attribute	✓	String	-	Column to search keyword on
keywords	✓	String	-	Keywords

Output Ports

Port	Mode
0	Set Snapshot

5.1.3.3 - Regular Expression

Search a regular expression in a string column

Input Properties

Property	Requirement	Type	Default	Description
Case Insensitive		Boolean	false	Regex match is case sensitive
Attribute	✓	String	-	Column to search regex on
Regex	✓	String	-	Regular expression

Output Ports

Port	Mode
0	Set Snapshot

5.1.3.4 - Substring Search

Search for Substring(s) in a string column

Input Properties

Property	Requirement	Type	Default	Description
attribute	✓	String	-	Column to search substring on
Substring	✓	String	-	Substring
Case Sensitive	✓	Boolean	false	Whether the substring match is case sensitive

Output Ports

Port	Mode
0	Set Snapshot

5.1.4 - Data Cleaning

Operators in the Data Cleaning category

Home > Data Cleaning

Subcategories

Operators

Operator	Description
Distinct	Remove duplicate tuples
Filter	Performs a filter operation using OR between multiple predicates
Limit	Limit the number of output rows
Projection	Keeps or drops the column
Type Casting	Cast between types

Total: 5 operators

5.1.4.1 - Join

Operators in the Join category

Home > Data Cleaning > Join

Operators

Operator	Description
Cartesian Product	Append fields together to get the cartesian product of two inputs
Hash Join	Join two inputs
Interval Join	Join two inputs with left table join key in the range of [right table join key, right table join key + constant value]

Total: 3 operators

5.1.4.1.1 - Cartesian Product

Append fields together to get the cartesian product of two inputs

Home > Data Cleaning > Join

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.1.2 - Hash Join

Join two inputs

Home > Data Cleaning > Join

Input Properties

Property	Requirement	Type	Default	Description
Left Input Attribute	✓	String	-	Attribute to be joined on the Left Input
Right Input Attribute	✓	String	-	Attribute to be joined on the Right Input
Join Type	✓	inner, left outer, right outer, full outer	inner	Select the join type to execute

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.1.3 - Interval Join

Join two inputs with left table join key in the range of [right table join key, right table join key + constant value]

Home > Data Cleaning > Join

Input Properties

Property	Requirement	Type	Default	Description
Interval Constant	✓	Long	10	Left attri in (right, right + constant)
Include Left Bound	✓	Boolean	true	Include condition left attri = right attri
Include Right Bound	✓	Boolean	true	Include condition left attri = right attri
Time interval type		TimeIntervalType	day	Year, Month, Day, Hour, Minute or Second
Left Input attr	✓	String (integer, long, double, timestamp)	-	Choose one attribute in the left table
Right Input attr	✓	String	-	Choose one attribute in the right table

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.2 - Set

Operators in the Set category

Home > Data Cleaning > Set

Operators

Operator	Description
Difference	Find the set difference of two inputs
Intersect	Take the intersect of two inputs
SymmetricDifference	Find the symmetric difference (the set of elements which are in either of the sets, but not in their intersection) of two inputs
Union	Unions the output rows from multiple input operators

Operator	Description
Aggregate	Calculate different types of aggregation values

Total: 1 operator

5.1.4.3.1 - Aggregate

Calculate different types of aggregation values

Home > Data Cleaning > Aggregate

Input Properties

Property	Requirement	Type	Default	Description
Aggregations	✓	List	-	Multiple aggregation functions (min: 1, aggregations cannot be empty)
↳ Aggregate Func	✓	sum, count, average, min, max, concat	-	Sum, count, average, min, max, or concat
↳ Attribute	✓	String	-	Column to calculate average value
↳ Result Attribute	✓	String	-	Column name of average result
Group By Keys		List	-	Group by columns

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.4 - Sort

Operators in the Sort category

Home > Data Cleaning > Sort

Operators

Operator	Description
Sort	Sort based on the columns and sorting methods
Sort Partitions	Sort Partitions
Stable Merge Sort	Stable per-partition sort with multi-key ordering (incremental stack of sorted buckets)

Total: 3 operators

5.1.4.4.1 - Sort

Sort based on the columns and sorting methods

Home > Data Cleaning > Sort

Input Properties

Property	Requirement	Type	Default	Description
Attributes	✓	List	-	Column to perform sorting on
↳ Attribute	✓	String	-	Attribute name to sort by
↳ Sort Preference	✓	ASC, DESC	-	Sort preference (ASC or DESC)

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.4.2 - Sort Partitions

Sort Partitions

Home > Data Cleaning > Sort

Input Properties

Property	Requirement	Type	Default	Description
Attribute	✓	String (integer, long, double)	-	Attribute to sort (must be numerical)
Attribute Domain Min	✓	Long	0	Minimum value of the domain of the attribute
Attribute Domain Max	✓	Long	0	Maximum value of the domain of the attribute

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.4.3 - Stable Merge Sort

Stable per-partition sort with multi-key ordering (incremental stack of sorted buckets)

Home > Data Cleaning > Sort

Input Properties

Property	Requirement	Type	Default	Description
Sort Keys	✓	List	-	List of attributes to sort by with ordering preferences
↳ Attribute	✓	String	-	Attribute name to sort by
↳ Sort Preference	✓	ASC, DESC	-	Sort preference (ASC or DESC)

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.5 - Distinct

Remove duplicate tuples

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.6 - Filter

Performs a filter operation using OR between multiple predicates

Input Properties

Property	Requirement	Type	Default	Description
Predicates	✓	List	-	Multiple predicates in OR
↳ Attribute	✓	String	-
↳ Condition	✓	=, >, >=, <, <=, !=, is null, is not null	-
↳ Value		String	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.7 - Limit

Limit the number of output rows

Input Properties

Property	Requirement	Type	Default	Description
Limit	✓	Integer	0	The max number of output rows

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.8 - Projection

Keeps or drops the column

Input Properties

Property	Requirement	Type	Default	Description
Drop Option	✓	Boolean	false	Check to drop the selected attributes
Attributes	✓	List	-
↳ Attribute	✓	String	-	Attribute name in the schema
↳ Alias		String	-	Renamed attribute name

Output Ports

Port	Mode
0	Set Snapshot

5.1.4.9 - Type Casting

Cast between types

Input Properties

Property	Requirement	Type	Default	Description
TypeCasting Units	✓	List	-	Multiple type castings
↳ Attribute	✓	String	-	Attribute for type casting
↳ Cast type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-	Result type after type casting

Output Ports

Port	Mode
0	Set Snapshot

5.1.5 - Machine Learning

Operators in the Machine Learning category

Home > Machine Learning

Subcategories

5.1.5.1 - Sklearn

Operators in the Sklearn category

Home > Machine Learning > Sklearn

Subcategories

Sklearn Training

Operators

Operator	Description
Adaptive Boosting	Sklearn Adaptive Boosting Operator
Bagging	Sklearn Bagging Operator
Bernoulli Naive Bayes	Sklearn Bernoulli Naive Bayes Operator
Complement Naive Bayes	Sklearn Complement Naive Bayes Operator
Decision Tree	Sklearn Decision Tree Operator
Dummy Classifier	Sklearn Dummy Classifier Operator
Extra Tree	Sklearn Extra Tree Operator
Extra Trees	Sklearn Extra Trees Operator
Gaussian Naive Bayes	Sklearn Gaussian Naive Bayes Operator
Gradient Boosting	Sklearn Gradient Boosting Operator
K-nearest Neighbors	Sklearn K-nearest Neighbors Operator
Linear Regression	Sklearn Linear Regression Operator
Linear Support Vector Machine	Sklearn Linear Support Vector Machine Operator
Logistic Regression	Sklearn Logistic Regression Operator
Logistic Regression Cross Validation	Sklearn Logistic Regression Cross Validation Operator
Multi-layer Perceptron	Sklearn Multi-layer Perceptron Operator
Multinomial Naive Bayes	Sklearn Multinomial Naive Bayes Operator
Nearest Centroid	Sklearn Nearest Centroid Operator
Passive Aggressive	Sklearn Passive Aggressive Operator
Linear Perceptron	Sklearn Linear Perceptron Operator
Sklearn Prediction	Sklearn Prediction Operator
Probability Calibration	Sklearn Probability Calibration Operator
Random Forest	Sklearn Random Forest Operator
Ridge Regression	Sklearn Ridge Regression Operator
Ridge Regression Cross Validation	Sklearn Ridge Regression Cross Validation Operator
Stochastic Gradient Descent	Sklearn Stochastic Gradient Descent Operator
Support Vector Machine	Sklearn Support Vector Machine Operator
Sklearn Testing	It will generate scorers for Sklearn model

Total: 28 operators

5.1.5.1.1 - Sklearn Training

Operators in the Sklearn Training category

Home > Sklearn > Sklearn Training

Operators

Operator	Description
Training: Adaptive Boosting	Sklearn Training: Adaptive Boosting Operator
Training: Bagging Training	Sklearn Training: Bagging Training Operator
Training: Bernoulli Naive Bayes	Sklearn Training: Bernoulli Naive Bayes Operator
Training: Complement Naive Bayes	Sklearn Training: Complement Naive Bayes Operator
Training: Decision Tree	Sklearn Training: Decision Tree Operator
Training: Dummy Classifier	Sklearn Training: Dummy Classifier Operator
Training: Extra Tree	Sklearn Training: Extra Tree Operator
Training: Extra Trees	Sklearn Training: Extra Trees Operator
Training: Gaussian Naive Bayes	Sklearn Training: Gaussian Naive Bayes Operator
Training: Gradient Boosting	Sklearn Training: Gradient Boosting Operator
Training: K-nearest Neighbors	Sklearn Training: K-nearest Neighbors Operator
Training: Linear Regression	Sklearn Training: Linear Regression Operator
Training: Linear Support Vector Machine	Sklearn Training: Linear Support Vector Machine Operator
Training: Logistic Regression	Sklearn Training: Logistic Regression Operator
Training: Logistic Regression Cross Validation	Sklearn Training: Logistic Regression Cross Validation Operator
Training: Multi-layer Perceptron	Sklearn Training: Multi-layer Perceptron Operator
Training: Multinomial Naive Bayes	Sklearn Training: Multinomial Naive Bayes Operator
Training: Nearest Centroid	Sklearn Training: Nearest Centroid Operator
Training: Passive Aggressive	Sklearn Training: Passive Aggressive Operator
Training: Linear Perceptron	Sklearn Training: Linear Perceptron Operator
Training: Probability Calibration	Sklearn Training: Probability Calibration Operator
Training: Random Forest	Sklearn Training: Random Forest Operator
Training: Ridge Regression	Sklearn Training: Ridge Regression Operator
Training: Ridge Regression Cross Validation	Sklearn Training: Ridge Regression Cross Validation Operator
Training: Stochastic Gradient Descent	Sklearn Training: Stochastic Gradient Descent Operator
Training: Support Vector Machine	Sklearn Training: Support Vector Machine Operator

Total: 26 operators

5.1.5.1.1.1 - Training: Adaptive Boosting

Sklearn Training: Adaptive Boosting Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.2 - Training: Bagging Training

Sklearn Training: Bagging Training Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.3 - Training: Bernoulli Naive Bayes

Sklearn Training: Bernoulli Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.4 - Training: Complement Naive Bayes

Sklearn Training: Complement Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.5 - Training: Decision Tree

Sklearn Training: Decision Tree Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.6 - Training: Dummy Classifier

Sklearn Training: Dummy Classifier Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.7 - Training: Extra Tree

Sklearn Training: Extra Tree Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.8 - Training: Extra Trees

Sklearn Training: Extra Trees Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.9 - Training: Gaussian Naive Bayes

Sklearn Training: Gaussian Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.10 - Training: Gradient Boosting

Sklearn Training: Gradient Boosting Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.11 - Training: K-nearest Neighbors

Sklearn Training: K-nearest Neighbors Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.12 - Training: Linear Perceptron

Sklearn Training: Linear Perceptron Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.13 - Training: Linear Regression

Sklearn Training: Linear Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.14 - Training: Linear Support Vector Machine

Sklearn Training: Linear Support Vector Machine Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.15 - Training: Logistic Regression

Sklearn Training: Logistic Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.16 - Training: Logistic Regression Cross Validation

Sklearn Training: Logistic Regression Cross Validation Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.17 - Training: Multi-layer Perceptron

Sklearn Training: Multi-layer Perceptron Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.18 - Training: Multinomial Naive Bayes

Sklearn Training: Multinomial Naive Bayes Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.19 - Training: Nearest Centroid

Sklearn Training: Nearest Centroid Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.20 - Training: Passive Aggressive

Sklearn Training: Passive Aggressive Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.21 - Training: Probability Calibration

Sklearn Training: Probability Calibration Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.22 - Training: Random Forest

Sklearn Training: Random Forest Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.23 - Training: Ridge Regression

Sklearn Training: Ridge Regression Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.24 - Training: Ridge Regression Cross Validation

Sklearn Training: Ridge Regression Cross Validation Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.25 - Training: Stochastic Gradient Descent

Sklearn Training: Stochastic Gradient Descent Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.1.26 - Training: Support Vector Machine

Sklearn Training: Support Vector Machine Operator

Home > Machine Learning > Sklearn > Sklearn Training

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.2 - Adaptive Boosting

Sklearn Adaptive Boosting Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.3 - Bagging

Sklearn Bagging Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.4 - Bernoulli Naive Bayes

Sklearn Bernoulli Naive Bayes Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.5 - Complement Naive Bayes

Sklearn Complement Naive Bayes Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.6 - Decision Tree

Sklearn Decision Tree Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.7 - Dummy Classifier

Sklearn Dummy Classifier Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.8 - Extra Tree

Sklearn Extra Tree Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.9 - Extra Trees

Sklearn Extra Trees Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.10 - Gaussian Naive Bayes

Sklearn Gaussian Naive Bayes Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.11 - Gradient Boosting

Sklearn Gradient Boosting Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.12 - K-nearest Neighbors

Sklearn K-nearest Neighbors Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.13 - Linear Perceptron

Sklearn Linear Perceptron Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.14 - Linear Regression

Sklearn Linear Regression Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Degree	✓	Integer	1	Degree of polynomial function

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.15 - Linear Support Vector Machine

Sklearn Linear Support Vector Machine Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.16 - Logistic Regression

Sklearn Logistic Regression Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.17 - Logistic Regression Cross Validation

Sklearn Logistic Regression Cross Validation Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.18 - Multi-layer Perceptron

Sklearn Multi-layer Perceptron Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.19 - Multinomial Naive Bayes

Sklearn Multinomial Naive Bayes Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.20 - Nearest Centroid

Sklearn Nearest Centroid Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.21 - Passive Aggressive

Sklearn Passive Aggressive Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.22 - Probability Calibration

Sklearn Probability Calibration Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.23 - Random Forest

Sklearn Random Forest Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.24 - Ridge Regression

Sklearn Ridge Regression Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.25 - Ridge Regression Cross Validation

Sklearn Ridge Regression Cross Validation Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.26 - Sklearn Prediction

Sklearn Prediction Operator

Input Properties

Property	Requirement	Type	Default	Description
Model Attribute	✓	String	model	Attribute corresponding to ML model
Output Attribute Name	✓	String	prediction	Attribute name of the prediction result
Ground Truth Attribute Name To Ignore		String	-	Attribute name of the ground truth

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.27 - Sklearn Testing

It will generate scorers for Sklearn model

Input Properties

Property	Requirement	Type	Default	Description
Regression	✓	Boolean	false	Choose to solve a regression task
Model Attribute	✓	String	model	Attribute corresponding to ML model
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.28 - Stochastic Gradient Descent

Sklearn Stochastic Gradient Descent Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.1.29 - Support Vector Machine

Sklearn Support Vector Machine Operator

Input Properties

Property	Requirement	Type	Default	Description
Target Attribute	✓	String	-	Attribute in your dataset corresponding to target
Count Vectorizer		Boolean	false	Convert a collection of text documents to a matrix of token counts
↳ Text Attribute		String	-	Attribute in your dataset with text to vectorize
↳ Tfidf Transformer		Boolean	false	Transform a count matrix to a normalized tf or tf-idf representation

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.2 - Advanced Sklearn

Operators in the Advanced Sklearn category

Home > Machine Learning > Advanced Sklearn

Operators

Operator	Description
KNN Classifier	Sklearn KNN Classifier Operator
KNN Regressor	Sklearn KNN Regressor Operator
SVM Classifier	Sklearn SVM Classifier Operator
SVM Regressor	Sklearn SVM Regressor Operator

Total: 4 operators

5.1.5.2.1 - KNN Classifier

Sklearn KNN Classifier Operator

Input Properties

Property	Requirement	Type	Default	Description
Parameter Setting	✓	SklearnAdvancedKNNParameters	-
Ground Truth Attribute Column	✓	String	-	Ground truth attribute column
Selected Features	✓	List	-	Features used to train the model

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.2.2 - KNN Regressor

Sklearn KNN Regressor Operator

Input Properties

Property	Requirement	Type	Default	Description
Parameter Setting	✓	SklearnAdvancedKNNParameters	-
Ground Truth Attribute Column	✓	String	-	Ground truth attribute column
Selected Features	✓	List	-	Features used to train the model

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.2.3 - SVM Classifier

Sklearn SVM Classifier Operator

Input Properties

Property	Requirement	Type	Default	Description
Parameter Setting	✓	SklearnAdvancedSVCParameters	-
Ground Truth Attribute Column	✓	String	-	Ground truth attribute column
Selected Features	✓	List	-	Features used to train the model

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.2.4 - SVM Regressor

Sklearn SVM Regressor Operator

Input Properties

Property	Requirement	Type	Default	Description
Parameter Setting	✓	SklearnAdvancedSVRParameters	-
Ground Truth Attribute Column	✓	String	-	Ground truth attribute column
Selected Features	✓	List	-	Features used to train the model

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.3 - Hugging Face

Operators in the Hugging Face category

Home > Machine Learning > Hugging Face

Operators

Operator	Description
Hugging Face Iris Logistic Regression	Predict whether an iris is an Iris-setosa using a pre-trained logistic regression model
Hugging Face Sentiment Analysis	Analyzing Sentiments with a Twitter-Based Model from Hugging Face
Hugging Face Spam Detection	Spam Detection by SMS Spam Detection Model from Hugging Face
Hugging Face Text Summarization	Summarize the given text content with a mini2bert pre-trained model from Hugging Face

Total: 4 operators

5.1.5.3.1 - Hugging Face Iris Logistic Regression

Predict whether an iris is an Iris-setosa using a pre-trained logistic regression model

Input Properties

Property	Requirement	Type	Default	Description
Petal Length Cm Attribute	✓	String	-	Attribute in your dataset corresponding to PetalLengthCm
Petal Width Cm Attribute	✓	String	-	Attribute in your dataset corresponding to PetalWidthCm
Prediction Class Name	✓	String	Species_prediction	Output attribute name for the predicted class of species
Prediction Probability Name	✓	String	Species_probability	Output attribute name for the prediction’s probability of being a Iris-setosa

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.3.2 - Hugging Face Sentiment Analysis

Analyzing Sentiments with a Twitter-Based Model from Hugging Face

Input Properties

Property	Requirement	Type	Default	Description
Attribute	✓	String	-	Column to perform sentiment analysis on
Positive Result Attribute	✓	String	huggingface_sentiment_positive	Column name of the sentiment analysis result (positive)
Neutral Result Attribute	✓	String	huggingface_sentiment_neutral	Column name of the sentiment analysis result (neutral)
Negative Result Attribute	✓	String	huggingface_sentiment_negative	Column name of the sentiment analysis result (negative)

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.3.3 - Hugging Face Spam Detection

Spam Detection by SMS Spam Detection Model from Hugging Face

Input Properties

Property	Requirement	Type	Default	Description
Attribute	✓	String	-	Column to perform spam detection on
Spam Result Attribute	✓	String	is_spam	Column name of whether spam or not
Score Result Attribute	✓	String	score	Column name of Probability for classification

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.3.4 - Hugging Face Text Summarization

Summarize the given text content with a mini2bert pre-trained model from Hugging Face

Input Properties

Property	Requirement	Type	Default	Description
Attribute	✓	String	-	Attribute to perform text summarization on
Result Attribute Name		String	summary	Attribute name of the text summary result

Output Ports

Port	Mode
0	Set Snapshot

5.1.5.4 - Machine Learning General

Operators in the Machine Learning General category

Home > Machine Learning > Machine Learning General

Operators

Operator	Description
Machine Learning Scorer	Scorer for machine learning models

Total: 1 operator

5.1.5.4.1 - Machine Learning Scorer

Scorer for machine learning models

Home > Machine Learning > Machine Learning General

Input Properties

Property	Requirement	Type	Default	Description
Regression	✓	Boolean	false	Choose to solve a regression task
↳ Scorer Functions		List	-	Select classification tasks metrics
↳ Scorer Functions		List	-	Select regression tasks metrics
Actual Value	✓	String	-	Specify the label attribute
Predicted Value	✓	String	-	Specify the attribute generated by the model

Output Ports

Port	Mode
0	Set Snapshot

5.1.6 - Utilities

Operators in the Utilities category

Home > Utilities

Operators

Operator	Description
Random K Sampling	Random sampling with given percentage
Reservoir Sampling	Reservoir Sampling with k items being kept randomly
Split	Split data to two different ports
Unnest String	Unnest the string values in the column separated by a delimiter to multiple values

Total: 4 operators

5.1.6.1 - Random K Sampling

Random sampling with given percentage

Input Properties

Property	Requirement	Type	Default	Description
Random K Sample Percentage	✓	Integer	0	Random k sampling with given percentage

Output Ports

Port	Mode
0	Set Snapshot

5.1.6.2 - Reservoir Sampling

Reservoir Sampling with k items being kept randomly

Input Properties

Property	Requirement	Type	Default	Description
Number Of Item Sampled In Reservoir Sampling	✓	Integer	0	Reservoir sampling with k items being kept randomly

Output Ports

Port	Mode
0	Set Snapshot

5.1.6.3 - Split

Split data to two different ports

Input Properties

Property	Type	Default	Description
Split Percentage	Integer	80	Percentage of data going to the upper port
Auto-Generate Seed	Boolean	true	Shuffle the data based on a random seed
↳ Seed	Integer	1	An int for reproducible output across multiple runs

Output Ports

Port	Mode
0	Set Snapshot
1	Set Snapshot

5.1.6.4 - Unnest String

Unnest the string values in the column separated by a delimiter to multiple values

Input Properties

Property	Requirement	Type	Default	Description
Delimiter	✓	String	,	String that separates the data
Attribute	✓	String	-	Column of the string to unnest
Result Attribute	✓	String	unnestResult	Column name of the unnest result

Output Ports

Port	Mode
0	Set Snapshot

5.1.7 - External API

Operators in the External API category

Home > External API

Operators

Operator	Description
Reddit Search	Search for recent posts with python-wrapped Reddit API, PRAW
Twitter Full Archive Search API	Retrieve data from Twitter Full Archive Search API
Twitter Search API	Retrieve data from Twitter Search API
URL Fetcher	Fetch the content of a single URL

Total: 4 operators

5.1.7.1 - Reddit Search

Search for recent posts with python-wrapped Reddit API, PRAW

Input Properties

Property	Requirement	Type	Default	Description
Client Id	✓	String	-	Client id that uses to access Reddit API
Client Secret	✓	String	-	Client secret that uses to access Reddit API
Query	✓	String	-	Search query
Limit	✓	Integer	100	Up to 1000
Sorting	✓	none, controversial, gilded, hot, new, rising, top	none	The sorting method, hot, new, etc

Output Ports

Port	Mode
0	Set Snapshot

5.1.7.2 - Twitter Full Archive Search API

Retrieve data from Twitter Full Archive Search API

Input Properties

Property	Requirement	Type	Default	Description
API Key	✓	String	-
API Secret Key	✓	String	-
Stop Upon Rate Limit	✓	Boolean	false	Stop when hitting rate limit?
Search Query	✓	String	-	Up to 1024 characters (Limited By Twitter)
From Datetime	✓	String	2021-04-01T00:00:00Z	ISO 8601 format
To Datetime	✓	String	2021-05-01T00:00:00Z	ISO 8601 format
Limit	✓	Integer	100	Maximum number of tweets to retrieve

Output Ports

Port	Mode
0	Set Snapshot

5.1.7.3 - Twitter Search API

Retrieve data from Twitter Search API

Input Properties

Property	Requirement	Type	Default	Description
API Key	✓	String	-
API Secret Key	✓	String	-
Stop Upon Rate Limit	✓	Boolean	false	Stop when hitting rate limit?
Search Query	✓	String	-	Up to 1024 characters (Limited by Twitter)
Limit	✓	Integer	100	Maximum number of tweets to retrieve

Output Ports

Port	Mode
0	Set Snapshot

5.1.7.4 - URL Fetcher

Fetch the content of a single URL

Input Properties

Property	Requirement	Type	Default	Description
URL	✓	String	-	Only accepts standard URL format
Decoding	✓	UTF-8, RAW BYTES	-	The decoding method for the url content

Output Ports

Port	Mode
0	Set Snapshot

5.1.8 - User-defined Functions

Operators in the User-defined Functions category

Home > User-defined Functions

Subcategories

5.1.8.1 - Python

Operators in the Python category

Home > User-defined Functions > Python

Operators

Operator	Description
2-in Python UDF	User-defined function operator in Python script
Python Lambda Function	Modify or add a new column with more ease
Python Table Reducer	Reduce Table to Tuple
1-out Python UDF	User-defined function operator in Python script
Python UDF	User-defined function operator in Python script

Total: 5 operators

5.1.8.1.1 - 1-out Python UDF

User-defined function operator in Python script

Input Properties

Property	Requirement	Type	Default	Description
Python script	✓	Code (python)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Columns		List	-	The columns of the source
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

Python script

# from pytexera import *
# class GenerateOperator(UDFSourceOperator):
# 
#     @overrides
#     
#     def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]:
#         yield

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.1.2 - 2-in Python UDF

User-defined function operator in Python script

Input Properties

Property	Requirement	Type	Default	Description
Python script	✓	Code (python)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Retain input columns	✓	Boolean	true	Keep the original input columns?
Extra output column(s)		List	-	Name of the newly added output columns that the UDF will produce, if any
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.1.3 - Python Lambda Function

Modify or add a new column with more ease

Input Properties

Property	Requirement	Type	Default
Add/Modify column(s)		List	-
↳ Attribute Name	✓	String	-
↳ Expression	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.1.4 - Python Table Reducer

Reduce Table to Tuple

Input Properties

Property	Requirement	Type	Default
Output columns		List	-
↳ Attribute Name	✓	String	-
↳ Expression	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.1.5 - Python UDF

User-defined function operator in Python script

Input Properties

Property	Requirement	Type	Default	Description
Python script	✓	Code (python)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Retain input columns	✓	Boolean	true	Keep the original input columns?
Extra output column(s)		List	-	Name of the newly added output columns that the UDF will produce, if any
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

Python script

# Choose from the following templates:
# 
# from pytexera import *
# 
# class ProcessTupleOperator(UDFOperatorV2):
#     
#     @overrides
#     def process_tuple(self, tuple_: Tuple, port: int) -> Iterator[Optional[TupleLike]]:
#         yield tuple_
# 
# class ProcessBatchOperator(UDFBatchOperator):
#     BATCH_SIZE = 10 # must be a positive integer
# 
#     @overrides
#     def process_batch(self, batch: Batch, port: int) -> Iterator[Optional[BatchLike]]:
#         yield batch
# 
# class ProcessTableOperator(UDFTableOperator):
# 
#     @overrides
#     def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]:
#         yield table

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.2 - Java

Operators in the Java category

Home > User-defined Functions > Java

Operators

Operator	Description
Java UDF	User-defined function operator in Java script

Total: 1 operator

5.1.8.2.1 - Java UDF

User-defined function operator in Java script

Home > User Defined Functions > Java

Input Properties

Property	Requirement	Type	Default	Description
Java UDF script	✓	Code (java)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Retain input columns	✓	Boolean	true	Keep the original input columns?
Extra output column(s)		List	-	Name of the newly added output columns that the UDF will produce, if any
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

Java UDF script

import org.apache.texera.amber.operator.map.MapOpExec;
import org.apache.texera.amber.core.tuple.Tuple;
import org.apache.texera.amber.core.tuple.TupleLike;
import scala.Function1;
import java.io.Serializable;

public class JavaUDFOpExec extends MapOpExec {
    public JavaUDFOpExec () {
        this.setMapFunc((Function1<Tuple, TupleLike> & Serializable) this::processTuple);
    }
    
    public TupleLike processTuple(Tuple tuple) {
        return tuple;
    }
}

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.3 - R

Operators in the R category

Home > User-defined Functions > R

Operators

Operator	Description
R UDF	User-defined function operator in R script
1-out R UDF	User-defined function operator in R script

Total: 2 operators

5.1.8.3.1 - 1-out R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

Property	Requirement	Type	Default	Description
R Source UDF Script	✓	Code (r)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Use Tuple API?	✓	Boolean	false	Check this box to use Tuple API, leave unchecked to use Table API
Columns		List	-	The columns of the source
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

R Source UDF Script

# If using Table API:
# function() { 
#   return (data.frame(Column_Here = "Value_Here")) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function() {
#   yield (list(text= "hello world!"))
# })

Output Ports

Port	Mode
0	Set Snapshot

5.1.8.3.2 - R UDF

User-defined function operator in R script

Home > User Defined Functions > R

Input Properties

Property	Requirement	Type	Default	Description
R UDF Script	✓	Code (r)	`See template below`	Input your code here
Worker count	✓	Integer	1	Specify how many parallel workers to launch
Use Tuple API?	✓	Boolean	false	Check this box to use Tuple API, leave unchecked to use Table API
Retain input columns	✓	Boolean	true	Keep the original input columns?
Extra output column(s)		List	-	Name of the newly added output columns that the UDF will produce, if any
↳ Attribute Name	✓	String	-
↳ Attribute Type	✓	string, integer, long, double, boolean, timestamp, binary, large_binary	-

Default Code Template

R UDF Script

# If using Table API:
# function(table, port) { 
#   return (table) 
# }

# If using Tuple API:
# library(coro)
# coro::generator(function(tuple, port) {
#   yield (tuple)
# })

Output Ports

Port	Mode
0	Set Snapshot

5.1.9 - Visualization

Operators in the Visualization category

Home > Visualization

Subcategories

Operators

Operator	Description
Nested Table	Visualize Data in a Depth Two Nested Table

Total: 1 operator

5.1.9.1 - Basic

Operators in the Basic category

Home > Visualization > Basic

Operators

Operator	Description
Bar Chart	Visualize data in a Bar Chart
Bubble Chart	A 3D Scatter Plot; Bubbles are graphed using x and y labels, and their sizes determined by a z-value.
Dot Plot	Visualize data using a dot plot
Dumbbell Plot	Visualize data in a Dumbbell Plot. A dumbbell plot (also known as a lollipop chart) is typically used to compare two distinct values or time points for the same entity.
Figure Factory Table	Visualize data in a figure factory table
Filled Area Plot	Visualize data in a filled area plot
Gantt Chart	A Gantt chart is a type of bar chart that illustrates a project schedule. The chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.
Hierarchy Chart	Visualize data in hierarchy
Icicle Chart	Visualize hierarchical data from root to leaves
Line Chart	View the result in line chart
Pie Chart	Visualize data in a Pie Chart
Range Slider	Visualize data in a Range Slider
Sankey Diagram	Visualize data using a Sankey diagram
Scatter Plot	View the result in a scatterplot
Tables Plot	Visualize data in a table chart.
Time Series Plot	Visualize trends and patterns over time.

Total: 16 operators

5.1.9.1.1 - Bar Chart

Visualize data in a Bar Chart

Input Properties

Property	Requirement	Type	Default	Description
Fields	✓	String	-	Visualize categorical data in a Bar Chart
Category Column		String	No Selection	Optional - Select a column to Color Code the Categories
Horizontal Orientation		Boolean	false	Orientation Style
Pattern		String	-	Add texture to the chart based on an attribute
Value Column	✓	String (integer, long, double)	-	The value associated with each category

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.2 - Bubble Chart

A 3D Scatter Plot; Bubbles are graphed using x and y labels, and their sizes determined by a z-value.

Input Properties

Property	Requirement	Type	Default	Description
X-Column	✓	String	-	Data column for the x-axis
Y-Column	✓	String	-	Data column for the y-axis
Z-Column	✓	String	-	Data column to determine bubble size
Enable Color		Boolean	false	Colors bubbles using a data column
Color-Column	✓	String	-	Picks data column to color bubbles with if color is enabled

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.3 - Dot Plot

Visualize data using a dot plot

Input Properties

Property	Requirement	Type	Default	Description
Count Attribute	✓	String	-	The attribute for the counting of the dot plot

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.4 - Dumbbell Plot

Visualize data in a Dumbbell Plot. A dumbbell plot (also known as a lollipop chart) is typically used to compare two distinct values or time points for the same entity.

Input Properties

Property	Requirement	Type	Default	Description
Category Column Name	✓	String	-	The name of the category column
Dumbbell Start Value	✓	String	-	The start point value of each dumbbell
Dumbbell End Value	✓	String	-	The end value of each dumbbell
Measurement Column Name	✓	String (integer, long, double)	-	The name of the measurement column
Compared Column Name	✓	String	-	The column name that is being compared
Dots		List	-
↳ Dot Column Value	✓	String (integer, long, double)	-	Value for dot axis
Show Legends?		Boolean	false	Whether to show legends in the graph

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.5 - Figure Factory Table

Visualize data in a figure factory table

Input Properties

Property	Requirement	Type	Default	Description
Font Size		Double	12	Font size of the Figure Factory Table
Font Color (Hex Code)		String	#000000	Font color of the Figure Factory Table
Row Height		Double	30	Row height of the Figure Factory Table
Add Attribute	✓	List	[1 items]	List of columns to include in the figure factory table
↳ Attribute Name	✓	String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.6 - Filled Area Plot

Visualize data in a filled area plot

Input Properties

Property	Requirement	Type	Default	Description
X-axis Attribute	✓	String	-	The attribute for your x-axis
Y-axis Attribute	✓	String	-	The attribute for your y-axis
Line Group		String	-	The attribute for group of each line
Color		String	-	Choose an attribute to color the plot
Split Plot by Line Group	✓	Boolean	false	Do you want to split the graph
Pattern		String	-	Add texture to the chart based on an attribute

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.7 - Gantt Chart

A Gantt chart is a type of bar chart that illustrates a project schedule. The chart lists the tasks to be performed on the vertical axis, and time intervals on the horizontal axis. The width of the horizontal bars in the graph shows the duration of each activity.

Input Properties

Property	Requirement	Type	Default	Description
Pattern		String	-	Add texture to the chart based on an attribute
Start Datetime Column	✓	String (timestamp)	-	The start timestamp of the task
Finish Datetime Column	✓	String (timestamp)	-	The end timestamp of the task
Task Column	✓	String	-	The name of the task
Color Column		String	-	Column to color tasks

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.8 - Hierarchy Chart

Visualize data in hierarchy

Input Properties

Property	Requirement	Type	Default	Description
Chart Type	✓	treemap, sunburst	-	Treemap or Sunburst
Hierarchy Path	✓	List	-	Hierarchy of attributes from a higher-level category to lower-level category
↳ Attribute Name	✓	String	-
Value Column	✓	String (integer, long, double)	-	The value associated with the size of each sector in the chart

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.9 - Icicle Chart

Visualize hierarchical data from root to leaves

Input Properties

Property	Requirement	Type	Default	Description
Hierarchy Path	✓	List	-	Hierarchy of attributes from a root (higher-level category) to leaves (lower-level category)
↳ Attribute Name	✓	String	-
Value Column	✓	String (integer, long, double)	-	The value associated with the size of each sector in the chart

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.10 - Line Chart

View the result in line chart

Input Properties

Property	Requirement	Type	Default	Description
Y Label		String	Y Axis	The label for y axis
X Label		String	X Axis	The label for x axis
Lines	✓	List	-
↳ Y Value	✓	String	-	Value for y axis
↳ X Value	✓	String	-	Value for x axis
↳ Line Mode	✓	line, dots, line with dots	line with dots
↳ Line Name		String	-
↳ Line Color		String	-	Must be a valid CSS color or hex color string

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.11 - Pie Chart

Visualize data in a Pie Chart

Input Properties

Property	Requirement	Type	Default	Description
Value Column	✓	String (integer, long, double)	-	The value associated with slice of pie
Name Column	✓	String	-	The name of the slice of pie

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.12 - Range Slider

Visualize data in a Range Slider

Input Properties

Property	Requirement	Type	Default	Description
Y-axis	✓	String	-	The name of the column to represent y-axis
X-axis	✓	String	-	The name of the column to represent the x-axis
Handle Duplicates		Nothing, Mean, Sum	NOTHING	How to handle duplicate values in y-axis

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.13 - Sankey Diagram

Visualize data using a Sankey diagram

Input Properties

Property	Requirement	Type	Default	Description
Source Attribute	✓	String	-	The source node of the Sankey diagram
Target Attribute	✓	String	-	The target node of the Sankey diagram
Value Attribute	✓	String	-	The value/volume of the flow between source and target

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.14 - Scatter Plot

View the result in a scatterplot

Input Properties

Property	Requirement	Type	Default	Description
X-Column	✓	String (integer, double)	-	X Column
Y-Column	✓	String (integer, double)	-	Y Column
Alpha Value		Double	1.0	Alpha (opacity) value from 0.0 (transparent) to 1.0 (opaque)
Color-Column		String	-	Dots will be assigned different colors based on their values of this column
log scale X		Boolean	false	Values in X-column is log-scaled
log scale Y		Boolean	false	Values in Y-column is log-scaled
Hover column		String	-	Column value to display when a dot is hovered over

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.15 - Tables Plot

Visualize data in a table chart.

Input Properties

Property Requirement Type Default Description

Add Attribute

✓

List

-	List of columns to include in the table chart
↳ Attribute Name	✓	String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.1.16 - Time Series Plot

Visualize trends and patterns over time.

Input Properties

Property	Requirement	Type	Default	Description
Time Column	✓	String	-	The column containing time/date values (e.g., Date, Timestamp)
Value Column	✓	String	-	The numerical column to plot on the Y-axis (e.g., Sales, Temperature)
Category Column		String	No Selection	Optional - A categorical column to create separate lines
Facet Column		String	No Selection	Optional - A column to create separate subplots
Plot Type	✓	String	line	Select the type of time series plot (line, area)
Show Range Slider		Boolean	false	Display a range slider at the bottom of the plot

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2 - Statistical

Operators in the Statistical category

Home > Visualization > Statistical

Operators

Operator	Description
Box/Violin Plot	Visualize data using either a Box Plot or a Violin Plot. Box plots are drawn as a box with a vertical line down the middle which is mean value, and has horizontal lines attached to each side (known as “whiskers”). Violin plots provide more detail by showing a smoothed density curve on each side, and also include a box plot inside for comparison.
Continuous Error Bands	Visualize error or uncertainty along a continuous line
Empirical Cumulative Distribution Plot	Visualize the empirical cumulative distribution of a numeric column.
Histogram	Visualize data in a Histogram Chart
Histogram2D	Displays a bivariate histogram as a density heatmap
Scatter Matrix Chart	Visualize datasets in a Scatter Matrix
Strip Chart	Visualize distribution of data points as a strip plot
Tree Plot	Visualize hierarchical data as a top-down, interactive, auto-sizing tree

Total: 8 operators

5.1.9.2.1 - Box/Violin Plot

Visualize data using either a Box Plot or a Violin Plot. Box plots are drawn as a box with a vertical line down the middle which is mean value, and has horizontal lines attached to each side (known as “whiskers”). Violin plots provide more detail by showing a smoothed density curve on each side, and also include a box plot inside for comparison.

Input Properties

Property	Requirement	Type	Default	Description
Value Column	✓	String (integer, long, double)	-	Data column for box plot
Quartile Method	✓	linear, inclusive, exclusive	linear
Horizontal Orientation		Boolean	false	Orientation style
Violin Plot		Boolean	false	Check this box to overlay a violin plot on the box plot; otherwise, show only the box plot

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.2 - Continuous Error Bands

Visualize error or uncertainty along a continuous line

Input Properties

Property	Requirement	Type	Default	Description
X Label		String	X Axis	Label used for x axis
Y Label		String	Y Axis	Label used for y axis
Bands	✓	List	-
↳ Y-Axis Upper Bound	✓	String	-	Represents upper bound error of y-values
↳ Y-Axis Lower Bound	✓	String	-	Represents lower bound error of y-values
↳ Fill Color		String	-	Must be a valid CSS color or hex color string
↳ Y Value	✓	String	-	Value for y axis
↳ X Value	✓	String	-	Value for x axis
↳ Line Mode	✓	line, dots, line with dots	line with dots
↳ Line Name		String	-
↳ Line Color		String	-	Must be a valid CSS color or hex color string

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.3 - Empirical Cumulative Distribution Plot

Visualize the empirical cumulative distribution of a numeric column.

Input Properties

Property	Requirement	Type	Default	Description
Value Column	✓	String (integer, long, double)	-	Numeric column used to compute the empirical cumulative distribution
Color Column		String	-	Optional column for coloring ECDF lines by group
Separate By Column		String	-	Optional column for splitting ECDF plots into subplots
Y Axis Mode		String	probability	Display cumulative probability, raw count, or cumulative sum
CDF Mode		String	standard	‘standard’ shows P(X ≤ x), ‘reversed’ shows P(X ≥ x), ‘complementary’ shows 1 - P(X ≤ x)
Orientation		String	vertical	Plot ECDF vertically or horizontally
Show Markers		Boolean	false	Display sample markers on the ECDF line
Marginal Plot		String	none	Optional marginal plot to display alongside the ECDF

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.4 - Histogram

Visualize data in a Histogram Chart

Input Properties

Property	Requirement	Type	Default	Description
Color Column		String	-	Column for differentiating data by its value
SeparateBy Column		String	-	Column for separating histogram chart by its value
Distribution Type		String	-	Distribution type (rug, box, violin)
Pattern		String	-	Add texture to the chart based on an attribute
Value Column	✓	String	-	Column for counting values

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.5 - Histogram2D

Displays a bivariate histogram as a density heatmap

Input Properties

Property	Requirement	Type	Default	Description
X Column	✓	String	-	Numeric column for the X axis bins
Y Column	✓	String	-	Numeric column for the Y axis bins
X Bins	✓	Integer	10	Number of bins along the X axis (Default: 10)
Y Bins	✓	Integer	10	Number of bins along the Y axis (Default: 10)
Normalization		density, probability, percent	density	Type of histogram normalization

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.6 - Scatter Matrix Chart

Visualize datasets in a Scatter Matrix

Input Properties

Property	Requirement	Type	Default	Description
Selected Attributes	✓	List	-	The axes of each scatter plot in the matrix
Color Column	✓	String	-	Column to color points

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.7 - Strip Chart

Visualize distribution of data points as a strip plot

Input Properties

Property	Requirement	Type	Default	Description
X-Axis Column	✓	String	-	Column containing numeric values for the x-axis
Y-Axis Column	✓	String	-	Column containing categorical values for the y-axis
Color By		String	-	Optional - Color points by category
Facet Column		String	-	Optional - Create separate subplots for each category

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.2.8 - Tree Plot

Visualize hierarchical data as a top-down, interactive, auto-sizing tree

Input Properties

Property	Requirement	Type	Default	Description
Edge List Column	✓	String	-	Column with [parent, child] pairs

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3 - Scientific

Operators in the Scientific category

Home > Visualization > Scientific

Operators

Operator	Description
Carpet Plot	Visualize data in a Carpet Plot
Contour Plot	Displays terrain or gradient variations in a Contour Plot
Dendrogram	Visualize data in a Dendrogram
Heatmap	Visualize data in a HeatMap Chart
Network Graph	Visualize data in a network graph
Parallel Coordinates Plot	Visualize multivariate data using parallel coordinate axes
Polar Chart	Displays data points in a polar scatter plot
Quiver Plot	Visualize vector data in a Quiver Plot
Radar Chart	Visualize data in a Radar Chart
Radar Plot	View the result in a radar plot.
Ternary Contour	Shows how a measured value changes across all mixtures of three components that sum to a constant
Ternary Plot	Points are graphed on a Ternary Plot using 3 specified data fields
Volcano Plot	Displays statistical significance versus effect size
Wind Rose Chart	Displays wind distribution using a polar bar chart

Total: 14 operators

5.1.9.3.1 - Carpet Plot

Visualize data in a Carpet Plot

Input Properties

Property	Requirement	Type	Default	Description
First Parameter Axis Column	✓	String	-	Column representing the first parameter axis (a)
Second Parameter Axis Column	✓	String	-	Column representing the second parameter axis (b)
Value Column	✓	String	-	Column representing the value at each (a, b) coordinate

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.2 - Contour Plot

Displays terrain or gradient variations in a Contour Plot

Input Properties

Property	Requirement	Type	Default	Description
Grid Size		String	10	Grid resolution of the final image
Connect Gaps		Boolean	true	Automatically fill in the missing parts
x	✓	String	-	The column name of X-axis
y	✓	String	-	The column name of Y-axis
z	✓	String	-	The column name of color bar
Coloring Method		heatmap, lines, none	heatmap

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.3 - Dendrogram

Visualize data in a Dendrogram

Input Properties

Property	Requirement	Type	Default	Description
Color Threshold		String	-	Value at which separation of clusters will be made
Value X Column	✓	String	-	The x values of points in dendrogram
Value Y Column	✓	String	-	The y value of points in dendrogram
Labels	✓	String	-	The label of points in dendrogram

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.4 - Heatmap

Visualize data in a HeatMap Chart

Input Properties

Property	Requirement	Type	Default	Description
Value X Column	✓	String	-	The values along the x-axis
Value Y Column	✓	String	-	The values along the y-axis
Values	✓	String	-	The values of the heatmap

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.5 - Network Graph

Visualize data in a network graph

Input Properties

Property	Requirement	Type	Default	Description
Source Column	✓	String	-	Source node for edge in graph
Destination Column	✓	String	-	Destination node for edge in graph
Title		String	Network Graph

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.6 - Parallel Coordinates Plot

Visualize multivariate data using parallel coordinate axes

Input Properties

Property	Requirement	Type	Default	Description
Dimensions	✓	List	-	List of numeric columns to visualize as parallel axes (min: 1, At least one dimension is required)
Color Column		String	-	Column used to color or group the lines

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.7 - Polar Chart

Displays data points in a polar scatter plot

Input Properties

Property	Requirement	Type	Default	Description
r	✓	String	-	The column name for radial values (must be numeric)
theta	✓	String	-	The column name for angular values (must be numeric)

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.8 - Quiver Plot

Visualize vector data in a Quiver Plot

Input Properties

Property	Requirement	Type	Default	Description
x	✓	String	-	Column for the x-coordinate of the starting point
y	✓	String	-	Column for the y-coordinate of the starting point
u	✓	String	-	Column for the vector component in the x-direction
v	✓	String	-	Column for the vector component in the y-direction

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.9 - Radar Chart

Visualize data in a Radar Chart

Input Properties

Property	Requirement	Type	Default	Description
Name Column	✓	String	-	Column containing entity names for each radar
Value Columns	✓	List	-	Columns containing numeric values for radar chart axes
Fill Opacity	✓	Double	0.5	Opacity value for radar chart fill from 0.0 (transparent) to 1.0 (opaque)

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.10 - Radar Plot

View the result in a radar plot.

Input Properties

Property	Requirement	Type	Default	Description
Axes	✓	List	-	Numeric columns to use as radar axes
Trace Name Column		String	No Selection	Optional - Select a column to use for naming each radar trace
Trace Color Column		String	No Selection	Optional - Select a column to use for coloring each radar trace (note: if there are too many traces with distinct coloring values, colors may repeat)
Line Pattern	✓	solid, dash, dot	solid	Pattern of the lines connecting points on the radar plot
Max Normalize	✓	Boolean	true	Normalize radar plot values by scaling them relative to the maximum value on their respective axes
Fill Trace	✓	Boolean	true	Fill the area within each radar trace
Show Point Markers	✓	Boolean	true	Display point markers on the radar plot
Show Legend		Boolean	true	Display the legend (note: without the legend, you are unable to selectively hide or show traces in the plot)

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.11 - Ternary Contour

Shows how a measured value changes across all mixtures of three components that sum to a constant

Input Properties

Property	Requirement	Type	Default	Description
Variable 1	✓	String	-	First variable data field
Variable 2	✓	String	-	Second variable data field
Variable 3	✓	String	-	Third variable data field
Measured Value	✓	String	-	Measured value data field

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.12 - Ternary Plot

Points are graphed on a Ternary Plot using 3 specified data fields

Input Properties

Property	Requirement	Type	Default	Description
Variable 1	✓	String	-	First variable data field
Variable 2	✓	String	-	Second variable data field
Variable 3	✓	String	-	Third variable data field
Categorize by Color		Boolean	false	Optionally color points using a data field
Color Data Field		String	-	Specify the data field to color

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.13 - Volcano Plot

Displays statistical significance versus effect size

Input Properties

Property	Requirement	Type	Default	Description
Effect Size (log2 Fold Change)	✓	String	-	Select the column representing the effect size or magnitude of change between two experimental groups. This value is typically a log2 fold change and is used for the x-axis of the volcano plot
P-Value Column	✓	String	-	Select the column representing the p-value associated with the statistical test for each feature. This value is transformed using -log10(p-value) and plotted on the y-axis to indicate statistical significance

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.3.14 - Wind Rose Chart

Displays wind distribution using a polar bar chart

Input Properties

Property	Requirement	Type	Default	Description
Radial Values (r)	✓	String	-	Numeric values representing magnitude (e.g., frequency)
Angular Values (θ)	✓	String	-	Direction or angle categories (e.g., N, NE, E)
Color Group		String	-	Optional grouping column (e.g., wind strength)

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.4 - Financial

Operators in the Financial category

Home > Visualization > Financial

Operators

Operator	Description
Bullet Chart	Visualize data using a Bullet Chart that shows a primary quantitative bar and delta indicator. Optional elements such as qualitative ranges (steps) and a performance threshold are displayed only when provided.
Candlestick Chart	Visualize data in a Candlestick Chart
Funnel Plot	Visualize data in a Funnel Plot
Gauge Chart	Visualize a single value with a radial gauge chart, showing progress towards a goal with optional steps, threshold, and delta.
Waterfall Chart	Visualize data as a waterfall chart

Total: 5 operators

5.1.9.4.1 - Bullet Chart

Visualize data using a Bullet Chart that shows a primary quantitative bar and delta indicator. Optional elements such as qualitative ranges (steps) and a performance threshold are displayed only when provided.

Input Properties

Property	Requirement	Type	Default	Description
Value	✓	String	-	The actual value to display on the bullet chart
Delta Reference	✓	String	-	The reference value for the delta indicator. e.g., 100
Threshold Value		String	-	The performance threshold value. e.g., 100
Steps		List	[]	Optional: Each step includes a start and end value e.g., 0, 100
↳ Start		String	-
↳ End		String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.4.2 - Candlestick Chart

Visualize data in a Candlestick Chart

Input Properties

Property	Requirement	Type	Default	Description
Date Column	✓	String	-	The date of the candlestick
Opening Price Column	✓	String	-	The opening price of the candlestick
Highest Price Column	✓	String	-	The highest price of the candlestick
Lowest Price Column	✓	String	-	The lowest price of the candlestick
Closing Price Column	✓	String	-	The closing price of the candlestick

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.4.3 - Funnel Plot

Visualize data in a Funnel Plot

Input Properties

Property	Requirement	Type	Default	Description
X Column	✓	String	-	Data column for the x-axis
Y Column	✓	String	-	Data column for the y-axis
Color Column		String	-	Column to categorically colorize funnel sections

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.4.4 - Gauge Chart

Visualize a single value with a radial gauge chart, showing progress towards a goal with optional steps, threshold, and delta.

Input Properties

Property	Requirement	Type	Default	Description
Gauge Value	✓	String	-	The primary value displayed on the gauge chart
Delta		String	-	The baseline value used to calculate the delta from the gauge value
Threshold Value		String	-	Defines a boundary or target value shown on the gauge chart
Steps		List	-	List of step ranges for the gauge
↳ Start		String	-
↳ End		String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.4.5 - Waterfall Chart

Visualize data as a waterfall chart

Input Properties

Property	Requirement	Type	Default	Description
X Axis Values	✓	String	-	The column representing categories or stages
Y Axis Values	✓	String	-	The column representing numeric values for each stage

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.5 - Media

Operators in the Media category

Home > Visualization > Media

Operators

Operator	Description
HTML Visualizer	Render the result of HTML content
Image Visualizer	Visualize image content
URL Visualizer	Render the content of URL
Word Cloud	Generate word cloud for texts

Total: 4 operators

5.1.9.5.1 - HTML Visualizer

Render the result of HTML content

Input Properties

Property	Requirement	Type	Default	Description
HTML content	✓	String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.5.2 - Image Visualizer

Visualize image content

Input Properties

Property	Requirement	Type	Default	Description
image content column	✓	String	-	The Binary data of the Image

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.5.3 - URL Visualizer

Render the content of URL

Input Properties

Property	Requirement	Type	Default	Description
URL content	✓	String	-

Output Ports

Port	Mode
0	Single Snapshot

5.1.9.5.4 - Word Cloud

Generate word cloud for texts