Pipeline components¶
Overview¶
Pipelines comprise of nodes that are implemented using components. A component typically only implements one unit-of-work, such as loading data, transforming data, training a model, or deploying a model to serve. The following depicts a basic pipeline in the Visual Pipeline Editor, which utilizes components to load a data file, split the file, truncates the resulting files, and counts the number of records in each file.
The same pipeline could be implemented using a single component that performs all these tasks, but that component might not be as universally re-usable. Consider, for example, that for another project the data resides in a different kind of storage. With fine-granular components you’d only have to replace the load data component with one that supports the other storage type and could retain everything else.
Elyra includes three generic components that allow for the processing of: Jupyter notebooks, Python scripts, and R scripts. These components are called generic because they can be used in all runtime environments that Elyra pipelines currently support: local/JupyterLab, Kubeflow Pipelines, and Apache Airflow.
Note: Refer to the Best practices topic in the User Guide to learn more about special considerations for generic components.
Custom components are commonly only implemented for Kubeflow Pipelines or Apache Airflow, but not both.
There are many example custom components available that you can utilize in pipelines, but you can also create your own. Details on how to create a component can be found in the Kubeflow Pipelines documentation and the Apache Airflow documentation. Do note that in Apache Airflow components are called operators, but for the sake of consistency the Elyra documentation refers to them as components.
Note: Refer to the Requirements and best practices for custom pipeline components topic in the User Guide to learn more about special considerations for custom components.
Example custom components¶
For illustrative purposes the Elyra component registry includes a few custom components that you can use to get started. These example components and the generic components are pre-loaded into the pipeline editor palette by default.
Component details and demo pipelines can be found in the https://github.com/elyra-ai/examples
repository:
Note that example components are provided as is. Unless indicated otherwise they are not maintained by the Elyra community.
Managing pipeline components¶
Components are managed in Elyra using the JupyterLab UI or the Elyra command line interface.
Managing custom components using the JupyterLab UI¶
Custom components can be added, modified, and removed in the Pipeline Components panel.
To access the panel in JupyterLab:
Click the
Open Pipeline Components
button in the pipeline editor toolbar.OR
Select the
Pipeline Components
tab from the JupyterLab sidebar.OR
Open the JupyterLab command palette (
Cmd/Ctrl + Shift + C
) and search forManage Pipeline Components
.
Adding components to the registry¶
To add a component registry entry:
- Click
+
in the Pipeline Components panel. - Define the registry entry. Refer to section Configuration properties for a description of each property.
If the registry entry validates correctly, the associated pipeline components are added to the pipeline editor’s palette.
Modifying a component registry entry¶
- Click the
edit
(pencil) icon next to the entry name. - Modify the registry entry as desired.
Deleting components from the registry¶
To delete a component registry entry and its referenced component(s) from the Visual Pipeline Editor palette:
- Click the
delete
(trash) icon next to the entry name. - Confirm deletion.
Caution: Pipelines that utilize the referenced components are no longer valid after the component registry entry was deleted.
Managing custom components using the Elyra CLI¶
Custom components can be added, modified, and removed using the elyra-metadata command line interface.
To list component registry entries:
$ elyra-metadata list component-registries
Available metadata instances for component-registries (includes invalid):
Schema Instance Resource
------ -------- --------
component-registry elyra-airflow-filename-preconfig .../jupyter/metadata/component-registries/elyra-airflow-filename-preconfig.json
Adding components to the registry¶
To add a component registry entry run elyra-metadata install component-registries
.
$ elyra-metadata install component-registries \
--display_name="filter components" \
--description="filter text in files" \
--runtime=kfp \
--location_type=URL \
--paths="['https://raw.githubusercontent.com/elyra-ai/elyra/master/etc/config/components/kfp/filter_text_using_shell_and_grep.yaml']" \
--categories='["filter content"]'
Refer to section Configuration properties for parameter descriptions.
Modifying a component registry entry¶
To replace a component registry entry run elyra-metadata install component-registries
and specify the --replace
option:
$ elyra-metadata install component-registries \
--name="filter_components" \
--display_name="filter components" \
--description="filter text in files" \
--runtime=kfp \
--location_type=URL \
--paths="['https://raw.githubusercontent.com/elyra-ai/elyra/master/etc/config/components/kfp/filter_text_using_shell_and_grep.yaml']" \
--categories='["file operations"]' \
--replace
Note: You must specify all property values, not only the ones that you want to modify.
Refer to section Configuration properties for parameter descriptions.
Deleting components from the registry¶
To delete a component registry entry and its component definitions:
$ elyra-metadata remove component-registries \
--name="filter_components"
Refer to section Configuration properties for parameter descriptions.
Configuration properties¶
The component registry entry properties are defined as follows. The string in the headings below, which is enclosed in parentheses, denotes the CLI option name.
Name (display_name)¶
A user-friendly name for the registry entry. Note that the registry entry name is not displayed in the palette. This property is required.
Example: data load components
N/A (name)¶
The canonical name for this registry entry. A value is generated from Name
if no value is provided.
Example: data_load_components
Description (description)¶
A description for the registry entry.
Example: Load data from external data sources
Category (categories)¶
In the pipeline editor palette components are grouped into categories to make them more easily accessible. If no category is provided, the components defined by this registry entry are added to the palette under no category
. A limit of 18 characters or fewer is enforced for each category.
Examples (CLI):
['load data from db']
['train model','pytorch']
Runtime (runtime)¶
The runtime environment that supports the component(s). Valid values are the set of configured runtimes that appear in the dropdown (UI) or help-text (CLI). This property is required.
Example:
airflow
Location Type (location_type)¶
The location type identifies the format that the value(s) provided in Paths
represent. Supported types are URL
, Filename
, or Directory
. This property is required.
URL
: The providedPaths
identify web resources. The pipeline editor loads the specified URLs using anonymous HTTPGET
requests.Filename
: The provided absolutePaths
identify files in the file system where JupyterLab/Elyra is running.~
may be used to denote the user’s home directory.Directory
: The provided absolutePaths
must identify existing directories in the file system where JupyterLab/Elyra is running.~
may be used to denote the user’s home directory. The pipeline editor scans the specified directories for component specifications. Scans are not performed recursively.
Paths (paths)¶
A path defines the location from where the pipeline editor loads one or more component specifications. The provided value must be a valid representation of the selected location type. This property is required.
Examples (GUI):
- URL:
https://raw.githubusercontent.com/elyra-ai/elyra/master/etc/config/components/kfp/run_notebook_using_papermill.yaml
- Filename:
/Users/patti/specs/load_data_from_public_source/http_operator.py
- Filename:
~patti/specs/filter_files/row_filter.yaml
- Directory:
/Users/patti/specs/load_from_database
Examples (CLI):
- URL:
['https://raw.githubusercontent.com/elyra-ai/elyra/master/etc/config/components/kfp/run_notebook_using_papermill.yaml']
- Filename:
['/Users/patti/specs/load_data_from_public_source/http_operator.py']
- Filename:
['~patti/specs/filter_files/row_filter.yaml']
- Directory:
['/Users/patti/specs/load_from_database']
Examples multiple components (CLI):
- URL:
['URL1', 'URL2']
- Filename:
['/Users/patti/specs/comp1.yaml','/Users/patti/specs/comp2.yaml']
- Directory:
['/Users/patti/load_specs/','/Users/patti/cleanse_specs/']