Introduction
This guide will show you how to import, extend and implement necessary methods of Tru Task Module, so you will be able to implement a custom task/process and use it on ML Pipeline
Configuration
To start using the TruTask module you first need to import the package
from gjirafatech.truai.task import TruTask, TruInputs, TruOutputs
from gjirafatech.truai.task.path import TruPathANY, TruPathCSV, TruPathJSON, TruPathXML
TruPath
TruPath is a custom data type to define inputs and outputs of a task that will be represented as local file and can be shared across different tasks, it contains a path
property of string type, that contains the absolute path to the respective file.
any_path = TruPathANY()
csv_path = TruPathCSV()
json_path = TruPathJSON()
xml_path = TruPathXML()
#exmaple
with open(csv_path.path, 'r') as csv_file:
for line in csv_file:
print(line)
TruInputs
TruInputs is the base class that should be extended to define the inputs of the task. Supported data types are: string, float, int, Enum, bool, TruPath.
class MyEnum(enum.Enum):
Small = 1
Medium = 2
class MyInputs(TruInputs):
string_input: str
float_input: float
int_input: int
enum_input: MyEnum
bool_input: bool
path_input: TruPathANY
TruOutputs
TruOutputs is the base class that should be extended to define the outputs of the task. Supported data types are: string, float, int, TruPath.
class MyOutputs(TruOutputs):
string_output: str
float_output: float
int_output: int
path_output: TruPathANY
Logging
TruTask provides logging through native logging library.
Logger can be retrieved with 'tru_logger' key.
logger = logging.getLogger('tru_logger')
TruTask
TruTask is the base class that you need to extend and implement run method.
The run method is triggered when execution of the task starts, it will receive an instance of defined TruInputs and should return an instance of defined TruOutputs.
def run(self, inputs: MyInputs):
outputs = MyOutputs(
string_output=inputs.string_input,
float_output=inputs.float_input,
int_output=inputs.int_input,
path_output=inputs.path_input
)
return outputs
Example
from gjirafatech.truai.task import TruTask, TruInputs, TruOutputs
from gjirafatech.truai.task.path import TruPathANY
import enum
import logging
class MyEnum(enum.Enum):
Small = 1
Medium = 2
class MyInputs(TruInputs):
string_input: str
float_input: float
int_input: int
enum_input: MyEnum
bool_input: bool
path_input: TruPathANY
class MyOutputs(TruOutputs):
string_output: str
float_output: float
int_output: int
path_output: TruPathANY
class MyTruTask(TruTask):
def __init__(self):
self.logger = logging.getLogger('tru_logger')
def run(self, inputs: MyInputs):
output_path = TruPathANY()
with open(inputs.path_input.path, 'r') as input_file:
with open(output_path.path, 'a') as output_file:
for line in input_file:
output_file.write(line)
self.logger.info("Content copied successfully")
outputs = MyOutputs(
string_output=inputs.string_input,
float_output=inputs.float_input,
int_output=inputs.int_input,
path_output=output_path
)
return outputs