Skip to main content

Introduction

This guide will show you how to import, extend and implement necessary methods of Tru Task Module, so you will be able to implement a custom task/process and use it on ML Pipeline

Configuration

To start using the TruTask module you first need to import the package

from gjirafatech.truai.task import TruTask, TruInputs, TruOutputs
from gjirafatech.truai.task.path import TruPathANY, TruPathCSV, TruPathJSON, TruPathXML

TruPath

TruPath is a custom data type to define inputs and outputs of a task that will be represented as local file and can be shared across different tasks, it contains a path property of string type, that contains the absolute path to the respective file.

TruPath includes four options
any_path = TruPathANY()
csv_path = TruPathCSV()
json_path = TruPathJSON()
xml_path = TruPathXML()

#exmaple
with open(csv_path.path, 'r') as csv_file:
for line in csv_file:
print(line)

TruInputs

TruInputs is the base class that should be extended to define the inputs of the task. Supported data types are: string, float, int, Enum, bool, TruPath.

Example of extending TruInputs
class MyEnum(enum.Enum):
Small = 1
Medium = 2


class MyInputs(TruInputs):
string_input: str
float_input: float
int_input: int
enum_input: MyEnum
bool_input: bool
path_input: TruPathANY

TruOutputs

TruOutputs is the base class that should be extended to define the outputs of the task. Supported data types are: string, float, int, TruPath.

Example of extending TruOutputs
class MyOutputs(TruOutputs):
string_output: str
float_output: float
int_output: int
path_output: TruPathANY

Logging

TruTask provides logging through native logging library.
Logger can be retrieved with 'tru_logger' key.

logger = logging.getLogger('tru_logger')

TruTask

TruTask is the base class that you need to extend and implement run method.
The run method is triggered when execution of the task starts, it will receive an instance of defined TruInputs and should return an instance of defined TruOutputs.

Example of run method
def run(self, inputs: MyInputs):
outputs = MyOutputs(
string_output=inputs.string_input,
float_output=inputs.float_input,
int_output=inputs.int_input,
path_output=inputs.path_input
)
return outputs

Example

from gjirafatech.truai.task import TruTask, TruInputs, TruOutputs
from gjirafatech.truai.task.path import TruPathANY

import enum
import logging

class MyEnum(enum.Enum):
Small = 1
Medium = 2


class MyInputs(TruInputs):
string_input: str
float_input: float
int_input: int
enum_input: MyEnum
bool_input: bool
path_input: TruPathANY


class MyOutputs(TruOutputs):
string_output: str
float_output: float
int_output: int
path_output: TruPathANY


class MyTruTask(TruTask):
def __init__(self):
self.logger = logging.getLogger('tru_logger')

def run(self, inputs: MyInputs):
output_path = TruPathANY()
with open(inputs.path_input.path, 'r') as input_file:
with open(output_path.path, 'a') as output_file:
for line in input_file:
output_file.write(line)

self.logger.info("Content copied successfully")

outputs = MyOutputs(
string_output=inputs.string_input,
float_output=inputs.float_input,
int_output=inputs.int_input,
path_output=output_path
)
return outputs