How to define and use Context Variables in Talend?

What is a Context? 

Contexts are used to configure a Job for standard environments within the Software Development Life Cycle, such as Development, Testing, and Production.  

A Job always runs within a specific context. When a context is selected in the Run view, the corresponding parameters are applied based on the environment in which the Job runs. 

 

What is a Context Variable? 

A context variable is a user-defined variable specific to a context. Some advantages of using context variables include 

  • Reducing the effort needed to update parameter values across all resources. 
  • Improving consistency. 
  • Enabling reuse across different Jobs. 
  • Automatically adapting to values based on the context in which the Job is run. 
  • Simplifying configuration by allowing you to change context values as needed for different environments. 

 

How to Define Context Variables? 

There are two main methods for defining context variables in Talend:  

→  in the Contexts view (Built-In)  

When the Job is opened in the Designer, the Contexts view displays all variables defined in the current Job, including imported context variables. This view allows you to create and manage contexts and variables specific to a Job. Context variables defined here are referred to as “Built-In” and are available only within the Job in which they were created. 

→  in the Repository (Context Groups)  

Context variables can also be defined in the Repository as context groups under the Contexts section. This method allows for reusability across any Job within the project. 

 

How to access a context variable? 

To access a context variable, use the syntax context.VariableName. 

 

How Do Contexts and Context Variables Work Together? 

The execution context must be specified in the Run view, where the context, variables, and values are displayed. The values are automatically updated when the context changes, though they cannot be modified directly in this view. 

 

Business Scenario 

Imagine you’re a member of the Data Engineering team at your company, currently working on a project for a large retail client with supermarkets in every major city in Romania. Each day, these supermarkets generate product sales data, which they store in Excel files. 

Your task? Build an ETL process that seamlessly transfers this data daily from Excel to the client’s database, enabling the Analytics team to create dashboards and gain valuable insights from the data to drive data-informed decisions. In addition to data transfer, you’ll also need to log key information, such as execution duration, process status, number of rows processed, and other relevant details in a dedicated log table.  

To accomplish this task, you’ll use context variables to configure the necessary components and manage logging requirements. 

 

Talend Approach 

For this SkillPill, we’ll assume a single context, DEV. We’ll create context groups for database connection and logging, and define a Built-In context variable for the file name from which we extract data. 

 

Step 1: Create the Job 

Open Talend using the “Run as administrator” option to ensure you have the appropriate privileges. 

→  In the Repository, right-click on Job Designs and select Create job. 

→  In the New job window, enter an appropriate name for the Job. You may also optionally add a purpose and description. 

→  Click Finish. 

 

Step 2: Create the Context, Context Groups, and the Built-In context variable needed for this task 

→ Context Group for Database Connection.  

The database used in this example is MySQL, with a username and password defined during installation, using a localhost connection. 

In the Repository, right-click on Contexts and select Create context group. 

In the Create/Edit a context group window (Step 1 of 2), enter the name DBConnection. You may also add a purpose and a description if needed.  

In the Create/Edit a context group window (Step 2 of 2), click the “+” icon on the right-hand-side. The Configure Contexts window appears, with a default already displayed. Select it and click Edit. In the Rename Context window, type DEV and click OK. Next, click OK again to confirm.  

Then, click the “+” icon at the bottom of the Create/Edit a context group window (Step 2 of 2) to add the required rows of database credentials as context variables. Be sure to specify the appropriate data type for each variable. Add the following credentials:  

  • Row 1 – Name: Host, Type: String, Value: localhost. 
  • Row 2 – Name: Port, Type: Integer, Value: 3306. 
  • Row 3 – Name: Database, Type: String, Value: training. 
  • Row 4 – Name: Username, Type: String, Value: <your_username>. 
  • Row 5 – Name: Password, Type: Password, Value: <your_pass>.  

 

→ Context Group for Logging Purpose 

A table called job_log has already been created in the training schema within the MySQL database. This table contains the following columns: id, job_name, purpose, schema_name, job_start_dt, job_end_dt, job_duration, job_status, row_count_exported. The id column is defined as a sequence, so we will not explicitly set it as a context variable, as its value auto-increments.  

In the Repository, right-click on Contexts and select Create context group. 

In the Create/Edit a context group window (Step 1 of 2), enter the name Logging. You may also add a purpose and a description if needed. 

In the Create/Edit a context group window (Step 2 of 2), click the “+” icon on the right-hand side. The Configure Contexts window appears, displaying a default context. Select it and click Edit. In the Rename Context window, type DEV and click OK. Next, click OK again to confirm.  

Then, click the “+” icon at the bottom of the Create/Edit a context group window (Step 2 of 2)  to add context variables for logging purposes. Be sure to specify the appropriate data type for each variable. For the Value field, we will add some default values to avoid warnings about strings appearing at runtime. These defaults will be overwritten by the actual values defined during execution. Add the following context variables, matching the columns in the MySQL logging table:  

  • Row 1 – Name: job_start_dt, Type: String, Value: 1900-01-01. 
  • Row 2 – Name: job_end_dt, Type: String, Value: 1900-01-01. 
  • Row 3 – Name: job_name, Type: String, Value: dummy. 
  • Row 4 – Name: purpose, Type: String, Value: dummy. 
  • Row 5 – Name: schema_name, Type: String, Value: dummy. 
  • Row 6 – Name: job_duration, Type: Long, Value: 0. 
  • Row 7 – Name: job_status, Type: String, Value: dummy. 
  • Row 8 – Name: row_count_exported, Type: Integer, Value: 0. 
  • Row 9 – Name: id, Type: Integer, Value: 0.

 

→  Buit-In Context Variable for the File Name 

Open the Job and go to the Contexts View.  

In the Contexts View, click the “+” icon on the right hand-side. The Configure Contexts window appears, displaying a default context. Select it and click Edit. In the Rename Context window, type DEV and click OK. Then, click OK again to confirm.  

Next, click the “+” icon at the bottom of the Contexts View to add a context variable for the file name. Be sure to select the appropriate data type. Add the following context variable:  

Row 1 – Name: filePath, Type: String, Value: C:/Program Files/RPM/Retail_Shop_Data.xlsx 

 

→  Adding the Context Groups to the Job.  

There are three options for adding context groups to the Job: 

  • Drag the context groups onto the Workspace Designer. 
  • Drag the context groups onto the Contexts View. 
  • With the Job open, go to the Contexts View and click the button to the right of the down arrow at the bottom. The Select Context Variables menu appears. Select the context groups you want to add, then click OK.  

 

Step 3: Configuring components in the Job flow 

→ Add a tPreJob component 

The tPreJob component is particularly useful for setting up configurations or executing preliminary tasks on which the main Job may depend. 

→  Add a tJava component  

Link this component to the tPreJob component using an OnComponentOk trigger. 

This component is used to initialize some of the context variables. When initializing a context variable in a tJava component, use the following syntax: context.VariableName=<the_value_to _be_assigned>;. Be sure to add a semicolon at the end of each line of code. Initialize the following variables:  

  • context.job_start_dt=TalendDate.formatDate(“yyyy-MM-dd HH:mm:ss”, TalendDate.getCurrentDate()) 
  • This timestamp format is accepted by MySQL and allows you to store the Job start time in the logging table. It’s typically set at the beginning of the flow.  
  • context.job_name=“<Your_Job_Name>”;  
  • The name should match the one you assigned to your Job and should be enclosed in double-quotes. This can be set at either the beginning or the end of the flow.  
  • context.purpose= “<The_Purpose_of_the_Job>”;  
  • Enter the purpose of the Job, surrounded by double quotes. This can be set at either the beginning or end of the flow.  
  • context.schema_name= “<The_Schema_Name>”;  

Enter the schema name, enclosed in double quotes. This can also be set at either the beginning or end of the flow.  

→ Add a tDBConnection component 

Link this component to the tJava component using an OnComponentOk trigger. 

The tDBConnection component is used to establish a connection to the MySQL Database. Follow these configuration steps:  

       Click on the component. 

        From the Database drop-down menu, select MySQL and verify that the DB version is correct.  

        Configure the context variables for the database connection:  

        Host: context.Host. 

        Port: context.Port. 

        Database: context.Database 

         Username: context.Username. 

         Password. Click on the three dots to the right of the Password field, remove any quotes, and type context. Password, and click OK.   

          Go to the Advanced Settings tab and enable the Auto Commit option. This will ensure that data is automatically committed to the database when the Job finishes.  

→ Add a tFileInputExcel component.  

This component marks the start of the main flow in the Job. It is used to read the daily data for a specific supermarket.  

In the Basic Settings tab, next to File name/Stream, add the Built-In context variable, context.filePath. You can do this by either entering it manually or using the auto-completion feature. In the Sheet list field, click on the “+” sign and enter the name of the sheet you are reading data from, enclosed in double quotes. If the Excel file contains column headers, enter 1 in the Header section. Next, click on the three dots to the right of the Edit schema and add the column names along with their respective data types. Depending on the format of the Excel file, you may need to enable the Read excel2007 file format (xlsx) option. 

→ Add a tDBOutput component 

Link this component to the tFileInputExcel using a Main row trigger. 

The tDBOutput component transfers data from the Excel file into the database table named “Shop 1”. This table was previously created in the MySQL training schema. Make sure to use the existing database connection defined at the beginning by enabling the Use an existing connection option. The Action on the table should remain at its default setting, and the Action on data should be set to insert. Also, ensure that the schema is synchronized with the input. In the Table field, enclose the name of the table in double-quotes. 

→  Logging Logic 

If the data transfer is successful 

Add a tJava component 

Link this component to tFileInputExcel using an OnSubJobOk trigger. This instructs Talend to follow this branch if the data transfer completes successfully.    

Define the following context variables:  

→   context.job_end_dt=TalendDate.formatDate(“yyyy-MM-dd HH:mm:ss”, TalendDate.getCurrentDate()); 

MySQL accepts this timestamp format and allows you to store the Job end time in the logging table. It’s typically set at the end of the flow. 

→   context.job_duration= TalendDate.diffDate(TalendDate.parseDate(“yyyy-MM-dd HH:mm:ss”,context.job_end_dt),TalendDate.parseDate(“yyyy-MM-dd HH:mm:ss”, context.job_start_dt),”ss”); 

This Talend expression calculates the time difference, in seconds, between two timestamps: context.job_end_dt and context.job_start_dt. It first converts these timestamp strings into date objects using the specified “yyyy-MM-dd HH:mm:ss” format, then finds the difference between them using diffDate, with “ss” indicating the result in seconds. This is useful for tracking how long a Job takes to run, aiding in performance monitoring and optimization. 

→   context.row_count_exported= ((Integer)globalMap.get(“tDBOutput_2_NB_LINE”)); 

The system variable of the output component is used to retrieve the total number of records that have been transferred. 

→   context.job_status = “Success”; 

Since we are on the Success branch, we hard-code the status as ‘Success’ just before inserting it into the log table. 

Add a tDBRow component 

Link this component to the tJava one using an OnComponentOk trigger. 

Make sure to enable the Use an existing connection option and select the one defined at the beginning of the flow. In the Query section, include the INSERT statement. 

The INSERT statement is:  

insert into job_log (job_name, purpose, schema_name, job_start_dt, job_end_dt, job_duration, job_status, row_count_exported) 

values( 

‘”+context.job_name +”‘, 

‘”+context.purpose +”‘, 

‘”+context.schema_name +”‘, 

‘”+context.job_start_dt +”‘, 

‘”+context.job_end_dt +”‘, 

‘”+context.job_duration +”‘, 

‘”+context.job_status +”‘, 

‘”+context.row_count_exported +”‘ 

);  

Add a tDBClose connection 

Link this component to the tDBRow one using an OnComponentOk trigger. This way, we ensure that the connection is closed at the end of the flow and the data is committed. 

→  If the data transfer is not successful:

Add a tJava component 

Link this component to tFileInputExcel using an OnSubJobError trigger. This instructs Talend to follow this branch if an error occurs during data transfer.    

Define the same context variables as described above, except for context.job_status, which is defined as follows:  

context.job_status = ‘Failure’; 

Add a tDBRow component.  

Link this component to the tJava using an OnComponentOk trigger. The same configurations and query apply here as presented in the tDBRow component in the successful case.  

Add a tDBClose connection 

Link this component to the tDBRow one using an OnComponentOk trigger. This way, we ensure that the connection is closed at the end of the flow and the data is committed.  

 

Step 3: Run the Job 

Ensure that the settings mentioned in steps 1-3 are applied. 

Run the Job. 

Verify that there are no errors at runtime and that the Job finishes with an Exit code of 0. 

Optionally, check the shop1 and job_log tables in the training schema to confirm that data has been properly inserted.   

 

 

Here is the result in Talend:

Watch the video tutorial here:

About btProvider:

We hold all possible technical and sales certifications for all products:

Talend, Tableau Desktop, Tableau Public, Tableau Server, Tableau Prep, Tableau Data Management, Vertica, Salesforce, Mulesoft, Write-Back.

With Talend – the best data management platform – you can always understand your data better. Using Talend, you will make intelligent and strategic decisions and seamlessly integrate, assure quality, and govern your data effortlessly. See other #skillpill video tutorials here:

How to create a Barcode Chart

How to create a top N Set

Using measure names in parameter actions

Are you interested in learning more about Talend – the catalyst for efficient and comprehensive data management solutions across your company?

Contact us!