Talend DevOps - Continuous Integration

by Robert Griswold and Jason Buys

Introduction

This article will describe how a DevOps approach to data management can be implemented with Talend. The term DevOps is used to describe the union of agile development methodology with operations. The term continuous integration (CI) refers to a development practice that allows developers to integrate code into a shared repository via an on-demand or scheduled basis.  The functional steps of CI are building source code, testing pre-defined unit tests, and eventually deploying the code to an artifact repository. This automated process not only allows for more regular software releases, but also enables the detection of errors introduced by new code, and prohibits that new code from making it to releases.

Below are the environment requirements and Jenkins functional walkthrough of steps that are needed to create this CI process:

To set up the CI environment the following are needed:

  • Jenkins server configured with the JDK, Maven and GIT Plugins
  • A second Talend CommandLine (not already dedicated to TAC)
  • The Talend CI Builder plugin installed in Nexus or local Maven repository
  • Access for GIT/SVN (containing Talend jobs) and Nexus (libraries and built artifacts)

While it may appear daunting, the process outlined below is something that has been implemented with customers very quickly; provided the needed environment is available, and test jobs are created the process is quite simple.

 

The following steps are a walkthrough of the steps of the CI process as it is implemented with Talend jobs.


Trigger for Jenkins CI Project

Step 1 Talend Studio - Code Commit

  • The process is started when a studio user commits code to the GIT/SVN repository.
  • The “code” generated by Talend studio consists of items and properties XML files.
  • The act of committing the code can be one way to trigger the Jenkins workflow

Jenkins Step to Generate and Compile Sources

Step 2 Check Out Sources

  • Jenkins checks out the XML files from the repository
  • Jenkins also checks out any custom Java or Routines from the repository

Step 3 Generate Sources

  • The Talend CommandLine service generates the Java code from the XML files

Step 4 Compile Sources

  • The generated java source code is now compiled as directed by a Maven POM file

Jenkins Step to Run Unit Tests

Step 5 Run Tests

  • This step will run any unit tests created in the Talend studio by the developer
  • These unit tests can be created by right clicking on a component in the studio and providing expected input and output criteria for the selected group of components

Jenkins Job to Package and Publish

Step 6 Package and Publish

  • This step will create a zip file with scripts, contexts, JVM parameters, and java libraries
  • This zip archive will then be published to a Nexus server artifact repository
  • This resulting artifact’s version need not be built again as it can be retrieved from the Nexus repository by whichever environment it is to be deployed in.

Deployment to Job Servers

The Talend lifecycle of CI as it is officially documented requires that an TAC administrator retrieve the jobs from Nexus via the Job conductor, and manually deploy them to be run.  This is the only area in which the automated fluidity of the CI/CD process requires human interaction.  There is, however, a way to use a Talend native function to also automate the deployment.

Continuous Deployment

Step 7 Meta Servlet Deployment

  • The Talend metaservlet can be used to deploy and schedule jobs from Nexus to a Talend job server. The Talend metaservlet is a REST interface used to accomplish many of the same tasks which can be done in the Talend Administration Console (TAC).
  • Rather than writing a script, an actual Talend job can be used to parameterize and build the JSON needed to execute the MetaServlet REST calls.
  • Example of REST Parameters sent to the Talend metaservlet

 

Manual Deploy

Step 7 Manual Deployment in TAC

  • Use the Talend Administration Console’s Job conductor to deploy and schedule the jobs from Nexus to the Talend job servers

Conclusion

Why tackle Continuous Integration?  It can seem complex and the benefits might not be immediately apparent.  After wrestling with the evolving Talend documentation and truly learning what each step in the process does, it becomes a very repeatable process to help others with.   The risks are simply the time spent setting up the Jenkins server and corresponding technologies.  Once CI is setup and test driven development is in place it will probably be difficult to live without. It eliminates a lot of human touch and puts a great deal of rigor in agile DevOps processes.

 

Demo: