Want to replace your legacy data management with something more modern, customizable and affordable? Look no further than Talend. It doesn't have to be scary, TESCHGlobal can walk you through an end-to-end conversion from your legacy system.
by Robert Griswold and Jason Buys
This article will describe how a DevOps approach to data management can be implemented with Talend. The term DevOps is used to describe the union of agile development methodology with operations. The term continuous integration (CI) refers to a development practice that allows developers to integrate code into a shared repository via an on-demand or scheduled basis. The functional steps of CI are building source code, testing pre-defined unit tests, and eventually deploying the code to an artifact repository. This automated process not only allows for more regular software releases, but also enables the detection of errors introduced by new code, and prohibits that new code from making it to releases.
Below are the environment requirements and Jenkins functional walkthrough of steps that are needed to create this CI process:
To set up the CI environment the following are needed:
- Jenkins server configured with the JDK, Maven and GIT Plugins
- A second Talend CommandLine (not already dedicated to TAC)
- The Talend CI Builder plugin installed in Nexus or local Maven repository
- Access for GIT/SVN (containing Talend jobs) and Nexus (libraries and built artifacts)
While it may appear daunting, the process outlined below is something that has been implemented with customers very quickly; provided the needed environment is available, and test jobs are created the process is quite simple.
The following steps are a walkthrough of the steps of the CI process as it is implemented with Talend jobs.
Trigger for Jenkins CI Project
Step 1 Talend Studio - Code Commit
- The process is started when a studio user commits code to the GIT/SVN repository.
- The “code” generated by Talend studio consists of items and properties XML files.
- The act of committing the code can be one way to trigger the Jenkins workflow
Jenkins Step to Generate and Compile Sources
Step 2 Check Out Sources
- Jenkins checks out the XML files from the repository
- Jenkins also checks out any custom Java or Routines from the repository
Step 3 Generate Sources
- The Talend CommandLine service generates the Java code from the XML files
Step 4 Compile Sources
- The generated java source code is now compiled as directed by a Maven POM file
Jenkins Step to Run Unit Tests
Step 5 Run Tests
- This step will run any unit tests created in the Talend studio by the developer
- These unit tests can be created by right clicking on a component in the studio and providing expected input and output criteria for the selected group of components
Jenkins Job to Package and Publish
Step 6 Package and Publish
- This step will create a zip file with scripts, contexts, JVM parameters, and java libraries
- This zip archive will then be published to a Nexus server artifact repository
- This resulting artifact’s version need not be built again as it can be retrieved from the Nexus repository by whichever environment it is to be deployed in.
Deployment to Job Servers
The Talend lifecycle of CI as it is officially documented requires that an TAC administrator retrieve the jobs from Nexus via the Job conductor, and manually deploy them to be run. This is the only area in which the automated fluidity of the CI/CD process requires human interaction. There is, however, a way to use a Talend native function to also automate the deployment.
Step 7 Meta Servlet Deployment
- The Talend metaservlet can be used to deploy and schedule jobs from Nexus to a Talend job server. The Talend metaservlet is a REST interface used to accomplish many of the same tasks which can be done in the Talend Administration Console (TAC).
- Rather than writing a script, an actual Talend job can be used to parameterize and build the JSON needed to execute the MetaServlet REST calls.
- Example of REST Parameters sent to the Talend metaservlet
Step 7 Manual Deployment in TAC
- Use the Talend Administration Console’s Job conductor to deploy and schedule the jobs from Nexus to the Talend job servers
Why tackle Continuous Integration? It can seem complex and the benefits might not be immediately apparent. After wrestling with the evolving Talend documentation and truly learning what each step in the process does, it becomes a very repeatable process to help others with. The risks are simply the time spent setting up the Jenkins server and corresponding technologies. Once CI is setup and test driven development is in place it will probably be difficult to live without. It eliminates a lot of human touch and puts a great deal of rigor in agile DevOps processes.
IoT is the hot new acronym in the tech space. For the uninitiated: IoT stands for “Internet of Things” and it refers to the interconnection of smart devices via embedded technology. Two things are made possible: remote control of the device and the exchange/collection of data to/from the device. A simple execution of this concept is the ability to monitor and control your home thermostat from anywhere. When you take this concept into the healthcare space an evolution occurs (I promised myself I wouldn’t say disruption.)
Electronic Health Records (EHR) changed the game in the early 2000s. It centralized, personalized and automated data regarding a patient's health. Standards around EHRs made portals and system integration possible. IoT is quickly becoming the next step in the evolution of healthcare. Instead of patient data being recorded and hand-jammed by overworked healthcare professionals it can be automatically collected via smart devices FROM ANYWHERE. Think of a more sophisticated FitBit monitoring the vitals of a guy who just had triple bypass surgery, and maybe he’s on a new medication. Instead of him commuting back and forth to a hospital to collect SNAPSHOT information; he can go about his life and the smart device is constantly monitoring/reporting vitals to the healthcare system. His doctor/provider can then get notified if his vitals are outside of acceptable ranges.
So how does TESCHGlobal fit in with IoT?
As a team of software engineers and developers we dive in head-first into new technologies, and for the better part of 2 years we have been architecting and building IoT solutions! Through our projects, research, and partnerships we’ve been paddling hard to catch this wave. Here is an overview approach to IoT via a reference architecture:
What has TESCHGlobal done with IoT?
With this emerging technology, innovators have been cooking up devices with IoT capabilities. We have been building a solution with one such innovator. QuiO invented a smart injector and needed help building their cloud platform: from the user interface to data management to advanced machine learning. We built a solution that leverages MongoDB, Amazon Cloud, Spark, MQTT, and Karaf. Traditional or Big Data - Cloud or On Premises; our team is ready to execute.
If you are looking to dive in and catch the wave of IoT or just want to know more: reach out to us!
This article will address leveraging best practices throughout the data management life cycle. We will take into account: the sources, processing, storage capabilities, constraints and end-user needs.
The opportunities and complexities presented by today’s information life cycle presents us with options and decisions that deserve consideration. We will be breaking down these decisions and opportunities into a common batch life cycle pattern within the context of a reference architecture.
Modernized Data Management Milestones
- What new data sources are available and what is their data management life cycle: Web activity logs, Social Media, System Logs, Sensor data
- Is there a way to process legacy sources that align more accurately with SLAs, technical drivers and business objectives? Cost, Performance, Maintenance windows
Staged Data (Raw Data)
- How to ingest raw data in an efficient and accurate manner
- Accomplish ingestion with the least amount of human touch
- Meta-data governance replaces design drag-and-drop
- Reuse of ingestion jobs
- Operationalization of meta-data
- Where to store
- Which design to use Kimball, Inmon, or Lindstedt
- Where to process
- Where to store
- What types of business rules are applied within the warehouse
- Hard Business Rules - Type changes, normalizing, denormalizing, tabularized hierarchical, or structuring
- Soft Business Rules - Changes meaning, content or granularity
Information - Data Marts and OLAP with “Soft Business Rules” (transformations, aggregations, granularity changes and data quality) applied
- Again: When and where are business rule applied?
- Format OLAP, or Star Schema
- Operationalized feeds back to sources
- In Place Analytics
Step 1: Ingest Source into Stage
Data sources can come from traditional means of files, applications, databases, media and transactions. The value recognition of new unstructured data sources can be added to data management life cycle. These new log, web, media and textual sources can be landed on low cost storage such as HDFS, Google Storage or Amazon’s S3. Ultimately this raw unstructured data can be indexed and linked to EDW(Enterprise Data Warehouse) information. Beyond the warehouse this data also can be used for operations or to build Data Marts and OLAP tables for decisioning and analytics.
Modern Data Sources
- Relational databases that house application, transaction and batch data
- Document Management
- Emerging Sources
- Web and Social Media
- Server Logs
- Sensor and Machine Data - IoT(Internet of Things)
- Audio Transcriptions
- Facial recognition
TESCHGlobal Value Proposition:
- New sources
- Access methods
- Ingestion strategies
- Leverage low cost, highly performant storage options
- Hadoop, Google, Amazon and Microsoft low cost high density storage for data storage.
- Cloud warehousing offerings such as Redshift, Snowflake and Greenplum
Data Ingestion is the landing of raw data into a database, or a high density file system such as HDFS, S3 or Google Storage. This data can be subsequently loaded into your data warehouse. Staged data will mimic source data in schema (data typing may be relaxed e.g., everything is a string) and be referred to as raw data. The goal is to make this process as low human touch as possible since not a lot of value is added at this step. Tools such as Spark Import, Bulk Utilities, Talend Dynamic Schemas (Non Big Data), and Talend Java APIs that use metadata for Schemas and Maps. The use of static vs dynamic schemas and jobs are driven by the volume of changes to existing schemas, and the addition of new sources added. Industries and sectors such as government, healthcare, manufacturing and retailing that have a large volume of ingestion sources will typically opt for metadata driven dynamic ingestion jobs as opposed to drag-and-drop design time static jobs; which handle finite schemas.
TESCHGlobal Value Proposition:
- Identify what tools are available in your enterprise for ingestion
- Leverage Talend out of the box capabilities when possible
- Non Big Data Dynamic Schemas
- Spark Import and other Bulk Load Components
- Extend Talend capabilities by making them metadata driven by the use of a small amount of Java code and metadata sources
- Use the proper tools and processing environment for ingestion based on requirements, constraints, best practices and enterprise architecture compliance.
Staging Raw Data
The main purpose of landing data in the staging area, as opposed to a transactional or operational store, is to reduce impact on the aforementioned systems. The staging area can be a relational, MPP, NoSQL or Hadoop database. The data in the staging table should generally be in the format of the source system. The staging area is used to build the data warehouse or data vault.
Some of the possible data generated for the staging table include:
- Load Date
- Record Source
- Sequence Number - Only for order purposes not a key
- Hash Keys - Generated from business keys
- Checksums - Used to determine if a change has taken place in descriptive attributes of satellite rows
TESCHGlobal Value Proposition:
- Design staging jobs that can be converted between technologies with minimal work
- Leverage job designs that utilize the correct storage strategy that will meet enterprise architecture guidelines for maintainability and scalability.
Loading the EDW from Staged Data
Processing Stage Data
After data is staged it can be processed and loaded into the data warehouse. The data in the staging area can leverage direct processing power in the way of ELT using SQL and APIs; or by using ETL data management runtimes. The decision on where to process staging data is typically driven by the data management requirements. You have the choice to use the database ELT, Hadoop/Spark ELT or data management ETL runtime servers. The decision on which to leverage for processing power is typically based on cost, capacities, performance and enterprise architecture direction. The amount of processing that takes place in the loading of the data warehouse depends on the design pattern used. If you are using a relational warehouse design such as Bill Inmon or Star Schema (which Ralph Kimball has suggested in the past) you will be doing transformations that differ from the source system format and data prior to loading the warehouse. If you are using a Data Vault design strategy the EDW consists of loading satellites, hubs and link tables. The hub & satellites of a data vault resemble the source system formats and have far less transformations and only hard business rules are applied.
TESCHGlobal Value Proposition:
- Leverage ELT components in both relational and Big Data technologies
- Help with the proper job composition for vertical and horizontal scalability; specifically for jobs targeted at Hadoop/Spark and data management runtimes
- Help with complex transformations using metadata APIs and industry specifications; such as HL7, X12 EDI and data mappers
- Determine best practice processing patterns that meet your acceptance criteria
- We recommend and enable Data Vault architecture for improved: extensibility, scalability and maintainability.
Provide Actionable Information
Traditional data warehouse
If you are using a traditional data warehouse strategy a majority of your business rules hard and soft have already been applied. Changes to information requirement sources could impact the gathering of information and vise versa. This adds a level of complexity and coupling which could increase maintenance costs and reduce agility.
If you are using a data vault strategy you have raw high fidelity data. Most of your heavy lifting (soft business rules) gets applied in the processing actionable information stage. I like to use the term “it is what it is, now what can we do with it.” Throughout time your abilities, requirements and understanding of data will increase. Why not start with the raw materials? It’s kind of like whole wheat or macrobiotic diets. Keeping it real and less processed with organic quality measures that don't impact the truth or wholeness of information.
TESCHGlobal Value Proposition: A lot of the steps are the same as transforming and loading your data warehouse but it’s dependent on your strategy for the amount in each phase of the life cycle
- Again: How to leverage ELT components in both relational and Big Data technologies
- Again: Help with the proper job composition for vertical and horizontal scalability for jobs targeted at Hadoop/Spark and data management runtimes
- Again: Help with complex transformations using metadata APIs, industry specifications such as HL7 and X12 EDI and data mappers
- Again: Determine best practice processing patterns that meet your acceptance criteria
Whatever your data mission statement is “One Click”, “360 View”, “Right Data at the Right place and Time” etc…. there is no clear right or wrong answer. Let TESCHGlobal help you get the most value and insight from your data, resources and capabilities. We can help with full life cycle using industry-top data scientist.
Modern Data Management Series
HealthLX, Inc. Submits Entry To The HL7 C-CDA Rendering Tool Challenge
Nimble healthcare data integration solutions firm shows off skills by creating a C-CDA viewer prototype.
Grafton, WI - June 2016: In an effort to exhibit their healthcare data management proficiencies, the talented team at HealthLX accepted and submitted their entry in the Health Level Seven (HL7) and Office of the National Coordinator for Health Information Technology (ONC) rendering tool challenge on May 31, 2016.
The challenge was created to inspire the development of HL7 tools, specifically the development of a Consolidated Clinical Document Architecture (C-CDA) rendering tool that makes the data exchange between healthcare providers and patients humanly legible.
According to HL7, the industry is calling for enhanced human readability and relevance as clinicians are frustrated with the usability of current C-CDA documents. Today, an overabundance of data is displayed by EHR systems and providers struggle to page and sort through all of the data to find the essential and relevant clinical information they need to make decisions. Development of an easy-to-use viewer with a modern user-interface provides clinicians the ability to save time and insure they have relevant patient information to make correct decisions.
The team at HealthLX aspired to make the tool easy-to-use and flexible to give a broad range of clinical users the ability to view relevant patient information quickly and clearly based on their specialty; all in a slick and compatible user-interface.
Will Tesch, founder of of HealthLX, praises his team’s ability to quickly step-up to the challenge and deliver a flexible, well-designed solution that elegantly delivers the desired usability and user experience demanded by this challenge.
Will added, “C-CDA is a foundational document for patients care history that once digitally available and shareable with doctors or care managers can play a significant role in ensuring decisions are made quickly and accurately by these providers. This benefits everyone; consumers, providers, and payers, and contributes to the “Triple Aim” of healthcare. We look forward to continually displaying our ability to rapidly design and develop healthcare data interoperability to the healthcare community in every way we can.”
To take a look at HealthLX’s C-CDA rendering tool submission click HERE .
About Health Level Seven
Founded in 1987, Health Level Seven International (HL7) is a not-for-profit, ANSI-accredited standards developing organization dedicated to providing a comprehensive framework and related standards for the exchange, integration, sharing, and retrieval of electronic health information that supports clinical practice and the management, delivery and evaluation of health services. Learn more at www.hl7.org
About HealthLX, Inc.
HealthLX, offers a full integration solution set designed for Healthcare software solution firms to bridge the gap of interoperability and innovation, necessary to enable them to quickly implement their solutions in diverse environments. HealthLX’s solution-set includes open source integration software that creates and manages connections and data-flows, integration consulting and connectivity services, and solution management. For Healthcare software solution firms that want to make integration a differentiator, visit www.healthlx.com
TESCHGLOBAL FORMS STRATEGIC PARTNERSHIP WITH FOUNDRY HEALTH
Exceeding the Demands of Healthcare Innovation
Milwaukee, WI, May 29, 2015 (PRWeb) -- TESCHGlobal LLC, a global technology professional services firm, has recently established a strategic partnership agreement with Foundry Health LLC, a digital health company that seeks to empower health professionals by building software tools that put the patient back at the center of each encounter. This strategic partnership will assist both TESCHGlobal and Foundry Health in the growth and development of healthcare innovation in the U.S., while also supporting the global growth initiatives of both firms.
"As Foundry Health continues to expand its ClinSpark™ software offering to an ever expanding area of the world, we are delighted to be partnering with TESCHGlobal. The TESCHGlobal team will allow us to scale up to meet the growing demands for the ClinSpark™ system," stated Brock Heinz, CEO at Foundry Health. “We are excited about our new partnership with Foundry Health,” says Will Tesch, President and CEO of TESCHGlobal. "Foundry Health has an innovative team and is led by a true engineering genius. Our support of Foundry Health and their implementation and support needs to scale is aligned closely with our past success with other Healthcare organizations like Foundry Health. This partnership will enable ClinSpark™ to meet its growing presence globally while also focusing on their product development.”
About Foundry Health
Foundry Health is a digital health company focused on building software tools that deliver creative strategic solutions. As the creators of ClinSpark™, the world’s first CDISC ODM certified phase 1 eSource system, Foundry Health is driven to empower health professionals and build successful custom solutions that exceed customer needs. For more information, visit www.foundryhealth.com.