by Robert Griswold
At TESCHGlobal, we have been Gold Partners to Talend for over 4 years and have taken part in all types and sizes of Talend implementations. We have developed a strategic and repeatable approach to helping clients convert to Talend from: Informatica, DataStage, SSIS, and Ab Initio.
There are a few patterns that emerge during a data management migration and we exploit these patterns to help clients optimize and refactor their architectural priorities, such as: high availability, resource management, maintainability and scalability.
We can help with strategies to optimize resources both on-premise and in the cloud. We will implement monitoring and management capabilities to enable the provisioning of load and capacities.
In some cases migrations can be fairly straightforward with a one-to-one port approach, but based on discovered pain points and opportunities we push our clients to continuously improve their environments.
Talend at Any Scale
The following patterns can be used when converting from proprietary legacy data integration tools to Talend and open source. The goal in Talend is to create modularized and horizontally scalable jobs that can be run in multiple ways. There are misconceptions on how Talend can handle large loads like the legacy monolithic providers. The patterns below can match and even beat legacy performance and manageability.
- Use YARN to manage and run jobs in Hadoop or Spark at Big Data scale
- Use the Talend scheduler and virtual servers to create a bank of servers with no data tax
- Use Talend data service REST APIs fronted by a load balancer
- Use a competing consumer pattern with queues. This allows the same job to deployed across a bank of resources. Each deployed job will grab the next available context or partition of work from the queue and process it. This patterns allows both load balancing and throttling of work.
Metadata Driven vs Visual design
We can help you to leverage the correct mix of visual design vs the operationalization of metadata.
- Define the correct mix of data driven vs drag and drop design
- Maximize code reuse
- Implement best practices for creating maintainable jobs
- Use metadata driven processes for patterned activities such as data ingestion
What we offer is a Professional Services Full Lifecycle Delivery. It is our goal to take clients from assessment to management and operations.
Migration Life Cycle
- Billing options (Retainer of work hours vs. set of estimated)
- Delivery vs. Knowledge Transfer
Discover & Align
- Discover Business and Technical Drivers and Priorities
- Identify stakeholders
- Define Scope
- Discover high level objectives
- Discover and Align with Enterprise Architectures current and future state
Assess & Refine
- Identify the backlog and align with business & technical priorities
- Estimate resources and timeline for backlog
- Gather requirements and reverse engineer existing legacy jobs
- Develop detailed integration roadmap
- Define integrations approach with guidance from architecture
- Identify patterns and best approaches for the port from legacy systems to Talend
- Identify the preferred run time based on best practices, cost, and enterprise standards
- Job Composition and Modularization
- Runtime Talend Jobs, ELT, MPP, MapReduce, or Spark
- Solution Architecture
- Design Patterns
- Best Practices
- Identify integration requirements
- Business & routing rules
- Design integrations and unit test
- Architectural Oversight
- Project oversight
Operate & Manage
- Continuous Integration
- Communicate technical and operational requirements
- Monitor performance and error logs
- Maintain integrations
Contain & Retire Legacy jobs
- Set the deprecation date and communicate plan to service consumers
- Develop migration plan for migrating consuming applications off of legacy systems
- Place the integrations in containment
- Retire the integration
- The integration lifecycle enables the growth and maturation of integrations post initial release to retirement
- Best practice data management principles, frameworks and methodologies are used throughout the lifecycle
- The lifecycle is applicable to multiple integration patterns and supports the addition of technologies & practices
- Have a targeted and governed approach to where data is staged, processed and stored based on prioritized architectural attributes.
If you are looking to explore the use of Talend please reach out to [email protected]