ETL Tech Blog: November 2018

Monday, November 26, 2018

Collect Statistics on Oracle Repository Tables

Collect Statistics on Oracle Repository Tables

Collect statistics on your Power Center repository databases this will help to improve performance of fetches and writes of repository data.

Follow below steps.

Execute below queries:

select 'analyze table ', table_name, ' compute statistics;' from user_tables where table_name like 'OPB_%'

select 'analyze index ', INDEX_NAME, ' compute statistics;' from user_indexes where INDEX_NAME like 'OPB_%'

Copy result to text file and remove all headers from file (if any) and execute.

Sample outputs:

'ANALYZETABLE' TABLE_NAME 'COMPUTESTATISTICS;'

-------------- ------------------------------ --------------------

analyze table OPB_ANALYZE_DEP compute statistics;

analyze table OPB_ATTR compute statistics;

analyze table OPB_BATCH_OBJECT compute statistics;

'ANALYZEINDEX' INDEX_NAME 'COMPUTESTATISTICS;'

-------------- ------------------------------ --------------------

analyze index OPB_DBD_IDX compute statistics;

analyze index OPB_DIM_LEVEL compute statistics;

analyze index OPB_EXPR_IDX compute statistics;

Note: Make sure to take backups and proceed with steps.

How to Identify Throughput Issues

How to Identify Throughput Issues

Ideally, how do you determine if your performance issues is due to read or write throughput of your session? Below are few typical issues that result in poor performance (low throughput values)

Target Commit Point is too small

The default target commit point of 10000 is used when processing large data volumes. This results in commits every 10000 rows even if the mapping is processing millions or hundreds of millions of rows. Setting a commit point of 100,000 or even 1 million rows is not unreasonable as data volumes grow and grow. There is a point where roll-back and commit points should be balanced, but if processing 40 million rows, a commit point of under 500,000 is probably too small.

Auto Memory settings are too small

Adding memory to the session is the single item with the biggest impact if you see throughput values in the Workflow Monitor (or in the query results) that are relatively small in relation to the number of rows being processed. Testing is needed to find the sweet spot of adding more memory and not adding memory when it no longer helps, and also adding memory while keeping in mind the resource requirements of other jobs running concurrently. Usually adjusting to 1 GB or 2 GB of Auto Memory can make a difference without being out of line. Be sure to balance the Maximum value for the suto memory with the Maximum Percentage of available memory since it is the smaller of the two that is used.

Multi-pass Sorts

Review the session log for messages about multi-pass sorts. This is an indication that more memory is required. The sorter transformations default to Auto for the sorter cache size, and in the past defaulted to 200 for this value. See the message in the log and set the Auto Memory or explicitly set the sorter cache to a larger value so that the sort can be performed in 1 or 2 passes. Note that the Auto Memory setting for sorters and other transformations that cache data is a shared value across each transformation. You may need to adjust the Auto Memory setting to make it an appropriate value.

Not Sorting before Aggregators and Joiners

Review the session log for messages about nested-loop algorithms for Aggregators and Joiners. These are indications that you did not sort before them, or you did not mark the transformation as having sorted input. Sorting does take some cycles, but the sort-merge algorithm is much faster and makes the other cache files much smaller, and so is worth the effort to improve performance.

Complex SQL Overrides

Using SQL Overrides pushes the logic to the database when initially querying the data or in lookups, etc. This is not a best practice for a number of reasons:

a. It hides the metadata of the mapping

b. It hides transformation logic being done on the data

c. It does not allow the Informatica server to take some of the load of the transformation of data

It is a good practice to take the SQL that will be run against the database from the session log and to run an explain plan and tune it appropriately in the database by adding the right indexes, etc., that assist in the query performance.

Using a Lookup instead of a Join

Both Lookups and Joins can be used to reference data in another table based on data in the main source data and specified logic. Lookup results are cached either in memory, if they are small enough, or to a file on disk. If the results are large, then it does not make sense to use the lookup and is likely more efficient to use a Join instead. This can be a join performed in a Source Qualifier if it is coming from the same instance as the source table, or with a Joiner transformation if they are disparate database sources.

It was observed that there are mappings that were caching lookup results for 40 million or more rows. This is a clear indication that a joiner should have been used instead. Generally, for any table with more than a million rows it is better to use a Join instead of a Lookup. The items marked in the session log below are lookups that should become joins instead because of the number of rows returned.

Unsorted Input in a Lookup

The session log might indicate messages with error code TT_11195 regarding unsorted input for a lookup transformation.

This occurs because the SQL Override eliminated the sort, the the ports in the lookup transformation are not the same datatype and exact precision as the definition in the database, or there is binary data as part of the lookup results and the sort order is ascii and not binary in the mapping.

Some of the items with poor throughput were flagged and we reviewed those mappings, sessions, and session logs to look at Auto Memory Settings, DTM buffer Size settings, session log erros and bottlenecks, logic in the mapping regarding SQL Overrides, Lookup size, sorting before Aggregator and Joins, etc. The candidates for this review were selected because, the execution time was longer than 1 hour, the number of rows per second of throughput was less than 10000 while the number of rows processed was greater than 10000. This was a starting point to try to find some of the worst jobs and review them. We did not get input from the development or production support teams to guide our efforts.

List "WRITE" Throughput of the session

SELECT l.subject_area,

workflow_name,

l.session_name,

l.table_name,

l.successful_rows,

l.successful_affected_rows,

l.throughput AS write_thruput,

l.start_time,

l.end_time,

ROUND ( (l.end_time - l.START_time) * 24 * 60 * 60) Run_Time

FROM rep_sess_tbl_log l, rep_workflows w

WHERE l.start_time > TO_DATE ('20181101', 'YYYYMMDD')

AND last_error_code IN (0, 7004) -- last one without an error

AND (l.start_time, l.session_name) IN

( SELECT MAX (start_time), -- latest run date - uncomment to get all

session_name

FROM rep_sess_tbl_log

GROUP BY session_name)

AND l.successful_rows > 0

AND l.workflow_id = w.workflow_id

ORDER BY l.subject_area,

w.workflow_name,

l.session_name,

l.start_time

List "READ" Throughput of the session

SELECT t.task_name AS workflow_name,

ti.instance_name AS session_name,

swl.instance_name AS table_name,

swl.applied_rows,

swl.affected_rows,

swl.thruput AS read_thruput,

swl.start_time,

swl.end_time,

ROUND ( (swl.end_time - swl.START_time) * 24 * 60 * 60) Run_Time

FROM opb_swidginst_log swl, opb_task_inst ti, opb_task t

WHERE swl.last_err_code IN (0, 7004)

AND (swl.start_time, swl.instance_name) IN

( SELECT MAX (start_time), instance_name

FROM opb_swidginst_log

GROUP BY instance_name)

AND swl.session_id = ti.task_id

AND TI.WORKFLOW_ID = T.TASK_ID

AND (applied_rows > 0 OR affected_rows > 0)

GROUP BY t.task_name,

ti.instance_name,

swl.instance_name,

swl.applied_rows,

swl.affected_rows,

swl.thruput,

swl.start_time,

swl.end_time,

ROUND ( (swl.end_time - swl.START_time) * 24 * 60 * 60)

ORDER BY t.task_name,

ti.instance_name,

swl.instance_name,

swl.start_time

Average Run Time and Throughput

SELECT l.subject_area,

w.workflow_name,

l.session_name,

l.table_name,

AVG (l.throughput) AS write_thruput,

AVG (ROUND ( (l.end_time - l.START_time) * 24 * 60 * 60)) Run_Time,

AVG (l.successful_rows)

AS avg_success_rows,

AVG (l.successful_affected_rows)

AS avg_affected_rows

FROM rep_sess_tbl_log l, rep_workflows w

WHERE l.start_time > TO_DATE ('20180210', 'YYYYMMDD')

--and subject_area='FOLDERName'

AND l.workflow_id = w.workflow_id

GROUP BY l.subject_area,

w.workflow_name,

l.session_name,

l.table_name

ORDER BY l.subject_area,

w.workflow_name,

l.session_name,

l.table_name

DTM Buffer Size

DTM Buffer Size

Typically, the DTM Buffer Size is set to Auto, and the Auto Memory Settings are used to determine the value here. Occasionally it is useful to set this higher, but never lower than the Auto Memory value of 512 MB. In older version of PowerCenter, or for sessions that are auto-generated, the value is typically set to 24000000, or 24 MB. Obviously, this is much smaller than the default of 512 MB. Values that are less than 512 MB should be set to Auto to use the default value for the Auto memory settings.

The DTM Buffer has three sections, one for Read, one for Write, and one for Transform. Rows that are being processed get loaded into the DTM buffer. The three sections are not each the same size. That is adjusted at execution time. See the explanation of this later in the document to understand how increasing the DTM Buffer Size can allow more rows to process at once and to improve performance.

The number of values where the settings were manually set to something other than Auto are too numerous to display here and instead are included in an appendix. Many are set too low, or set manually to 512 MB.

Below are repository queries to list values on all objects in a repository.

SELECT

B.SUBJECT_AREA,

B.TASK_NAME AS SESS_NAME,

A.ATTR_VALUE AS DTM_BUFFER_SIZE

FROM

OPB_TASK_ATTR A ,

REP_ALL_TASKS B

WHERE

A.ATTR_ID IN (101)

AND A.TASK_ID = B.TASK_ID

AND TASK_TYPE_NAME IN ('Session')

ORDER BY 1,2 ASC

SELECT *

FROM (SELECT Z.SUBJ_NAME,

A.TASK_NAME WORKFLOW_OR_WORKLET_NAME,

B.INSTANCE_NAME SESSION_NAME,

C.ATTR_VALUE VALUE

FROM OPB_TASK A,

OPB_TASK_INST B,

OPB_TASK_ATTR C,

OPB_SUBJECT Z

WHERE Z.SUBJ_ID = A.SUBJECT_ID

AND A.TASK_TYPE IN (70, 71)

AND A.TASK_ID = B.WORKFLOW_ID

AND B.TASK_TYPE = 68

AND B.INSTANCE_ID = C.INSTANCE_ID

AND C.TASK_TYPE = 68

AND B.TASK_ID = C.TASK_ID

AND ATTR_ID = 101

UNION

SELECT Z.SUBJ_NAME,

A.TASK_NAME WORKFLOW_OR_WORKLET_NAME,

B.INSTANCE_NAME SESSION_NAME,

C.ATTR_VALUE VALUE

FROM OPB_TASK A,

OPB_TASK_INST B,

OPB_TASK_ATTR C,

OPB_SUBJECT Z

WHERE Z.SUBJ_ID = A.SUBJECT_ID

AND A.TASK_TYPE IN (70, 71)

AND A.TASK_ID = B.WORKFLOW_ID

AND B.TASK_TYPE = 68

AND C.INSTANCE_ID = 0

AND C.TASK_TYPE = 68

AND B.TASK_ID = C.TASK_ID

AND ATTR_ID = 101

AND B.INSTANCE_ID NOT IN

(SELECT C.INSTANCE_ID

FROM OPB_TASK A,

OPB_TASK_INST B,

OPB_TASK_ATTR C,

OPB_SUBJECT Z

WHERE Z.SUBJ_ID = A.SUBJECT_ID

AND A.TASK_TYPE IN (70, 71)

AND A.TASK_ID = B.WORKFLOW_ID

AND B.TASK_TYPE = 68

AND B.INSTANCE_ID = C.INSTANCE_ID

AND C.TASK_TYPE = 68

AND B.TASK_ID = C.TASK_ID

AND ATTR_ID = 101)) DUAL

Auto Memory Settings

Auto Memory Settings in Session Level

The session have settings for Maximum Memory Allowed for Auto Memory Attributes and Maximum Percentage Allowed for Auto Memory Attributes. Whichever is smaller is the amount used for items like the DTM buffer size, lookup cache, and other settings. The default is 512MB and 5% of the available server memory, whichever is smaller.

Tuning these values can make a huge impact on performance of the session, with a very small change that needs little testing and does not create mapping logic changes that require more extensive testing.

Testing Auto Memory settings requires knowing the amount of available memory, the number of concurrent jobs that might be running along with the session in question, and some adjustments and sample runs to find out if adding memory helps or not. The usual method for tuning is the increase the setting until such time as more memory has not impact and then scale back. Set the memory settings within reason, because the job is not likely to always run without any other system jobs, and there must be enough memory to support operating system processes and any other applications that are running, as well as the Informatica jobs.

The settings are set in the Session Configuration in the Config Object Tab. It affects all of the items that use “Auto” as the memory value, including DTM Size, DTM Buffer Block Size, cache index and data sizes for Lookups, Sorters, Aggregators, etc. Sometimes those specific settings should be set to values independent of Auto Memory values, but typically only if it will make the value larger than the Auto Memory setting, not if it will be smaller.

Informatica NFS File Systems

Informatica NFS File Systems

Hello All,

As Informatica administrator, recommend admins to know information on file systems (Shared) on Informatica Servers, most of the jobs read/write on share shared file systems and this will cause Latency in jobs and performance is degraded.

Below are few inputs on NFS from Informatica, please make sure to discuss with your Storage or Linux Admins for any latency issues.

• NFSv2 has several limitations in terms of locking, reliability and in ability to access 64-bit file offsets. Also, industry has mostly migrated to NFSv3 and v4.

• NFSv3 has improved file locking support over v2 but remains a stateless protocol dependent on a separate lock manager – this can cause issues with recovery scenarios. Primary concern is with single point of failure in the NFS server. Based on discussions with Netapp & EMC this is resolved in most NAS appliances by integrating the NFS server support into their proprietary file system. Open concern still exists for standalone NFS server processes. Currently supported with Netapp WAFL NAS and EMC Celerra NAS with some caveats. Please see supported shared file system section.

• NFSv4’s new design seems have resolved lingering issues with NFSv3’s locking and recovery and is the preferred option for NFS-driven solutions.

• Critical - maintaining a write lock on the master pmserver lck file used to arbitrate split brain

This indicates a possible latency issue in using the NFSv4 file system between the time the session creating the dynamic mapping completes and the one using it starts and needs to read it. A solution would be to use the Timer task to insert a delay between the sessions to allow the file system to complete before the second session tried to read the file.

Please refer to below Informatica Link for more updates.

https://kb.informatica.com/faq/4/Documents/Informatica%20Support%20Statement%20for%20PowerCenter%20HA%20file%20system%20from%209.6%20version.pdf

Sunday, November 4, 2018

Edifecs 9.0 Installation Steps on Linux Server.

Edifecs 9.0 Installation Steps on Linux Server.

1. XEngine Installation Steps.

[user_efx@DEV2003 XE1]$./XEngineInstaller-linux-ix86_64
Checking distribution... [ OK ]
Checking OS compatibility... [ OK ]
Checking platform compatibility... [ OK ]
Inspecting the system...
Destination is /opt/Edifecs/9.0
EDIFECS root is /opt/Edifecs/9.0
Uninstalling older versions of XEngine... NOT FOUND
Performing backup:
Extracting files... [ OK ]
Installing files... [ OK ]
Setting environment variables:
Updating [.profile]... skipped (No such file or directory)
Updating [.bash_profile]... [ OK ]
Updating [.login]... skipped (No such file or directory)
Preparing configuration files... [ OK ]
Registering product... [ OK ]
Cleaning up... [ OK ]

Installation of XEngine completed successfully!

2. XEServer Installation Steps.

[user_efx@DEV2003 XES1]$./XEServerInstaller-linux-ix86
Checking OS compatibility... [ OK ]
Checking distribution... [ OK ]
Reading destination:
Defaulting to EDIFECS root.
Destination is /opt/Edifecs/9.0
EDIFECS root is /opt/Edifecs/9.0

Installing files:
Performing backup... [ OK ]
Unpacking files... [ OK ]
Delivering files... [ OK ]
Updating environment variables:
Updating [/home/user_efx/.profile]... N/A (No such file or directory)
Updating [/home/user_efx/.bash_profile]... [OK]
[ OK ]
Updating configuration files... [ OK ]
Registering product... [ OK ]
Cleaning up... [ OK ]

Installation of XEServer completed successfully!

3. Edifecs Standards Installation Steps.

[user_efx@DEV2003 MultiSetup1]$./MultiSetupInstaller-linux-x86
Inspecting the system...
Checking free space... [ OK ]

Next Product:ECL2
Checking distribution... [OK]
Checking OS compatibility...[OK]
Inspecting the system...
Destination is /opt/Edifecs/9.0/STANDARDS/ExternalCodeList
Edifecs root is /opt/Edifecs/9.0
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 is found.
Checking for previous version... [OK] - Previous version is not found.
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of ExternalCodeList completed successfully!

Next Product:HIPAA
Checking distribution... [OK]
Checking OS compatibility... [OK]
Inspecting the system...
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 or compatible is found.
Edifecs root is /opt/Edifecs/9.0
Destination is /opt/Edifecs/9.0/STANDARDS/HIPAA
Settings are /opt/Edifecs/9.0/STANDARDS/Shared/X12N
Checking for previous version... [OK] - Previous version is not found.
Performing backup:
/opt/Edifecs/9.0/Common/StandardsRegistry.xml
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of HIPAA completed successfully!

Next Product:HL7
Checking distribution... [OK]
Checking OS compatibility... [OK]
Inspecting the system...
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 is found.
Destination is /opt/Edifecs/9.0/STANDARDS/HL7
Edifecs root is /opt/Edifecs/9.0
Checking for previous version... [OK] - Previous version is not found.
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of HL7 completed successfully!

Next Product:HL7XSD
Checking distribution... [OK]
Checking OS compatibility... [OK]
Inspecting the system...
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 is found.
Destination is /opt/Edifecs/9.0/STANDARDS/HL7XSD
Edifecs root is /opt/Edifecs/9.0
Checking for previous version... [OK] - Previous version is not found.
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of HL7XSD completed successfully!

Next Product:X12N
Checking distribution... [OK]
Checking OS compatibility...[OK]
Inspecting the system...
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 or compatible is found.
Edifecs root is /opt/Edifecs/9.0
Destination is /opt/Edifecs/9.0/STANDARDS
Settings are /opt/Edifecs/9.0/STANDARDS/Shared/X12N
Checking for previous version... [OK] - Previous version is not found.
Performing backup:
/opt/Edifecs/9.0/Common/StandardsRegistry.xml
/opt/Edifecs/9.0/STANDARDS/Shared/X12N
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of X12N completed successfully!

Next Product:NCCI
Checking distribution... [OK]
Checking OS compatibility... [OK]
Inspecting the system...
Checking for ExternalCodeLists version... [OK] - ExternalCodeLists version 9.0.4.284 is found.
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 is found.
Destination is /opt/Edifecs/9.0/STANDARDS/ExternalCodeList
Edifecs root is /opt/Edifecs/9.0
Checking for previous version... [OK] - Previous version is not found.
Extracting files...[OK]
Cleaning up...[OK]

Installation of NCCI completed successfully!

Next Product:SCRIPT
Checking distribution... [OK]
Checking OS compatibility... [OK]
Inspecting the system...
Checking for XEngine version... [OK] - XEngine version 9.0.0.6231 is found.
Destination is /opt/Edifecs/9.0/STANDARDS/SCRIPT
Edifecs root is /opt/Edifecs/9.0
Checking for previous version... [OK] - Previous version is not found.
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of SCRIPT completed successfully!

Next Product:NCPDP
Checking distribution...[OK]
Checking OS compatibility...[OK]
Inspecting the system:
Checking for XEngine version...[OK] - XEngine version 9.0.0.6231 is found.
Edifecs root is /opt/Edifecs/9.0
Destination is /opt/Edifecs/9.0/STANDARDS
Checking for previous version...[OK] - Previous version is not found.
Performing backup:
/opt/Edifecs/9.0/Common/StandardsRegistry.xml
Extracting files...[OK]
Installing files...[OK]
Registering product...[OK]
Cleaning up...[OK]

Installation of NCPDP Guidelines Database completed successfully!

4. EAM Insallation Steps.

[user_efx@DEV2003 EAM1]$./EAMInstaller-linux-ix86
Checking distribution... [ OK ]
Checking OS compatibility... [ OK ]
Inspecting the system...
Destination is /opt/Edifecs/9.0
EDIFECS root is /opt/Edifecs/9.0
Installation type is "Server"
Performing backup:
Extracting files... [ OK ]
Installing files... [ OK ]
Setting environment variables...
Updating [/home/user_efx/.profile]... N/A (No such file or directory)
Updating [/home/user_efx/.bash_profile]... [OK]
[ OK ]
Preparing configuration files... [ OK ]
Registering product... [ OK ]
Cleaning up... [ OK ]

Installation of EAM completed successfully!

ETL Tech Blog

Headings

Monday, November 26, 2018

Collect Statistics on Oracle Repository Tables

How to Identify Throughput Issues

Average Run Time and Throughput

DTM Buffer Size

Auto Memory Settings

Informatica NFS File Systems

Sunday, November 4, 2018

Edifecs 9.0 Installation Steps on Linux Server.

Site Disclaimer

Followers

Most Viewed Links