DataSelf ETL Help

Revision: Feb. 17, 2014

DataSelf ETL v2013.2

Introduction

This help information is a brief guide to the most commonly used features of the DataSelf ETL engine. Some topics are not included in this guide because they should only be used by persons with advanced training from DataSelf.

This guide assumes the reader is familiar with foundational concepts of data warehousing such as source systems; metadata, tables; ODBC connections; filters; grouping of records; formulas; indexing; manual, automatic, and batch refresh; and extract, transformation, and load (ETL).

Organization of this Guide

This guide is indexed by How To Topics.

Word Search this Guide

Type Ctrl-F and enter your search terms in the Find window.

 

HOW TO Topics

How do I  . . .  ?

Do this   or  Refer to

Add or remove columns/fields in a table

Click on the table’s name in  Browse Table button.jpg Browse Table window;  right-click, click  Modify Import_Mapping.jpg Modify Import/Mapping to open the Import Table Wizard.

Add or remove DataSelf ETL users

See User Wizard icon.jpg  DataSelf ETL Users. (This is rarely used.)

Add or remove indexes in a table

Action  >  Wizard  >  Filter.jpg Index/Sort  to open the Index/Sort Wizard.

Filter Wizard title cut.jpg  Follow the wizard’s instructions.

Add or remove profiles

Action  >  Wizard  >  Filter.jpg Profile to open the Profile Wizard.

Follow the wizard’s instructions. (This wizard is rarely used.)

Add or remove relationships

Action  >  Wizard  >  Filter.jpg Relationship to open the Relationship Wizard.

Follow the wizard’s instructions.

Add, change, or delete filters

Action  >  Wizard  >  Filter.jpg Filter  to open the Filter Wizard.

Filter Wizard title cut.jpg Follow the wizard’s instructions.

Add, change, or delete formulas / derived fields

Action  >  Wizard  >  Filter.jpg Formula  to open the Formula Wizard.

Filter Wizard title cut.jpg Follow the wizard’s instructions.

Add, modify, or remove summaries

Action  >  Wizard  >  Filter.jpg Summary to open the Summary Wizard.

Follow the wizard’s instructions.

Batch/Automatic Refresh

See Batch & Automatic Refresh

Browse a table’s data

Double-click on table’s name in the Browse Table button.jpg Browse Table Window.

See also: Meaning of the colored column headings in the Table window

Browse a table’s filters

Click on the table’s name in  Browse Table button.jpg Browse Table window;  right-click, click  Filter.jpg Filter.

Browse a table’s metadata

Metadata about an entire table or Metadata about the columns/fields in a tableSee also: Report Metadata.

Browse metadata about the column’s / field’s in a table 

Click on table’s name in the Browse Table button.jpg Browse Table Window; right-click, click Fields button.jpg Fields.   See also: Fields Window.

Change a column/field’s data type or size

See instructions at Modify Structure.jpg Modify Structure Window

Connect to source data with ODBC

See ODBC Connections and Source Systems.

Connect to Excel Files

See Importing Data from Excel and Source Systems.   (2013.2)

Connect DataSelf ETL to data warehouse /
SQL Server instance

See SQL Server Connection Tool.

Delete a table

Click on the table’s name on the Browse Table button.jpg Browse Table window; right-click, click  Delete.jpg. Delete.

Import a new table

See Import a New Table with the Import Table Wizard.

Manual refresh

See Refresh.  Contrast with Batch & Automatic Refresh.

Refresh a table

See Refresh.

Refresh all Tables

See Refresh.

Reindex a table

See Reindex. Tables.

Reindex all tables

See Reindex. Tables.

Rename a column/field

See instructions at Modify Structure.jpg Modify Structure Window.

Rename a table

See instructions at Modify Structure.jpg Modify Structure Window.

Report Metadata

Click on table’s name in the Browse Table button.jpg Browse Table Window; right-click, click Print metadata.jpgPrint Metadata.

End of How To Topics index

______________________________________________________________________________________________________

 

 

 

 

Browse Table button.jpg  Browse Table Window

The Browse Table window (see below) opens automatically when you log to the ETL. This window is the starting point to several tasks in the ETL such as browsing data in tables, checking ETL metadata, and refreshing tables.Browse table-default presentation crop.jpg

Default presentation of Browse Table window.

Browse Table button.jpg

Browse Table
Icon

To Open:

Browse Table button Browse Table button.jpg 
  or
Action > Browse  >  Table
  or
Alt-T

 

 

Mouse Controls – Browse Table window

Click on row

Select row.

Click on selected row

Open table window. See also double-click on row.

Double-click on row

Open table window.

Right-click

Show sub-menu.

Ctrl-click on column

Adjusts column width making it narrower.

Ctrl-drag on column

Adjusts column width.

Ctrl-click on left or right margin and drag the mouse to the left or right.

Click Maximize button Maximize button.jpg

Shows more columns in the window, exposing the columns of metadata.

 
   Columns in the Browse Table Window  (Table Metadata)  

Browse Table-all columns crop 3.jpg

To Open: Click maximizeMaximize button.jpg on the Browse Table window. (The default settings for the Browse Table window do not show all the columns in the window.)

Columns in the Browse Table Window (Table metadata)

first column
(column with no heading)

Name of table used in the DataSelf ETL.(The name of a SQL view in the data warehouse.)

Physical Name

Name of table (physical table) in the DataSelf BI data warehouse.
(The name used internally by DataSelf ETL to identify the table in the data warehouse.)

Source Table Name

Name of table in source system that supplies data to the physical table. (Data is extracted from the source table and loaded into the physical table during the refresh.

Data Source Name (DSN)

Name of DSN (ODBC connection) that is used by refresh to connect to the source table.

Last Refresh

Date the physical table was last refreshed.

Table Created by

Name of user who first created the physical table in the ETL.

Table Created

Date physical table was created.

See also: Metadata about individual columns / fields in the table.

 

Browse Table button.jpg Table Window

 

To Open:  On the Browse Table window click twice on a table name.

The heading line of the Table Window shows the name of the table followed by the number of records in the table. The example below shows that the ICLOC_Locations table has 6 records.

Browse Table button.jpg Table Window:  Column Headings Color Code

Example of the color coded columns in a Table window.
 In this example the table ICLOC_Locations contains 6 records.

 

 

Red column heading

The table is currently sorted by the values in this column.

Green column heading

Derived field. The data in the column is derived from a formula. Contrast with black heading columns.

Shift-click green column heading to display the formula.

Black column heading

Neither of the types above. The data from the column is extracted directly from a column in the source table.

 

To Browse Formulas

Shift-click green column heading

Displays the formula used to derive the values in the column. That is, the formula used to create the field.

 

 

Browse Table window Sub-menu  Browse Table button.jpg

To Open:  Right-click anywhere on the Browse Table window to display the sub-menu.

To Open for a Particular Table: First select (click on) the table name in the Browse Table window then right-click the selection to open the sub-menu.

This section shows only some of the options on the sub-menu – not all options are described.

 

Fields button.jpg Fields

Browse metadata about individual columns / fields in the selected table.
Metadata includes format, size, decimal position, and formatting. For more Fields window below.

Filter.jpg Filter

Displays the filter settings that are available for the selected table.

Index_Sort.jpg Index/Sort

Display the indexes available for the selected table.

Print metadata.jpg Print Metadata

Print window _ from Print Metadata.jpgA report of all the metadata for the selected table.

Summary.jpg Summary

Shows the summary fields available for the selected table.

Modify Structure.jpg Modify Structure

Renames tables or columns and/or changes the size or type of columns.

Refresh.jpg Reindex

Reindexes all of the indexed columns of the select selected table


Reindex all Tables

Reindexes all of the indexed columns of the select selected table

Refresh.jpg Refresh

Refreshes the data in the selected table. A refresh extracts data from the source system’s database, recalculates formulas, and reindexes as required.

Refresh all tables.jpg
Refresh all Tables
(not recommended)

Performs a refresh on all the tables in the data warehouse.

This procedure is not recommended. Use the manual batch refresh procedure instead. See also the caution in the Refresh section.

Recalculate.jpg Recalculate

Recalculates the formulas for the selected table.

Recalculate all Formulas.jpg Recalculate all Formulas

Recalculates all the formulas for all the tables in the data warehouse.

 

 

Fields button.jpg  Fields Window

View - Fields.jpg

Figure 2  List of columns/fields in a table. This window is opened by the Fields option.

View-Fields expanded_ICLOC-Locations row.jpg

Figure 3  Fields window. Expand window to show metadata for rows.

To Open:  Right-click anywhere on the Browse Table window Browse Table button.jpg  then click Fields Fields button.jpg.

Browse metadata about individual columns / fields in the selected table.
Metadata includes format, size, decimal position, and formatting.

 

Fields button.jpg

 

See also: Browse Table window Sub-menu

 

 

Modify Structure.jpg  Modify Structure Window  (prior to version 2013.4)

To Open: On the Browse Table window click a table name to select it. Then right-click and click Modify Structure.

 

Rename a table

Click the table name in the window’s heading. Modify Structure change table name.jpg

Type in changes to the name and press Enter.

Rename a column/field name.

Click the column heading, type changes in the 3 lines available and press enter.

Change the size or data type of a column/field.

Click on the data in a column below the column heading. Double-click a field model in the Change Field Models window. Click Yes to save in database.

 

Edit Table Name Mode

Clicking on the window header puts the window into Table Edit Name mode. No other actions can be performed until you press Enter or Escape to exit edit mode.

 

To Save Modifications:  When all modifications have been made to a table then right-click and click . Changes are not saved until you click Confirm.

Modify Structure-rt-click sub-menu Confirm selected with cusor.jpg

 

 

 

Import Table Wizard Import Table Wizard_title only cut.jpg

This wizard allows the user to change the mapping properties of tables. For example, changes can be made to the mappings as provided by DataSelf in the out-of-the-box solutions, fields may be added or removed from a table, or filters added to a table’s data refresh process.

To Open: On the Browse Table window click a table name to select it. Then right-click and click Modify Import_Mapping.jpg Modify Import/Mapping

Import Table Wizard_title only cut.jpg  Follow the instructions in the Import Table Wizard.

 

 

Reindex Tables

Reindex defined: A reindex (or re-index) rebuilds the indexes in the indexed columns of tables in the DataSelf BI data warehouse. Reindexing is one of the steps in the refresh process.

Implications.  Reindexing a table may take a few seconds to several minutes to finish depending on the number of records in the table. Re-indexing all tables may take a long time. The performance of DataSelf BI and the server that hosts DataSelf BI may also be impacted.

To Start a Reindex: Click on table’s name in the Browse Table button.jpg Browse Table Window; right-click and click Refresh.jpg Reindex or   Reindex all Tables

Refresh.jpg Reindex

Reindexes all of the indexed columns of the select selected table..


Reindex all Tables

Reindexes all the indexed columns in all tables in the data warehouse. On some installations this may tie up resources on the server that hosts DataSelf BI for some time.

 

 

 

Refresh Wizard icon.jpg Refresh Wizard

To Open: Action > Wizard > Refresh       or click the Refresh Wizard.jpg Refresh Wizard button.

Follow the instructions on the wizard.

 

Refresh Wizard

This example shows refresh Batch Number 0 (zero). This batch has 5 steps.

 

The Step column specifies the order in which the files in this batch will be refreshed. Use the Move arrows to change the order of the steps.

See also:  BatchesBatch Refresh.

 

 

 

Refresh

Refresh Defined: A refresh extracts data from the source system’s database, recalculates formulas, and reindexes tables as required. The primary function of the DataSelf ETL is first to setup, configure, and maintain the DataSelf ETL metadata and second to run the refresh processes. The operation performed by each refresh process is controlled by this metadata.

Source System defined:  The source system is the accounting, ERP, order entry, etc software systems in your organization. The “E” in ETL stands for extract. During a refresh the DataSelf BI ETL engine extracts data from the source system’s database or databases. For this reason please take note of the following caution.

Implications

A refresh usually draws resources from your source system server, your network, and the server that hosts DataSelf BI. Please heed the caution below.

Caution: A refresh is a resource intensive process that calls on the database resources of other computer hardware and software systems within your company. On some installations a refresh, especially a full batch refresh, may have significant impacts on the performance of other business software systems within your organization.  (On the other hand some well-tuned installations report little to no noticeable impact on performance. Check with the DataSelf system manager for more information.)

 

 

Types of Refresh

The data warehouse is normally refreshed on schedule using the batch refresh process but other options are available. A refresh may be either manual or scheduled, on a single table or by batch.

A manual refresh is refresh that is run by you. A scheduled or automatic refresh is a refresh that is run by a job scheduler.

Refresh Single Table

See Manual Refresh on a Single Table

Manual Batch Refresh

See Batch Refresh

Scheduled (automatic) Batch Refresh

See Batch Refresh

 

v              Note: Use of the manual Refresh all Tables option is strongly discouraged. Use the manual batch refresh option instead.

 

 

Manual Refresh on a Single Table.

To Start a Refresh (manual refresh): Click on table’s name in the Browse Table button.jpg Browse Table Window; right-click and click Refresh.jpg Refresh or  Refresh all tables.jpg Refresh all Tables.

Refresh.jpg Refresh

Refreshes the data in the selected table..

Refresh all tables.jpg
Refresh all Tables
(not recommended)

Performs a refresh on all the tables in the data warehouse. This process has been replaced by the batch refresh process and should not be used.

See also: Warning about the Refresh all Tables Option.

 

v              NOTE: Depending on the table’s content, after the data is refreshed from the source, you may have to confirm if you want to recalculate formulas.

 

 

Refresh Wizard icon.jpg Batch Refresh 

A batch is a group of tables that are configured to refresh in a particular sequence. Batches and sequences are specified in the Refresh Wizard.

To Run: To run a batch refresh run the DSRefresh.bat file.  Speak with the designated DataSelf BI system manager for more information.

The batch refresh process is usually scheduled to run overnight. When the refresh process is run by job scheduler the batch refresh is also known as an automatic refresh.

The Refresh Wizard specifies a particular sequence for the refresh process by establishing batches and processing steps within the batches.

Some installations may have several refresh batches. Each batch takes care of specific needs and may run on a separate schedule.  An understanding of the components involved in a batch refresh should help you to better understand how to make changes.

Components of the Batch Refresh

dsrefresh.exe

The batch refresh program.

The program has one optional parameter: a batch number. The default batch number is zero.

DSRefresh.bat

The script which is usually used to run dsrefresh.exe.
(The .bat suffix stands for batch. A batch is a computer term for a script – a sequence of commands which is used to execute a process.)

Job scheduler

A program which runs the .bat file according to a schedule.

Any job scheduler program can be used. The most popular scheduler among DataSelf users is SQL Jobs, the scheduler built into Microsoft’s SQL Server. This option provides several logging features and it is better integrated with the DataSelf BI tool set. Windows Task Scheduler is another popular option.

DataSelf ETL metadata

The metadata is a set of instructions that controls the refresh process. Specifically, the operation of dsrefresh.exe is controlled by the instructions in this metadata. The metadata is setup and maintained by the wizards inn DataSelf ETL.

See also: Out-of-the-box Templates from DataSelf

DataSelf ETL

A major function of the wizards in the DataSelf ETL is to setup and maintain the metadata.

 

 

Batches

The ETL automatic refresh runs via batches. Each batch may have several steps. A batch might for example refresh ERP tables in the first step, refresh CRM tables in the 2nd steps, refresh combined tables in a 3rd step. The configuration of batches and is done in the Refresh Wizard.

 

 

Trouble Shooting a Batch Refresh

If a table in either a manual or batch/automatic refresh is not being refreshed as expected here are the first things that DataSelf’s experts look for.

Trouble Shooting Connection Issues

Verify that the ODBC connections to all data sources are working. Setting up the OCBC connections for the out-of-the-box solutions from DataSelf should have been done during the initial installation but a number of things can break the connection. For more see ODBC Connections.

Run a manual refresh to be sure there are no problems connecting to this table.

Trouble Shooting Mappings Issues with the   Import Table Wizard

The import mappings to the table may be preventing the refresh from working as desired. 

Open the Browse Table window. Click the table’s name, then right-click and click Modify Import_Mapping.jpg Modify Import/Mapping.)

Import Table Wizard - Select Tables window:

Click Back to see the names of the OCBC connections or to change the login and password.

Click Next.

Import Table Wizard - Select Fields window: Verify that the appropriate fields are imported in the Select Fields window. The fields imported are listed on the right side pane.

Click Next.

Import Table Wizard - Select Data Grouping (optional) window: For most tables, you should see no fields on Grouping Fields pane. If there are grouping fields then verify that the appropriate grouping is being used to summarize data.

Click Next.

Import Table Wizard - Select Import Filter (optional) window: This step may not show up for some tables. For most of the remaining tables, this step should have (none) selected. If appropriate, verify that the appropriate filter is being used to select data.

Click Next.

Import Table Wizard – Finish window: Verify the following refresh options are set appropriately:

·         The Maximum number of rows to import box should usually be un-checked.

·         If the Disable automatic refresh box is checked then the refresh will not extract data from the source system. Formulas, however, will still be calculated.

·         Click Refresh Type to see what’s been selected.

·         Click Table Union to see if this table is being merged with other table(s).

·         Read the caution about refreshing tables. If it’s Ok to refresh a table click Finish & Import Now. The table refresh should run without errors.

 

 

Warning about the Refresh all Tables Option

On some systems the use of the manual Refresh all Tables option may corrupt the data warehouse. Use the manual batch refresh procedure instead.

Proper use of this option is beyond the scope of this document.

The Refresh all Tables command ignores the sequence of operations in a batch as configured in the Refresh Wizard. A refresh performed by the Refresh all Tables will probably be partly incorrect.  In some cases a data warehouse refreshed out of proper sequence may be seriously corrupted.

 

 

Out-of-the-box Templates from DataSelf

Part of the DataSelf BI solution includes one or more out-of-the-box templates for each source database DataSelf supports. A main component of each template is metadata for the DataSelf ETL. The out-of-the-box metadata was created by DataSelf using only the DataSelf ETL.

See also: Components of the Batch Refresh

 

 

 

Import A New Table with the Import Table Wizard icon.jpg Import Table Wizard

To Open the Import Table Wizard: Click Action > Wizard > Import Table Wizard.jpg  Table.

Follow the instructions on the Import Table Wizard.

Add the table to the automatic refresh process with the Refresh Wizard icon.jpg  Refresh Wizard..

Requirements: An ODBC connection to the source system that contains the new table is required.

 

 

 

Metadata

Metadata in the context of the DataSelf ETL engine and the refresh process includes everything the refresh process needs to know. Metadata includes information about:

·         Source systems and their databases

·         The relationship between source system database tables and data warehouse tables

·         Format of data warehouse tables and columns (or files and fields).

·         Formulas

·         Indexes

·         Groupings and summarizations.

·         Refresh batches

The out-of-the-box solutions from DataSelf consist largely of metadata.
See also: Out-of-the-box Templates from DataSelf

 

 

 

ODBC Connections

OCBC connections to the source systems for the out-of-the-box solutions from DataSelf should have been done during the initial installation. But a number of things can break an ODBC connection. Setting up and fixing ODBC connections often requires a computer system manager with the right passwords and technical knowledge.

See also: SQL Server Connection Tool

See also:  Advanced Technical Assistance

 

 

 

SQL Server Connection Tool

Connects the ETL to the data warehouse.  Controls the connection of the DataSelf ETL engine to the SQL Server instance where the data warehouse is hosted.

This window also has options which resets the metadata for the data warehouse and data cubes back to their initial out-of-the-box configurations.

Under normal conditions this tools is used once when first installing the ETL.

Caution:  Some of the settings in the SQL Server Connection window can erase all changes made to the data warehouse and data cubes since installation.  

To Open: Click Tools > SQL Server Connection.

v  Important: In most cases, the boxes below the Database name should not be checked.

 

 

 

User Wizard icon.jpg User Wizard

To add, change, or delete users open the User Wizard and follow the instructions.

To Open: Action > Wizard > User Wizard.jpg User

v              Important Note: These user settings apply only to ETL users. Other components in the DataSelf BI solution have their own user settings. Changing the settings here will not impact the user and security settings for the data warehouse and data cubes.

 

 

 

ETL

ETL is an acronym for Extract, Transformation, and Load. The purpose of an ETL engine is to extract data from source databases, transform (clean up, summarize, reformate, and calculate), and load the transformed information into the data warehouse.

 

 

 

Advanced Technical Assistance

Every DataSelf customer should have at least one person designated as a primary contact and person designated as system manager. Your organization’s policy (as well as and the support agreement with DataSelf) may ask you to contact of these people before contacting DataSelf directly.

 

 

 

Contacting DataSelf for Support

Email: support@dataself.com

Home page: www.dataself.com

Tech support line: 408-351-3569