DataSelf ETL Glossary

From DataSelf Knowledge Base
Jump to: navigation, search

{{#breadcrumb: Main}}

Glossary of terms used in DataSelf ETL's documentation.

Business System

See Source System.

Calculated Data

Data that is calculated or derived from other data. In DataSelf's ETL calculated data is defined by the Formula Wizard.
Also known as: Derived data, Calculated columns or fields, Computed columns or fields.
Related concepts: Business Rules.

Computer Platform

Also known as: Platform, Computer System, Server, Computer Server, Server Platform, Network Server.
A combination of computer hardware, operating system and other software that hosts application software such as DataSelf's BI solution. The term is often used when referring to what kind of computer system a certain software program will run on.

Usage: Computer platform is rarely used except to define the term. The use of platform emphasizes the operating system environment. The term server can be ambiguous because server can refer to a computer platform as defined here or to a class of software (such as Microsoft's SQL Server) which performs a type of service that is used by other software.

Related concepts: client / server architecture, cloud server, client software.

Currency Date

The date the data is considered effective or up-to-date. It is also know as the "as of" date or temporal currency. In data warehouse terms the currency date is also the extract or refresh date.
Related concepts: Data Currency, Temporal Currency.

Data Cleansing

The process of checking for data quality problems and attempting to resolve (or at least reduce) them. Removing errors and inconsistencies from data being imported into a data warehouse.

As one aspect of data transformation, data cleansing looks for data that is correct, unambiguous, consistent, and complete. Data cleansing can be applied to source databases and the data warehouse data via the data transformation step of the ETL process.
Related concepts: Data quality assurance.

Data Connection

The service provided by a Data Connector. The process and technology that connects the ETL to a data store and thus allows the ETL read and write data to and from databases, source data, data warehouses and other data stores such as Microsoft Excel workbooks (spreadsheets).

Data Connector

Software used to connect to data and thus creates a data connection. See data connection for more information. Examples of data connectors include ODBC and the data source types seen in the Import Table Wizard.
Data Connectors use the information (metadata) stored in a data to establish the connection to a data store.

Data Transformation

The transformation step of the ETL acronym. The general term for a set of operations that manipulate your source data and transform it into DataSelf's data warehouse. your data Some would say that transformation is the step where ETL can add the most value. Data cleansing is one aspect of data transformation.

Data Source

Loosely the term refers to a connection to source data. The phrase also refers specifically to the configuration information (a type of metadata) that specifies how a data connector shall to connect to data. This data or data store may be, but is not limited to, source data in a file, a source database, or a data warehouse.

  • Each data source is assigned a unique data source name name by it's creator.
  • Data sources and it's related concepts are tied together in the Import Table Wizard.
  • Data source is one of the most difficult terms to define in this glossary because it is easily confused with other terms especially source data. We attempt to dis-entangle the related terms below.
Metadata in a Data Source
The connection information stored in the metadata in a data source includes:
  • Location of the data store's file or database.
  • Type of data connector software required.
  • Data connector level security information (user name, password, etc)
  • For a SQL Server database the name of SQL Server instance which manages the database and any security information such as user name and password.
  • Database manager (DBMS) level security. SQL Server instances often require a user name and password.
  • Security information such as User name & password to the specific file or database.
Data Source Names versus Data Connectors
Data source is not a data connector but rather the configuration information (a type of metadata) that the data connector software uses to connect to data stores.
A Data Source versus Source Data
Data source is used informally as interchangeable with source data and source database. At times this informal usage can get confusing and is not recommended. A source database refers to a database (to data), a data source refers to the metadata describing how to connect to the database.
Imported Tables and Data Sources
To import data from a data source into a data warehouse the ETL requires the metadata from the data source. These concepts are connected in the Import Table Wizard.

More: Import Table Wizard - Start window especially the section on Data Source Types.

Data Sources in DSA/Tableau

A data source is a reusable connection to data. Data sources can include data extracts, or information for a pass-through connection to a live, relational database. A data source can also include a layer of customizations, such as calculations, groups, or sets.

DSA/Tableau Desktop authors can publish data sources to DSA/Tableau Server.

More: | Data Sources from Tableau's online help.

Data Source Name

Every Data Source is assigned a name by the person configuring it. DataSelf follows a set of naming conventions for assigning data source names.

Data Store

A generic term for a file, database, etc where data is stored. Source data, source databases, and data warehouses are types of data stores.

Data Warehouse

DataSelf Data Warehouse

See also: Metadata, Template, DataSelf BI Templates.

Data Warehouse Database

All data in a DataSelf Data Warehouse is stored in a single data base in SQL Server. Each of these data bases is known as a Data Warehouse Database.

Database
A collection of data, usually in the form of tables or files, under the control of a database management system (DBMS).

DataSelf ETL

DataSelf's ETL engine.

DSN

Acronym for Data Source Name. The usage of DSN can be confusing.
A DSN is a file that contains information ( metadata ) for configuring a ODBC connection to a ODBC compatible data store.
Informally DSN is sometimes used to refer to the data source the DSN is configured to connect to.

Enterprise Data Management

Management of the information shared between applications, systems and services within an enterprise.

  • The ability to effectively create, integrate, disseminate and manage data for all enterprise applications, processes and entities requiring timely and accurate data delivery.

The concept addresses the transmission of different data sets within processes and applications that rely on the consumption of these data sets to complete business processes or transactions.

  • The main purpose of EDM is the removal of organizational issues and conflicts resulting from the mismanagement of information and data, by implementing a structured data delivery strategy - from data producer to data consumer.
  • EDM is comprised of software applications, computing and network infrastructure, business logic and policies used to manage enterprise data flow. A organization implements EDM through a variety of processes and requires collaboration through different departments, such as IT, finance and operations.
-- https://www.techopedia.com/definition/28050/enterprise-data-management-edm

Related Concepts: master data management (MDM), data cleansing, point-to-point integration

ETL

Acronym for Extract, Transform(ation), and Load.
Data is extracted from a data store, transformed to match the data warehouse schema, and loaded into DataSelf's data warehouse database. Informally this process is sometimes called mapping or data mapping and is related to the concept of importing data and refreshing data

  • A table loaded into the data warehouse by the ETL is also known as an imported table.
  • Information specifying how the data in extracted, transformed, and loaded as well as the schema (structure) of the table is known as Metadata.
  • ETL is often a complex combination of process and technology. DataSelf's out-of-the-box templates and ETL engine reduces the effort that otherwise would require a significant development effort requiring the skills of business analysts, database designers, and application developers.
  • Running the ETL's extract, transformation, and load process is also known as refreshing the data.
  • The ETL refresh process is usually triggered at least daily, typically overnight.

Imported Table

A table imported into the DataSelf data warehouse by the ETL from a source database.
Imported tables are data warehouse tables; sometimes referred to as a Target Table.

Metadata

Metadata is data about data. Examples of metadata include the name, length, valid values, data type, and description of data elements. Metadata also is a detailed description of the processing that will during the ETL refresh process. The pre-packaged metadata from DataSelf comes in the form of DataSelf BI Templates.
Sometimes spelled: meta-data or meta data.

Metadata in a Data Warehousing Context
The sum of all documentation about the data warehousing process in its entirety. Metadata describes the contents of the data warehouse, its structure, and the processes involved in its setup. Essentially, metadata is all the information in the data warehouse environment that is not the actual data itself.
Business Metadata

The information whereby business users can understand and access the data warehouse. This type of metadata often focuses on what data is in the warehouse, how it was transformed, the source, and the timeliness of the data.

Project

ETL projects enable the ETL to manage more than one {egx|Data Warehouse|data warehouse}}.

Project information for each project is stored in a sub-folder under the ETL's main folder. Project information in the folder includes metadata and the project's data warehouse database.

test link to projects

Refresh

Also known as Data Refresh, Refreshing Data, Extract.
The most granular level of refresh in the DataSelf ETL is the table refresh. In production many tables are refreshed during a refresh job.
Related concepts: Data Currency, Refresh/Extract Frequency.

Some authorities define a refresh as: A process of taking a snapshot from one environment and moving it to another environment overlaying old data with the new data each time. The default setting in the Import Table Wizard is to replace the data in an data in an imported table with new data during a table refresh.

Refresh Batch

See Refresh Batch Concept in Refresh_Wizard.

Refreshing Data

The process of refreshing data. The result of a refresh process is refreshed data.

Refresh Wizard

See Refresh Wizard

Source Data

The point of origin of data that is imported, transformed, and loaded into the data warehouse by the ETL. Source data is a more generic term that includes source databases as well as Excel files.

Source Database

Also known as source data and source systems.
The point of origin of data that is imported, transformed, and loaded into the data warehouse by the ETL. The source database is usually the database of a business system such as an accounting, inventory, or ERP system.

Most ODBC compliant databases can be imported into a data warehouse by the ETL. DataSelf has prepared metadata for a number of source systems; these we sometimes refer to as out-of-the-box solutions or BI templates. List of source databases supported by DataSelf BI templates.

Source System

Alternative term for Source Data or Source Database.
The term source system emphases that the source data comes from a business system/business application.
See DataSelf solutions for Business Systems on DataSelf's main web page or See List of source systems supported by DataSelf BI templates.

TANSTAAFL

Acronym for "There Ain't No Such Thing As a Free Lunch". Developing a data warehouse is work and there aren't many "free lunches".

Target Table

The use of the term Imported Table is generally preferred.

Union

A set operation that is used to merge/combine data from two or more tables. Usage examples:

  • ... a union of ...
  • "union operator" as in "SQL's UNION operator".
SQL UNION operator
In SQL the UNION clause combines the results of two SQL queries into a single table. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL is used.
UNION can be useful in data warehouse applications. A simple example would be a database having tables sales2012 and sales2013 that have identical structures but are separated because of performance considerations. A UNION query could combine results from both tables.

Unioned

Informal. Prefered usage is union, refer to that section for usage examples.
Also: UNIONed especially when referring to SQL's UNION operator.

Wizard

A software wizard is user interface type that presents a user with a sequence of dialog boxes that lead the user through a series of well-defined steps. Wizards are provided to simplify lengthy or multi-step tasks by guiding you through a set of steps in the proper order.