SAYS: Import Wizard
Overview
The import wizard is used to clean and import user data into defined data models. It has many features to transform and manipulate the data. This section outlines those features and shows how users can leverage the import wizard to transform their data.
Loading & Saving Import Settings
File Formatting
- Has Column Headers: If this is selected, then the import wizard will take the first row as the column headers of your data.
- Start Row: This will determine which row the import wizard should start reading from. The default is 1 which will read the first row of your file. This includes column headers if they exist.
- End Row: This will determine which row the import wizard should stop reading at. The default is empty which means it will read all the rows of your data.
- Start Column: This will determine which column the import wizard should start reading from. The default is 1.
- End Column: This will determine which column the import wizard should stop importing at. The default is 1.
Data Preview
The data preview section features an Excel-like table that lets you preview your data and see how it will look before it gets saved to the system. You can restrict the number of rows it shows, export the table to Excel, and look over the data and column headers it is showing.
You will be able to preview the data at various steps to see how the import wizard is transforming your data.
Advanced ETL Settings
Data Management Plug-In
The SAYS platform leverages the power of Opturo's VIA data management platform. A key feature of the import wizard is that VIA can be used to parse, transform, and clean your data. If a data management plugin is defined a user can select this to use for importing. These plugins can be specific to a vendor format or custom configuration files that have been developed on the VIA platform.
Map Column
The Map Column setting is used for assigning data columns that have been initially imported from the raw user data to the template columns of the associated data model. For example, in the initial raw file the column may be named as 'Effective Date', but needs to transformed to 'Value Date' in the SAYS system.
This setting enforces the data type of the Template Column and, for columns with date-based data, let's you define the date format, e.g. yyyyMMdd.
Click 'Map Column' when you want to add this to the list of applied settings.
Add Column
The Add Column setting is used for inserting/creating additional columns into the data you are importing that may not have been in the original raw format. These are for creating columns of data that are in the data model, but not in the original raw user data.
These column types described below can be used to map to the data model columns.
Click 'Add Column' when you want to add this to the list of applied settings.
Assign Value
The Assign Value column type is used to assign a constant data value for each row in the mapped column. For example, if you wanted to insert a common fund ID, you may map to a fund id column and insert a constant Text string of 'Portfolio A'. Or if you'd like to insert a constant number, you might insert 1 with the Data Type Number to be used as an indicator column.
You can select from data types: Number, Text, and Date. You can then assign that constant value in the Value input box.
Extract Value from File/Sheet Name
The Extract Value from File/Sheet Name column type is used to get a value from the raw data file's file name or sheet name using a find/search functionality or regex functionality. This way if there is some identifier or date in the name, it can be extracted and imported into the mapped column.
- Prefix: Define the string or regex string you want to search for in the prefix of the file or sheet name.
- Prefix is Regex: Check this box if the string you defined in the Prefix box is a regex string.
- Suffix: Define the string or regex string you want to search for in the suffix of the file or sheet name.
- Suffix is Regex: Check this box if the string you defined in the Suffix box is a regex string.
- Date Format: If you are extracting a date from the file or sheet name, define the format of the date here.
- Test Extract: You can run the test extract to what gets extracted using the settings you provided.
Multi-Column Operation
The Multi-Column Operation column type is used to map to a data model column by applying an operation type on the existing columns of the raw user data. For example, if you want to combine a fund identifier with a group identifier, you might used Addition to concatenate those two strings together.
You can also perform simple arithmetic operations if you have Number data columns that you want to perform operations on. For example, if you have a security position column containing shares and a price column, you might want to multiply those columns together to map to a market value column.
Append the columns you want to operate on in the Operation Column List box and select the operation type form the Multi-Column Operation Type dropdown.
Create Index
The Create Index column type is used to map to an index column defined in the data model.
Filter Data
The Filter Data setting is used to apply filtering rules to the imported data to limit what gets imported based on user-defined rules.
- Where Clause Type: This is a logical operator when you want to do multiple where clauses.
- Data Column: Select the data column you want to apply the filtering rules on.
- Data Type: Select the data type of the data column you are applying the rule on.
- Operation: Select an operation type from the dropdown that will be the filtering rule. For example, you might be looking for a Text that is Equal To a specific value. Or a Number Between two different values.
- Comparison Type: You can apply a filtering rule that does an operation between two data columns, or an operation on a single value. Select column to compare two columns, and select the comparison column from the Comparison Column dropdown. If you select value, then you can define the value in the Comparison Value input box.
- Comparison Value: You can define the specific value you want to use in your rule in this input box.
- Comparison Column: If you want to compare two columns, select the second column you are comparing against from this dropdown.
Click 'Add Where Clause' when you want to add this to the list of applied settings.
Validate Data
The Validate Data setting is used to replace or update columns of data. You may want to fetch values from another column and only replace certain values in a selected column. For example, you may want to replace all rows that are missing an identifier with a constant identifer MISSING. Or if you want to fetch values from a second column, you can replace those missing identifiers with the identifiers from the second column.
Refer to the Filter Data setting for information on the where clause definitions.
- Data Validation Operation: Select whether you want to Replace rows with a constant value, Fetch values from another column, or Mutliply the rows by some constant factor.
- Data Validation Column: Select the column that you are applying the data validation logic on.
- Validation Value: If you are applying a constant factor, or replacing with a constant value, define that value in this input box.
- Validation Value Fetch Column: If you are are fetching values from a second column, specify the column from this dropdown.
Click 'Add Validation Rule' when you want to add this to the list of applied settings.
SQL Engine
The SQL Engine setting can be used to leverage basic SQL queries to manipulate and transform the existing data columns and map them to new columns. This setting leverages the H2SQL engine, which has extensive documentation on how to define and execute queries.
Click 'Run Query' when you want to add this to the list of applied settings.
Data Merge Options
Merge Static Data to Loaded Time Series Data
Allows merge of static data to persisted time series data. For instance, merging account-level currency code which is static through time to account time-series return data.
Merge Time Series Data to Loaded Folder Indexed Time Series Data
Allows merge of Non-Portfolio specific Time-Series data to Portfolio time series persisted data. For instance, merging security-level price data to portfolio holdings.
Generate Unique Start and End Date Attribute Data From Time Series Data
Generates attribute data set (Columns: Key Column, Start Date Column, End Date Column, Attribute Columns..) from time series account or security time series data. For instance, track the change in sector, industry and country for an account or security over a period of time. Map the Value Date to the Attribute Start Date column.
Unioning Additional Data
The SAYS Import Wizard allows users to union together different files based on keys that will map merge the files together. For example, you may have a file with account IDs and dates that should be merged with another file that has account IDs and account detail information. This feature allows user the ability to combine files, and columns within those files, together into one data model.
Model Mapping
The Model Mapping section of the import wizard is for mapping columns of data in the user format to the columns of the data model you are trying to import to. The Mapping Column dropdown lets you select from the list of columns defined in the data model.
Data Preview is also available here to see how the data looks after mapping columns. You can also define the date format of your data here, e.g. yyyyMMdd.
Data Exceptions
The Data Exceptions step of the import wizard is used as an audit step before the data gets saved to the system. Any rows of the data that have violated constraints of the data, e.g. Text length or missing data, will show up here. The preview table lets users review those data exceptions.