Openprise supports two different connectors to S3. Below are guidelines on which connector to use.
- The standard connector allows the user to specify the location in S3 by bucket name or by path specification. The standard connector supports both import and export. Information on this feature is located HERE
- The open connector to S3 should be used if you need to specify the location of your files using a wildcard pattern. The Open Connector only supports import of data from S3 to Openprise.
Using the Open Connector for S3 using wildcard patterns
1. Authentication: select S3 Credentials
2. Bucket: (required) enter the bucket name
3. AWS Access Key ID: (required) enter your access key
4. AWS Secret Access Key: (required) enter your secret access key
5. AWS Region: (optional) Regions listed here, select the code (i.e. us-east-2)
6. Start Date: (optional) Any file with a modified date before this time will be ignored. The start date must be in this format: 2020-01-20T00:00:00.000000Z
7. Add The List of Streams to Sync: (optional but suggested) By default, Openprise will import all files found in the bucket. To select which files are to be imported, enter one or more streams.
a. Globs: Enter the Globs that define which files to be synced. This is a regular expression that allows Openprise to pattern match the specific files to import. If you are importing all the files within your bucket, use ** as the pattern. For more precise pattern matching options, refer to the Globs section below.
b. Schemaless: (optional and not recommended) Select this option to skip all validation of the records against a schema and to import all columns with data as one attribute named “data”.
c. Name: Give a Name to the stream. This name will appear in the Parse page during data source configuration.
d. Days To Sync If History Is Full: (optional) This gives you control of the lookback window that will be used to determine which files to sync if the state history is full.
e. Format: CSV is recommended and is the only format that has been tested. When CSV has been selected, also configure the following. Refer to the onscreen help text for each input. You may also consult the Airbyte S3 documentation here.
- Quote character
- Skip Rows Before Header
- Filetype (default and recommendation is csv)
- Skip Rows After Header
- Delimiter
- Strings Can Be Null
- True Values
- False Values
- Encoding (default and recommendation is utf8)
- CSV Header Definition
- Header Definition Type
- Ignore errors on field mismatch
- Double Quote
f. Validation Policy: (Optional) Select a Validation Policy to tell Openprise how to handle records that do not match the schema. You may choose to omit the record anyway (fields that aren't present in the schema may not arrive at the destination) OR skip the record altogether. The option to Wait until the next discovery is not supported.
8. Name your authentication credentials: (required) The name entered here will be visible in the Administration>Security>Credentials page in Openprise
9. Test Connection: Select to have Openprise verify your credentials. If successful, select Next.
Globs
This connector can sync multiple files by using glob-style patterns, rather than requiring a specific path for every file. This enables:
- Referencing many files with just one pattern, e.g. ** would indicate every file in the bucket.
- Referencing future files that don't exist yet (and therefore don't have a specific path).
You must provide a path pattern. You can also provide many patterns split with | for more complex directory layouts.
Each path pattern is a reference from the root of the bucket, so don't include the bucket name in the pattern(s).
Some example patterns:
** : match everything.
**/*.csv : match all files with the specific extension.
myFolder/**/*.csv : match all csv files anywhere under myFolder.
*/** : match everything at least one folder deep.
*/*/*/** : match everything at least three folders deep.
**/file.*|**/file : match every file called "file" with any extension (or no extension).
x/*/y/* : match all files that sit in folder x -> any folder -> folder y.
**/prefix*.csv : match all csv files with the specific prefix.