Tutorial

Introduction to Streaming SQL in Striim


Userlevel 1
Badge +1

 

Striim is a next generation unified data streaming product that offers change data capture (CDC) from popular databases such as Oracle, SQLServer, PostgreSQL and many others. Streaming data can be tricky for two reasons. Firstly, the large volume of data at the source that creates events every second requires high agility and low latency. Secondly, in its raw form, streaming data lacks structure and schema, which makes it tricky to query with analytic tools. There is an increasing need to process, parse, and structure streaming data before any proper analysis can be done.  Striim provides a powerful solution to all your streaming needs through its distributed, streaming SQL platform. Striim applies filtering, transformations, aggregations, masking, and enrichment of streaming data in real time  as the data is being delivered into target, in a consumable format  to accelerate the delivery of rich, timely, and reliable insights.

Here is an example showing Continuous Query inserted between source and target in the PosApp that transforms the data in-flight for analytics-ready operations rather than transforming in the warehouse.

Filter the Data Stream

CsvDataSource outputs the data to CsvStream, which is the input for the query CsvToPosData:

 

cGUVlxDdEFNaHxtYP5ru6Rk67aJQm0l317LKnCPSG-_D1o7JTHRKeq-JKVKPQKjxqdLnwfPAbg0wHehX9KH6CN7Hu7jTe1pih72BrsFnEmEWrpwLYHoePuQfnQ6C-ctVlYWAJ1nC1Oc5AiXwYnw

tyqQxtwfasP9-fzzwRHnCFVWVcXHfiz1GU9plQ4xComZNGNkzJGPiPkKnuqujiP0wxJI1m05omO7dT0BYYvsjy0zrZe6HWsFPACg0C0731-uS82jxHWYm2mmUVxnDUDLq2lfZVrupFTMv0TYq3F85Gk
This CQ converts the comma-delimited fields from the source into typed fields in a stream that can be consumed by other Striim components. Here, data refers to the array shown above. The number in brackets specifies a field from the array, counting from zero. Thus data[1] is MERCHANT ID, data[4] is DATETIME, data[7] is AUTH AMOUNT, and data[9] is ZIP.

TO_STRING, TO_DATEF, and TO_DOUBLE functions cast the fields as the types to be used in the Output to stream. The DATETIME field from the source is converted to both a dateTime value, used as the event timestamp by the application, and (via the DHOURS function) an integer hourValue, which is used to look up historical hourly averages from the HourlyAveLookup cache.

The other fields are discarded. Thus the first line of data from posdata.csv has at this point been reduced to the following values:

  • D6RJPwyuLXoLqQRQcOcouJ26KGxJSf6hgbu (merchantId)

  • 20130312173210 (DateTime)

  • 17 (hourValue)

  • 2.20 (amount)

  • 41363 (zip)

The CsvToPosDemo query outputs the processed data to PosDataStream which partitions the transformed data:

 D0H4fjVH4ZHcjxzDwwn9pJxYqwzjjkFfsqFk3ibM7Zzgv802JMcVQx4db-rmUSoxPFrvr6tY7A1Z-1prcvFaTvMEQ2i2DCWI96_FDh3j85-1C3bXhgrP6hqnaiMqyzXty60Wxeaua2uvY3QlkbU

PosDataStream assigns the remaining fields the names and data types in the order listed above:

  • PRIMARY ACCOUNT NUMBER to merchantID

  • DATETIME to dateTime

  • the DATETIME substring to hourValue

  • AUTH AMOUNT to amount

  • ZIP to zip

Striim’s Continuous Queries (CQ) are in most respects similar to SQL, except that they are continually running and act on real-time data instead of relational tables. Queries may be used to filter, aggregate, join, enrich, and transform events. A query may have multiple input streams to combine data from multiple sources, windows, caches, and/or WActionStores. To learn more about Striim’s powerful SQL streaming feature, please feel free to reach out to our integration experts to schedule a demo, or try Striim for free here.


0 replies

Be the first to reply!

Reply