Scope of this guide

This is a guide that shows typical queries using RAW. It assumes basic familiarity with the RAW User Interface (see this guide for information on the user interface). For more information and documentation on RAW please visit http://test.raw-labs.com/docs.

Parsing machine Logs

This example shows how to transform an unstructured machine log into a form that can be queried.

We use the file "machine_log.log", which can be found the S3 tutorial bucket: "raw-tutorial". Start by registering the files in RAW (see this guide for information on how to register files).

Then execute the query:

SELECT * FROM machine_log
PARSE AS r"(\\d+/\\d+/\\d+ \\d+:\\d+:\\d+) (\\w+):? (.*)"
INTO (timestamp: _1, debug_level: _2, message: _3)

Note that:

  • The "PARSE AS" takes a single line of unstructured text and parses it into multiple fields using a regular expression;
  • The "INTO" is used to rename the default field names of groups in the regular expression from _1, _2, etc to more meaningful names.

Joining a CSV and a JSON file

This examples shows how to join a CSV and a JSON file.

We use the files "airports.csv" and "trips.json" files that can be found in the S3 bucket "raw-tutorial".
Start by registering the files in RAW (see this guide for information on how to register files).

Then execute the query:

SELECT t.reason, t.dates, start, end
FROM trips t, airports start, airports end
WHERE t.origin=start.IATA_FAA
AND t.destination=end.IATA_FAA

Note that:

  • The result contains a nested structure, where "start" and "end" fields contain the corresponding information on the airport.

Alternative, to obtain a flat structure (i.e. in tabular form), then execute the query:

SELECT t.reason, t.dates,
       start.Name AS start_name,
       end.Name AS end_name
FROM trips t, airports start, airports end
WHERE t.origin=start.IATA_FAA
AND   t.destination=end.IATA_FAA