Bigbash It! - Convert SQL queries into bash scripts

1. Create a table

Every set of csv-files that you want to query using SELECT needs to be defined as a table as in a normal sql db. Create a table by using CREATE TABLE that reassembles your csv-file(s) structure. For example if your csv-file "persons.csv" has 5 columns (id, name, street, city, country) then use
CREATE TABLE persons (id, name, street, city, country);
You can add data types like INT or TEXT as well as the UNIQUE attribute which can speed up the resulting bash script and ensures correct sorting.

2. Map table to files

After creating a table use the special MAP command to define the file(s) that should be mapped to it. It is possible to select a range of files using wildcard operators as well as an arbitrary bash command where the output lines are used as table rows. For instance, to map a set of gzip'ed compressed csv files to our person table write MAP persons TO 'persons_*.csv.gz' DELIMITER ',' TYPE 'GZ' REMOVEHEADER;
REMOVEHEADER denotes that the first line in every file is a header and should be ignored.

3. The select query

You query this table using a standard SQL select command. To get persons living in Berlin sorted by name, enter SELECT name, street FROM persons WHERE city = 'Berlin' ORDER BY name;

4. Compile to a bash script

If you have put these three lines in the editor and hit "Create Bash script" you should get a resulting bash script like this one, which cou can directly run in your command line to get query result:

(trap "kill 0" SIGINT; export LC_ALL=C; find persons_*.csv.gz -print0
| xargs -0 -i sh -c "gzip -dc {} | tail -n +2"|tr $',' $';'|cut -d $';' -f2,3,4
|awk -F ';' '($3 == "Berlin") {print}'|sort -t$';'  -k 1,1|awk -F ';' '{print $1";"$2}')

Look at the examples below for more complex queries with joins and groups. You can find more documentation on the Github page.

BigBash It! - Convert SQL queries into bash scripts

1. Create a table

2. Map table to files

3. The select query

4. Compile to a bash script

More

SQL Input