Tech and stuff
Recently I was working on some tests for which I wanted to generate test data out of small CSV files. In order to easily prepare test data and have it in human readable form I wanted it to be nicely tabulated in the code. Quick search on Google returned couple hits to very neat approach that looked something like this.
cat test.csv | column -t -s, | less -S
cat
- sends file content to standard output
column -t -s,
- organizes lines delimited by ,
into columns
less -S
- partially reads input ignoring line overflow wrapping
This will work great most of the time when there are no missing values. If that’s not the case, the result may not be what you expected. Consider this CSV file content:
id,date,amount
1,2019-01-10,20.00
2,2019-01-11,10.50
3,,35.10
4,2019-01-24,15.75
5,,13.25
Using our command above our result will be:
id date amount
1 2019-01-10 20.00
2 2019-01-11 10.50
3 35.10
4 2019-01-24 15.75
5 13.25
If there is a missing value somewhere in the middle, value from the next column shifts back into its place. This is a simple example but with more columns and especially where columns have same data type it would be hard to tell which value in a row is actually missing.
What we have to do here is add extra space character between each delimiter (comma in this case).
Let’s try this improved command:
cat test.csv | sed -e 's/,/ ,/g' | column -t -s, | less -S
sed -e 's/,/ ,/g'
- for each line replaces ","
with " ,"
Result:
id date amount
1 2019-01-10 20.00
2 2019-01-11 10.50
3 35.10
4 2019-01-24 15.75
5 13.25
Yes, much better!