Tweaking uniq -c
By Bob Mesibov, published 28/08/2017 in Tutorials
I often sort lists on the command line and get the frequencies of listed items with the command chain sort | uniq -c. For example, suppose I have a list of "brown" surnames in a file called demo:
OK, so there are 7 Brauns in the list, 3 Brouns, and so on. But notice the indentation of the numbers? The default behaviour of uniq is to right-justify the frequency in a line 7 spaces wide, then separate the frequency from the item with a single space. And you can't change that behaviour without re-coding uniq.
For my purposes I'd really prefer to have each frequency left-justified starting at the beginning of the line, then a tab character, then the item, like this:
How did I do that? With an alias, uniqc, added to my .bashrc file:
alias uniqc="uniq -c | sed 's/^[ ]*//;s/ /\t/'"
The alias starts with uniq -c and pipes the output to a sed command. The first part of the command,
's/^[ ]*//
', deletes the leading spaces in the line. The second part, 's/ /\t/
', replaces the first space in the now-edited line (the space just after the leading number) with a tab character.