Using Awk, Sed, Cut and TR To Cut a Column List for Character Substitution and Nmap Bad Ports List

*BEWARE! a note on terminal characters before you start: apostrophe pairs can be problematic for copy/paste operations between different editors, term types, WordPress etc. The common standard is now UTF8, so make sure you set PuTTY correctly using the Translation option:

This may still paste one of the two apostrophes in the wrong "direction" and fail for sed and awk command lines - it is easy to miss these visually when they don't translate correctly in a paste! e.g.

awk `{print $2}` (i.e with backtics that don't show in the HTML!)

awk '{print $2}'

The first will fail, the second won't (as seen in the Worpress editor but not on the Post page, which translates correctly again!). These pastes were from the same source (this page!), into PuTTY then back here again, before PuTTY was set to UTF8!

I want to separate a list of known Trojan/virus ports to experiment with an Nmap scan, so needed a quick way to cut out just the port numbers from a column listing of some known dodgy ports I found here:

https://www.simovits.com/trojans/trojans.html

As it is in a column format conveniently, linux has some perfect tools to strip these how you want.

The site seems down since yesterday, so here's a list to practice with:

port 2 Death
port 20 Senna Spy FTP server
port 21 Back Construction, Blade Runner, Doly Trojan, Fore, Invisible FTP, Juggernaut 42 , Larva, MotIv FTP, Net Administrator, Senna Spy FTP server, Traitor 21,
WebEx, WinCrash
port 22 Shaft
port 23 Fire HacKer, Tiny Telnet Server - TTS, Truva Atl
port 25 Ajan, Antigen, Email Password Sender - EPS, EPS II, Gip, Gris, Happy99, Hpteam mail, I love you, Kuang2, Magic Horse, MBT (Mail Bombing Trojan),
Moscow Email trojan, Naebi, NewApt worm, ProMail trojan, Shtirlitz, Stealth, Tapiras, Terminator, WinPC, WinSpy
port 31 Agent 31, Hackers Paradise, Masters Paradise
port 41 Deep Throat, Foreplay or Reduced Foreplay
port 48 DRAT
port 50 DRAT

OR get the txt file already done here:

BadPortsBig.txt

As I want the port numbers from the 2^nd column only at this point, I selected and copied all the text from the browser, and pasted it into a text file on the command line:

vi /badportsBIG.txt

(If you don't want to do all the examples below in depth, a summary of the commands used to quickly convert the port columns to CSV text, with no duplicates, is on the Notepad page.)

Press the "I" key in vim for text Insert, then the middle mouse button (or both side buttons) to paste all the text in:

Now you just have to strip out columns 1 and 3 which are nicely space delimited, by choosing only column 2 I love linux's ability for this kind of operation:

awk '{print $2}' /badportsBIG.txt

This gives you a STDOUT on the screen leaving the original file intact.

Just append this output to a new empty file to keep this data:

awk '{print $2}' /badportsBIG.txt > /badportsBIGports.txt

Now you have a text file with just the port numbers in.

From research there appears to be no easy way for Nmap to read port numbers from a file - only hosts/IP adresses etc. - which I find hard to believe but seems so, as the port numbers have to follow the -p switch, then be comma delimited either before or after the target - in the form e.g.:

nmap -p 123,124,125,148 192.168.1.1

(A year later I found the simple answer to that poor assumption of appending file content, using the $(variable) format - linux can manipulate data how you want - if you know what you're doing!)

https://stevepedwards.today/DebianAdmin/experiment-with-pipes-redirection-command-substitution-and-variable-expansion/

This is a pain, because commas now have to be appended between the file's port numbers, and the return delimiters removed so they are all on one big line, so it can be pasted onto Nmap's command line.

How do you insert the commas after each port number, then remove the carriage returns/whitespaces?

Note; you have to think like a programmer here about the order you do things. If you removed the white space first, you would be left with one massive number! You can't put the commas in then without severe pain.

Research the net as usual or check the UNIX: Database Approach book I mentioned a few Posts back.

I originally found that book back in 2001 in New Zealand, as it is big on fundamental programmer's tools like sed, awk, cut, diff and sort for manipulating text at a character level (as databases were written in column and row formats, needing character definitions etc., on the command line only before GUIs back then, as they still can be in SQL/mysql etc. now, but I never practised with them.

Now's the time for a review with an actual real goal at hand.

There are some sed and awk examples for commas here:

https://stackoverflow.com/questions/8714355/bash-turning-multi-line-string-into-single-comma-separated

Let's try with one on the initial file, as this example also splits off the second column, but does the whole required process in one go, provided cat is used to read my file FIRST, so:

awk -vORS=, '{ print $2 }' file.txt | sed 's/,$/\n/'

becomes:

cat /badportsBIG.txt | awk -vORS=, '{ print $2 }' | sed 's/,$/\n/'

which outputs:

Just what is required to paste the port numbers to nmap. That is so cool!

Let's save that STDOUT to a new file:

cat /badportsBIG.txt | awk -vORS=, '{ print $2 }' | sed 's/,$/\n/' > /badportsBIGcommas.txt

Let's just try nmap with those ports on only one host to start:

nmap 192.168.1.1 -p
0,1,2,5,11,16,17,18,19,20,21,22,23,25,27,28,30,31,37,39,41,44,51,52,53,54,66,69,69,70,79,80,80,81,101,102,103,105,107,109,110,111,113,120,121,123,137,137,138,139,143,146,146,166,170,171,200,201,202,211,212,221,222,230,231,232,285,299,334,335,370,400,401,402,411,420,443,445,455,511,513,514,515,520,555,564,589,600,623,635,650,661,666,667,668,669,680,692,700,777,798,808,831,901,902,903,911,956,991,992,999,1000,1001,1005,1008,1010,1011,1012,1015,1016,1020,1024,1025,1025,1026,1026,1027,1028,1028,1029,1029,1030,1031,1031,1032,1032,1033,1034,1035,1036,1037,1039,1041,1042,1042,1043,1044,1044,1047,1049,1052,1053,1054,1080,1081,1082,1083,1092,1095,1097,1098,1099,1104,1111,1111,1115,1116,1116,1122,1122,1133,1150,1151,1160,1166,1167,1170,1180,1183,1183,1200,1201,1207,1208,1212,1215,1218,1219,1221,1222,1234,1243,1245,1255,1256,1272,1313,1314,1349,1369,1386,1415,1433,1441,1492,1524,1560,1561,1600,1601,1602,1703,1711,1772,1772,1777,1826,1833,1834,1835,1836,1837,1905,1911,1966,1967,1978,1981,1983,1984,1985,1985,1986,1991,1999,2000,2000,2001,2001,2002,2002,2004,2005,2023,2060,2080,2101,2115,2130,2140,2140,2149,2150,2156,2222,2222,2281,2283,2300,2311,2330,2331,2332,2333,2334,2335,2336,2337,2338,2339,2339,2343,2345,2407,2418,2555,2565,2583,2589,2600,2702,2702,2772,2773,2774,2800,2929,2983,2989,3000,3006,3024,3031,3119,3128,3129,3131,3150,3150,3215,3215,3292,3295,3333,3333,3410,3417,3418,3456,3459,3505,3700,3721,3723,3777,3791,3800,3801,3945,3996,3996,3997,3999,4000,4092,4128,4128,4156,4201,4210,4211,4225,4242,4315,4321,4414,4442,4444,4445,4447,4449,4451,4488,4567,4653,4666,4700,4836,5000,5001,5002,5005,5011,5025,5031,5032,5050,5135,5150,5151,5152,5155,5221,5250,5321,5333,5350,5377,5400,5401,5402,5418,5419,5419,5430,5450,5503,5534,5550,5555,5555,5556,5557,5569,5650,5669,5679,5695,5696,5697,5742,5802,5873,5880,5882,5882,5888,5888,5889,5933,6000,6006,6267,6400,6521,6526,6556,6661,6666,6666,6667,6667,6669,6670,6697,6711,6712,6713,6714,6715,6718,6723,6766,6766,6767,6767,6771,6776,6838,6891,6912,6969,6970,7000,7001,7007,7020,7030,7119,7215,7274,7290,7291,7300,7301,7306,7307,7308,7312,7410,7424,7424,7597,7626,7648,7673,7676,7677,7718,7722,7777,7788,7788,7789,7800,7826,7850,7878,7879,7979,7983,8011,8012,8012,8080,8090,8090,8097,8100,8110,8111,8127,8127,8130,8131,8301,8302,8311,8322,8329,8488,8489,8489,8685,8732,8734,8787,8811,8812,8821,8848,8864,8888,9000,9090,9117,9148,9301,9325,9329,9400,9401,9536,9561,9563,9870,9872,9873,9874,9875,9876,9877,9878,9879,9919,9999,10000,10000,10001,10002,10003,10008,10012,10013,10067,10067,10084,10084,10085,10086,10100,10100,10167,10167,10498,10520,10528,10607,10666,10887,10889,11000,11011,11050,11051,11111,11223,11225,11225,11660,11718,11831,11977,11978,11980,12000,12310,12321,12321,12345,12345,12346,12348,12349,12361,12362,12363,12623,12623,12624,12631,12684,12754,12904,13000,13013,13014,13028,13079,13370,13371,13500,13753,14194,14285,14286,14287,14500,14501,14502,14503,15000,15092,15104,15206,15207,15210,15382,15432,15485,15486,15486,15500,15512,15551,15695,15845,15852,16057,16484,16514,16514,16515,16515,16523,16660,16712,16761,16959,17166,17449,17499,17500,17569,17593,17777,18753,19191,19216,20000,20001,20002,20005,20023,20034,20331,20432,20433,21212,21544,21554,21579,21957,22115,22222,22223,22456,22554,22783,22784,22785,23000,23001,23005,23006,23023,23032,23321,23432,23456,23476,23476,23477,23777,24000,24289,25002,25002,25123,25555,25685,25686,25799,25885,25982,26274,26681,27160,27184,27184,27373,27374,27379,27444,27573,27665,28218,28431,28678,29104,29292,29559,29589,29589,29891,29999,30000,30001,30005,30100,30101,30102,30103,30103,30133,30303,30331,30464,30700,30947,31320,31320,31335,31336,31337,31337,31338,31338,31339,31339,31340,31340,31382,31415,31416,31416,31557,31745,31785,31787,31788,31789,31789,31790,31791,31791,31792,31887,32000,32001,32100,32418,32791,33270,33333,33545,33567,33568,33577,33777,33911,34312,34313,34324,34343,34444,34555,35000,35555,35600,36794,37237,37651,38741,38742,40071,40308,40412,40421,40422,40423,40425,40426,41337,41666,43720,44014,44014,44444,44575,44767,44767,45092,45454,45632,45673,46666,46666,47017,47262,47698,47785,47785,47891,48004,48006,48512,49000,49683,49683,49698,50000,50021,50130,50505,50551,50552,50766,50829,50829,51234,51966,52365,52901,53001,54283,54320,54321,55165,55555,55665,55666,56565,57163,57341,57785,58134,58339,59211,60000,60001,60008,60068,60411,60551,60552,60666,61115,61337,61348,61440,61603,61746,61746,61747,61747,61748,61979,62011,63485,64101,65000,65289,65421,65422,65432,65432,65530,65535

Starting Nmap 6.40 ( https://nmap.org ) at 2015-08-21 12:52 BST

WARNING: Duplicate port number(s) specified. Are you alert enough to be using Nmap? Have some coffee or Jolt(tm).

Ok, what's it being sarcastic about?

It's right, the list HAS got many duplicate ports on close inspection, though nmap still works, e.g:

80,80,8090,8090,61746,61746 etc.

For now though, before searching for more stripping tools, part of the output from the nmap scan for all these ports across 192.168.1.1-254 gave:

Nmap scan report for 192.168.1.244

Host is up (0.000062s latency).

Not shown: 787 closed ports

PORT STATE SERVICE

22/tcp open ssh

80/tcp open http

139/tcp open netbios-ssn

445/tcp open microsoft-ds

This shows the words closed or filtered in all cases for this scan, (good!) but if you want to find any PCs that have open ports on ANY of these ports (bad!?) for further investigation, then pipe and append grep "open" on the nmap line:

nmap 192.168.1.1-254 -p 0,1,2 | grep open

You can then decide what ports are legit on that list and which are suspect from the output e.g:

21/tcp open ftp

22/tcp open ssh

23/tcp open telnet

80/tcp open http

443/tcp open http

515/tcp open printer

-----

So, what tools can remove those duplicates? Uniq!

e.g.:

This leaves the original file unchanged, so you need to pipe the fix to another file e.g.

cat /aabbcc.txt
a
a
b
b
c
c
d
d

uniq /aabbcc.txt
a
b
c
d

uniq /aabbcc.txt > /abcd.txt

-------

Anyway...I don't want to mess up my good original at this point so I'll back it up:

cp -v /badportsBIGcommas.txt /badportsBIGcommas.bak

This sed line is supposed to remove duplicate words, but maybe only if white space delimited so won't work for comma separated text:

sed -ri 's/(.* )1/1/g' /badportsBIGcommas.bak

bash: syntax error near unexpected token `('

Hmm...doesn't even parse correctly may be due to UTF8 chars in linux or a keymap error for the apostrophes:

Maybe it means this:

sed -ri 's/(.* )1/1/g'

as it completed with the new apostrophes. It did not change anything though, seen by looking for known dupes in cat /badportsBIGcommas.bak e.g. port 80

cat /badportsBIGcommas.bak | grep 80

No change as there are 2 consecutive port 80s showing:

Let's check that command works for white space delimited words from his original example:

vi /abc.txt

This is is how it works buddy
What else else you want

sed -ri 's/(.* )1/1/g' /abc.txt

This is is how it works buddy

What else else you want

Nope.

This turned out to be another monumental search for such a seemingly simple operation! It may have to be done with a script containing many tools?

Best read the info page:

info sed

It may have been better to remove dupes before the comma addition operation too¦

However, it gave a chance to learn some sed format commands from my UNIX:DB book, to see how this stream editor works, and maybe find out why that command above is flawed. Options that can be used with sed include:

i inserts before a line

a appends after a line

c changes line

d deletes line

p prints line STDOUT

q quits after reading up to addressed line

r fname places content of filename

= prints line number addressed

s/s1/s2/ Subtitutes string s1 by string s2

y/s1/s2 Transforms characters in line by mapping each char in s1 with its counterpart in s2

Now you can see what the example above is driving at in that substitution.

sed evolved from earlier Unix text editors, ed, then ex, that use an address space context to id lines, so for the 2 line example in /abc.txt, you can show the whole file like cat would:

HPMint stevee # sed '2q' /abc.txt

This is is how it works buddy

What else else you want

Where q is a quit after outputting number of requested lines, or p to print out.

This is similar to:

HPMint stevee # head -1 /abc.txt

This is is how it works buddy

HPMint stevee # tail -1 /abc.txt

What else else you want

HPMint stevee # more -1 /abc.txt

This is is how it works buddy

--More--(55%)

or the initial cursor position for the line number in vim:

HPMint stevee # vi +1 /abc.txt

But, a major difference is sed outputs its operation and operants to STDOUT by default, so will show the lines to be operated on, as well as the result, so print them again if the p option is used, so you get line doubling for output e.g:

HPMint stevee # sed 'p' /abc.txt

This is is how it works buddy

What else else you want

To negate this, use the -n number switch with no address to id a specific line, so showing the whole file, up to max buffer memory:

HPMint stevee # sed -n 'p' /abc.txt

This is is how it works buddy

What else else you want

The line addressing format shows now in:

HPMint stevee # sed -n '1p' /abc.txt

This is is how it works buddy

HPMint stevee # sed -n '2p' /abc.txt

What else else you want

sed also responds to the NOT command "!" which gives the negative of the above line 1 address, showing line 2 instead or vice versa e.g:

HPMint stevee # sed -n '1!p' /abc.txt

What else else you want

HPMint stevee # sed -n '2!p' /abc.txt

This is is how it works buddy

So, what about the initial example to replace s1 with s2 for the "else" word?

Can it be understood now from the above research? I still need to know the meaning of the "." which I think, from a long time ago, means any single character? So, any single char followed by ANY other combination of none or all chars, because the * is included...?

His example was:

sed -ri 's/(.* )1/1/g' /abc.txt

From the man page:

s/regexp/replacement/

Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

It looks to me like another "programmer" type dude NOT checking his work again...

The subexpression \1-9 mentioned there seems to be what he means, but he left out the backslashes! The g is for a global flag so the substitution affects all instances of the character in the addressed line.

Now it works, but not exactly, as it seems to have removed a "d" from "buddy" also, but first, from no change with the missing slashes, to the desired outcome:

HPMint stevee # sed -ri 's/(.*)1/1/g' /abc.txt

HPMint stevee # cat /abc.txt

This is is how it works buddy

What else else you want

HPMint stevee # sed -ri 's/(.*)\1/\1/g' /abc.txt

HPMint stevee # cat /abc.txt

This is how it works budy

What else you want

A missing d also in buddy! I'll just repeat that to see if it does it again, with a ? to show it's my text:

HPMint stevee # cat /abc.txt

This is is how it works buddy

What else else you want?

HPMint stevee # cat /abc.txt

This is how it works budy

What else you want?

Well, it removed one "else" and one "is", so I assume the \1\1 removes 1 of 2 repeated regular expressions in ANY defined string on all lines due to the global "g".

Seems he could have confined it to a line at a time, say, 2 addressing, from what's been learned so far - ah! like this! (Smartass!)

HPMint stevee # sed -ri '2s/(.*)\1/\1/g' /abc.txt

HPMint stevee # cat /abc.txt

This is how it works buddy

What else you want?

Still, his example was the best I found all day for this type of operation, until reading my book at home, and I have still not found anything this simple to remove numbers in a CSV, single line file, which is what I need here.

So, my nostalgic £5 spent on Sumitabha Das' UNIX:DB book has paid off after all. I hope he gives a number substitution example.

If I need to change the commas to something else first, to get back to a column view so other tools like "tr" may work more easily with lines, another sed example for changing chars in all lines in a file, say a ";" for a "," for the ports file /badports.csv:

sed 's/,/;/g' /badports.csv

this changes all commas to semicolons:

Wish it was that easy for the dupe numbers...

So, I should be able to reverse the very first column to commas example above, by replacing all commas (or semicolons now) with newlines (\n) using that eh?

Yep!

65432

65530

65535

Sorted! As you might say...

Now that is back how it was originally in column format, does uniq work to remove the dupe numbers? Seems so! My test dupe, 61746 from above does not show twice if grepped.

HPMint stevee # cat /badportscolumnreal.txt | uniq | grep 61746

61746

Lets make the file, finally, with no dupe numbers:

cat /badportscolumnreal.txt | uniq > /badportscolumn.txt

Check for only 1 port 80 too:

HPMint stevee # cat /badportscolumn.txt | grep 80

680

Finally, convert this back to a CSV, changing the awk line from above to reflect a single column:

cat /badportscolumn.txt | awk -vORS=, '{ print $1 }' | sed 's/,$/\n/'

Note this leaves a trailing comma:

65432,65530,65535,

Once the extra blank last line I missed in the original file was fixed, there is no comma trail:

65432,65530,65535
HPMint stevee # cat /badportscolumn.txt | awk -vORS=, '{ print $0 }' | sed 's/,$/\n/'

I'm still confused here (it doesn't take much!), as I thought the 2nd arg replaced the first in sed? This appears that newlines should be replacing commas here, but they are not and it's working?

Hmm - look at which stage does what in order - first cat, which gives the column listing:

65432
65530
65535

HPMint stevee # cat /badportscolumn.txt

Then with awk, you get the commas, still with one trailing, even though the original column file last blank line has been fixed - either $0 or $1 will do:

65432,65530,65535,HPMint stevee # cat /badportscolumn.txt | awk -vORS=, '{ print $1 }'

With sed added, it removes the last comma - so that is where the new line is replacing the final comma ONLY:

65000,65289,65421,65422,65432,65530,65535
HPMint stevee # cat /badportscolumn.txt | awk -vORS=, '{ print $1 }' | sed 's/,$/\n/'

Could sed alone have done this column to commas substitution in the first place? It's not been easy to find how because of the different ways Carriage Returns and Line Feeds are handled, but this example placed a comma at each line end:

https://www.canbike.org/information-technology/sed-delete-carriage-returns-and-linefeeds-crlf.html

sed 's/$/,/g' /badportscolumn.txt

So sed combined with cut can do all required from the start, except remove the final comma:

cut -d " " -f2 BadPortsBig.txt | sed 's/$/,/g'

Note this uses the BASH command line operator "$" to signify the "end of line" last character (as ^ is the first line char), i.e. just as both

grep "^" /etc/passwd

grep "$" /etc/passwd

would print any whole file, as every line has to have a first and last character in it, so all lines would be found in both cases!

Anyway, this still leaves the final comma problem. Not a real issue for my current task, as the port list has to be pasted to nmap anyway, as there is no easy way to include the port list after nmap's -p switch.

65422,
65432,
65530,
65535,

So, for pedantic completeness, the 2nd complex loop example works:

where:

:a  - create a label 'a'
N   - append the next line to the pattern space
$!  - if not the last line
ba  - branch (go to) label 'a'
s   - substitute
/\n/     - regex for new line
/<text>/ - with text "<text>"
g   - global match (as many times as it can)

sed ':a;N;$!ba;s/\n/,/g' /badportscolumn.txt

65422,65432,65530,65535

In total then, from the first file, to last removed comma, use:

cut -d " " -f2 BadPortsBig.txt | sed ':a;N;$!ba;s/\n/,/g'

The filter "tr" can do the bulk it the most easily from the start.

man tr

NAME

tr - translate or delete characters

SYNOPSIS

tr [OPTION]... SET1 [SET2]

DESCRIPTION

Translate, squeeze, and/or delete characters from standard input, writ

ing to standard output.

-c, -C, --complement

use the complement of SET1

-d, --delete

delete characters in SET1, do not translate

-s, --squeeze-repeats

replace each input sequence of a repeated character that is

listed in SET1 with a single occurrence of that character

-t, --truncate-set1

first truncate SET1 to length of SET2

--help display this help and exit

--version

output version information and exit

The format is as simple as can be, with the 2nd expression replacing the first:

cat /badportscolumn.txt

65432
65530
65535

tr '\n' ',' < /badportscolumn.txt

61747,61748,61979,62011,63485,64101,65000,65289,65421,65422,65432,65530,65535,

Note the trailing comma, but how easy is that in one go?!

An instant column to CSV file conversion!

To save this output as a file you need to feed STDOUT into text file as usual e.g :

tr '\n' ',' < /badportscolumn.txt > /badportsTR.txt

cat /badportsTR.txt

65530,65535,HPMint stevee # cat /badportsTR.txt

Each first and second argument can have multiple chars separated by the | pipe in each apostrophe pair, each mapping to its position counterpart e.g. a to c and b to d, so c replaces a, and d replaces b :

cat /TRfile.txt
abaabbaaabbb
cdccddcccdd

tr 'a|b' 'c|d' < /TRfile.txt
cdccddcccddd
cdccddcccddd

You see all 2nd arguments - c and d - replaced all first arguments - a and b.

Another example from the man page:

-d, --delete
delete characters in SET1, do not translate

This should just remove the first line of a's and b's - only for 1 set at a time:

cat /TRfile.txt
abaabbaaabbb
cdccddcccddd

HPMint stevee # tr -d 'a|b' < /TRfile.txt

cdccddcccddd

or vice versa for 1 set only:

tr -d 'c|d' < /TRfile.txt
abaabbaaabbb

or more chars:

HPMint stevee #tr -d 'a|b|c' < /TRfile.txt

dddddd

SUMMARY - the one liner's from start to finish for this problem:

cut -d " " -f2 badportsbig.txt | sed ':a;N;$!ba;s/\n/,/g'

awk '{print $2}' badportsbig.txt | sed ':a;N;$!ba;s/\n/,/g'

And to scan the ports:

awk '{print $2}' badportsbig.txt | uniq > ports.txt

nmap -p $(echo `cat ports.txt`)

Filters are a MASSIVE subject, and effectively a programming language in themselves, so take a lot of time and practice to understand. Interesting though. You have to take your hat off to programmers, as much as I love to hate them at times.

O_Reilly_-_sed___awk_2nd_Edition.pdf

Awk as a programming tool using contents of {} as a "script":

awk -F: '{print $0 }' /etc/passwd

Sed as an editor using a search and replace function:

sed 's/:/:/' /etc/passwd