Swap With Dell 9300

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, January 15, 2010

Fun with grep and sed

Posted on 12:19 PM by Unknown
At work we have several java files that have javadocs with links that are not hyperlinked with . So I wanted to covert the links to hyperlinks. We wanted to convert only links that start with "Automates ", followed by one or more links that ends with a number. Example "Automates http://something/12345 and http://something/67890 but not http://something/54321". I wanted to do the conversion with one line of a bash command (trying to avoid writing the bash script). While tackling the problem I learnt a few things that I want to share and record here for myself to look back again in future when I forget.

To start with I needed to find all the files containing "Automates http://". I just wanted the filenames containing that string. And so comes grep to the rescue. With -l switch to list just the filenames instead of all the lines that match.

grep -R -l "Automates http://" *

Then it is time to replace the links with <a href=link>link</a> only for those lines containing "Automated http://". For this I want the line number of every line of every file that contained it. Getting the file number and line number is easy with grep. To get the filename use -H switch and to get the line number use -n switch. Here is an example

grep -R -H -n "Automates http://" *

The output of the above command looks something like this

/home/chandanp/temp/temp.java:73:  /** Automates http://something/353571 */

To replace the link with hyperlink we can used sed. All we need is the filename, the line number and the string to replace. And use sed like so

sed -i '936s|\(http.*[0-9]\)|<a href="\1">\1</a>|g' Filename.java

Where 936 is the line number I want to change and Filename.java is the filename I want to edit. The -i option edits the file in place. The more complicated part is the regex matching. Basically anything that matches the regex inside a \( and \) will be stored in a buffer. The buffer number is the number of the matching \(\). So in the example above, the first buffer is the string that matches \(http.*[0-9]\). Which is basically any link that ends with a number. To recall the buffer we use \1. Which means: use the value that matches the first parenthesis pair. So in the sed the replaced string will be <a href="link">link</a>, where link is the string that matches \(http.*[0-9]\). Here is an example of the change


/** Automates <a href="http://something/353571">http://something/353571</a> */

Notice another thing with the way I used sed's replace command above. I used s|match|replace instead of the usual s/match/replace. What many people don't know is that once can use any character after s instead of the usual /. So you could even do s#match#replace too if you want. I used the pipe symbol.

Now that we can replace each individual line of each file we somehow have combine the previous grep output with this sed command. That was tricky. First we need to break up the output of the grep command to individual filename and line numbers and then give that to sed. Well xargs, cut and sed to the rescue. We use the fact that the filename and line number are delimited by : and play some tricks

grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/"

Basically all it says is that take the output from the first grep which prints out the filename containing "Automates http://" and pipe it to xargs which takes the filename and gives it to another grep that prints filename:line_number:matched_sting and pipe that information to cut which prints the first 2 tokens that are delimited by :. We need to do the cut because the matched string also has : which means we don't want sed to use that part of information in the matching. Then we pipe the information from the cut to another sed to print the filename and line number. Here is the output after various pipes

$ grep -R -l "Automates http://" *
temp.java

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {}
temp.java:73:  /** Automates http://something/353571 */
temp.java:936:  /** Automates http://something/336439 and http://something/336438 */

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d:
temp.java:73
temp.java:936

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/filename is \1 and line number is \2/"
filename is temp.java and line number is 73
filename is temp.java and line number is 936

The final piece of puzzle is to make output from the last sed into a command and then run it. So instead an output like filename is temp.java and line number is 73, we just need sed -i '73s|\(http.*[0-9]\)|<a href="\1">\1</a>|g' temp.java. So here is the command to do just that (very complicated with lots of backslashes and quotes but I did not know any better :).

$ grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/sed -i \\\'\2s|\"\\\(\"http.*[0-9]\"\\\)\"|\<a href=\"\\\1\"\>\"\\\1\"\<\/a\>|\\\' \1/"
sed -i \'73s|"\("http.*[0-9]"\)"|<a href="\1">"\1"</a>|\' temp.java
sed -i \'936s|"\("http.*[0-9]"\)"|<a href="\1">"\1"</a>|\' temp.java

Then we need to execute that command using bash. Like so

grep -R -l "Automates http://" * | xargs -I{} grep -H -n "Automates http://" {} | cut -f-2 -d: | sed "s/\(.*\):\(.*\)/sed -i \\\'\2s|\"\\\(\"http.*[0-9]\"\\\)\"|\<a href=\"\\\1\"\>\"\\\1\"\<\/a\>|\\\' \1/" | xargs -I{} bash -v -c "{}"

Ah finally. But there is one problem however. When there are multiple links in the same line, sed matches all of the links and creates a weird output like this:

Automates <a href=http://something/336439 and http://something/336438>http://something/336439 and http://something/336438</a>

I still don't have good solution for that. Since I have just a few of these lines I fixed them quickly using tkdiff. But anyone know how to solve it?
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Installing Boxee on Gentoo: my experience
    You have probably heard of Boxee and Gentoo (come on). I wanted to give Boxee a try, because of all the movies and tv shows that I can watc...
  • InduinoX: Interfacing with LCD
    After my quick Hello World app , I became a bit more adventurous and decided to interface with the LCD screen that was part of the basic ki...
  • InduinoX and wireless relays: Part I
    It has been a while since I received my wireless relay and I finally got some time this weekend to put them to good use. The connections we...
  • IR remotes
    As part of my  home automation project , I also wanted to control my home entertainment system. The controller can be a web interface or an ...
  • InduinoX: IR receiver
    From my previous post , you probably understood how IR remotes work in general. Now lets take a look at how we can read the signals coming f...
  • Attesting General Power of Attorney in SF
    Recently I had to go through the motions of getting a General Power of Attorney (GPA) document attested in San Francisco. I am an Indian by ...
  • Google Chromium (the open source chrome browser) on Gentoo!
    Chromium is not yet complete no matter what I say here. Please do not post bugs or assume this is the final version of chromium. It still la...
  • What Darwin Never Knew: DNA
    I just finished watching one of the best PBS NOVA episodes - "What Darwin Never Knew". It is exceptionally good. I was finally abl...
  • VirtualBox additions
    I got everything working from with in virtual box, except for the additions. I was able to run VBoxLinuxAdditions.run (from the Guest additi...
  • Building a linux gaming PC: Update 1
    Today I received my 1GB memory, the upgrade I require to transform my media center desktop PC into a moderately powerful gaming PC. I am sti...

Categories

  • 555 timer
  • arduino
  • delay circuit
  • electronics
  • gentoo
  • home automation
  • induinox
  • ir emitter
  • ir receiver
  • ir remote
  • kubuntu
  • lcd
  • ldr
  • leds
  • lucid lynx
  • oscilloscope
  • picoscope
  • probots
  • receiver
  • relay
  • relays
  • scope
  • sensors
  • simple labs
  • sony remote protocol
  • transmitter
  • virtualbox
  • virtualization
  • wireless relay

Blog Archive

  • ►  2011 (12)
    • ►  December (1)
    • ►  November (1)
    • ►  October (6)
    • ►  September (4)
  • ▼  2010 (5)
    • ►  May (1)
    • ►  March (1)
    • ▼  January (3)
      • What Darwin Never Knew: DNA
      • Fun with grep and sed
      • Attesting General Power of Attorney in SF
  • ►  2009 (10)
    • ►  December (5)
    • ►  November (3)
    • ►  May (1)
    • ►  April (1)
  • ►  2008 (29)
    • ►  August (1)
    • ►  June (1)
    • ►  May (4)
    • ►  April (1)
    • ►  March (1)
    • ►  February (10)
    • ►  January (11)
  • ►  2007 (7)
    • ►  May (5)
    • ►  April (1)
    • ►  March (1)
  • ►  2006 (8)
    • ►  November (1)
    • ►  October (3)
    • ►  September (2)
    • ►  August (2)
Powered by Blogger.

About Me

Unknown
View my complete profile