How to differentiate bots and humans

From the perspective of a HTTP server, all it sees are HTTP headers and IP address. The HTTP headers are trivial to forge, and so is the IP address. So, it seems impossible to tell between a user and a bot ( web spider ), right? Not quite. For bots, here I mean both good bots ( that honors robots.txt and identify themselves ) and malicious bots ( doesn’t honor robots.txt and pretend to be human users ).

You cannot tell with absolute certainty a bot and a human, but tricks can be employed to give a higher confidence of a guess.

1. User-Agent header
A browser will send a User-Agent header which will identify themselves to the HTTP server like below ( Firefox ):

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14

A good bot will identify themselves like MSNBot :

MSNBOT/0.1 (http://search.msn.com/msnbot.htm)

But bad bots will pretend to be a browser, so you can’t identify bad bots and humans with this one.

2. Cache
A browser will cache resources such as images and links by default. Layout the same image or link at different closely related pages, poorly written bots will fetch repeatedly the same resource regardless of the link.

3. Invisible links
Links that cannot be clicked or seen by normal means are less likely be reached by the average human. A link like

<a href=”http://youcannotseethislink.com/”></a>

should not get clicked by humans. If some IP really did, it is likely a few of them are real humans, but a more likely guess would be bots scraping the HTML.

4. Regular expression
Since most bots use regular expression to parse web pages, place links in HTML comments.

<!– <a href=”http://www.grepthis.com/”>About me</a> –>

Since few users would open the HTML source and grab the links, it would likely be bots in the first place. You can also put links in places where it should not be, like these :

<img src=”http://thisisnormallink.com” alt=”http://youshouldnevercomehere.com/”></img>

<thetag href=”http://youshouldnevercomehere.com/”></thetag>

Smarter implementations will not fall to this. However, trivial regular expression will. Though, beware of text browser users.

5. Regular expression - hidden typo in link protocols
It is known that some websites try to obfuscate links from common users or bots by writing

hxxp://www.alinktosomething.com/

htp://www.alinktosomething.com/

ttp://www.alinktosomething.com/ ( The japanese guys love this! )

If these links, when hidden, got picked up, could be a sign of regular expression in the bots doing the job.

6. Deep-depth links
Few users will be so eager to browser meaningless links, make a link into another page, then into yet another page, repeat a few more depths. Fill the deep depth pages with no meaningful content, most people will quit a few levels deep, but some bots will dig really deep.

7. Burst of HTTP requests
Humans might refresh a page repeatedly in a short timeframe, but that is only a single page. A human (an IP ) is unlikely to many pages within a short timeframe in general. Take this lightly though, because the users might be behind a proxy. This will not work well for distributed bots, too.

8. Links in Javascript
Since javascript emulation is still not trivial, if you can generate some links with javascript, IP addresses that are not reading these links but a lot of other static HTML links could be bots, because they cannot see the links generated by javascript.

9. robots.txt
There is a robots.txt for webservers to tell (good) bots what not to crawl (or tell bad bots what to crawl =] ). Humans should rarely read this file, this should raise a flag if some IP is fetching this file.

10. robots.txt + unreachable link in other pages
If there is a link that cannot be found anywhere except in robots.txt, since good bots honor robots.txt in general, those who go to the links are even more likely bad bots or hackers. Though, a bot is a likely guess.

11. Browser fingerprinting
Different browsers have their own implementations and order of HTTP headers in a HTTP Request. Say, if Firefox sends 5 headers A B C D E in this order, Internet Explorer might send in A B D E C, and Opera might send in A B E D C. Some bots might not be out of these patterns, and thus likely self-tailored bots.

12. RFC conformance
If web browsers send RFC-conformant HTTP request packets, poorly written bots might not, and this is a good indication of self-tailor bots again. I am not exactly sure about this one. If you know this, please let me know, thank you.

13. IP blocklists
If the IPs are from some public IP blacklist, and is acting a little like a bot, it is safe to assume bot behaviour.

None of them can tell 100% a human and a bot, but these together should give an average OK guess on the identity of the IP address. Currently the above methods are what I can think of, though I am sure a lot of you know much more than listed above. Let me know if you know of other tricks that can identify them.

No Law, Said Chinese Cop

I figure this is not closely related to security, but nevertheless like to let you know what’s going on here. Today I was in one of the streets with more tourists in Shanghai, and took a photo, and obviously a cop wanted to stop that and approaches. Here is the conversation translated into English.

He begins : “Kid, show me the photos.”
I replied : “What did I do that authorizes you to scrutinize my camera?”
He coldly said : “You do not need to know. Show me your photos, and your ID card.”
I replied : “Of which of the law and under what authority allow you such privilege?”
He seriously said : “It is your best interest NOT to know about which particular law. If you do know about it, you will be in serious trouble. You should not ask any further.”
I replied : “So, there’s no law here you’re talking about?”
He gravely replied : “Kid, I am telling you again, there is NO law. And even if you know what is the law we are talking about it, I WILL guarantee that you will regret deeply about it, and there is nothing you can do about it even if you know anything about it. So, are you going to pursue your question? I am telling you the LAST time, show me the photos and the ID card NOW.

I know he’s not kidding, because he’s a Chinese cop, and I know he is going to frame me whatever without fear. As a smart intelligent being, I know there is no way to win the dignity, so I showed him the photos and ID card, and of course there is nothing he want. He let me go shortly after a few series of warning. I felt angry, but I remember this is China, a place where law works even more unexpected than in other places such as Hong Kong.

So, whoever in China, be very careful of what you are doing and saying in China. The bar is invisible and can zap you anytime.

URL Redirection Attack With Examples

A URL Redirection is to bring the browser from one URL to another URL. For example, if a link at

http://www.example.com/login.php?redirect=
http://www.example.com/home.php

brings you to

http://www.example.com/home.php

This is a URL Redirection.

A URL Redirection Attack is a kind of vulnerability that redirects you to another page freely out of the original website when accessed, usually integrated with a phishing attack.

http://www.example.com/login.php?redirect=
http://www.examp1e.com/home.php

and on clicking it will bring you to

http://www.examp1e.com/home.php

This page could lead to a malicious page that resembles the original, and tries to trick the user into giving their credentials. Notice the “l” and “1″, which can catch some unwary users off-guard. This is a URL redirection attack.

Examples

- Yahoo!

This is a Yahoo! ad link I randomly picked to wherever at the Yahoo! main page :

http://us.ard.yahoo.com/SIG=152qjujd5/M=635447.12008473.12439042.9413843/
D=yahoo_top/S=2716149:MKP1/_ylt=Aq5b314JJAcHbNKRSsc_Nc71cSkA;
_ylg=X3oDMTA1NnVjODhvBGNjA2pw/Y=YAHOO/
EXP=1214212570/L=tPTX3ES00lum1O1VR5SP5Mdozy5cEEhfTboADWZ_/
B=0nxmEEWTWU0-/J=1214205370890230/
A=4758808/R=0/SIG=13r3d7ici/*

http://autos.yahoo.com/newcars/buy.html
;
_ylc=X3oDMTFjMXJjcHYxBF9TAzI3MTYxNDkEc2VjA2Zw
LW1hcmtldHBsYWNlBHNsawNnYXEtdGV4dC0x

Try change the red part as shown :

http://us.ard.yahoo.com/SIG=152qjujd5/M=635447.12008473.12439042.9413843/
D=yahoo_top/S=2716149:MKP1/_ylt=Aq5b314JJAcHbNKRSsc_Nc71cSkA;
_ylg=X3oDMTA1NnVjODhvBGNjA2pw/Y=YAHOO/
EXP=1214212570/L=tPTX3ES00lum1O1VR5SP5Mdozy5cEEhfTboADWZ_/
B=0nxmEEWTWU0-/J=1214205370890230/
A=4758808/R=0/SIG=13r3d7ici/

*http://1089059683/#http://autos.yahoo.com/newcars/buy.html
;
_ylc=X3oDMTFjMXJjcHYxBF9TAzI3MTYxNDkEc2VjA2
ZwLW1hcmtldHBsYWNlBHNsawNnYXEtdGV4dC0x

It will effectively brings you to Google main page. You might argue it is already suspicious for anyone to click that link, despite the authentic domain : http://us.ard.yahoo.com/ .

- Baidu

One more example, a very well-known search engine in China, Baidu. On its main page, it has a “Set as homepage” function :

http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com

which redirects you to its main page after clicking it.

Now, change the red part into below :

http://utility.baidu.com/traf/click.php?id=215&url=http://log0.wordpress.com

This will bring you back here! ( Oh I’m sorry =) )

Just imagine if this happens to some other larger site, it can be used to phish users personal information or redirect to malicious sites exploiting browser vulnerabilities infecting them.

Prevention

If you search for “Log0” in Yahoo!, you will find me in rank 3 ( as of now ). Yahoo! redirects you to me through this :

http://rds.yahoo.com/_ylt=A0oGkmQGTV9IAQUBC0hXNyoA;
_ylu=X3oDMTEyMWJuc2o5BHNlYwNzcg
Rwb3MDMwRjb2xvA3NrMQR2dGlkA0gxMzlfNzI-/
SIG=11eshlhg8/EXP=1214291590/**
http%3a//log0.wordpress.com/

Long link, notice http://log0.wordpress.com at the end. Now let’s change that link into www.google.com :

http://rds.yahoo.com/_ylt=A0oGkmQGTV9IAQUBC0hXNyoA;
_ylu=X3oDMTEyMWJuc2o5BHNlYwNzcg
Rwb3MDMwRjb2xvA3NrMQR2dGlkA0gxMzlfNzI-/
SIG=11eshlhg8/EXP=1214291590/**
http%3a//www.google.com/

This will bring you to a 403 Forbidden in Yahoo! and warns you of the destination. In this case, Yahoo! checked the link if it matches that in database. That is a protection. Basically, it verifies whether or not the redirected destination is its original intent.

By now I hope you are familiar with what URL Redirection Attack is, and should have an idea how to prevent it.

Please do not use this for malicious purpose, I show these examples only for educational purposes. I have notified the above domain masters of the problems above.

UPDATED :

I forgot to explain how the Google link went into it. That is URL obfuscation. More at here.

http://1089059683/ is actually the IP of a google web server.

http://64.233.187.99/ = http://www.google.com
64 * 256 + 233 = * 256 + 187 = * 256 + 99 = http://1089059683/

Who Cares Beyond The Great Firewall

If you have read George Orwell’s 1984, you will know how history gets erased, or never existed. The point is that if people can never reference or see it, or locate any concrete information, they think it doesn’t exist. If you tell them, most people aren’t that skeptical enough to go verify it. And what for if they are to verify?

If you and almost everyone beside you have never been to Xanga.com, and it got blocked, do you care? You don’t, and because no one care, it goes unnoticed. Blogs? There goes Sina blogs, CSDN blogs, etc. Why Xanga? There are plenty of substitutes, few care about Xanga, Blogspot and WordPress.

On the other hand, many people here are being new to the Internet, but kind of aware of the perils the next corner. Now, if the Government tells you that they are protecting you from the perils of the Internet, and your kids from anything malicious. If they have such glorious claims, can anyone IN China refute? Of course, they never did, as far as I know. With such glorious claims, one can do in the name of virtue anything malicious, they have no problems justifying it at least to her people ( if they even need to. ) should that day come.

Tell them websites that do not exist? Yahoo News? We got Sina News. Ebay? We got our Taobao. Youtube? We got our Tudou. Anobii? We got our Douban. Google? We got our Baidu.

You can bypass The Great Firewall, but most of the people do not know how. Proxy? Are you expecting most people know what they are and how to find them? Not to mention you can get in serious trouble for doing so.

Now, love it or not. The government is winning the game for now.

Blended Threat - Safari Carpet Bomb and More

It has been two weeks since the announcement of the Safari carpet bombing. In case you do not know, Safari has a very nice feature that allows web servers to put arbitrary files into your local computer without consent, and, according to Apple, that is by design.

Neat! Remote file management!

What harm could it bring to have junk files on our local system? We users have a lot of junk, don’t we? Three things that can be summarized :

1. A fake executable so unwary users click on it.
2. A malicious PDF like with vulnerability MS07-061.
3. Open IE7 or IE8 ( or possibly other software, more to that. ) and then get 0wN3d immediately.

Nitesh Dhanjani wrote about the carpet bomb. The core idea is that Safari will download the file to the default download location if it cannot handle the file type (Content-type). Internet Explorer and Firefox prompts the user in the same situation.

Now, add this with Aviv Raff’s findings on DLL-Hijack of IE7 ( and IE8 beta 1 ), they make an unaware download and an auto-executed file. The core idea is that IE7 ( IE 8 beta 1 ) uses the LoadLibrary API in Windows. The internal implementation is that it loads the first DLL found.

According to MSDN,

If lpFileName does not include a path and there is more than one loaded module with the same base name and extension, the function returns a handle to the module that was loaded first.

More to the DLL search order here.

Knowing this, and because IE does not load the dlls with absolute path and sign their DLL with public key, you can put the malicious dll with the same name in a place where IE will search earlier than the real path, then you get your private DLL loaded in place of the real one. That’s a DLL Hijack. Start IE7/8 beta1, you are 0wN3d.

Technical details, proof of concepts and credits to Liu Die Yu who pieced the puzzles together, and of course ultimately to the original vulnerability finders of Nitesh Dhanjani and Aviv Raff.

But wait. Like I said, LoadLibrary is an API of Windows, so it is not the problem of IE. Which means, for any application that uses a relative path and unsigned DLL is going to suffer the same fate of IE. Yes, you can’t really do anything with that for an arbitrary software you have. But think again, if someone can put a file into your computer without your consent, you are actually in great danger already.

To mitigate the Safari Carpet Bombing, follow this advisory.

But, again, what if the downloaded file is not dll? It can be a nasty MS07-061 remote arbitrary code execution in form of a PDF file. With IE7 installed, the shell32 will make way for the exploit as well. Sometimes with a lot of things on the desktop, the average person might not remember what is the file, and then open it …ouch! Get patched with KB943460.

Even more, if there is a nice .url icon with a familiar icon ( but with an extra arrow ) but pointing to a nasty URL…? You aren’t much better with this ending either. You can only be wary of what you click.

I really consider this Safari threat a nasty one. And it is a good thing I am not using it, and I for one don’t want to be 0wN3d, do you?

The Wary Almost Tricked

Let me share an embarrassing experience happened recently.

A few days ago, I check this apathy plagued blog for no activity. To my surprise, I found several comments in the Akismet spam queue. Being skeptical, I wonder if any legitimate readers get blocked. For a deserted blog, my dear readers and their comments are even much more valuable. Some look very fake, but some are almost sound - complimentary, authentic and HUMAN.

“He got some really great stuffs there at http://log0.wordpress.com. Pingback at http://www.thegeekyblog.com/ “.

Ouch, I almost thought it is real and want to click “Not spam!”.

“Gotcha!” ??

No. Not really, I am almost clicking it though. I’m human too, that’s why such phrases could have hit any psychological weaknesses. Yeaaa I write to be read, not to be deserted. Ahh! There you go that trap.

You see, although I don’t work with phishing and spamming for a living, I play security for hobby, and I consider myself more alert than the average john doe. I won’t assert that I am anything close to the experts like RSnake and Jeremiah Grossman. However, there are times that even such awareness might give in for a mistake. And this is only one, and traps are out there all time. And yes, I have eaten an IM virus by a careless click. Ouch.

Now I am thinking, if my user-awareness can only do as good as to avoid most of the spamming and phishing, but still susceptible to a few in some several hundred thousand ( or less? ), how much better can the rest of the non-geeky people do? Now think about China, this developing place, there is still even more uneducated people.

I really can’t hope my mum to grasp what is Cross Site Request Forgery.

Trying to Automate Hacking

There was a new security paper addressed “Automatic patch-based exploit generation“. Basically, the proposed solution in the paper takes two versions of a file, one vulnerable and one fixed, then generate an input which fails in the vulnerable and passes in the fixed version.

According to Errata Security, the steps to work out an exploit from a patched and an unpatched version of a file goes in this form in general :
1. Find out the differences between the two versions.
2. Find an input such that the unpatched code fails and the patched code passes.
3. Find out how to reach that vulnerable code.
4. Work it out so the shellcode gets executed.

The paper proposes automatic approach to the 2nd step. Although it is only part of the process, it is still a step forward to automating the process. First to state, I’m not into this sort of thing, but I have listened to a Blue Hat session on this. From what I know, the 3rd step is usually very hard as the vulnerable might be some process you never heard of, or at places you have no idea how to invoke. Yes, remember that strange API and then now it is even buried down in some weird code path, it can be very hard.

Some bashes the paper is only a small step, but then most things start unimpressive. However, if the other steps are going automation with reasonable success, it will probably become just another easy tool like virus kit generators and so in time. Although it is only part of the exploit generation process, claiming “it’s only part of it” and then ignore it is kind of dangerous for that reason. Thus, it is worth taking note of this.

Hacker Safe

McAfee offers a service call “Hacker Safe” to audit client’s website daily for security issues. That they said : “not only increase sales by increasing shopper confidence, you build your brand with the security seal seen on more top sites than any other.”, yet with such high profile declaration and then getting hacked many times, they get slapped hard on the face.

“Hacker Safe”, and I presume it means “safe from all means of hackers” rather than “safe from most means of hackers” ( then I’m not safe! ). Claiming having no vulnerabilities in a piece of software is no different than claming there are no bugs in a non-trivial software — after all, vulnerabilities are just security bugs. What a misconception and misleading marketing strategy to fool users.

The problem is how many people believe it is really completely secure? The sad part could be it is giving the very wrong impression of the SSL certificate : “Having the green lock is the sign of safety! 128-bit encryption!”. I’m not kidding, I’ve actually met users who told me so.

Jeez. We all better be careful, the web is very dangerous.

Antivirus Scanners Never Out of Stock

The average people usually do not scan the computers, although they know some IM viruses around. How long have you been scanning? Computer software shops and repair shops get a spike in their businesses during global virus outbreaks. And that is when antivirus scanners get out-of-stock.

Out-of-stock.

Ahhh. Cool. Good for businesses, right? Well that’s one thing, and we got the other. Since most people won’t scan their computers despite the computer looks infected, or thought they are free of viruses so they don’t need a scan, a global virus outbreak can just send everyone cleaning that popular virus. Good - along with other malware they do not know.

This is a good thing.

It is no more noisy attacks nowadays, but hit-and-run with utmost discretion. It is unlikely such wondrous event to occur again.

Lives made easier again

There are quite a number of custom tools that are built to make reconnaissance easier. And moreover, they are packed into tool sets readily to be tested. One very well-known is a linux distro Backtrack where you can find hundreds of tools packed together ready for use, including the nmap, fierce, nitko, to name a few.

So these are more of the network and host level. For our internet, we also have our favourite Google, which is a very good place to start collecting information. With combination of the commands like “inurl:” stuffs you can really get some fruitful results out. Google search can really do a LOT. After Google, life has been easier for hackers.

Cult of the Dead Cow ( CDC ) just released Goolag not long ago that leverage Google’s capabilities to do automated reconnaissance, and I checked that the tool is rather comprehensive. Moreover, by just learning how the scripts work provides great insight for the interested minds. But then, this tool if leveraged against us, can be quite scary, given its convenience. Remember the Amazon datamining incident years ago, the author datamined the locations and reading preferences of 260,000 users on Amazon with ease, and even pinpointed some of their location. Well, with some work you can do alike in other places, maybe like the relationship of some people?

Ahh, let’s becareful of what we are publishing on the internet.