[prev in list] [next in list] [prev in thread] [next in thread] 

List:       httpclient-users
Subject:    Re: HTTPClient - HTTP Gets broken with there is a #anchor in the
From:       Oleg Kalnichevski <olegk () apache ! org>
Date:       2011-10-24 13:31:36
Message-ID: 1319463096.2227.8.camel () ubuntu
[Download RAW message or body]

On Mon, 2011-10-24 at 14:15 +1100, Jack Hatch wrote:
> Hey all,
> 
> Bit of a weird one. I'm using HTTPClient 4.1.2, and it seems that whenever
> it finds are URL with something like a '#' in it, it does a full get with
> the # in the URL.
> 
> For example, trying to get the URL http://stks.co/eWt will redirect to the
> URL
> http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter.
>  Now this URL is live, but the problem is the HTTPClient sends a get request
> with the URI set to URI:
> /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitterwhich
>  causes the server to send back a 404 page not found.
> 
> Looking at the GET sent by IE, Firefox and cURL, they all strip out the #...
> from the end of the URI, so for example the cURL GET request URI is set as
> URI: /2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/ -
> all the #... have been removed. This is for the exact same entry URL of
> http://stks.co/eWt.
> 
> As a test, sending this raw URL into HTTPClient (i.e. HttpGet httpget = new
> HttpGet("
> http://news.ichinastock.com/2011/10/jack-ma-alibaba-has-prepared-20-billion-to-acquire-yahoo/#.Tpw-xG61XjU.twitter
>  ");) gives the same 404 not found result.
> The issue is I dont know if the url has an #anchor in it, as it from a short
> URL service...
> 
> So the question is are there any settings in HTTPClient that can be set so
> that things like the trailing #... can be auto removed from URLs. Or how
> would I go about manually removing this from URLs (remember that I would
> need to capture all redirect URLs as well)?
> 
> Cheers!

You can use a custom RedirectStrategy and reformat / modify redirect
locations as you see fit. Most likely all you need is to subclass the
DefaultRedirectStrategy and override its #createLocationURI method.

Oleg




---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic