[Ilugc] [OT] need of libraries for exploring websites (Python/PERL)

Ashok Gautham thescriptdevil at gmail.com
Mon Dec 29 23:45:46 IST 2008


On Mon, Dec 29, 2008 at 5:34 PM, kish <realmailer at gmail.com> wrote:

> hi
>
> I wish to download a bunch of files from a site which allows directory
> listing
> with links like
>
> http://38.106.111.111/folder/sub%20folder/sub%20folderl%201/
> zYWIxNDQ2ODliZDhhMTc4NzRjMGQwMzY/
> file%201.zip<http://38.106.111.111/folder/sub%20folder/sub%20folderl%201/zYWIxNDQ2ODliZDhhMTc4NzRjMGQwMzY/file%201.zip>
> .......


> ......
> http://38.106.111.111/folder/sub%20folder/sub%20folderl%205/
> afkafkajfkljlkfjiejjflkjlFJLKJFLAY/
> file%205.zip<http://38.106.111.111/folder/sub%20folder/sub%20folderl%205/afkafkajfkljlkfjiejjflkjlFJLKJFLAY/file%205.zip>
>
> As you can see, the second line of the link changes with every link
> The file name is a combination of sub folder name and .zip
>
I see such links only  on sites that dont want you to do what you
just mentioned you need to do.

eg. Rapidshare uses it to prevent usage of download managers other
than their own. They also use it to prevent resuming downloads.

SDN uses it to prevent someone from repeatedly downloading a file
and using up their bandwidth and for promoting their own Sun
Download Manager that can resume and stop. I can do that with
aria2c as long as the download is resumed within a day or 2/
( I can't think of a different reason why Sun does that)

>
> Could anybody point  me to a library or modules to explore a given
> website to gather all the links in the site?


>
> After which I can apply a filter to extract only those I need and pass
> the list to wget or the like

In either case, you were not meant to download it without human
intervention.

---
Ashok `ScriptDevil` Gautham


More information about the ilugc mailing list