• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • 🏆 Hive's 6th HD Modeling Contest: Mechanical is now open! Design and model a mechanical creature, mechanized animal, a futuristic robotic being, or anything else your imagination can tinker with! 📅 Submissions close on June 30, 2024. Don't miss this opportunity to let your creativity shine! Enter now and show us your mechanical masterpiece! 🔗 Click here to enter!

How do normal people crawl the web?

Status
Not open for further replies.
Level 15
Joined
Aug 7, 2013
Messages
1,337
Hi,

So suppose I want to monitor a site and extract information from it 24/7.

What I would do is write a Python script to do this, and then run it while I'm using my computer. But I'm not willing to leave my computer on 24/7 (bad for its life/hardware).

I am guessing most people who crawl the web / mine it for data work for companies and have access to servers/computers which can run code 24/7.

Those resources I don't have access to.
 
Level 15
Joined
Aug 7, 2013
Messages
1,337
I suddenly feel I a bit accused (why is there any negative stigma to web crawling in the first place?).

I was trying to convince the folks at makemehost.com to add censorship to the game names, since some people put up some very profane stuff on the game's list. So any games which had profane names wouldn't be hosted or would be instantly unhosted, because when people are naming their games such, they aren't actually using the service as intended (i.e. playing WC3 custom maps). That means the bots are being wasted on those users.

When I talked to one of the admins via email, the admin said he/she needed proof that people are occasionally making games with profane names.

So my plan was to simply monitor the gamelist 24/7 and gather a list of all such game names over some period of time. Then present this to the admin, who would hopefully include some censorship technology into the hosting process as a response.
 
Level 15
Joined
Aug 7, 2013
Messages
1,337
I would think the same thing. But he wanted evidence from me. I guess they are too lazy to check themselves?

But this is getting off topic. OP asks how to keep code running 24/7 if my own machine won't be on 24/7.
 
Level 15
Joined
Mar 9, 2008
Messages
2,174
No code will run if your machine isn't on, simple as that. I don't know any actual web crawling software, but you should consider that if that admin wanted to deal with that problem he would have done so, without asking for evidence.
 

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,232
It is obvious that makemehost are a pretty childish organization the fact they are too stupid to look at their own logs for evidence, or at least make them public/send you a copy so you can do it for them.

That said I personally do not have anything against profane names, after all it is only words. Advertisements on the other hand should definitely be removed as they are used purely for commercial gain.
 
Level 15
Joined
Aug 7, 2013
Messages
1,337
It is obvious that makemehost are a pretty childish organization the fact they are too stupid to look at their own logs for evidence, or at least make them public/send you a copy so you can do it for them.

That said I personally do not have anything against profane names, after all it is only words. Advertisements on the other hand should definitely be removed as they are used purely for commercial gain.

It's more than just profanity. The fact is when people are hosting games with profane names, I bet at least 90% of the time they aren't actually playing any games or have any intention of using the free services of makemehost.com the way they are meant to be used.

Since there are only a limited number of bots, they should not be wasted on people who aren't there to actually play games.

So I am implying that games with profane names are just "spam" and should not be tolerated. Obviously this isn't true 100% of the time, but I would think it would be true more than half the time from my own personal observations.

Additionally, I do not think it professional at all to allow the gamelist to be polluted.

But the primary reason would be to reduce the spam on the gamelist, which I think is a worthy cause.

Are they actually offensive or just "bad words" like "fuck" and "shit"?

Sometimes, other times they can be quite perverted in what they say. Those ones sometimes don't include "dictionary" profane words, and would be probably take some n-gram analysis to classify as spam with high probability.
 
Status
Not open for further replies.
Top