How to Create a Web Spy with a PHP Crawler

Posted in PHP, Tutorials3 years ago • Written by 52 Comments

Crawler, spider, bot, or whatever you want to call it, is a program that automatically gets and processes data from sites, for many uses.

Google, for example, indexes and ranks pages automatically via powerful spiders, crawlers and bots. We have also link checkers, HTML validators, automated optimizations, and web spies. Yeah, web spies. This is what we will be doing now.

Actually I don’t know if this is a common term, or if its ever been used before, but I think it perfectly describes this kind of application. The main goal here is to create a software that monitors the prices of your competitors so you can always be up to date with market changes.

You might think “Well, it is useless to me. You know, I’m a freelancer, I don’t have to deal with this ‘price comparison’ thing.” Don’t worry, you are right. But you may have customers that have a lot of competitors they want to watch closely. So you can always offer this as a “plus” service (feel free to charge for it, I’ll be glad to know that), and learn a little about this process.

So, let’s rock!

1 – Requirements

  • PHP Server with linux – We need to use crontab here, so it is better to get a good online server
  • MYSQL – We will store data with it, so you will need a database

2 – Basic crawling

We will start by trying a basic crawling function: get some data. Let’s say that I sell shoes, and Zappos is my competitor (just dreaming about it). The first product I want to monitor is a beautiful pair of  Nike Free Run+. We will use now fopen to open the page, fgets to read each line of the page and feof to check when we need to finish the reading. At this time, you need to have fopen enabled in your server (you can check it via phpinfo ). Our first piece of code will be:

<?php
	if(!$fp = fopen("http://www.zappos.com/nike-free-run-black-victory-green-anthracite-white?zlfid=111" ,"r" )) {
		return false;
	} //our fopen is right, so let's go
	$content = "";

	while(!feof($fp)) { //while it is not the last line, we will add the current line to our $content
		$content .= fgets($fp, 1024);
	}
	fclose($fp); //we are done here, don't need the main source anymore
?>

At this point, if you echo the $content you will notice that it has all page contents without any CSS or JS, because on zappos site they are all with relative paths.

Now we have the content, we need to process the product price.

How do you know the difference between price and  other ordinary data in our page? Well, it is easy to notice that all prices must have a “$” before them, so what we will do is get all data and run a Regular Expression to see which prices where we have a dollar sign,  we have on that page.

But our regular expression will match every price on the page. Since Zappos is a good friend of spies, it has made the “official” price as the first, always. The others are just used in JavaScript, so we can ignore them.

Our REGEX and price output will be something like this:

<?php
//our fopen, fgets here

//our magic regex here
	preg_match_all("/([$][0-9]*[,]*[.][0-9]{2})/", $content, $prices, PREG_SET_ORDER);
	echo $prices[0][0]."<br />";
?>

Wow, we now have the price. Don’t forget the other prices, we will need them if Zappos changes something in their site.

3 – Save data in MYSQL

Let’s prepare our DB to receive this data. Let’s create a table called zappos. Inside of it we will have four columns:

  • ID - Primary key on this table
  • Date - When data was stored. It’s good to store this so you can do some reports.
  • Value - Value that you’ve found
  • Other_Values - Values that aren’t what you want, but it’s important to store them so if the site owner changes the code you have a “backup” of the possible values

In my phpmyadmin I’ve created a database called spy, and inside it my table zappos, this way:

CREATE TABLE IF NOT EXISTS `zappos` (
  `ID` int(5) NOT NULL AUTO_INCREMENT,
  `Date` date NOT NULL,
  `Value` float NOT NULL,
  `Other_Values` char(100) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

Once you’ve created your table, we will start adding some data. So we will need to do a mysql connect in our PHP and prepare our prices to be saved.

Since all our data is not perfect floats, we need to prepare it so we will have just numbers and a dot.
To connect in our db we will use mysql_connect, and after we will use mysql_select_db to select “spy” and then we can do our mysql_query to save or get our data.

<?php

//preparing to save all other prices that isn't our "official" price
	$otherValues = "";
	foreach ($prices as $price) {
		$otherValues .= str_replace( array("$", ",", " "), '', $price[0]); //we need to save it as "float" value, without string stuff like spaces, commas and anything else you have just remove here
		$otherValues .= ","; //so we can separate each value via explode when we need
	}

//if someday Zappos changes his order (or you change the site you want to spy), just change here
	$mainPrice = str_replace( array("$", ",", " "), '', $prices[0][0]);

//lets save our date in format YYYY-MM-DD
	$date = date('Y\-m\-d');

	$dbhost  = 'localhost';
	$dbuser  = 'root';
	$dbpass  = '';
	$dbname  = "spy";
	$dbtable = "zappos";

	$conn = mysql_connect($dbhost, $dbuser, $dbpass)
		or die ('Error connecting to mysql');
		echo "<br />Connected to MySQL<br />";

		$selected = mysql_select_db($dbname)
			or die( mysql_error() );
			echo "Connected to Database<br />";

			//save data
			$insert = mysql_query("
						INSERT INTO `$dbname`.`$dbtable` (
							`ID` ,
							`Date` ,
							`Value` ,
							`Other_values`
						)
						VALUES (
							NULL , '$date', '$mainPrice', '$otherValues'
						);
					");
			//get data
			$results = mysql_query("SELECT * FROM $dbtable");

	mysql_close($conn);

//all data comes as MYSQL resources, so we need to prepare it to be shown
	while($row = mysql_fetch_array($results, MYSQL_ASSOC)) {
		echo "ID :{$row['ID']} " .
			 "Date : {$row['Date']} " .
			 "Value : {$row['Value']}";
		echo "<br />";
	}

?>

4 – Smarter spy with Crontab

Well, with crontab we can schedule some tasks in our (linux) system so it runs automatically. It is useful for backup routines, site optimizing routines and many more things that you just don’t want to do manually.

Since our crawler needs some fresh data, we will create a cron job that runs every day at 1am. On net.tuts+ we have a really good tutorial on how to schedule tasks with cron, so if you aren’t too familiar with it, feel free to check it out.

In short, we have command lines that we could use for it, (second is my favorite):

#here we load php and get the physical address of the file
#0 2 * * * says that it should run in minute zero, hour two, any day of month, any month and any day of week
0 2 * * * /usr/bin/php /www/virtual/username/cron.php > /dev/null 2>&1

#my favorite, with wget the page is processed as it were loaded in a common browser
0 2 * * * wget http://whereismycronjob/cron.php

5 – Let’s do some pretty charts

If you are planning to use this data, just a db record won’t be too useful. So after all this work we need to present it in a sexier way.

Almost all our jobs here will be done by the gvChart jQuery plugin. It gets all our data from tables and make some cool charts out of it. What we have to do actually is print our results as a table, so it can be used by gvChart. Our code this time will be (download our demo for more info!):

<?php
	$dbhost  = 'localhost';
	$dbuser  = 'root';
	$dbpass  = '';
	$dbname  = "spy";
	$dbtable = "zappos";

	$conn = mysql_connect($dbhost, $dbuser, $dbpass)
		or die ('Error connecting to mysql');

		$selected = mysql_select_db($dbname)
			or die( mysql_error() );

			//get data
			$results = mysql_query("SELECT * FROM $dbtable ORDER BY `ID` DESC LIMIT 15");

			mysql_close($conn);

			$dates  = array();
			$values = array();
			while($row = mysql_fetch_array($results, MYSQL_ASSOC)) {
				$dates[] = "{$row['Date']}";
				$values[] = "{$row['Value']}";
			}

			echo "<table id='real'>";
				echo "<caption>Real Prices on Zappos.com</caption>";
				echo "<thead>";
					echo "<tr>";
						echo "<th></th>";
						foreach($dates as $date) {
							$date = explode('-', $date);
							echo "<th>" . $date[2] . "</th>";
						}
					echo "</tr>";
				echo "</thead>";
				echo "<tbody>";
					echo "<tr>";
						echo "<th>" . $date[0] . "-" . $date[1] . "</th>";
						foreach($values as $value) {
							echo "<td>" . $value . "</td>";
						}
					echo "</tr>";
				echo "</tbody>";
?>

Are you hungry yet?

I think there’s a lot to improve on yet. You could, for example, do a “waiting list” of urls so you could crawl a lot of URL’s with a single call (of course each URL could have his own REGEX and “official price”, if they are from different sites).

And what do you think we could improve?

43 Written ArticlesWebsiteGoogle+

I'm a web designer and entrepreneur from Itajubá (MG), Brasil. I love writing about obscure topics and doing some cool stuff. And also I do some FREE stuff, check it out: http://www.roch.com.br/

52 Comments Best Comments First
  • Derrick Dedmon

    Saturday, May 28th, 2011 06:42

    18

    This is a really cool post. I will keep this in mind if I find any applications. This may come in handy.

    +3
    • Rochester Oliveira

      Monday, May 30th, 2011 14:55

      26

      Hey Derrick, it can be useful in a lot of applications, for sure.

      If you do something like this, don’t forget to put you link here, so we could see it :D

      0
  • Manuel

    Saturday, May 28th, 2011 21:17

    20

    Yeah, awesome post, as usual. I think I’ll be digging into PHP a little bit more.

    +2
    • Rochester Oliveira

      Monday, May 30th, 2011 14:58

      27

      Thank you, Manuel.

      We have a lot of php related posts to go, don’t forget to keep visiting us :D

      []‘s

      +1
  • Mainual

    Thursday, May 26th, 2011 16:23

    3

    Thx for this great article! Hope to try it in action!

    +1
    • Rochester Oliveira

      Thursday, May 26th, 2011 17:32

      4

      Hey Mainual,

      I’ve sent a demo to our (great) editor, Saad, but he didn’t sent to our server yet.. Hope soon have it linked here too, so you can easily test your spy :D

      []‘s

      -1
  • Sammy

    Friday, May 27th, 2011 21:45

    16

    thanks its a great tutorial very thorough…

    0
    • Rochester Oliveira

      Monday, May 30th, 2011 14:53

      23

      Thank you, betsafe!

      Keep coming, we have a lot of great content to be published

      0
  • Erich

    Thursday, May 26th, 2011 18:00

    1

    I like when you guys do some above average complexity articles. Especially about things like this

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:44

      7

      Glad to hear this Erich, we are tryng our best here :D

      []‘s
      Rochester

      0
  • Jasmine

    Wednesday, June 1st, 2011 12:00

    29

    This is a great tutorial… it looks fairly self explanatory. The possibility is endless, perhaps we can even create a crawler to crawl Google’s search result? Hehehe…

    0
    • Rochester Oliveira

      Wednesday, June 1st, 2011 14:35

      30

      Hey Jasmine, how are you?

      Well, google is much smarter then us, if you try to do a search engine monitor, for example, he could block you access after X downloads of the page.. I could try it and if it works I could make another article about search engine position monitor :D

      []‘s

      0
  • Jeremy Bayone

    Thursday, May 26th, 2011 22:08

    6

    I’ll have to give this a try.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:52

      9

      Hi Jeremy,
      If you want to download the demo just follow this link: http://roch.com.br/webspy.zip
      []‘s

      0
  • Eindhoven

    Thursday, May 26th, 2011 20:41

    5

    Cool tutorial, I’ll give it a try!

    0
  • Barry Reynolds

    Thursday, May 26th, 2011 20:32

    2

    This is quite interesting. So basically I can keep ahead of the market by monitoring my competitors prices and adjust mine accordingly, similar to what Tesco does with their price but this is online. This could be a very useful tool in the right company, thanks.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:51

      8

      Yeah, Barry, it is a really good tool.

      I use something really similar to this to update all the prices in my store daily, based on USD to BRL currrency, so I can get prices much more accurately than my competitors (sometimes they lose money, sometimes they lose sales because of higher prices..)

      []‘s
      Rochester

      0
  • sin2384

    Friday, May 27th, 2011 14:25

    13

    Great tutorial, this can be used for so many things! I’ll definitely gonna try it!

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 16:32

      15

      Thank you!

      Yeah, it has a wide range of aplications, this is just one of them :D

      []‘s

      0
  • Adán

    Friday, May 27th, 2011 05:21

    12

    Hi, Rochester!

    well your article is cool,

    some time ago i was with a similar thing, my problem was that i couldnt do this automatically becouse I pay a limited webserver and the administrator told me that i dont have access to the physical server, i only have access to phpmyadmin and ferozo panel.

    the question is: is there any solution to do something like crontab without entering the physical server?

    in any case you could recommend me a web hosting that allows this?

    thank you very much for do cool posts
    Regards, Adán.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 16:32

      14

      Hey Ádan, how are you?
      What I can think now is what wordpress uses, called wp_cron. It has a kind of listed tasks, and every time a user loads the aplication it compares the current time with the time that the tasks should run.
      It is not so accurate like crontab, but can be a good alternative for you.. You could user in a wordpress site, or do something similar to it in your own system..

      If I remember another way to do this, I’ll come back here :D

      CODEX: http://codex.wordpress.org/Category:WP-Cron_Functions
      []‘s
      Rochester

      0
      • Adán

        Saturday, May 28th, 2011 04:44

        17

        Tanks Rochester!

        I’m going to check that

        Regards, Adán.

        0
    • Bogdan Rusu

      Saturday, May 28th, 2011 19:04

      19

      Actually there’s also another pretty easy way to do this without the cron: create a scheduled task on your own computer to access the php script automatically once in a while (maybe in the background) and get the necessary information. It won’t work as good unless your computer is on all the time when this should be executed, but you can go around any limitations you might have on the server.
      Also, if you have access to another server where you CAN edit the crontab (so not necessarily the same server where the web page is stored), you could add an entry there to do the exact same thing – access the web page automatically. :)
      In both cases, since the PHP script would be publicly accessible, you could think about protecting it with a hashkey sent via GET, to prevent unauthorized access even if someone else knows the URL.

      0
      • Rochester Oliveira

        Monday, May 30th, 2011 14:57

        24

        Hey Webilă,

        The good part is that it is really easy. The bad part is that you start to depend on your computer to have you application (in other, I guess) depending on external resources.

        But is a good alternative, and I’m sure that works in a lot of cases.

        Thank you!

        0
  • Kent

    Friday, October 14th, 2011 22:03

    41

    This utility did not know PHP. I will use it to see how it works.

    0
    • Rochester Oliveira

      Tuesday, October 25th, 2011 13:45

      44

      It’s really good, you should give it a try :)

      []‘s

      0
  • Johnny

    Thursday, May 3rd, 2012 11:31

    52

    Thank you for a great article.
    I have created the db as ‘spy’, imported the sql string and it has created the table ‘zappos’ in the db.
    Any idea why i get this error : Notice: Undefined variable: date in C:\wamp\www\WebSpy\index.php on line 154?

    0
  • amir

    Saturday, March 31st, 2012 21:12

    50

    Hi!
    very nice tutorial, but I still get this error

    ( ! ) Notice: Undefined variable: date in C:\wamp\www\Mywebsearch\webspy\index.php on line 154
    Call Stack
    # Time Memory Function Location
    1 0.0009 379664 {main}( ) ..\index.php:0

    ( ! ) Notice: Undefined variable: date in C:\wamp\www\Mywebsearch\webspy\index.php on line 154
    Call Stack
    # Time Memory Function Location
    1 0.0009 379664 {main}( ) ..\index.php:0
    -
    best regard

    0
  • Rochester Oliveira

    Monday, January 2nd, 2012 19:21

    47

    hahaha

    0
  • Ned

    Sunday, January 22nd, 2012 02:06

    48

    Great tutorial .I think that using CURL instead of fopen is better , because you can set (REFERER) field which is required by many websites .

    0
    • Rochester Oliveira

      Monday, January 23rd, 2012 03:06

      49

      Nice point NokiaThemes,

      CURL is pretty useful if you need to send POST data also (e.g. working to get data from search forms or any kind of post forms)

      0
  • Rohan

    Saturday, October 22nd, 2011 22:51

    42

    This website showing forex rates of different countries, and i want to crwal all of the stored data which can be shown by selecting different dates, Please help me how can i write curl or fpot crawler.

    0
    • Rochester Oliveira

      Tuesday, October 25th, 2011 13:42

      43

      Hey Rohan, reach me via e-mail, I think I could help ya.

      []‘s

      0
  • Onur

    Thursday, September 29th, 2011 09:12

    39

    Hi Rochester,

    I am not a php expert i need to ask this :) If i want to save the data hourly instead of daily, what kind of changes should i made to my php files?

    0
    • Rochester Oliveira

      Thursday, September 29th, 2011 19:55

      40

      Hi Onur!

      So, actually this is not with PHP, this is the crontab job. (part 4)
      ——You’ll change this
      #0 2 * * * says that it should run in minute zero, hour two, any day of month, any month and any day of week
      0 2 * * * wget http://whereismycronjob/cron.php
      ——to
      #minute zero, any hour, any day of month, any month and any day of week
      0 * * * * wget http://whereismycronjob/cron.php
      ——-

      Hope it helps!
      []‘s

      0
  • Nolan

    Thursday, June 9th, 2011 13:57

    33

    Rochester , good tutorial you can use curl instead of fopen.

    0
    • Rochester Oliveira

      Thursday, June 9th, 2011 15:30

      32

      Thank you, novini.

      Yeah, I’ve seen even JS techniques for doing this, pretty amazing :D

      []‘s

      0
      • Dhruv

        Sunday, November 20th, 2011 20:30

        45

        I know i this is an old comment, but cURL is a PHP function/class not JavaScript.

        0
        • Rochester Oliveira

          Monday, January 2nd, 2012 19:21

          46

          Hi Dhruv, I wasn’t talking about cURL, actually. As I said “I’ve seen even JS techniques”, there is a lot of other ways for doing this ;)

          []‘s

          0
  • Rochester Oliveira

    Friday, June 3rd, 2011 22:14

    31

    Thank you zeeshan, we are all doing our best here!

    0
  • Nadeem

    Saturday, July 9th, 2011 16:15

    34

    hi,

    I am web developer in asp.net ans mssql, can this be done in asp.net, or how can I convert this code to asp.net using mssql.

    thanks,
    Nadeem

    0
    • Rochester Oliveira

      Monday, July 18th, 2011 06:19

      35

      Hy Nadeen,
      I’m not a asp guy, sorry I don’t know how to do it.

      []‘s

      0
  • Rochester Oliveira

    Monday, July 18th, 2011 06:20

    36

    Hey Antony, if you need some help, just reach me via email or twitter :)

    []‘s

    0
  • Rescue Leokeng

    Friday, July 22nd, 2011 08:01

    37

    I am receiving this error when I run the DEMO “Cannot read property ‘kh’ of undefined” what could be the fix?

    0
    • Rochester Oliveira

      Tuesday, July 26th, 2011 05:48

      38

      Hi there!
      In which point do you get this error? I don’t remember to have seen anything similar to it off the top of my head, but it could be:
      - Server config
      - Any variable reference that is wrong (maybe with all this copy & paste stuff I’ve done something wrong)
      []‘s

      0
  • Sadaruwan

    Monday, April 2nd, 2012 08:21

    51

    I ran this and works like a charm but what I want is to do is to create bot that’ll search for a specific word in the whole site can you help with this I’ll be a very grateful.

    -1
  • Duro

    Sunday, May 29th, 2011 14:28

    22

    Thank you for a great article.

    One thing that bothers me is, if this is legal? What about robots.txt file settings and other issues regarding crawler (webscraper) usage?

    -2
    • Rochester Oliveira

      Monday, May 30th, 2011 15:04

      28

      Hey Đuro,

      I haven’t thought about it, acctually. I think that in many cases isn’t a problem since what our crawler do is access the page, as one of our employees could do, so there is no reason to worry about.

      But is something that we must to pay attention, for sure :D

      []‘s

      -2
  • Antown

    Sunday, May 29th, 2011 10:36

    21

    Now I’m learning php and for me this information is very important. It’s amazing but just now I am involved in the collection of information from other sites.
    “… easy to notice that all prices must have a” $ “before them …” – it is not always the case. I would have replaced it with a piece of code with something else.
    In any case, this is a very useful article. Thanks for the link for setting up cron

    -2
    • Rochester Oliveira

      Monday, May 30th, 2011 15:01

      25

      Hey Antown,

      Yeah, we have a lot of cases that $ isn’t enough. Some currencies have different signs, so we have to pay attention and adjust the regex with your needs..

      Thank you!
      []‘s

      0
  • Johnny

    Thursday, May 3rd, 2012 11:31

    52

    Thank you for a great article.
    I have created the db as ‘spy’, imported the sql string and it has created the table ‘zappos’ in the db.
    Any idea why i get this error : Notice: Undefined variable: date in C:\wamp\www\WebSpy\index.php on line 154?

    0
  • Sadaruwan

    Monday, April 2nd, 2012 08:21

    51

    I ran this and works like a charm but what I want is to do is to create bot that’ll search for a specific word in the whole site can you help with this I’ll be a very grateful.

    -1
  • amir

    Saturday, March 31st, 2012 21:12

    50

    Hi!
    very nice tutorial, but I still get this error

    ( ! ) Notice: Undefined variable: date in C:\wamp\www\Mywebsearch\webspy\index.php on line 154
    Call Stack
    # Time Memory Function Location
    1 0.0009 379664 {main}( ) ..\index.php:0

    ( ! ) Notice: Undefined variable: date in C:\wamp\www\Mywebsearch\webspy\index.php on line 154
    Call Stack
    # Time Memory Function Location
    1 0.0009 379664 {main}( ) ..\index.php:0
    -
    best regard

    0
  • Ned

    Sunday, January 22nd, 2012 02:06

    48

    Great tutorial .I think that using CURL instead of fopen is better , because you can set (REFERER) field which is required by many websites .

    0
    • Rochester Oliveira

      Monday, January 23rd, 2012 03:06

      49

      Nice point NokiaThemes,

      CURL is pretty useful if you need to send POST data also (e.g. working to get data from search forms or any kind of post forms)

      0
  • Rochester Oliveira

    Monday, January 2nd, 2012 19:21

    47

    hahaha

    0
  • Rohan

    Saturday, October 22nd, 2011 22:51

    42

    This website showing forex rates of different countries, and i want to crwal all of the stored data which can be shown by selecting different dates, Please help me how can i write curl or fpot crawler.

    0
    • Rochester Oliveira

      Tuesday, October 25th, 2011 13:42

      43

      Hey Rohan, reach me via e-mail, I think I could help ya.

      []‘s

      0
  • Kent

    Friday, October 14th, 2011 22:03

    41

    This utility did not know PHP. I will use it to see how it works.

    0
    • Rochester Oliveira

      Tuesday, October 25th, 2011 13:45

      44

      It’s really good, you should give it a try :)

      []‘s

      0
  • Onur

    Thursday, September 29th, 2011 09:12

    39

    Hi Rochester,

    I am not a php expert i need to ask this :) If i want to save the data hourly instead of daily, what kind of changes should i made to my php files?

    0
    • Rochester Oliveira

      Thursday, September 29th, 2011 19:55

      40

      Hi Onur!

      So, actually this is not with PHP, this is the crontab job. (part 4)
      ——You’ll change this
      #0 2 * * * says that it should run in minute zero, hour two, any day of month, any month and any day of week
      0 2 * * * wget http://whereismycronjob/cron.php
      ——to
      #minute zero, any hour, any day of month, any month and any day of week
      0 * * * * wget http://whereismycronjob/cron.php
      ——-

      Hope it helps!
      []‘s

      0
  • Rescue Leokeng

    Friday, July 22nd, 2011 08:01

    37

    I am receiving this error when I run the DEMO “Cannot read property ‘kh’ of undefined” what could be the fix?

    0
    • Rochester Oliveira

      Tuesday, July 26th, 2011 05:48

      38

      Hi there!
      In which point do you get this error? I don’t remember to have seen anything similar to it off the top of my head, but it could be:
      - Server config
      - Any variable reference that is wrong (maybe with all this copy & paste stuff I’ve done something wrong)
      []‘s

      0
  • Rochester Oliveira

    Monday, July 18th, 2011 06:20

    36

    Hey Antony, if you need some help, just reach me via email or twitter :)

    []‘s

    0
  • Nadeem

    Saturday, July 9th, 2011 16:15

    34

    hi,

    I am web developer in asp.net ans mssql, can this be done in asp.net, or how can I convert this code to asp.net using mssql.

    thanks,
    Nadeem

    0
    • Rochester Oliveira

      Monday, July 18th, 2011 06:19

      35

      Hy Nadeen,
      I’m not a asp guy, sorry I don’t know how to do it.

      []‘s

      0
  • Nolan

    Thursday, June 9th, 2011 13:57

    33

    Rochester , good tutorial you can use curl instead of fopen.

    0
    • Rochester Oliveira

      Thursday, June 9th, 2011 15:30

      32

      Thank you, novini.

      Yeah, I’ve seen even JS techniques for doing this, pretty amazing :D

      []‘s

      0
      • Dhruv

        Sunday, November 20th, 2011 20:30

        45

        I know i this is an old comment, but cURL is a PHP function/class not JavaScript.

        0
        • Rochester Oliveira

          Monday, January 2nd, 2012 19:21

          46

          Hi Dhruv, I wasn’t talking about cURL, actually. As I said “I’ve seen even JS techniques”, there is a lot of other ways for doing this ;)

          []‘s

          0
  • Rochester Oliveira

    Friday, June 3rd, 2011 22:14

    31

    Thank you zeeshan, we are all doing our best here!

    0
  • Jasmine

    Wednesday, June 1st, 2011 12:00

    29

    This is a great tutorial… it looks fairly self explanatory. The possibility is endless, perhaps we can even create a crawler to crawl Google’s search result? Hehehe…

    0
    • Rochester Oliveira

      Wednesday, June 1st, 2011 14:35

      30

      Hey Jasmine, how are you?

      Well, google is much smarter then us, if you try to do a search engine monitor, for example, he could block you access after X downloads of the page.. I could try it and if it works I could make another article about search engine position monitor :D

      []‘s

      0
  • Duro

    Sunday, May 29th, 2011 14:28

    22

    Thank you for a great article.

    One thing that bothers me is, if this is legal? What about robots.txt file settings and other issues regarding crawler (webscraper) usage?

    -2
    • Rochester Oliveira

      Monday, May 30th, 2011 15:04

      28

      Hey Đuro,

      I haven’t thought about it, acctually. I think that in many cases isn’t a problem since what our crawler do is access the page, as one of our employees could do, so there is no reason to worry about.

      But is something that we must to pay attention, for sure :D

      []‘s

      -2
  • Antown

    Sunday, May 29th, 2011 10:36

    21

    Now I’m learning php and for me this information is very important. It’s amazing but just now I am involved in the collection of information from other sites.
    “… easy to notice that all prices must have a” $ “before them …” – it is not always the case. I would have replaced it with a piece of code with something else.
    In any case, this is a very useful article. Thanks for the link for setting up cron

    -2
    • Rochester Oliveira

      Monday, May 30th, 2011 15:01

      25

      Hey Antown,

      Yeah, we have a lot of cases that $ isn’t enough. Some currencies have different signs, so we have to pay attention and adjust the regex with your needs..

      Thank you!
      []‘s

      0
  • Manuel

    Saturday, May 28th, 2011 21:17

    20

    Yeah, awesome post, as usual. I think I’ll be digging into PHP a little bit more.

    +2
    • Rochester Oliveira

      Monday, May 30th, 2011 14:58

      27

      Thank you, Manuel.

      We have a lot of php related posts to go, don’t forget to keep visiting us :D

      []‘s

      +1
  • Derrick Dedmon

    Saturday, May 28th, 2011 06:42

    18

    This is a really cool post. I will keep this in mind if I find any applications. This may come in handy.

    +3
    • Rochester Oliveira

      Monday, May 30th, 2011 14:55

      26

      Hey Derrick, it can be useful in a lot of applications, for sure.

      If you do something like this, don’t forget to put you link here, so we could see it :D

      0
  • Sammy

    Friday, May 27th, 2011 21:45

    16

    thanks its a great tutorial very thorough…

    0
    • Rochester Oliveira

      Monday, May 30th, 2011 14:53

      23

      Thank you, betsafe!

      Keep coming, we have a lot of great content to be published

      0
  • sin2384

    Friday, May 27th, 2011 14:25

    13

    Great tutorial, this can be used for so many things! I’ll definitely gonna try it!

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 16:32

      15

      Thank you!

      Yeah, it has a wide range of aplications, this is just one of them :D

      []‘s

      0
  • Adán

    Friday, May 27th, 2011 05:21

    12

    Hi, Rochester!

    well your article is cool,

    some time ago i was with a similar thing, my problem was that i couldnt do this automatically becouse I pay a limited webserver and the administrator told me that i dont have access to the physical server, i only have access to phpmyadmin and ferozo panel.

    the question is: is there any solution to do something like crontab without entering the physical server?

    in any case you could recommend me a web hosting that allows this?

    thank you very much for do cool posts
    Regards, Adán.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 16:32

      14

      Hey Ádan, how are you?
      What I can think now is what wordpress uses, called wp_cron. It has a kind of listed tasks, and every time a user loads the aplication it compares the current time with the time that the tasks should run.
      It is not so accurate like crontab, but can be a good alternative for you.. You could user in a wordpress site, or do something similar to it in your own system..

      If I remember another way to do this, I’ll come back here :D

      CODEX: http://codex.wordpress.org/Category:WP-Cron_Functions
      []‘s
      Rochester

      0
      • Adán

        Saturday, May 28th, 2011 04:44

        17

        Tanks Rochester!

        I’m going to check that

        Regards, Adán.

        0
    • Bogdan Rusu

      Saturday, May 28th, 2011 19:04

      19

      Actually there’s also another pretty easy way to do this without the cron: create a scheduled task on your own computer to access the php script automatically once in a while (maybe in the background) and get the necessary information. It won’t work as good unless your computer is on all the time when this should be executed, but you can go around any limitations you might have on the server.
      Also, if you have access to another server where you CAN edit the crontab (so not necessarily the same server where the web page is stored), you could add an entry there to do the exact same thing – access the web page automatically. :)
      In both cases, since the PHP script would be publicly accessible, you could think about protecting it with a hashkey sent via GET, to prevent unauthorized access even if someone else knows the URL.

      0
      • Rochester Oliveira

        Monday, May 30th, 2011 14:57

        24

        Hey Webilă,

        The good part is that it is really easy. The bad part is that you start to depend on your computer to have you application (in other, I guess) depending on external resources.

        But is a good alternative, and I’m sure that works in a lot of cases.

        Thank you!

        0
  • Jeremy Bayone

    Thursday, May 26th, 2011 22:08

    6

    I’ll have to give this a try.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:52

      9

      Hi Jeremy,
      If you want to download the demo just follow this link: http://roch.com.br/webspy.zip
      []‘s

      0
  • Eindhoven

    Thursday, May 26th, 2011 20:41

    5

    Cool tutorial, I’ll give it a try!

    0
  • Mainual

    Thursday, May 26th, 2011 16:23

    3

    Thx for this great article! Hope to try it in action!

    +1
    • Rochester Oliveira

      Thursday, May 26th, 2011 17:32

      4

      Hey Mainual,

      I’ve sent a demo to our (great) editor, Saad, but he didn’t sent to our server yet.. Hope soon have it linked here too, so you can easily test your spy :D

      []‘s

      -1
  • Barry Reynolds

    Thursday, May 26th, 2011 20:32

    2

    This is quite interesting. So basically I can keep ahead of the market by monitoring my competitors prices and adjust mine accordingly, similar to what Tesco does with their price but this is online. This could be a very useful tool in the right company, thanks.

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:51

      8

      Yeah, Barry, it is a really good tool.

      I use something really similar to this to update all the prices in my store daily, based on USD to BRL currrency, so I can get prices much more accurately than my competitors (sometimes they lose money, sometimes they lose sales because of higher prices..)

      []‘s
      Rochester

      0
  • Erich

    Thursday, May 26th, 2011 18:00

    1

    I like when you guys do some above average complexity articles. Especially about things like this

    0
    • Rochester Oliveira

      Friday, May 27th, 2011 04:44

      7

      Glad to hear this Erich, we are tryng our best here :D

      []‘s
      Rochester

      0

Comments are closed.

54.196.233.250 - unknown - unknown - US