parse the words on HTML's meta tag

Links for php scripts

Moderators: egami, macek, gesf

Post Reply

Mon Jul 18, 2005 7:09 am

You can use PHP function (get_meta_tags) to get the Meta tags into an array and then extract and save them into any databse you are familiar and/or using. The following is the copy of the explanation in the PHP manual on how to use the function. Good luck :)

(PHP 3>= 3.0.4, PHP 4 )

get_meta_tags -- Extracts all meta tag content attributes from a file and returns an array
array get_meta_tags ( string filename [, int use_include_path])

Opens filename and parses it line by line for <meta> tags in the file. This can be a local file or an URL. The parsing stops at </head>.

Setting use_include_path to 1 will result in PHP trying to open the file along the standard include path as per the include_path directive. This is used for local files, not URLs.

Example 1. What get_meta_tags() parses

<meta name="author" content="name">
<meta name="keywords" content="php documentation">
<meta name="DESCRIPTION" content="a php manual">
<meta name="geo.position" content="49.33;-86.59">
</head> <!-- parsing stops here -->

(pay attention to line endings - PHP uses a native function to parse the input, so a Mac file won't work on Unix).

The value of the name property becomes the key, the value of the content property becomes the value of the returned array, so you can easily use standard array functions to traverse it or access single values. Special characters in the value of the name property are substituted with '_', the rest is converted to lower case. If two meta tags have the same name, only the last one is returned.

Example 2. What get_meta_tags() returns

// Assuming the above tags are at
$tags = get_meta_tags('');

// Notice how the keys are all lowercase now, and
// how . was replaced by _ in the key.
echo $tags['author']; // name
echo $tags['keywords']; // php documentation
echo $tags['description']; // a php manual
echo $tags['geo_position']; // 49.33;-86.59

Note: As of PHP 4.0.5, get_meta_tags() supports unquoted HTML attributes.
See also htmlentities() and urlencode().

Post Reply