String parsing

Ask about general coding issues or problems here.

Moderators: macek, egami, gesf

Etrai
New php-forum User
New php-forum User
Posts: 4
Joined: Sun Oct 13, 2002 2:33 am
Location: Sweden
Contact:

String parsing

Postby Etrai » Sun Oct 13, 2002 2:46 am

I'm trying to create my own forum using php. My problem is posts. If someone were to post something including HTML tags, they would be parsed by the browser. This could prove quite disruptive.

So I wonder if anyone can point me in the right direction for isolating HTML tags in a string? I.e. show me a way to annihilate and/or replace disruptive tags.

Thanks in advance
// Etrai

DoppyNL

Postby DoppyNL » Sun Oct 13, 2002 2:58 am

You can use the function htmlspecialchars:

Code: Select all

$value = htmlspecialchars($value);

This way ANY html-contents in $value will be converted in a way that the browser will treat is as normal text.
if a user enters "<a href=www.somelink.com>link!</a>" in his post it will apear EXACTLY like that! (no link is created).
For more details see the manual.

If you still want to give users capability's to adjust there text (bold, italic, whatever) just like is possible on this forum you could use [b ] and [/b ] (for example). You will have to change that you'reself though, but then you got full control on what is possible or not.

Greetz Daan

Etrai
New php-forum User
New php-forum User
Posts: 4
Joined: Sun Oct 13, 2002 2:33 am
Location: Sweden
Contact:

Postby Etrai » Sun Oct 13, 2002 3:24 am

This is kind of what I had in mind, however; say that FOO posts the following:

</td>I'm trying to create my own forum using php. My problem is posts. If someone were to post something including HTML tags, they would be parsed by the browser. This could prove quite disruptive.<br><br>
So I wonder if anyone can point me in the right direction for isolating <i>HTML tags</i> in a string? I.e. show me a way to annihilate and/or replace disruptive tags.<td>

Since "<br>" and "<i></i>" are harmless, they should be parsed by the browser, but "<td></td>" that are very disruptive shouldn't.
When using htmlspecialchars() all tags are made ineffective.
What I'm acctually looking for here is taking out "<whatever>" and/or "</whatever>" from the post and comparing it to a list of valid options. If "<whatever>" is valid let it be as it is, if not, htmlspecialchar()-it or replace it with "".
I.e. I'm trying to "create" a reduced version of HTML.

DoppyNL

Postby DoppyNL » Sun Oct 13, 2002 3:34 am

I think it will be easyer to do it as follows:

use htmlspecialchar to make sure every html-attribute is "disabled"
then check the post for every [b ] (for instance) and replace that with <b>. You can do that with every [ think of something ] you want.
You can use str_replace, to replace those brackets.
it will be much more difficult to check for special html-entities and exclude them from htmlspecialchar than using [brackets]

Greetz Daan

Etrai
New php-forum User
New php-forum User
Posts: 4
Joined: Sun Oct 13, 2002 2:33 am
Location: Sweden
Contact:

Postby Etrai » Sun Oct 13, 2002 3:37 am

Ok, THANKS!
I'll try that...and if it doesn't work, I'll ask something else later ;)

Etrai
New php-forum User
New php-forum User
Posts: 4
Joined: Sun Oct 13, 2002 2:33 am
Location: Sweden
Contact:

Postby Etrai » Sun Oct 13, 2002 4:35 am

Damn, I'm getting annoying =)
when searching and replacing stuff like [b] and [i] there is no problem. But if I "need" to replace something like [text color="#000000"] I run in to somewhat of a problem. str_replace() doesn't really like wildcards, in fact; it doesn't handle them at all... Do you know any possible solution?
(There is a funktion called fnmatch() which can match up strings with wildcards, but it's only in the CVS version of PHP4.)

Jay

Postby Jay » Sun Oct 13, 2002 8:00 am

You'll want to try using a regular expression function. I'm sorry I can't help you on that, it's one area which I still haven't mastered!


Return to “PHP coding => General”

Who is online

Users browsing this forum: Bing [Bot] and 1 guest