20
4

4 回答 4

8

My somewhat-friend Sean built something that I use for this purpose quite often. You can view the demo here: http://files.seancoates.com/lexentity/ he blogged about it here: http://seancoates.com/blogs/lexentity and you can grab the source here: https://github.com/scoates/lexentity

It might not meet your full language needs, but it's a start with English.

于 2012-12-06T19:53:34.890 回答
2

You might be interested in tidy. It is boundled with PHP 5+ (all you need to use it is libtidy). It not just parses HTML, but repairs it too.

But with the localization, you are on your own - intl does not have any data about quotes - f.ex.; at least i could not found them.

于 2012-12-10T23:12:06.570 回答
2

As about quotes read this Q tag, others I would use bbcode library. As it would be really difficult to write algorithm to distinguish between dashes You need. BBcode allows editor to choose, but in that case when editor has to make an action You may think of providing some kind of button to insert special characters. For things that are easy to recognize, You just create new rules for BBcode lib and if they have to be local aware You would create different set of rules for different languages. Obvously inheritance in OOP would come handy here.

于 2012-12-10T23:44:49.537 回答
2

As others have said, a regex-based solution could be dangerous/forbidden...

But if you have a lock-down on the kind of content you want to use this tool on (and it sounds like you do if the content is coming from your CMS), it sounds like an extension to the Perl program Demoroniser could take care of this for you: http://www.fourmilab.ch/webtools/demoroniser/

于 2012-12-12T22:57:37.240 回答