Friday, January 7, 2011

Expecting A Baby And Message

Pod:: Simple:: XHTML and Pod in UTF-8

I'm almost finished with the first round while attempting $ foo as epub generate. The results look even moderately well. During the conversion process I use Pod:: Simple:: XHTML and the articles are in Pod ago - saved as UTF-8 .

The parser has been used like this:

 my $ parser = Pod:: Simple:: XHTML-> new; 
$ parser-> parse_file ('article.pod');

came so but "rubbish" out with the umlauts. The doctype and charset specification in the meta tag was correctly set to UTF-8.

A look at the code of Pod:: Simple:: XHTML has shown that the text by HTML:: Entities :: encode_entities running. So something had to go wrong.

in the documentation of HTML:: Entities found nothing about UTF-8. So I quickly wrote a test script and indicated that:

 use HTML:: Entities; 
my $ text;
local $ /; open
my $ fh, '<', 'article.pod'; $text = <$fh>;
close $ fh;
} print

encode_entities ($ text);

brings false entities. A

 use HTML:: Entities; 
my $ text;

{local $ /;
open my $ fh, '<', 'article.pod'; binmode ":encoding(utf-8)"; $text = <$fh> ;
close $ fh;}

encode_entities print ($ text);

works. So I have to take the IO layer, so that Perl reads the same file as UTF-8 (on I / O Layer here once wrote that recently).

So the problem is not with HTML:: Entities. But how can I Pod:: Simple:: XHTML say that the text is UTF-8? In the documentary, nothing found. Checked to see how the file is opened when parse_file. The result was

{local * PODSOURCE;
open (PODSOURCE, "<$source") || Carp::croak("Can't open $source: $!"); $self-> {'source_filename'} = $ source;
$ source = * PODSOURCE {IO};}

So with nothing IO-layers and the 2-arg form of open (see also this blog post ).

For Pod:: Simple, it is so that newer releases will be kept backwards compatible to pre-5.8. Since this does not work with the IO layers. That will not then.

But Pod:: Simple is supported parse_file (* FILEHANDLE) . So I put this a little like this: open my $

 pod_fh '<:encoding(utf-8)', 'article.pod' or die $!; my $parser = Pod::Simple::XHTML-> new; 
$ parser-> parse_file ($ pod_fh);


Post a Comment