The drop is always movingYou know that saying about standing on the shoulders of giants? Drupal is standing on a huge pile of midgetsAll content management systems suck, Drupal just happens to suck less.Popular open source software is more secure than unpopular open source software, because insecure software becomes unpopular fast. [That doesn't happen for proprietary software.]Drupal makes sandwiches happen.There is a module for that

I AM GROOT

Submitted by nk on Wed, 2014-09-10 05:36

Or, languages are really hard.

So I was handing over some CSV export functionality to a client who loaded it into Excel as it is without using the import wizard. This resulted in misinterpreted UTF-8 as WIN-1252. I quickly wrote this little function to add a BOM (error handling omitted for brevity):

<?php
 
function uconv($text) {
   
$descriptorspec = array(array("pipe", "r"), array("pipe", "w"));
   
$process = proc_open("/usr/bin/uconv --add-signature", $descriptorspec, $pipes);
   
fwrite($pipes[0], $text);
   
fclose($pipes[0]);
   
$text = stream_get_contents($pipes[1]);
   
fclose($pipes[1]);
   
proc_close($process);
    return
$text;
  }
?>

A quick test of the function showed it working, so I patched the CSV export to call it, deployed it on the dev server and... it died on the first accented character. I have checked on the dev server from command line and it worked. W.T.F. I compared the mbstring ini values, all the same. W.T.F, no, really, this can't be.

Well, there must be something different, right? What could be? Locale? But what's locale? Environment variables. Hrm, proc_open has environment variables too. Well then let's see whether my shell feeds something into this script that makes it work: env -i php x.php. It breaks! Yay! It's always such relief when I can reproduce a bug that refuses to be reproduced. The solution is always easy after -- the LANG environment variable is en_US.utf8 in the shell, and C in Apache:

<?php
proc_open
("/usr/bin/uconv --add-signature", $descriptorspec, $pipes, NULL, array('LANG' => 'en_US.utf8'));
?>

Ps. Curiously enough, -f utf-8 as an uconv argument didn't help -- but -f utf-8 -t utf-8 did. Morale of the story: uconv defaults to the value LANG both to and from. This is not documented and it's very hard to discover.

Commenting on this Story is closed.

Submitted by Anonymous on Wed, 2014-09-10 16:10.

Hi Károly,

In the uconv() function, the variable $pipes seems uninitialized, right?

Cheers,
Jeff

Submitted by nk on Thu, 2014-09-11 00:19.

You don't need to init a variable that is received by reference.