{"id":550,"date":"2011-03-26T16:13:46","date_gmt":"2011-03-26T15:13:46","guid":{"rendered":"https:\/\/d-mueller.de\/blog\/?p=550"},"modified":"2016-01-11T23:34:27","modified_gmt":"2016-01-11T22:34:27","slug":"parallel-processing-in-php","status":"publish","type":"post","link":"https:\/\/d-mueller.de\/blog\/parallel-processing-in-php\/","title":{"rendered":"Parallel processing in PHP"},"content":{"rendered":"<p>Since PHP does not offer native threads, we have to get creative to do parallel processing. I will introduce 3 fundamentally different concepts to emulate multithreading as good as possible.<\/p>\n<h2>Using systemcalls<\/h2>\n<p>If you have some basic linux knowledge, you will know that a background process can be started by adding ampersand to the systemcall (in Windows, it&#8217;s the <a href=\"http:\/\/www.robvanderwoude.com\/ntstart.php\">start<\/a>-command)<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\ndav@david:\/var\/www$ php index.php &amp;\r\n[1] 3229\r\n<\/pre>\n<p>The PHP script is running silently in the background. What is being printed to the shell (<i>3229<\/i>) is the process id, so that we are able to kill the process using<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nkill 3229\r\n<\/pre>\n<p>A problem with this approach is, that any output of the script is lost, so we have to redirect the output stream to a file, just like this:<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nphp index.php &gt; output.txt 2&gt;&amp;1 &amp;\r\n<\/pre>\n<p>The purpose of the scary <i>2>&#038;1<\/i> is to <b>redirect stderr to stdout<\/b>, so when your script produces any kind of php error, it will also get caught by the output-file. Putting everything together, we get<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n$cmd = &quot;php script.php&quot;;\r\n\r\n$outputfile = &quot;\/var\/www\/files\/out.&quot;;\r\n$pidfile = &quot;\/var\/www\/files\/pid.&quot;;\r\n\r\nfor ($i = 0; $i &lt; $process_count; $i++)\r\n    exec(sprintf(&quot;%s &gt; %s 2&gt;&amp;1 &amp; echo $! &gt;&gt; %s&quot;, $cmd, $outputfile.$i, $pidfile.$i));\r\n<\/pre>\n<p>Looks confusing, right? We&#8217;ve added <i>echo $! >> %s<\/i> to the command, so that the process id of the background script gets written to a file. This proves to be useful to keep track of all running processes.<\/p>\n<p>If you want to kill all php-processes, the following command will do:<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nkillall php\r\n<\/pre>\n<p>Needless to say that when you add the php <a href=\"http:\/\/en.wikipedia.org\/wiki\/Shebang_(Unix)\">shebang<\/a> <b>#!\/usr\/bin\/php<\/b> to the top of your script and make it executable using <b>chmod +x script.php<\/b>, the system command needs to be modified to <b>.\/script.php<\/b> instead of <b>php script.php<\/b>.<\/p>\n<p>To check if a process is still running, you might use some variation of the <a href=\"http:\/\/linux.about.com\/od\/commands\/l\/blcmdl1_ps.htm\">ps command<\/a> as done here (stolen from <a href=\"http:\/\/www.incloud.de\">Steffen<\/a>):<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfunction is_running($pid)\r\n{\r\n\t$c = &quot;ps -A -o pid,s | grep &quot; . escapeshellarg($pid);\r\n\texec($c, $output);\r\n\r\n\tif (count($output) &amp;&amp; preg_match(&quot;~(\\d+)\\s+(\\w+)$~&quot;, trim($output[0]), $m))\r\n\t{\r\n\t\t$status = trim($m[2]);\r\n\t\tif (in_array($status, array(&quot;D&quot;,&quot;R&quot;,&quot;S&quot;)))\r\n\t\t{\r\n\t\t\treturn true;\r\n\t\t}\r\n\t}\r\n\t\r\n\treturn false;\r\n}\r\n<\/pre>\n<\/p>\n<h2>Using fork()<\/h2>\n<p>Using the <a href=\"http:\/\/www.php.net\/manual\/en\/intro.pcntl.php\">pnctl<\/a>-functions of php, you get the ability to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fork_(operating_system)\">fork<\/a> a process (<a href=\"http:\/\/php.net\/manual\/en\/function.pcntl-fork.php\">pcntl_fork<\/a>, not availible on Windows). Before you get too excited, read to following quote from a comment written on php.net that exactly reflects my experience with forking in php:<\/p>\n<blockquote><p>\nYou should be _very_ careful with using fork in scripts beyond academic examples, or rather just avoid it alltogether, unless you are very aware of it&#8217;s limitations.<br \/>\nThe problem is that it just forks the whole php process, including not only the state of the script, but also the internal state of any extensions loaded.<br \/>\nThis means that all memory is copied, but all file descriptors are shared among the parent and child processes.<br \/>\nAnd that can cause major havoc if some extension internally maintains file descriptors.<br \/>\nThe primary example is ofcourse mysql, but this could be any extensions that maintains open files or network sockets.\n<\/p><\/blockquote>\n<p>You have been warned! Look at the following example:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfor ($i = 0; $i &lt; 4; $i++)\r\n{\r\n    pcntl_fork();\r\n}\r\n\r\necho &quot;hi there! pid: &quot; . getmypid() . &quot;\\n&quot;;\r\n<\/pre>\n<p><b>Output:<\/b><\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\ndav@david:\/var\/www$ php script.php\r\nhi there! pid: 3534\r\nhi there! pid: 3536\r\nhi there! pid: 3538\r\nhi there! pid: 3539\r\nhi there! pid: 3540\r\nhi there! pid: 3541\r\nhi there! pid: 3542\r\nhi there! pid: 3537\r\nhi there! pid: 3543\r\ndav@david:\/var\/www$ \r\nhi there! pid: 3544\r\nhi there! pid: 3545\r\nhi there! pid: 3546\r\nhi there! pid: 3548\r\nhi there! pid: 3547\r\nhi there! pid: 3549\r\nhi there! pid: 3550\r\n<\/pre>\n<p>As you can see, we get 2 ^ <sup><i>fork count<\/i><\/sup> processes. Somewhere in the middle of the output, the original script is finished but some forks are still running. It&#8217;s even possible to <a href=\"http:\/\/de3.php.net\/manual\/en\/function.pcntl-signal.php\">communicate<\/a> with processes that you forked. Forking is a very interesting area of computer science, nevertheless i don&#8217;t recommend using fork in real-world php applications.<\/p>\n<h2>Using curl<\/h2>\n<p>The last way to process multiple scripts in parallel is to abuse the webserver and <a href=\"http:\/\/php.net\/manual\/de\/book.curl.php\">curl<\/a>. With curl, we are able to execute multiple requests in parallel (inspired by <a href=\"http:\/\/gonzalo123.wordpress.com\/2010\/10\/11\/speed-up-php-scripts-with-asynchronous-database-queries\/\">Gonzalo Ayuso<\/a>).<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n$url = &quot;http:\/\/localhost\/calc.php&quot;;\r\n$mh = curl_multi_init();\r\n$handles = array();\r\n$process_count = 15;\r\n\r\nwhile ($process_count--)\r\n{\r\n    $ch = curl_init();\r\n    curl_setopt($ch, CURLOPT_URL, $url);\r\n    curl_setopt($ch, CURLOPT_HEADER, 0);\r\n    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);\r\n    curl_setopt($ch, CURLOPT_TIMEOUT, 30);\r\n    curl_multi_add_handle($mh, $ch);\r\n    $handles[] = $ch;\r\n}\r\n\r\n$running=null;\r\n\r\ndo \r\n{\r\n    curl_multi_exec($mh, $running);\r\n} \r\nwhile ($running &gt; 0);\r\n\r\nfor($i = 0; $i &lt; count($handles); $i++) \r\n{\r\n    $out = curl_multi_getcontent($handles[$i]);\r\n    print $out . &quot;\\r\\n&quot;;\r\n    curl_multi_remove_handle($mh, $handles[$i]);\r\n}\r\n\r\ncurl_multi_close($mh);\r\n<\/pre>\n<p>Here, we call the script <i>calc.php<\/i> 15 times. The content of calc.php is:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n&lt;?php\r\necho &quot;my pid: &quot; . getmypid();\r\n?&gt;\r\n<\/pre>\n<p>The <b>output<\/b> is as follows:<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\ndav@david:\/var\/www$ php script.php\r\nmy pid: 1401\r\nmy pid: 1399\r\nmy pid: 1399\r\nmy pid: 1403\r\nmy pid: 1403\r\nmy pid: 1398\r\nmy pid: 1398\r\nmy pid: 1402\r\nmy pid: 3767\r\nmy pid: 3768\r\nmy pid: 3769\r\nmy pid: 3772\r\nmy pid: 3771\r\nmy pid: 3773\r\nmy pid: 3770\r\n<\/pre>\n<p>Interesting to see, that we see the same process id a few times. Keep in mind, that you trigger an http-request, so you are losing performance because a webserver has to do some work. Furthermore, the called script will be working with the ordinary <i>php.ini<\/i>, and <b>not<\/b> <i>php-cli.ini<\/i>.<\/p>\n<h2>What about the speed? Benchmarks!<\/h2>\n<p>What would you take away from this post, when you didn&#8217;t know which parallel processing method is the fastest? I&#8217;ve written a little benchmark script using the 3 methods described above, did 3 runs and calculated the average. Basically, this is my benchmark scipt <b>calc.php<\/b>:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n$starttime = time();\r\n$duration = 10;\r\n\r\n$filename = &quot;\/var\/www\/results\/&quot; . getmypid() . &quot;.out&quot;;\r\n\r\n$loops = 0;\r\n\r\nwhile (true)\r\n{\r\n    for ($i = 0; $i &lt; 10000; $i++)\r\n    {\r\n        sqrt($i);\r\n    }\r\n    \r\n    $loops++;\r\n    \r\n    if ($starttime + $duration &lt;= time())\r\n        break;\r\n}\r\n\r\nfile_put_contents($filename, $loops);\r\n<\/pre>\n<p>My system:<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nUbuntu 10.10 (Kernel 2.6.35-28)\r\n4 gig Ram\r\nIntel Core 2 Duo T7500 (2 * 2.2GHz)\r\n<\/pre>\n<p>I&#8217;m fully aware that this benchmark is in no way representative, because writing the result files to harddisk might influence other processes, that are still running and my time comparison may also be slightly inaccurate. Ah, before you ask: I haven&#8217;t used <a href=\"http:\/\/php.net\/manual\/de\/function.set-time-limit.php\">set_time_limit<\/a> because it sucks. <b>So bring on the results!<\/b><\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nMethod Proc.  Iterations\r\n\r\nexec     1    2183 \r\nexec     2    3953 \r\nexec     4    4283 \r\nexec     8    4378\r\nexec    16    4586\r\nexec    32    4868\r\n\r\ncurl     1    2203\r\ncurl     2    2843\r\ncurl     4    3029\r\ncurl     8    3556\r\ncurl    16    3986\r\ncurl    32    4373\r\n\r\nfork     1    2274\r\nfork     2    4299\r\nfork     4    4245\r\nfork     8    4309\r\nfork    16    4177\r\nfork    32    4577\r\n<\/pre>\n<p>As you can see, the more parallel processes, the more iterations in total. I haven&#8217;t tested 64 processes and more because my system almost froze (memory usage and cpu utilization). Feel free to interpret the results in any way you want but in the end, it boils down to the <i>exec<\/i> &#8211; method because <i>fork<\/i> is evil and <i>curl<\/i> is not a serious alternative.<\/p>\n<p>Finally, if you want to do some testing on your own, here is my benchmark file. Place it in the same folder with the <b>calc.php<\/b> from above, give the file execute rights and create a folder <i>results<\/i>. The file is invoked by using <i>.\/bench.php method processcount<\/i>, so possible calls are<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\n.\/bench.php exec 16\r\n.\/bench.php curl 8\r\n.\/bench.php fork 32\r\n.\/bench.php -&gt; no parameter to display results\r\n<\/pre>\n<p><b>The file itself:<\/b><\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n#!\/usr\/bin\/php\r\n&lt;?php\r\n$mode = isset($argv[1]) ? $argv[1] : &quot;results&quot;;\r\n$process_count = isset($argv[2]) ? $argv[2] : 1;\r\n\r\n\/\/cleanup\r\nif ($mode != &quot;results&quot; &amp;&amp; count(glob(&quot;\/var\/www\/results\/*&quot;)))\r\n{\r\n    exec(&quot;rm \/var\/www\/results\/*&quot;);\r\n}\r\n\r\nif ($mode == &quot;exec&quot;)\r\n{\r\n    $cmd = &quot;php calc.php&quot;;\r\n\r\n    $outputfile = &quot;\/var\/www\/results\/out.&quot;;\r\n    $pidfile = &quot;\/var\/www\/results\/pid.&quot;;\r\n\r\n    for ($i = 0; $i &lt; $process_count; $i++)\r\n        exec(sprintf(&quot;%s &gt; %s 2&gt;&amp;1 &amp; echo $! &gt;&gt; %s&quot;, $cmd, $outputfile.$i, $pidfile.$i));\r\n}\r\nelseif ($mode == &quot;curl&quot;)\r\n{\r\n    $url = &quot;http:\/\/localhost\/calc.php&quot;;\r\n    $mh = curl_multi_init();\r\n    \r\n    while ($process_count--)\r\n    {\r\n        $ch = curl_init();\r\n        curl_setopt($ch, CURLOPT_URL, $url);\r\n        curl_setopt($ch, CURLOPT_HEADER, 0);\r\n        curl_setopt($ch, CURLOPT_NOBODY, true);\r\n        curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);\r\n        curl_setopt($ch, CURLOPT_TIMEOUT, 30);\r\n        curl_multi_add_handle($mh, $ch);\r\n    }\r\n    \r\n    $running=null;\r\n    \r\n    do \r\n    {\r\n        curl_multi_exec($mh, $running);\r\n    } \r\n    while ($running &gt; 0);\r\n}\r\nelseif ($mode == &quot;fork&quot;)\r\n{\r\n    for ($i = 0; $i &lt; log($process_count, 2); $i++)\r\n    {\r\n        pcntl_fork();\r\n    }\r\n    \r\n    include &quot;calc.php&quot;;\r\n}\r\nelse\r\n{\r\n    $total = 0;\r\n\r\n    foreach (glob(&quot;\/var\/www\/results\/*.out&quot;) as $f)\r\n    {\r\n        $runtime = file_get_contents($f);\r\n        $total += $runtime;\r\n        echo $runtime . &quot;\\r\\n&quot;;\r\n    }\r\n\r\n    echo &quot;Total: &quot; . $total . &quot;\\r\\n&quot;;\r\n}\r\n<\/pre><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since PHP does not offer native threads, we have to get creative to do parallel processing. I will introduce 3 fundamentally different concepts to emulate multithreading as good as possible. Using systemcalls If you have some basic linux knowledge, you &hellip; <a href=\"https:\/\/d-mueller.de\/blog\/parallel-processing-in-php\/\">Weiterlesen <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,10,3,7],"tags":[],"class_list":["post-550","post","type-post","status-publish","format-standard","hentry","category-php","category-performance","category-webdev","category-linux"],"_links":{"self":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts\/550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/comments?post=550"}],"version-history":[{"count":0,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts\/550\/revisions"}],"wp:attachment":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/media?parent=550"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/categories?post=550"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/tags?post=550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}