session_regenerate_id() old data not copied to new session ID

In PHP I’m using MySQL with Memcached backend for storing session information. Every few minutes I need to regenerate the session ID to prevent replay attack. However when using session_regenerate_id(true), sometimes session data from old ID are not copied to the new regenerated ID. Therefore everytime when the session regenerated, I’ll be logged out if I’m currently logged in to the web app.

To fix it, I need to regenerate the ID without deleting the old data, flush the session data into database, stop then restart the session with new ID

session_regenerate_id();
$new_sess_id = session_id();

// this enable the backend to write the session data to backend storage
$this->data_changed = true; 

// this will call the write() function to save session data to backend
session_write_close(); 

session_id($new_sess_id);
session_start();

New PHP

PHP is an amazing server side language to build web application. The ability to embed an executable code into text content is really practical in the web page generation, thus the most convenient language to use to create dynamic website. Additionally, it’s also easy to learn by beginner.

However, the language inconsistencies in naming, function output, default behavior etc. – is a real PITA to seasoned developers. One of the inconsistencies that I’d like to address here is the notation syntax used. Namespace declaration, variable name, methods and properties accessor & also array – they are all different. Some use backslash, some use arrow, some use double colon – as a result of borrowing syntax from a lot of other languages.

Therefore i hope in PHP 6 there’ll be some groundbreaking changes to its syntax. In my opinion, to be a much cleaner syntax, just use one format – dot notation. Here’s the new code syntax I suggest, together with some new features that I feel useful to have in new PHP

Current:

// namespace
namespace org\example\system;

use org\example\system\base;
use org\example\storage;
use org\example\collection\Dictionary;

// class
class Application extends baseApplication {

  private static $mode = 0;
  private $id = 0;
  private $db;

  public function __construct() {

    // method / property accessor
    $this->id = 1;
    $this->run();

    // namespaced class
    $this->db = new storageDatabase();
    $collection = new Dictionary(array('a','b'));

    // array
    $arr = array(
      'a' => 'A'
    );

    // array php 5.4
    $arr = [
      'b' => 'B'
    ];

    // array operation
    array_pop($arr);
    array_push($arr, 'c');

    // strings
    $str = 'name';
    $n = $str[0];
    $lower = strtolower($str);

    // include files
    require '../lib/functions.php';
    require_once '../lib/functions.php';
    include '../lib/functions.php';
    include_once '../lib/functions.php';

    // static method / property accessor
    $mode = self::$mode;
    $dict = Dictionary::create();

  }

  public function run() {

    // parent class accessor
    parent::run();

  }

}

New PHP:

// namespace
namespace org.example.system;

use org.example.system.base;
use org.example.storage;
use org.example.collection.Dictionary;

// class
class Application extends base.Application {

  private static mode = 0;
  private id = 0;
  private db;

  public function __construct() {

    // method / property accessor
    this.id = 1;
    this.run();

    // namespaced class
    this.db = new storage.Database();
    collection = new Dictionary(['a', 'b']);

    // array can be treated as mixed of hashed map,
    // dictionary, list or set
    // array is now an object type
    arr = [
      'b' : 'B'
    ];

    // array operation
    arr.pop();
    arr.push('c');
    arr[0];
    arr[0:2];
    arr[:2];

    // string is now object type,
    // not treated like an array
    str = 'name';
    str.charAt(0);
    str.substr(0, 1);
    str.toLower();

    // include files, use 'use' statement
    // all diff statement behavior will use 'include' behavior
    // use full path

    use com.example.lib.utils;
    use com.example.lib.functions as fn;

    // class alias
    fn.log('message');

    // static method / property accessor
    mode = self.mode;
    dict = Dictionary.create();

  }

  public function run() {

    // parent class accessor
    parent.run();

  }

}

Sure, it looks like Java (or Javascript, or Python), nothing wrong with that, as long as the primary feature – embeddable code – remain intact, PHP will always be the main choice for web app development.

OOP the right way

Most of tutorials on object oriented programming tells the reader about how to do it rather than why we have to write it that way. This post attempts to explain why we write object oriented code the way it is.

Why object oriented? Object oriented is a way to write an application in components to reduce code repetition, enhance maintainability (able to unit test) & to let developer to focus on specific parts of app (separation of concern). Therefore, an app must be structured into components (component-based application).

In explaining class & objects in oop, it’s better to use real world example. Here I’ll take cache component of a web application. The basic functionality of cache component is to store app level data into temporary storage & retrieve it quickly. Now we can assume the basic class of a cache component is like this:

class Cache {
	public function get($key) {}
	public function set($key, $data, $expired) {}
	public function delete($key) {}
}

Notice we’re using public method. Public methods are used to expose functionality of a component for other components to use. This is the basic of designing an API, we’re defining how to use a component, what input it receives & what the output it’s expected to produce

class Cache {
	/**
	 * @param string $key
	 * @return array|false
	 */
	public function get($key) {}
	/**
	 * @param string $key
	 * @param mixed $data
	 * @param string|int $expired
	 */
	public function set($key, $data, $expired) {}

	// ...
}

How about the implementation? We know that cache component can use many types of backend, such as file based, SQLite, Memcached, Redis etc. Here is where inheritance is useful. Inheritance is used to either implement methods to a specific condition or to change the way data is processed (override).

class Cache {
	// ... public methods ...

	protected function purge() {}
	protected function getExpirationTime() {}
	protected function formatInputData() {}
	protected function formatOutputData() {}
}

Here we are using protected method. Protected is used on methods that are not going to be used by other components – these methods are going to be override by child class & used within the class itself. Let’s take a look at the example of child class implementations:

// using sqlite backend
class SQLiteCache extends Cache {
	private $connection;
	public function __construct() {
		parent::__construct();
		$this->connect();
	}
	private function connect() {
		// connect to database
		$this->connection = new PDO('sqlite::memory:');
	}
}

// using memcached backend
class MemcachedCache extends Cache {}

// using file-based backend
class FileCache extends Cache {
	// expose specific hidden methods to let extended class
	// to manipulate the data based on its backend
	protected function formatInputData() {
		if (!is_string($this->data)) {
			$this->data = serialize($this->data);
		}
	}
}

In SQLiteCache class, it has its own connect() method, that’s going to be used in this class only. That’s why it’s declared private. We may have connect() method in MemcachedCache to connect to memcached server, however the implementation is totally different from SQLiteCache, so we may write those methods separately without needing a parent class. Only create a parent class to combine child classes that has similar code (reduce code repetition).

In FileCache, we’re implementing formatInputData() because we couldn’t store PHP variable other than strings into file, so we serialize() the vars. This is the appropriate way to implement hook pattern (like in WordPress). Since this methods is declared in our base Cache class, this method can be called before storing the data into cache store, with different type of backend may already manipulate the data format to suite the backend requirement.

Let’s say this code is packaged in a module released by other developer and we’re not satisfied with the MemcachedCache implementation, we can extends that class and write our own implementation. This way, we’re not disturbing the module code & it’s important so that we don’t have to hack the module code again every time we want to update it.

class BetterMemcachedCache extends MemcachedCache {
	protected function getExpirationTime() {
		// return time-to-live instead of expiration timestamp
		return strtotime($this->expired, 0);
	}
}

Summary: Write OOP code in components, public methods are the API of the components, protected methods are for specific implementation & private methods are for usage within that class only. In general, we may have a lot of protected methods (to achieve hook pattern) and a few private methods (only to group repetitious code).

Singleton class

This is continuation from singleton pattern post.

Here’s the abstract class for singleton pattern, and other classes that need to be singleton just need to extend this class.

class singleton {
    private static $instance;
    private function __construct() {}
    private function __clone() {}
    protected static function init($class) {
        if (!isset(self::$instance[$class]) ||
            !self::$instance[$class] instanceof $class) {
            self::$instance[$class] = new $class();
        }
        return self::$instance[$class];
    }
}

We can’t directly use self::$instance = new self(); because self will refer to this singleton class itself, not the class we extend. So, to get the extending class, the extending class need to explicitly pass the classname to parent class to init().

class app extends singleton {
    var $i = 0;
    static function o() {
        return parent::init(__CLASS__);
    }
    function run() {
        $this->i++;
        echo 'run app...';
    }
}
// to access app class and its methods & attr
app::o()->run();
echo app::o()->i;

Filter nudity with PHP

nude.js is JS based nudity scanner using HTML5 Canvas and Webworker. The algorithm is based on this research paper http://www.math.admu.edu.ph/~raf/pcsc05/proceedings/AI4.pdf.

I’ve been working on to port the script to PHP and using GD library. So, here’s the result: php-nudity-filter. It’s a direct porting of the nude.js script to PHP, where I maintain the data structure, functions and algorithm, therefore its performance is not very optimized for PHP. Scanning a 500×500 image will take around 8-10 seconds.

But overall it’s working as expected, it can detect nude picture at rate similar to nude.js. There are several steps that still not complete (esp. the bounding polygon) and optimization. Fork it at Github, improve it and make internet a better place.

PHP timezone handling

Within PHP app, we need to set the timezone to only one timezone – UTC. All timestamp data that going in and out of the database must use UTC timezone so that it’s easier to convert to other timezone value. It’s a basic in PHP script to first set the timezone data

date_default_timezone_set('UTC');

Then it is highly encourage to store all datetime related data in UNIX timestamp, since retrieving it from database is faster than formatted datetime, also easier to format using date() function, and also easy to convert from one timezone to another timezone value, using function below:

timezone_offset_get(new DateTimeZone($timezone), new DateTime()));

$timezone value is one of the timezone identifier listed at List of Supported Timezones at php.net. This function will handle the DST conversion automatically.

To summarize, here’s the correct usage and handling of timezone in PHP

  1. Always set default timezone to UTC, and store user specific timezone info in database or in cookies
  2. Store and retrieve timestamp in UTC timezone
  3. Only convert to local timezone when displaying the timestamp info
// set first early in the script
date_default_timezone_set('UTC');

// data retrieved from database is based on UTC timezone
$timestamp = 1310529794;

// and you're in Los Angeles
$timezone = 'America/Los_Angeles';

// show the formatted datetime for time in L.A
echo date('F j Y, g:i:s a', $timestamp + timezone_offset_get(new DateTimeZone($timezone), new DateTime()));

PHP async request with auth

In http://stackoverflow.com/questions/962915/how-do-i-make-an-asynchronous-get-request-in-php, it shows how to send asynchronous request from PHP script, so that the webpage is immediately rendered without waiting the request to finish.

This technique is important if you want to implement background processing in web app, just by send a request to own script that going to run the process at background, while the main script continue to run and produce the webpage.

However, the request made to the background process page maybe can be accessed directly, just by entering the correct URL to browser, and you want to make sure this page is not being abused by users.

During sending the request, we can modify the request header as an authentication method to check if request is originally come from our own web application. You may modify user agent header and set it as your own webapp UA

‘User-Agent: my-web-app’, and in your script, check the value of $_SERVER[‘HTTP_USER_AGENT’]

Or, for more security, use custom header name, as follows:

$host = 'localhost';
$path = '/bg.php';
$qs = array(); // query string
if (!empty($qs)) {
    $qs = http_build_query($qs);
    $path = $path .'?'. $qs;
}
$fp = fsockopen($host, 80, $errno, $errdesc);
if ($fp) {
    $req  = "GET $path HTTP/1.0\r\n";
    $req .= "Host: $host\r\n";
    $req .= "Content-Type: application/x-www-form-urlencoded\r\n";
    $req .= "Content-Length: ". strlen($qs) ."\r\n";
    $req .= "Anxx0Wjoiw3: asmkd3A0das2wq2\r\n";
    $req .= "Connection: Closern\r\n";
    fputs($fp, $req);
    fclose($fp);
}

And in the background process script, check if the custom header is correct:

if (isset($_SERVER['HTTP_ANXX0WJOIW3']) && $_SERVER['HTTP_ANXX0WJOIW3'] == 'asmkd3A0das2wq2') {
    // do background process
}

Note: Custom header (or other headers), can be access using $_SERVER variables, with the key prepended with ‘HTTP_’

As an addition, use encrypted value for the header name and value, and also change the value for every several hour by autogenerate it using mktime() and strtotime(), provided you secure the encrypted string with salt data, example:

$salt = 'secret-key';
$hash = md5($salt . mktime(date('H', strtotime('+6 hour')), 0, 0));

PHP Templating

Here’s a simplified template class that was inspired by PHP Tip: Extract, Variable Variables and Templating. It supports template inheritance, it’s object oriented & got auto escaping variables. Furthermore, it doesn’t need to be compiled, since it’s using plain PHP

The class: tpl.class.php

class tpl {

    var $file;
    var $folder = 'template';
    var $vars;

    function __construct($file) {
        $this->file = $file;
    }

    /**
     * Assign variables to class attributes
     */
    function assign($name, $value) {
        // need to render child template
        if ($value instanceof self) {
            ob_start();
            foreach ($this->vars as $k => $v) {
                if (is_scalar($v) || is_array($v)) {
                    // copy variables to child template
                    $value->assign($k, $v);
                }
            }
            $value->render();
            $html = ob_get_contents();
            ob_end_clean();
            // assign output HTML to parent template variable
            $this->vars[$name] =& $html;
        } else {
            $this->vars[$name] =& $value;
        }
    }

    /**
     * Echo variables and auto escape HTML for string vars
     */
    function e($name) {
        if (is_string($this->vars[$name])) {
            echo htmlspecialchars($this->vars[$name], ENT_QUOTES, 'UTF-8');
        } else {
            echo $this->vars[$name];
        }
    }

    /**
     * Display the main template (usually master/layout template)
     * and set header
     */
    function display() {
        if (!headers_sent()) {
            header('Content-type: text/html; charset=utf-8');
        }
        $this->render();
    }

    /**
     * Include template file
     */
    private function render() {
        require_once dirname(__FILE__) .'/'. $this->folder .'/'. $this->file .'.php';
    }
}

This class is just a basic class that organize the template files, and it can be easily applied to your page controller script, so that you can separate the presentation layer and business logic part.

To use, here are some of the template files, that consist of 3 inheritance levels.

Top level: layout.tpl.php

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
    <title><?php echo $this->vars['title']; ?></title>
    <meta http-equiv="content-type" content="text/html;charset=utf-8" />
</head>

<body>
    <?php echo $this->vars['body']; ?>
</body>

</html>

Middle level: body.tpl.php

<h2><?php $this->e('blog_title'); ?></h2>
<p><?php $this->e('blog_content'); ?></p>
<hr />
<?php echo $this->vars['portlet']; ?>

Bottom level: portlet.tpl.php

<div style="width: 120px; border: 1px solid #ccc; background: #eee; text-align: center; padding: 10px;">
    <span style="color: #333; font-size: 11px; font-family: sans-serif">This is portlet<br/><?php $this->e('portlet_name'); ?></span>
</div>

Then in your page controller code, add this to the bottom of the file to render the template & output:

require_once 'tpl.class.php';

$tpl = new tpl('layout.tpl');
// assign all variables first before display the template
$tpl->assign('title', 'New Test Template');
$tpl->assign('blog_title', 'My First Post');
$tpl->assign('blog_content', 'Lorem ipsum dolor sit amet');
$tpl->assign('portlet_name', 'My lil' portlet');
$tpl->assign('portlet', new tpl('portlet.tpl')); // this variable needed by body.tpl, so assign it first
$tpl->assign('body', new tpl('body.tpl'));
$tpl->display();

The order of assigning the variables is important, because you cannot echo or access variables that are not yet assigned.

This class is very much working now, and you may add more features to it – template caching, setting custom header to support other output type (such as RSS, XML etc.), data formatting functions, language translation support, gzip support etc.

Singleton pattern

Singleton class is class that allow only one instance of its class to be instantiated.

A lot of examples I see the way to implement singleton pattern whether:

1. Extends a base singleton class
2. Each class apply singleton pattern (get_instance() method, has static $instance attribute)
3. Using registry pattern where one dedicated class act as the singleton instances manager

Well, things shouldn’t get too hard. Here, I will use the combination of registry pattern and singleton in just a function

function o($class) {
    static $instances;
    if (!isset($instances[$class]) || !$instances[$class] instanceof $class) {
        $instances[$class] =& new $class();
    }
    return $instances[$class];
}

To use it:

o('db')->query(...);
o('db')->fetch();
o('singleton_class')->do_something();

PHP Hooks System

After reading Explaining Hooks, finally I understand the concept of hooks in PHP and why people use it in WordPress and say WP codes is poetry.

The general idea is, in a web application, during the runtime of the program it go through stages of processes, such as connecting to database, start the session, rendering template etc. These are known as events. When these events occured during the runtime of the program, some external code can be run as additional processing to the core program. These external/additional process is known as plugin.

So, the hooks system expose these events for the plugins to attach to. So that when the event occur, the plugin will be run. Even though the concept seems simple, but there are problem that we need to handle:

Which plugin to call first when this event occur? The hooks system need to have priority feature to make sure plugins are called in correct order, to produce the intended result

What plugin need to load? If load all, wouldn’t it affect the site performance? This is why plugins need to be registered to the plugins system of an application. So that, the core application know what plugins to load at what time, and what function to run.

Therefore, the hooks system need to have plugin registration section, priority section, specify the list of hooks event available and know how to handle unknown events. The plugin data can be stored in database, and stored the configuration data temporarily in cache for faster access.