PHP Data Validation

WordPress data validation has some good reference in doing data validation, especially the philosophy part. For best practice, always use format correction first, then use whitelist.

// format correction
$action = (int) $_GET['action'];
// whitelist
switch ($action) {
    case 1:
        do_this();
        break;
    case 2:
        do_that();
        break;
    case 0:
    default:
        die("Don't know this action!");
}

However the above example is only for one input ($_GET[‘action’]). What if your web app (or controller) need  a few inputs, and surely you don’t want to write the same code over again, like this

// correct the format
$id = (int) $_GET['id'];
$name = (string) trim($_POST['name']);
$email = (string) trim($_POST['email');
$about = (string) htmlspecialchars(trim($_POST['about']), ENT_QUOTES, 'UTF-8');
// then do validation
if ($id == 0) {
    $id = 1;
}
if (empty($name)) {
    $msg = 'Name is required';
}
...

Most PHP frameworks I know whether validate the input one by one or using just one input source (GET or POST or COOKIE). It’s better to use $_REQUEST, since some input can be passed in <form> or just query string. E.g. /blog/post/?id=10&do=edit, then inside the page got <input type="hidden" name="do" value="save" />, now you can use both input from $_REQUEST[‘do’], which determine what action need to be done

$do = (string) $_REQUEST['do'];
switch ($do) {
    case 'edit':
        get_post($id);
        break;
    case 'save':
        save_post($id);
        break;
    case 'view':
    default:
        view_post($id);
        break;
}

With that pattern, here comes PHP filter functions, filter_input_array(). It receive an array of arguments, and source type (GET/POST etc.) and validate each of the input. It will return false on failed validation. This function is really convenient when we need to validate a long list of inputs.

However there’s a drawback. Since all the validation can be put in one input key argument (‘input_name’ => array(…list of various validators…), it is difficult to provide accurate error message, of which validation that failed. Therefore we can recreate another filter_input_array(), and put in the previous pattern (format first, then whitelist) into it. So basically the validation function work like this:

$input = validate_input(array(
    'id' => array('filter' => 'int', 'options' => array(
        'range' => array('max' => 1000, 'msg' => 'Max. value for ID is 1000')
    )),
    'name' => array('filter' => 'string', 'options' => array(
        'required' => array('value' => true, 'msg' => 'This input is required'),
        'alphanumeric' => array('value' => true, 'msg' => 'Name need to be alphanumeric')
    ))
));
// now we can use $input
$input['id'];
$input['name'];
// to get error msg
$err_msg['id']; // which may contain the error msg of specific validator
$err_msg['name'];
// to check if the whole form passed the validator or not, simply:
if (empty($err_msg)) {
    // form is valid
}

Additional note: String input.

To sanitize the string input, actually trim() is enough, and can be directly stored in database. You only need to sanitize the value only when outputting it. When echo() the value to HTML, make sure to always encode it first:

<p class="comments"><?php echo htmlspecialchars($value, ENT_QUOTES, 'UTF-8'); ?></p>

This not only can prevent XSS issue, but also display the correct Unicode character. Avoid htmlentities() as it may corrupt the Unicode characters

Apache Caching Proxy Server

This setup is using Apache 2.2 bundled with XAMPP, in Windows 7

Create new config file: /xampp/apache/conf/extra/httpd-cache-proxy.conf

LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_http_module modules/mod_proxy_http.so
LoadModule proxy_connect_module modules/mod_proxy_connect.so
LoadModule proxy_ftp_module modules/mod_proxy_ftp.so

Listen 3128
NameVirtualHost *:3128
<VirtualHost *:3128>
    ErrorLog "logs/proxy-error.log"
    CustomLog "logs/proxy-access.log" combined

    <IfModule mod_proxy.c>
      ProxyRequests On
      ProxyVia On
      <Proxy *>
        Order deny,allow
        Deny from all
        Allow from 127.0.0.1
      </Proxy>
    </IfModule>

    <IfModule mod_cache.c>
      LoadModule disk_cache_module modules/mod_disk_cache.so
      <IfModule mod_disk_cache.c>
        CacheEnable disk /
        CacheRoot "c:/xampp/apache/proxy/cache"
        CacheDirLevels 3
        CacheDirLength 5
        CacheMaxFileSize 10485760
        CacheMaxExpire 2592000
      </IfModule>
      ProxyTimeout 60
    </IfModule>
</VirtualHost>

Include this file to /xampp/apache/conf/httpd.conf

Include conf/extra/httpd-cache-proxy.conf

Make sure to create folder for CacheRoot. Restart Apache using XAMPP control panel or Windows Services (if you installed as service), and set browser’s proxy server to 127.0.0.1:3128.

Page Controller Pt.2

However, as the web application grow, the controller part may become very complex. As we know, a web page is a collection of sections which combined together to become a page – header, sidebar, main content, footer, banner etc. are sections (or whatever you call it). Therefore, it would be very complex when we need to include all this in a single controller. We can separate the smaller sections to another class or put it in our toolbox functions class, but it might be temporary before you may need it to act like a main controller. Here’s where the difference comes:

  • Most framework (that I know) only consist of single MVC, which designed to handle one simple request (CRUD operation of a blog post is a simple request)
  • A way to overcome this is by abstracting similar logics to a parent class, and let controller extends it. However not all conditions require those extra logics to be loaded for all requests.

Then I read http://techportal.ibuildings.com/2010/02/22/scaling-web-applications-with-hmvc/, an extension to ordinary MVC. It’s a collection of MVC triad that are independent to each other, and one MVC become the main controller. By using this idea, we can redesign our front controller to support HMVC and make it ready for a complex application

  1. Front controller received request, parse, delegate to target page controller as usual, and run the target controller.
  2. From within main controller, we may call another controller for different sections in that particular page.
  3. All sub controllers that being called from main controller will be independent of each other – make it look like requesting a different page. We may call it asynchronously (from within PHP, using cURL and get the response in whatever format)
  4. These subcontrollers doesn’t need to be reside in the same server or same script directory with the main controllers, it can be put in different server and act as a background service, interact using RPC, SOAP or other web service protocol

What’s more interesting is all these can be achieved without having to allocate too much resource as in term of development time (all can be done in PHP), server (one server can achieved all these) and maintainability (all components are structured and modularized – given you’re writing proper documentation and comments on the code). Take a look at Kohana or CodeIgniter if you’re interested to include this feature into your web application.

Page Controller

Having read http://www.phpwact.org/pattern/centralized_request_logic, one thought comes to mind, which one the better preferred way to code the page controller for PHP.

The page controller is the common pattern, where a file handles one web page and all of its actions. Files are being accessed directly. Base class usually being used to refactor the code. E.g http://domainname.tld/about.php, http://domainname.tld/search.php – these php files are independent from each other. Older PHP scripts are using this method, and it’s quite hard to maintain the code.

Intercepting filters being implemented by prepend the page controller script with an input filter script and append it with an output filter.  These controllers will include a header file (input filter) at the beginning of the script and include a footer file (output filter) to the end of the script. Having need to include the filter files on every page we created is quite cumbersome, and also the filter files need to handle many types of request, and sometimes the page controllers doesn’t need all of those extra functions to handle specific requests. phpBB is one of the PHP scripts that using this pattern.

Front controller pattern is having one central point to accept request and dispatch it to appropriate page controllers. This is common for all modern framework such as CakePHP, YiiFramework and DooPHP. Also these frameworks utilize the dynamic invocation method, where it doesn’t store the registry of modules and plugins it has – the framework will lookup the script directory to find appropriate controllers to handle the request. It is a drawback for static invocation method since it stores a large list of modules or plugins which need to be loaded on every requests, and it surely not needed all the time by all of the controllers.

Therefore, I conclude that the best way to code the page controller is by implementing dynamic invocation of front controller, similar to other frameworks. Here’s the program flow:

  1. Front controller received a request.
  2. Parse the request to determine which page controller to load, which action to run and what parameter need to be set
  3. Load target controller or show error page not found
  4. Target controller will need to extend some base class that share same logic with other controllers.
  5. Run the target action, determine  how to display the output

ORM or SQL query

I’ve been developing website using PHP for quite a while now, and one thing come across my mind. The company I’m working with is using basic query function using PHP PDO:

$db->prepare('query...');
$db->execute();
$db->fetchAll();

(Well, not actual code, just to show we’re using PHP PDO). We are using our own custom framework, and not utilize any ORM pattern which most popular framework use, such as CakePHP, Symfony, Yii, DooPHP etc. I wonder if using ORM will improve the performance of the website, or it is just a tool to simplify the code.

Using ORM, we don’t need to write SQL queries – we treat a table in database as an object by mapping it with PHP model class. Select, insert, update and delete rows in the table is a matter of calling correct function. However, IMO ORM will cause some performance drawbacks, because the SQL generated is not all the time optimized – querying unnecessary columns, generate too many queries etc.

By writing the SQL ourselves and call query() function manually, we have flexibility to change the SQL to execute in optimum performance. The drawback of this is we have to write similar SQL multiple times if we use it in different controller classes, while using ORM, we just need to call get() function to return specific row from db.

Personally, I’d like to use manual SQL query more, since it offer better flexibility. To overcome the need to rewrite similar SQL is just by wrap in one static function and let other classes to call it.

static function get_data() {
    $sql = 'SELECT ...';
    $db->query($sql);
    return $db->fetch();
}

In other class:

...
$data = another_class::get_data();
...