Author Archive

What is a Framework?

Tuesday, June 15th, 2010

Given the diversity of packages that call themselves frameworks, the broadest description we can give, is that frameworks for web development are collections of reusable software components for building websites (or web-based applications).

Frameworks generally belong to one of two camps, “pure frameworks” and “full stack frameworks”.

Most frameworks don’t make a distinction between “framework” and “application” features, and as a result they become cluttered. They also become more daunting to newcomers, with their sheer volume of classes that attempt to solve every practical problem a web developer might ever encounter – user management, document and image handling, upload and downloading, etc.

My favorite PHP framework, Yii, is a pure framework – free from the burden that many frameworks drag around, namely a full “application stack”. Yii consists almost exclusively of components and features designed to support certain practices or paradigms.

One of my favorite things about Yii, as compared to “full stack” frameworks, such as Zend, is that Yii only comes with components and architecture that can be rightfully said to belong in the “framework” domain, and not the “application” domain.

Of course, it may be hard in some cases to draw an exact line between the two, and I think the first criteria for selecting features that belong in the framework, and not in an application stack, should be:

• Is it absolutely general-purpose?
• If not, is it fully extensible?

Certain features, like the URL manager or Active Record, are not absolutely general purpose, in the sense that they may not satisfy every possible need anyone could ever have. But they are sufficiently general purpose in the sense that almost everybody is going to need at least the core functionality of those components. And because they are fully extensible, developers can build on top of them, rather than having to replace them, if they find that a component does not fully cater to their specific needs.

The reason why I do not want application features in a framework, is because I know from experience that these will not meet my strict requirements. I will eventually end up replacing many of these features and the existing features provided by the framework are reduced to baggage.

Everybody is different – we all have different goals, and even if we share some goals, we usually have different means for reaching those goals.

I believe the reason why we web developers converge around a framework, is because we agree on certain practices. The framework is designed to leverage those practices in a convenient and streamlined way.

The subtle art of the framework is to achieve accepted practice, without getting in the way of individuality – to enable us to adhere to the good practices that we agree upon, while still allowing us to be as different, as versatile and as colorful as we can!

In short, frameworks enable us to focus our efforts on “business logic”, the functionality that has real value to the website users. Thanks to the framework, we can focus on the practical requirements, without sacrificing the values that professional software developers care about: clean, maintainable and extensible code.

With a good framework, we can deliver value and high code quality, without charging extra for quality!

Rasmus Schultz has worked for web development companies, advertising agencies and a music software company during his extensive development career. His main strengths are software development and database design. Rasmus has more than a decade of experience with many development platforms, languages and standards.

Web Tools for Natural Language Processing

Tuesday, September 1st, 2009

We have been researching Web 3.0, which is the moniker assigned to the next generation of web applications that really understands what you are trying to do.

Part of creating “smart” web applications is understanding the semantics of what people type in, which implies using natural language processing.  Natural language processing software examines unstructured documents, and generates structured metadata that computers can handle.

Our application needed to understand phrases that people enter into a web browser.  We found three different approaches to handling this unstructured text:

SaaS APIs

These are hosted applications. All offer limited services at no charge, commercial services are generally pretty expensive. The major players appear to be:

Zemanta: offers an API with automatic tagging, among many other features.

OpenCalais: while it is by no means “open”, this API is powered by Reuters – which means that their “corpus” (body of words understood by the system) was composed using one of the world’s largest and most accurate volumes of text.

Alchemy API: offers automated categorization, tagging, keywords, etc.

NLP Toolkits

These are open-source toolkits (APIs that you can install on your own server) for analysis of unstructured text. Learning how to apply one of these might take a considerable effort – someone would have to learn at least the basics of NLP, to apply this software, or you might choose to hire a consultant with the the skills to develop this part of the application.

NLTK.org: a library written in Python, started in 2005, has been slowly creeping towards release 1.0 for the past year or so. While relatively young, it may be based on newer research than some of the more mature NLP libraries. Many corpora, grammar collections and trained models ready to use.

GATE: General Architecture for Text Engineering. Stable and proven toolkit for Java – this project started in 1995. Countless subprojects leverage this toolkit for various purposes.

FreeLing: Widely used toolkit in C++, with APIs for Java, PERL and Python. Online demos of this library demonstrate graphically how a short sentence can be broken down to a kind of tree-structure (nested subject/object, verb/adverb, etc.)

These are just a few examples – there are so many toolkits, and applications using these toolkits, that it would be impossible to make a choice based on a superficial analysis. To make a qualified choice, we would need to study at least the basics, or we would need the help of someone who knows enough about it to make a recommendation based on our needs.

Roll-your-own

Using e.g. MySQL, the Porter stemmer, a stop-word list and various other techniques to roll a basic search engine. Perhaps throw in a Bayesian text similarity measurement, to help rank the results and create stronger/weaker links between tables of keywords and posts.

It’s not NLP, and it’s not “web 3.0″, or “the semantic web” that everyone is buzzing about these days – because it does not understand semantics, and this will not yield the same kind of results – NLP systems “understand” unstructured text, where words like “not” and “really” can reverse or amplify the meaning of a subject – whereas anything you can roll on your own would most likely just recognize and consider these words “stop words” (ignoring them).

Rasmus Schultz has worked for web development companies, advertising agencies and a music software company during his extensive development career. His main strengths are software development and database design. Rasmus has more than a decade of experience with many development platforms, languages and standards.

Dynamic resource management in JavaScript

Wednesday, April 1st, 2009

One of the most important features of modern web sites overall, is their ability to respond quickly and come back with the information requested by the user. We live in an “attention economy” on the Web 2.0, and if your site does not respond pretty much instantly, the user will find a site that does.

One way that modern sites achieve this, is by loading in smaller increments – for example, loading the detailed content for an item on a list can be deferred using AJAX.

Another way to break down the loading of a web application into smaller chunks, is by loading various other resources, specifically JavaScript and CSS files, on demand.

Doing this in a compatible and clean way, can be tricky.

The Challenge

Let’s look at some of the requirements that a good solution to this problem must fulfill.

First of all, some of the JavaScript frameworks we work with, do provide some sort of resource management. But we work with many different frameworks, and sometimes with no framework at all, so our solution needs to be stand-alone as well as compatible with most major JavaScript frameworks.

Secondly, we target all modern browsers – and therefore, our solution must be fully cross browser , targeting Internet Explorer 6 through 8, Firefox, Opera and Chrome, and hopefully any other standards compliant future browsers.

JavaScript is (or can be) object-oriented, meaning that a class in one script could potentially extend a class in another script. Therefore, since we are going to load resource asynchronously, we must ensure that scripts are loaded and executed in order .

Another concern is notification of resource readiness. Since resources are loaded asynchronously, and sometimes have already been loaded once, our script needs to provide a callback-notification when the resource is ready and available.

And finally, we don’t want to write code for every project to keep track of what’s been loaded and what has not. In other words, we need a load-once method, so that classes, widgets and stylesheets can be automatically loaded the first time they are needed.

Limitations

We’re going to accept certain limitations of this script.

For one, we’re not going to attempt to do a lot of error handling – if a resource can’t be loaded, this is a problem that needs to be solved by the developer, and not really something you can provide a “pretty” solution for anyway.

And secondly, some resource management scripts attempt to inject scripts in a way that allows scripts from foreign domains to run without security limitations. This has certain other drawbacks that I won’t get into, but we’re going to assume that you’re loading scripts from your own domain.

Our Solution

Our script comes in the form of a classless singleton object, which provides two methods for loading resources on demand:

Loader.load( url, [callback-function], [context-object], [driver-name] );
Loader.once( url, [callback-function], [context-object], [driver-name] );

The Loader.load() method will allow you to load the same resource repeatedly – for example, you will be able to execute the same JavaScript more than once. Just keep in mind that duplicate class declarations, and duplicate initialization, for example, could cause problems. Most likely, you will only want to use this method if you want to intentionally overwrite existing variables.

The Loader.once() method will ensure that the same resource is only loaded once. Note that the callback will be called when the resource is ready , not only the first time when the resource actually loads.

Arguments for the two functions are identical:

  • url : required – relative (to your page) or absolute URL to a JavaScript or CSS resource.
  • callback-function : optional callback-function – called when the resource (and any resources requested before it) has loaded.
  • context-object : optional object to use as the context (this) for the callback-function. Use null (or leave out) if calling context is unimportant.
  • driver-name : optional driver-name, e.g. “css” or “js” – this determines how the loaded resource is handled. If unspecified, the loader will try to determine the driver by file-extension, but will default to “js” if the file extension at the end of the URL does not match “.css” or “.js”.

Conclusion

When you need to load a collection of resources, remember that they will be loaded in the order you request them. For example, loading components for a widget-based framework such as ExtJS is possible – you can load a component class declaration, and then load another class that extends it, but you must request them in order.

Another thing to keep in mind, when you need to know when a collection of resources are ready, you don’t need to attach a callback to every request – just attach your callback-function to the last request, as this won’t execute until all previously requested resources are ready.

Source Code

Finally, here is the source code for you to cut and paste:

/*

Version:   1.1
Developer: Rasmus Schultz
License:   GPL v3 <http://www.gnu.org/licenses/gpl-3.0-standalone.html>

Copyright 2009, Gorges Web Sites <http://www.GORGES.us>

Removing this notice from the source code would be bad karma.

*/

var Loader = {
q: [], // the Queue for pending items to be loaded
reg: {}, // a registry to ensure items are loaded only once
load: function(url, cb, context, driver, once) {
var dd = url.split('.').pop();
if (!Loader.drivers[dd]) dd = driver || 'js';
var p = new Loader.Proxy(
{ url: url, cb: cb || function(){}, reg: once, driver: dd, context: context }
);
Loader.q.push(p);
Loader.next();
return p;
},
once: function(url, cb, context, driver) {
this.load(url, cb, context, driver, 1);
},
next: function() {
for (i=0; i<Loader.q.length; i++) {
var l = Loader.q[i];
if (l.state == 1) return; // already loading
if (l.state == 0) return l.load(); // not loading (and not yet loaded)
}
}
}

Loader.Proxy = function(opt) {
this.driver = opt.driver;
this.context = opt.context || this;
this.url = opt.url;
this.reg = opt.reg;
this.state = 0; // inactive
this.cb = opt.cb;
this.load = function() {
if (this.reg &amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp; Loader.reg[this.url]) return this.loaded(); // already loaded once
this.state = 1; // loading
var hd = document.getElementsByTagName("head")[0];
var el = Loader.drivers[this.driver](this, this.url + (this.url.indexOf('?') == -1 ? '?' : '&amp;amp;amp;amp;amp;') + new Date().getTime());
hd.appendChild(el);
}
this.loaded = function() {
this.state = 2; // loaded
if (this.reg) Loader.reg[this.url] = 1;
this.cb.call(this.context);
Loader.next();
}
}

Loader.drivers = {

js: function(proxy, url) {
var el = document.createElement('script');
el.type = 'text/javascript';
el.src = url;
var me = proxy;
if (el.attachEvent) { // IE
el.attachEvent('onreadystatechange', function() {
if (el.readyState == 'loaded' || el.readyState == 'complete') me.loaded();
});
} else { // DOM
el.onload = function() { me.loaded(); }
}
return el;
},

css: function(proxy, url) {
var el = document.createElement('link');
el.rel = 'stylesheet';
el.type = 'text/css';
el.href = url;
el.media = 'all';
new (function(link, proxy){
this.index = document.styleSheets.length;
this.link = link;
this.proxy = proxy;
var me = this;
this.check = function() {
try {
var s = document.styleSheets[me.index];
if ((s.rules || s.cssRules).length) { // DOM || FF
window.clearInterval(me.int);
me.proxy.loaded();
}
} catch (e) {};
}
this.int = window.setInterval(this.check, 100);
})(el, proxy);
return el;
}

}
Rasmus Schultz has worked for web development companies, advertising agencies and a music software company during his extensive development career. His main strengths are software development and database design. Rasmus has more than a decade of experience with many development platforms, languages and standards.