Friday, January 29, 2010

Things to watch out for Java novices


In the process of learning Java one encounters many points to which a novice should pay special attention. Doing Sun tutorials obviously helps people to come to grips with some of these points but some salients points have been highlighted here in random order.
No constructor can have a return type (not even void).
We can override a private member of a superclass in the subclass with a private/public/protected member. Effectively it is not override as subclass is not even aware of the superclass private member and sees it as a new definition. The rules of overriding do not apply, so you can make this newly-declared-but-just-happens-to-match method declare new exceptions, or change the return type, or anything else you want to do with it.
No modifier means method has default access. Default is similar to protected. However in protected subclass can be in different package but in default access it has to be in the same package.
default = package; protected = package+subclasses access.
Protected member can be accessed through inheritance but not not through access to an instance of parent object, if it is in a different package. Once the subclass-outside-the-package inherits the protected member, that member (as inherited by the subclass) becomes private to any code outside the subclass.
There is never a case where an access modifier can be applied to a local variable. It will not compile. We can only use 'final' modifier for local variables.
A reference variable marked final can’t ever be reassigned to refer to a different object. The data within the object, however, can be modified, but the reference variable cannot be changed. There are no final objects, only final references.
If you declare a final instance variable, you’re obligated to give it an explicit value, and you must do so by the time the constructor completes.
Static methods can’t be overridden, they are hidden.
You can put as many classes in a source code file as you like, but only one (or none) can be public. The file name should match the name of the public class, but if no public class is in the file, you can name it whatever you like. The order in which the classes appear makes no difference.
Not having a proper main() method is a runtime error, not a compiler error!
All variables defined in an interface must be public, static, and final, ie only constants. If we don't give these modifiers or give only part of them, they are automatically given these three characteristics.
An interface is free to extend multiple interfaces.
An inner class instance has access to all members of the outer class, even those marked private.
Just because a series of threads are started in a particular order doesn't mean they'll run in that order.
sleep() and yield() static methods always affect the thread that's currently executing and not another thread.
If a thread goes to sleep, it holds any locks it has—it doesn't release them.
Default exception handler
– Provided by Java runtime
– Prints out exception description (e.getMessage())
– Prints the stack trace (e.printStackTrace()) , ie hierarchy of methods where the exception occurred
– Causes the program to terminate
Collections.disjoint(l1, l2) — determines whether two Collections are disjoint; that is, whether they contain no elements in common.
All enums implicitly extend java.lang.Enum. Since Java does not support multiple inheritance, an enum cannot extend anything else.
The constructor for an enum type must be package-private or private access.
An enum cannot be declared within a method.
As an inner non-static class is associated with an instance, it cannot define any static members itself. An inner class must be static to be called from static method.
Polymorphism doesn't apply to static methods.
Polymorphism only applies to instance methods and not instance variables
System class maintains aProperties object that describes the configuration of the current working environment. To maximize portability, never refer to an environment variable (extracted from System.getnnv) when the same value is available in a system property (extracted from System.getProperties). Use latter value.
For example, to get the value of path.separator, use the following statement: System.getProperty("path.separator");
System.exit, which terminates the Java virtual machine with an exit status, invokes SecurityManager.checkExit to ensure that the current thread has permission to shut down the application.
By convention, an exit status of 0 indicates normal termination of the application, while any other value is an error code.The exit status is available to the process that launched the application.
All byte stream classes are descended from InputStreamand OutputStream.
All character stream classes are descended from Readerand Writer.
To use line-oriented I/O use two classes BufferedReader andPrintWriter. Latter is used in servlets.
Channel I/O reads a buffer at a time.
There are four buffered stream classes used to wrap unbuffered streams:BufferedInputStream andBufferedOutputStream create buffered byte streams, whileBufferedReader andBufferedWritercreate buffered character streams. They improve I/O performance.
When you need to create a formatted output stream, instantiate PrintWriter, notPrintStream.
Data streams support binary I/O of primitive data type values (boolean, char, byte, short, int,long,float, and double) as well as String values. All data streams implement either theDataInput interface or theDataOutput interface.
Just as data streams support I/O of primitive data types, object streams support I/O of objects.
Using a int as a return type on byte input stream allows read()to use -1 to indicate that it has reached the end of the stream. We can use (char) cast to see the value of int read, ie int 65 is same as 'a' after cast. For line-oriented I/O use inputStream.readLine() != null to detect end of file.
Notice that DataStreamsdetects an end-of-file condition by catching EOFException, instead of testing for an invalid return value.
When a method accepts a varargs argument, you can pass it a comma-separated list of values or an array of values. The vararg has to be the last argument and there can be only one vararg.
String s = "Java"; s.concat( " Rules); System.out.println(s); will give 'Java' not 'Java Rules' as the output of s.concat(0 is not placed in our reference for s. To get 'Java rules' we have to say s = s.concat(). Without this syntax the newly created string has no reference. In the case of StringBuffer sb, sb,.append() will append the value without assigning to the reference.
Static variables are never saved as part of serialisation as they belong to class and not object.
If serializable subclass is serialised and parent is non-serialiazable then parent will have initial values as per creation of new object.
When using classpath the last directory in the classpath must be the super-directory of the root directory of the package, ie if package is com.my... then the classpath should have directory which has com as subdirectory.
Put jar file in jre\lib\ext and java finds it without specifying in classpath.
Static imports can be used when we want to use a class's static members.
Collection (singular) is an Interface, but Collections (plural) is a helper class.
equals() returns false if the wrapper object types are different. It does not raise a compiler error.
No inner class (non-static inner class) can have a static member.

Sunday, January 24, 2010

Java EE journey



As people are looking forward to new version of Java to arrive in March, I thought it would be a good idea to highlight where we are in Sun's own words. It has been a tremendous journey and becoming stronger by the day.

Thursday, January 21, 2010

Understanding inner classes

We are accustomed to classes having variables and methods. However in JDK1.1 inner classes were introduced. They are effectively a class nested within a class, hence also known as nested classes. Their introduction respected the design principles of encapsulation and cohesiveness. Any functionality which naturally merited a special class but was so intertwined with a given class that it was deemed logically necessary to define it within the context of that class gave birth to inner class. For example the event handling class is inextricably linked to GUI class so is a natural candidate for becoming an inner class. The key advantage is that the nested class instance has access to the instance members of the outer class, including those with private modifier. This power to access private members could be deemed to flout encapsulation.

On compiling the basic minimum Java code for an inner class

class TestOuter {
class TestInner {}
}

we get TestOuter.class and TestOuter&TestInner.class files. The class file name for the inner class makes it clear that it is within the context of outer class. These inner classes can be public, private, protected or default-package.

They come in four flavours:

- the normal inner class
- static nested inner class
- method-local inner class
- the anonymous inner class

1. The normal inner class

The normal inner class follows the code pattern below:

public class TestOuter {

private String name = "Rajeev";
public static void main(String[] args) {
TestOuter myOuter = new TestOuter();
// the syntax for instantiating inner class makes use of outer class object
TestOuter.TestInner myInner = myOuter.new TestInner();
myInner.innerMethod();
}

public class TestInner {
public void innerMethod() {
System.out.println("hello " + name + " from outer");
}
}
}

The code produces the output ‘hello Rajeev from outer’ and clearly shows the instantiation process and inner class object accessing private members of outer object. The key thing to note is that new is invoked on object of outer class. The instantiation of outer class does not automatically instantiate the inner class.

The inner class can access outer class variables with the same name with the syntax:

OuterClassName.this.variableName

Within inner class this refers to the inner class object so to get hold of outer class reference we need to use OuterClass.this

2. The static nested inner class

All the modifiers which are applicable to any member of outer class can be equally applied to inner class. Therefore when we use the static modifier on inner class it becomes the ‘static nested inner class’. Obviously an outer class can never be static.

public class TestOuter {
static class TestInner {}
}

The static just means that the inner class can be accessed without an object of the outer class. The syntax for reference to this static class is

TestOuter.TestInner n = new TesterOuter.TestInner();

We can then invoke any method of this static class on this object. However, the key is that a static inner class does not have access to the instance variable of outer class and not does it have access to non-static methods of outer class as there is no associated object of outer class.

3. The method-local inner class

When we place the inner class within a method of the outer class then we get a method-local inner class. No other method has access to this class. This class is most suitable for checking pre- and post-conditions in Design-By-Contract. Within the body of a method we can define an AssertionChecker class to check for the method’s conditions through assert mechanism. This way the initial value of all the variables required for post-condition check can be saved if necessary in the AssertionChecker object. For example in banking system we may be interested in saving the value of initial balance prior to depositing new amount to check that the post-condition of correct amount update has been met. With this approach, once the assertion mechanism is switched off it doesn’t have any side-effects.

public void myMethod(final int myVar) {
// a method-local inner class for checking pre- and post-conditions
class AssertionChecker {
// define variables so they don’t have side effect when we don’t need these checks
private int assertionVar;
boolean precondition() {…}
boolean postcondition() {…}
}
// check the method’s pre-condition
AssertionChecker check = null;
// cannot declare within assert so above reference has to be done outside
assert (check = new AssertionChecker()) !- null && check.precondition;
//
// code the method’s logic
//
// check the method’s postcondition
assert check.postcondition();
}

The parameter to myMethod was marked as final as this type of inner class only has access to method local variables thus marked and cannot access method’s other local variables. This requirement happens because the class can still exist while the method’s variables have been removed from the stack. Just as local variables do not have private, public, static etc modifiers, we cannot apply these to method-local inner class either.

4. The anonymous inner class

These nameless inner classes effectively subclass an existing class or implement an interface and the code just happens to follow the initiation.

Let us say we have an existing Animal class with an eat() method and we use the syntax

Animal myPet = new Animal() {public void eat() {
System.out.println(“The pet is eating”) }
}; // note the semicolon termination of the anonymous block

We now have myPet object which refers not to Animal but to an anonymous subclass of Animal. The runtime polymorphism comes in play whenever we invoke methods on myPet object.

Incidentally if Animal were actually an interface (OK its name would have been Animality) rather than class then the anonymous definition becomes the implementation of the interface. We could have this type of anonymous implementation in mrethods arguments as well. The syntax become myMethod(new Animal() {// definition}); Note the peculiar }); ending in the syntax. We can see how the logic for ActionListener be implemented within our GUI with each component using these type of anonymous classes.

myButton1= new JButton();

myButton.addActionListener(
new java.awt.event.ActionListener()
{
public void actionPerformed(java.awt.event.ActionEvent e)
{
// do something
}
}
);

The article should have given flavour of the number of ways we can use inner classes and their syntax.


Tuesday, January 12, 2010

Is SaaS a throw back to computer bureaux of yore?

Computer bureaux date back to the era when computers were expensive and batch processing and dumb terminals multiplexed into central mainframe to maximise usage were the norm. A discrete service like payroll or accounting systems was offered on a centralised server on time-sharing basis for multiple-clients and the client data was transferred via magnetic tapes and disks for batch processing. The charging was usage-based for the computer time needed. The SaaS offerings provide web-based, installation-free access to managed services on centralised hosts providing integrated applications like enterprise resource planning systems or customer relationship management. The latter are truly distributed offerings, whereby data from the central repository could be manipulated on the local PC. The charging is normally based on user population and concurrent users. The motivation in 60-70s was sharing expensive resources but nowadays concerns like availability, scalability, reliability and security are paramount. The disjointed, slow, batch and cumbersome approach of the bureaux has acquired 24x7 availability, responsiveness and seamless-integration in SaaS world. The whole burden of performing license-management, version-control, resilient-configuration, secure-access, disaster recovery etc is devolved on ASP which is more complex nowadays. We have moved from pseudo-parallelism of bureau to distributed, concurrent environment of SaaS. So despite the surface level similarity in their approach they are two distinctly different beasts. In both cases the application resides outside the enterprise but the expectations, operations, technology, rationale and scope are totally different.

The computers were bulky, slow and expensive resource in pre-70s so it made sense to share them for common business functions like payroll amongst a number of clients. The rapid advances in technology heralded advent of PC in 80s, then subsequent increase in memory availability, faster CPUs and faster communication speeds made it viable to have in-house LAN-based client-server offerings to support these functions. The PCs were cheap enough to allow individuals ownership without worrying about idle time. Also the business users sought decision-support systems to complement transactional systems and the IS/IT departments started in the companies thus fading time-sharing.

The complexity involved in deploying and upgrading software in distributed environment, the consequential difficulties in negotiating relevant licenses, the interoperability issues, ubiquity of browser-based client, fast-and-cheap communication, affordable scalability, the trend towards outsourcing etc have all aided the drive towards SaaS. Hardware and software technology is seen as purchasable commodity and the organisations prefer to concentrate on their core competencies, expecting secure and resilient service from experts. The ASPs also feel confident that benchmarks exist to provide requisite concurrency and performance from their server farms, allowing them to focus on their domain-expertise. Also the approach is usually cheaper than in-house solution when TCO is taken into account. All these factors mean that SaaS offerings will continue to grow in foreseeable future.

Tuesday, January 5, 2010

Use servlets and JSP judiciously

All the sizeable Java EE applications use a framework like Struts, Java Server Faces etc where the controller is invariably a servlet object. The key disadvantage of servlet technology is that even a minor modification to static content requires changes to the Java code to output HTML. The JSP overcomes this shortcoming by combining HTML and Java. The static part can be pure HTML where as the dynamic aspect can be managed by including code in JSP tags for directive, scripting and action. (Incidentally, JSP tag libraries and JSP expression language are preferred vehicles for including Java code into JSP nowadays). A combination of JSP and servlets provides horses for courses. Historically, the servlet technology is the forerunner to JSP for dynamic content and, unsurprisingly, each JSP page translates to servlet code prior to execution. Thus it is common to use both servlet and JSP technologies in applications as servlets are inevitable but JSP provides convenience, simplicity and ease of development. It also facilitates segregation of responsibilities amongst development teams as the web designers can focus on rather static presentation aspects in JSP whilst the Java developers concentrate on processing logic in servlets and custom tag libraries. This can be enforced by declaring some JSP pages as scriptless in the deployment descriptor of the application. Also in the prevalent model-view-controller architecture, the servlets act as controller whilst the JSP pages provide views. Both technologies are capable of invoking each other so we can focus on the best solution for the task at hand.

However, in the real world to create a truly interactive web application we will go a step further and use Java Server Faces (JSF) technology which builds on these two technologies. With JSF2.0, it is even deemed that JSP is deprecated for creating views.

Why P2P?

The peer-to-peer (P2P) software architecture has been instrumental in changing the landscape of the music industry. The drive by the music industry to have the free Napster file sharing service outlawed in 2004 to protect their intellectual capital points to the negative impact on their revenue. However, groups like Arctic Monkeys have seen it as an opportunity to deliberately share their demo CDs free of charge to build fan base with a little marketing outlay.

In pure P2P architecture all the participating computers have equal roles and can act as both client and server. There is no concept of centralised server so there is no single point of failure. Without any centralised server, the P2P application will keep track of users with installed software, locate them and search for the files in their P2P storage area. A hybrid P2P like Napster had a centralised server keeping an index of all the available files and the currently signed on users with their available files to facilitate searches. The essential feature of P2P file transfer is that once a host server has been established which has the required file, the communication is directly between the requesting client and the responding host. The network doesn’t suffer from degraded performance when more clients are connected as in traditional client-server architecture; as each new client brings its own resources (ie storage, bandwidth and CPU processing power) to the table to increase network capacity. Also the files are available from a number of connected clients so there is potential for selecting from the most responsive host and there is built-in fault tolerance as the files are available on a number of hosts.

Saturday, January 2, 2010

Access restrictions and integrity constraints clarified

Access restrictions are enforced by the DBMS facility that ensures that only authorised users gain access to the DBMS. For example, a valid user is allowed to manipulate a table with given access rights.

Integrity constraints are constraints that maintain the consistency and correctness of data.

They protect the contents of the database in totally distinct ways. When a DBMS restricts access to a user to do certain things, like granting right to only view a table rather than update it, then it is ensuring that the data providers with responsibility for maintaining the accuracy of the data content enters, updates and deletes the data while the data consumers just have the privilege to review data for their proper functioning. Also the sensitive company data pertaining to finance and personnel functions can be shielded from the prying eyes of those who have no need for direct access to this data by not authorizing access to these data areas. Essentially, access restriction is ensuring that only the data necessary and sufficient for carrying out a job is made available to the person and the rest of the data is hidden from him. Through access restrictions we can segregate responsibilities within the organisations by providing access authorization to data horizon necessary for a role. These security access restrictions are centrally defined and DBMS automatically enforces them while accessing the database (Block1,p24). The data correctness is achieved by proper responsibility sharing through access privileges and obviating the potential for unauthorised rogue data manipulations. Any SQL statement issued by a user can only be commensurate with his authorised access profile or DBMS will not execute it.

Integrity constraint ability to enforce consistent and correctness is best understood through example. If we have a geographical hierarchy with levels of company, country, region and world (eg Unilever UK, UK, Europe, Global held in COMPANIES, COUNTRIES, REGIONS and WORLD tables) then while defining a company in COMPANIES table it ensures that it is only linked to the valid countries defined in the COUNTRIES table in the database and countries and linked to the valid regions in REGIONS table. If we had linked a company to an undefined country in the database, then while aggregating regional data this rogue country would have been missed in table linkages as it is not part of the hierarchy. Also if we try to delete a region in our REGIONS table while there are countries linked to that region in the COUNTRIES table then integrity constraint could either prevent us from carrying out this operation or cascade the delete to all the records in COUNTRIES table that are linked to REGIONS table with region_id as the foreign key. All these type of rule governing referential integrity are stored in the system catalog managed by the DBMS and are automatically enforced by DBMS without required any programming intervention by the developer (Block1,p24). Although we have focussed on the referential integrity in our example, we can see the data integrity being enforced by DBMS when it enforces user-defined rules like date of birth has to be lower than date of school start or numeric phone number should not have any arithmetic operations performed on them. We can define a number of integrity constraints like email address must include ‘@’, the values for a particular column must be within 100-800 range or area code is restricted to a subset of predefined codes etc. These are all examples of integrity constraints defined to ensure the correctness and validity of the data contained in the database. The integrity constraints could also be implemented through pre-insert, pre-update, pre-delete triggers etc on the tables.

In brief, the access restriction forces correctness through security measures whilst the integrity constraints enforce correctness and consistency of contents through defined constraints on columns in the table. With access restriction no person would be able to insert, update or delete information in a table without appropriate privilege. Whilst the integrity constraint will ensure that the authorised personnel can only add data which conforms to pre-defined rules.

Why datawarehouse and OLAP tools when there is data duplication?

The data warehouse used by OLAP tools has large quantities of integrated, normally summarised, historical data which is time-stamped. The data is normally added to the data warehouse on regular frequencies rather than being updated to form an enterprise-wide, integrated repository to support data mining. All the updates to the various transactional systems are incrementally added to the data warehouse to accurately reflect the reality on the date of last extract. In the OLTP system the data has to be absolutely accurate as it is dealing with the operational transactions and it has to respond within timely fashion. Also in OLTP systems the non-current data is archived to reduce storage needs and enhance performance. The time-stamped nature of the data in the warehouse means that the business reality on a defined date can be analysed for strategic purposes without worrying about the performance impact on the transactional systems. The historical data could span a number of years to facilitate trend analysis and to seek correlations. The OLAP tools allow business users to slice and dice data, discover anomalies and drill-down to the root causes. For example, the decline in a brand’s performance could be correlated to the rise of a new launch by a competitor or the decline in advertising expenditure to support the brand or even the changing economic climate. The powerful data mining tools can carry out statistical analysis, use artificial intelligence, neural networks, and machine learning etc to unearth unexpected correlations and anomalies. There is no way such an analysis could have been done in a transactional system as it would not have access to competitor’s information or macroeconomic data. Also the normal star schema of a data warehouse is optimised for analytical processing and, may, hold aggregates. Thus the data duplication in the warehouse is being used to support a different business objective from the one expected of OLTP system. The governance structure around the warehouse ensures accuracy of data on the date of last extract from transactional systems which is incrementally added. Apart from the data extraction overhead, the OLAP system doesn’t impact the OLTP system but allows a wider business objective of data analysis to be achieved. Thus making investment in data warehouse worthwhile, despite the seemingly duplication of data.

Advantages of namespaces

Namespaces are a set of names, distinguished from other sets by being identified with a particular URI. It is a mechanism for differentiating elements. The syntax for defining a non-null prefix namespace is

xmlns:prefix:”URI”

This prefix followed by a colon is added to each tag within a vocabulary to make it unique. For example, if the prefix is ‘rdf’ then the elements would be of the form . As xmlns is a reserve word, we can establish all the namespace definitions in a given file by searching for xmlns.

They are indispensable part of XML documents so a clear appreciation of their role is a pre-requisite for a person embarking on XML/XSL development. The key benefits offered by namespaces are:

  • Facilitate use of different XML vocabularies in the same XML document by resolving conflicts stemming from identical tags being used in different vocabularies. The name clashes need to be averted. The namespace qualifier makes the tag globally unique, thus obviating any ambiguities. It is not unusual to have multiple vocabularies in a single document; for example XSLT stylesheet needs three different XML vocabularies so avoiding name clashes is of fundamental importance.
  • Provides a simple, abbreviated, XML-compliant prefix for a unique uniform resource identifier (URI), thus avoiding syntactical difficulties stemming from non-compliant characters if one were forced to use full URI for qualifying tags.
  • Improves readability - from a parser perspective short prefix is identical to full namespace name.
  • Allows an organisation to have their distinct tags with distinct meanings through different namespaces associated with different URIs and they can all be called within same XML document., ie Adobe can have different tags for different image formats and they can all be used in a single document,
  • Avoids tedium of typing by providing a mechanism for defining default namespace within a document so unqualified names automatically acquire this full qualification from parser perspective.
  • To constrain a document to an XML vocabulary, we need XML Schema which can only be defined by using the reserved namespace http://www.w3.org/2001/XMLSchema which has all the attributes and elements of W3C XML schema specification. Also the document validators check document instance against the structures, elements, attributes, datatypes and constraints defined in associated XML schema. The link to associated schema in a document instance is through a namespace. Incidentally, we should appreciate that a namespace URI does not necessarily point towards anything on the implied location.
  • Allows search engines to find similar documents with tags conforming to a namespace. In practice, namespace brings all the elements and attributes of a vocabulary together to be exploited by software.

Eclipse as an XML editing tool

A good XML editing tool should ideally be syntax-aware, context-sensitive, graphical and support namespaces to enhance productivity. Any experimentation with a simple text-editor like Notepad brings out all the frustration when the edited document acquires any complexity. It becomes tedious to indent the tags to improve readability, difficult to reorganise structures and a nightmare to debug errors. You may be a genius at handling XML but the fitness for purpose of a simple text-editor is undeniably questionable. Try converting a sub-element in an XML Schema instance into an attribute in a text-editor to understand the radical surgery required to achieve this simple task and the potential for errors. There are good commercial products like Stylus Studio (Progress Software Corporation) available to handle all aspects of XML development effectively but a free, feature-rich, open source product, namely Eclipse, is also admirable in its capabilities. Just look into Help\Software Update\Manage Configuration section of Eclipse to get a feel for the extraordinary variety of features provided by this integrated development environment. I will focus on three features which add value in editing XML documents and XML Schema instances.

Firstly, the syntax awareness is pervasive throughout the product. When we look at a simple xml document in Eclipse, it is immediately apparent that the elements are shown in one colour and their values in another and same is applicable to attributes. This colour scheme for handling all the colour preferences for different syntax elements is governed through the preferences management. Also if the document is linked to a schema, it automatically validates the content against the schema and underlines the text in the manner of wrong spellings in Word if the added content doesn’t adhere to the constraints of the data-type of the element. Incidentally, the contents which are not dictionary words are also underlined but in a different colour to support correction of possible mis-spelt words. The other dimension of this syntax awareness is context sensitivity.For example, when I type <>

The second major feature is the support for graphical editing. There is ‘outline’ view available which allows us to easily navigate through the tree structure by letting us expand and collapse various nodes. Clicking on any node in this outline view takes us to the corresponding element in the XML document. If this were not enough, we can toggle between design and source view in the main editing panel. In the design view, which is a refinement of outline view with content, we can easily edit the contents, whilst the source view lets us rapidly cut and paste appropriate sections. Right click on any node in design or outlook view allows us to add appropriate processing instructions, comments, elements etc.

The IDE automatically synchronises the editing place in these two distinct views to correctly reflect changes. Incidentally, the design view is displayed slightly differently graphically while editing an XML Schema instance. Carrying on the context-relevance theme, it shows sections relevant to defining a schema and their relationships. Right clicking on any of the constituents of these sections brings up a pop-up window for handling the type, ie for an element we may be able to set multiplicity while for a complex data type we may be able to add/remove elements or attributes. Also double clicking on any constituent of a section takes us to see its full definition graphically. All this has the beauty of letting us focus on one aspect type thoroughly. Once we get down to an element we can define constraints in graphic view like maximum length, enumerations etc and see the code appear automatically in the text mode. The only caution I will add is that design and source view can occasionally fall out of sync in an incomprehensible fashion. I particularly recall cutting and pasting a schema in my environment and playing with it. It insisted that a definition of an attribute from original schema were in my schema instance whereas the source view clearly showed that there was no such attribute. I believe this stems from multiple tools working together to harness the full power of XML and they occasionally don’t behave as well as one may expect.

The third impressive feature is the general XML support. We can easily click on format button to auto-indent the elements to improve readability. Or cleanup a document by automatically compressing empty element tags, or inserting required attributes, or inserting missing tags, or adding quotes around attribute values. The validate option can check a document against it schema and check it is well-formed. The refactor option allows a tag to be renamed through out a project to make it more meaningful. The schema definition template automatically refers to the appropriate URI http://www.w3.org/2001/XMLSchema. In fact, the various templates themselves are customisable. An XML document can be created from an existing schema and various options are allowed during creation. Also a catalogue of standard schemas is available. These XML support features can have occasional hiccup. I remember that a reformat of text in text-editor made a valid document against a schema invalid because the reformatting inserted a tag value on newline to make it more readable but it made it invalid against the schema as the content was from an enumeration and it didn’t like these extra whitespaces.

It should be clear that despite the small niggles the synergistic impact of these features makes Eclipse a potent environment for editing XML documents and XML Schema instances. However, it should also be noted that fans of NetBeans will notice that the similar features are available in that tool.

SOA recommendations for an SME

Here are some of the recommendations an SME should follow if it is toying with the idea of SOA and doesn't want to come to grief.

  • SOA advantages of speed-to-market, reduced cost, reuse, better business-IT alignment, better and faster decision makings have to be tempered with awareness of high implementation failures, increased governance and long-term commitment (Guah,2008,pp139-40). Practitioners advise making governance a priority and ‘think big but start small’ (Sumar,2008).
  • REST currently is religion for many who prophesise demise of WS-* and consider this ‘second generation’ style offers better maintenance, performance, scalability, extensibility, simplicity, security and condemn SOAP RPC as ‘DCOM for the Internet’ (Prescod,2002). SOAP uses POST to send its payload so the results cannot be cached thus thwarting scalability. If you are dealing with images then be aware that mashup with image galleries like Flickr is easier with REST (Simpkins,2009,p6). SOAP demands an image payload be packed using base64 encoding while REST allows image to be retrieved as a resource. If the business handles images and has CRUD-like interactions then Restful services commend themselves.

Although lightweight REST architectural style is deemed simpler and leverages existing HTTP protocol with intuitive resource-based URIs for web services and Amazon sees 80% REST and 20% SOAP usage (Anderson,2006), we should adhere to WS-* standards as the trading partners are likely to be using it and expect it. The standardisation, flexibility, reliability, ESB-support and ubiquitous toolkits offering productivity enhancements are compelling argument for SOAP-based services (Simpkins,2009e,p5;Haas,2005). Web services orchestration using process-modelling language, BPEL, require WSDL contract so BPEL cannot be used with Restful services (Simpkins,2009,pp5-6).

A compromise of using both approaches will not be possible for SME as supporting both styles stretches IT skill-base and investment. SOAP solution offers the benefits of large historical investment by standards bodies, tool suppliers, governments and business users and is currently more versatile so recommended despite its need for higher infrastructure investment. The WS-* standards are likely to win in the longrun (Simpkins,2009,p6).

  • Protect existing investment by building adapters for legacy applications. Various wrapping approaches are available to expose the functionality (Al-Belushi and Baghdadi,2007).
  • Use document/literal wrapped as the encoding model for WSDL as it is most versatile and WS-I compliant (Butek,2005).
  • An Eclipse IDE with plugins, jUDDI, MySQL and Tomcat open-source environment is fit for cheap experimentation and implementation without incurring huge licensing costs of commercial products. Apache Synapse ESB can be considered. Eclipse support both SOAP and Restful services.
  • Discover domain-specific services and build relationship with suppliers rather than construct everything internally. For example, banks may provide currency conversion service. Publish selected internally developed services in public UDDI to expose image catalogue and win business.
Although many other aspects can be discussed but this should give a head start with tool selection and provide a handle on rather fanatical SOAP vs REST debate.

References

Al-Belushi W. and Baghdadi Y. (2007) ‘An Approach to Wrap Legacy Applications into Web Services’, Proc. Int'l Conf. Service Systems and Service Management (ICSSSM '07), pp.1-6, June 2007.

Anderson, T. (2006) 'WS-* vs the REST', Reg Developer, 26 April [online], http://www.regdeveloper.co.uk/2006/04/29/oreilly_amazon/ (accessed 3 June 2009)

Butek, R. (2005) Which style of WSDL should I use? [online], IBM, http://www.ibm.com/developerworks/webservices/library/ws-whichwsdl/ (accessed 3 June 2009)

Ghah M. W. (2008) Managing Very Large IT Projects in Businesses and Organizations, Idea Group, Pennsylvania

Sumar S. (2008) Making Your SOA Journey Successful – Key Aspects [online], Infosys, available at http://www.infosysblogs.com/soa/2008/08/making_your_soa_journey_succes.html (accessed 3 June 2009)

Prescod, P. (2002) Second Generation Web Services [online], O'Reilly Media, Inc., http://webservices.xml.com/pub/a/ws/2002/02/06/rest.html (accessed 3 June 2009)

Simpkins, N. (2009a) ‘Block 3 part 5: Web services messaging with HTTP’, in T320 E-business Technologies: Foundations and Practice, The Open University, Milton Keynes