Friday, September 9, 2011

DB2 pureXML for the Protein Data Bank - Managing Atoms and Molecules with XML

Yesterday a new interesting article was published on developerWorks, "Managing the Protein Data Bank with DB2 pureXML". It describes how scientists with highly complex data (the Protein Data Bank), atoms and molecules that make up protein, can benefit from a highly sophisticated database management system, DB2.

At the core of the solution is pureXML to reduce the schema complexity (see the graphic for a relational design in the article). Compression is used to keep the database small and keep I/O to a minimum. Now add database partitioning across multiple nodes, range partitioning and multi-dimensional clustering to further help organize the data and to cut complexity and improve performance.

What a good example of combining many of DB2's features for the benefit of science.

BTW: This work was possible through the IBM Academic Initiative which as part of the benefits allows free use of DB2.