This last weeks OR09 meeting unleashed a number of new epiphanies around DSpace 2.0. I feel these realizations should lead us to the next step in way we represent entities like DSpace Items.
One of these ideas has to do with the recognition that an Item is actually a “View” or a “Join” of properties from separate functional areas such as “content”, “history”, “policy”, and “presentation”. Here are a few examples of those areas.
1.) Content: An underlying store in DSpace might be JCR, Fedora, DSpace, and s3. Each of these maintain a separate model expressing content their domain. Fedora (Fedora Objects and Datastreams), JCR (JCR Nodes, JCR Properties, JCR References), DSpace (Collections, Communities, Items, Bundles, Bitstreams, etc).
2.) History: Each of these above storage systems presents Versioning, History or Provenance differently in its above ER (Entity Relationship) model. While the Content domain may provide a representation of that versioning history, there is still the requirement of a separate service layered (or not layered) on top of that to work with the at Versioning model.
3.) Policy: Certainly each of these expresses a set of policy information, which is generally managed in a separate service or layer on top of the Entities expressed therein.
4.) Presentation: Once you have these expressed, external tools will seek to have varying “expressions” of the previous 3 areas that are specific to a particular domain/tool (HTML, AJAX/Web 2.0, OAI-PMH, LOD, ATOM/APP) Some of these various communities seek to constrain what out of the previous 3 areas can be expressed within their transmissions.
I feel that being at the nexus of all these functional domains, DSpace 2.0 has an opportunity to be something very powerful. If we can recognize that the “presentation” of any DSpace 2.0 Entity is a JOIN of the data from each of these areas, and that we have different “Services” responsible for expressing, accessing and persisting that “domain specific” data, then its clear that for any one entity in the DSpace model that we actually have N possible domain specific representations of an Entity expressed.
Content: Content in an underlying stores model on a per Entity basis.
History: Provenance, Versioning, and Change History Trail in Harmony per Entity.
Policy: Access Control and other Policy rules per Entity.
Presentation: Rendering details specific to a serialization of the Entity.
For much of the week I had been battling if we would want to provide Access Control configurability on a per property basis in DSpace 2.0, such that one would set explicit access rights on each property (regardless of its “functional use”) in an Entity, it seemed that it could get very unwieldy, bloated and slow as the store grew. As an alternative, I was exploring mediating access control for only Entities on a per Service basis. Given the above analysis, I’m beginning to think this is the more reasonable approach.
To give an example, an Item might have a set of System properties, Descriptive properties, Provenance properties, and Policy controls; all stored in separate services. In DSpace 1.x provenance was stored directly in the metadata table (I.E. the Content Domain). Capturing changes on the “Content Domain” of the Item becomes difficult to mediate rights on because the changes themselves get encoded as changes in the “Content Domain” of the Item.
Work was ongoing at MIT by Mackenzie Smith to separate out that change history into an independent service based on a “history” triple-store. This is actually the right direction we should be going in for DSpace 2.0 and I think we may now see the correct path to integrate not just the work completed in the Pledge project into DSpace 2.0, But also possibly some of the significant work completed in last years Google Summer of Code on Fedora integration and Versioning of DSpace Items.