17 July 2010

JPA 2.0: Mapping a Map

JPA 2.0 has added support for persistent maps where keys and values may be any combination of basic types, embeddables or entities.

Let's start with a use case:

The Use Case


In an internationalized application, working with plain old Strings is not enough, sometimes you also need to know the language of a string, and given a string in English, you may need to find an equivalent string in German.

So you come up with a LocalizedString, which is nothing but a plain old String together with a language code, and then you build a MultilingualString as a map of language codes to LocalizedStrings. Since you want to reuse LocalizedStrings in other contexts, and you don't need to address them individually, you model them as an embeddable class, not as an entity.

The special thing about this map is that the keys are part of the value. The map contents look like

'de' -> ('de', 'Hallo')
'en' -> ('en', 'Hello')

The Model


This is the resulting model:


[Update 20 July 2010: There is a slight misconception in my model as pointed out by Mike Keith in his first comment on this post. Editing the post in-place would turn the comments meaningless, so I think I'd better leave the original text unchanged and insert a few Editor's Notes. The @MapKey annotation below should be replaced by @MapKeyColumn(name = "language", insertable = false, updatable = false) to make the model JPA 2.0 compliant.]

@Embeddable
public class LocalizedString {

    private String language;

    private String text;

    public LocalizedString() {}

    public LocalizedString(String language, String text) {
        this.language = language;
        this.text = text;
    }
    
    // autogenerated getters and setters, hashCode(), equals()
} 
 
@Entity
@Table(schema = "jpa", name = "multilingual_string")
public class MultilingualString {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    @Column(name = "string_id")
    private long id;

    @ElementCollection(fetch=FetchType.EAGER)
    @MapKey(name = "language")
    @CollectionTable(schema = "jpa", name = "multilingual_string_map", 
                     joinColumns = @JoinColumn(name = "string_id"))
    private Map<String, LocalizedString> map = new HashMap<String, LocalizedString>();

    public MultilingualString() {}
    
    public MultilingualString(String lang, String text) {
        addText(lang, text);
    }
    
    public void addText(String lang, String text) {
        map.put(lang, new LocalizedString(lang, text));
    }

    public String getText(String lang) {
        if (map.containsKey(lang)) {
            return map.get(lang).getText();
        }
        return null;
    }
    
    // autogenerated getters and setters, hashCode(), equals()
}



The SQL statements for creating the corresponding tables:

CREATE TABLE jpa.multilingual_string
(
  string_id bigint NOT NULL,
  CONSTRAINT multilingual_string_pkey PRIMARY KEY (string_id)
)

CREATE TABLE jpa.multilingual_string_map
(
  string_id bigint,
  language character varying(255) NOT NULL,
  text character varying(255)
)

The Specification


The most important and most difficult annotation in this example is @MapKey. According to JSR-317, section 2.1.7 Map Keys:

If the map key type is a basic type, the MapKeyColumn annotation can be used to specify the column mapping for the map key. [...]
The MapKey annotation is used to specify the special case where the map key is itself the primary key or a persistent field or property of the entity that is the value of the map.

Unfortunately, in our case it is not quite clear whether we should use @MapKey or @MapKeyColumn to define the table column for our map key. Our map key is a basic type and our map value is not an entity, so this seems to imply we should use @MapKeyColumn.

On the other hand, our key is a persistent field of the map value, and I think the whole point of the @MapKey annotation is to indicate the fact that we simply reuse a property of the map value as the map key, so we do not need to provide an extra table column, as the given property is already mapped to a column.

The way I see it, replacing @MapKey by @MapKeyColumn(name = "language_key") - note the _key suffix! - is also legal, but then we get a different table model and different semantics: The table jpa.multilingual_string_map would have a fourth column language_key, this language_key would not necessarily have to be equal to the language of the map value.

Another open question: Is it legal to write @MapKeyColumn(name = "language")? If so, this should indicate that the language column is to be used as the map key, so this would be equivalent to the @MapKey annotation. On the other hand, you might say that this annotation indicates that the application is free to use map keys that are independent of the map values, so this contract would be violated if the column name indicated by the annotation is already mapped.

The Persistence Providers


I've tried implementing this example with the current versions of Hibernate, Eclipselink, OpenJPA and DataNucleus. I did not succeed with any of them. Only OpenJPA provided a workable solution using @MapKeyColumn, but as I said, I'm not sure if this usage is really intended by the specification.

[Update 20 July 2010: With the corrected model, the updated verdict is: Only OpenJPA passes the test, the other three bail out for various reasons.]

Let's look at the contestants in turn:

Hibernate


Using the mapping defined above, Hibernate 3.5.3-Final complains:

org.hibernate.AnnotationException: Associated class not found: LocalizedString

Apparently Hibernate is expecting the map value to be an entity not an embeddable.

Using @MapKeyColumn(name = "language"), the exception is

org.hibernate.MappingException: Repeated column in mapping for collection: MultilingualString.map column: language

Finally, with @MapKeyColumn(name = "language_key"), Hibernate no longer complains about duplicate columns, but I end up with a redundant table column in my database which I was trying to avoid.

Another problem with Hibernate is different behaviour when working with XML mapping data instead of annotations (which is what I prefer for various reasons, but that's a topic for another post).

Using XML metadata for this example, Hibernate happily ignores the table names from the metadata and simply uses the default names. I filed a bug report in April 2010 (HHH-5136), with no reaction ever since.


Eclipselink


Using Eclipselink 2.1.0, I simply get a rather cryptic exception

java.lang.NullPointerException
 at org.eclipse.persistence.internal.queries.MapContainerPolicy.compareKeys(MapContainerPolicy.java:234)

With @MapKeyColumn=(name = "language"), Eclipselink also complains about a duplicate column, and changing the name to language_key, my test finally passes, at the expense of a redundant column, as with Hibernate.

OpenJPA


With OpenJPA 2.0.0, the message is

org.apache.openjpa.persistence.ArgumentException: Map field "MultilingualString.map" is attempting to use a map table, 
but its key is mapped by another field.  Use an inverse key or join table mapping.

which I can't make sense of. Switching to @MapKeyColumn=(name = "language"), the new message is

org.apache.openjpa.persistence.ArgumentException: 
"LocalizedString.text" declares a column that is not compatible with the expected type "varchar".  

Its seems OpenJPA is confused by the column name text which sounds like a column data type. After adding @Column(name = "_text") to LocalizedString.text, my test case works and my database table only has three columns.

DataNucleus


DataNucleus 2.1.1 complains

javax.persistence.PersistenceException: Persistent class "LocalizedString" has no table in the database, 
but the operation requires it. Please check the specification of the MetaData for this class.

I'm getting the same message with all three variants of the annotation, so it appears that DataNucleus simply cannot handle embeddable map value and expects them to be entities.

Conclusion


Mapping maps with JPA is much harder than you would think, both for the user and for the implementor. Hibernate, Eclipselink and OpenJPA have all passed the JPA TCK. DataNucleus would have liked to do so, but they have not yet been granted access to the TCK.

All four implementors failed this simple map example to various degrees, which implies that there are features in the JPA 2.0 specification which are not sufficiently covered by the TCK.

An Open Source TCK for JPA would help in detecting and eliminating such gaps instead of leaving that to the initiative of individuals.

18 comments:

Harald Wellmann said...

Sorry, I accidentally deleted a comment asking for the Hibernate issue number, and there is no undo button :-(

Anyway, here is the bug report for Hibernate:
http://opensource.atlassian.com/projects/hibernate/browse/HHH-5393

Mike Keith said...

Hi Harald,

JPA Providers are far from perfect, and occasionally you will hit a bug in one or another of them, but the fact that this doesn't work in *all* of them should raise a flag that this is a problem with the mappings and not the providers ;-).

You aren't really to blame, of course, since Maps can get quite complex, hence the mappings can also be complex. (This is why people like me write books about it :-)

Your point that @MapKey is only allowed when the value is an entity was correct. We didn't actually make any provisions in the spec for embeddable properties to be used as the key because in theory embeddables can in general only be uniquely identified by the combination of all of their properties, not any one of them. However, I freely admit that is a fairly feeble defense since @MapKey may reference any property in an entity and there is no guarantee that property is unique either. It is up to the developer to ensure that the property being used as the key is unique within the context of the Map that contains the object, and I can't think of a good reason why we could not offer the same functionality for embeddables with the same proviso attached.

In any case, your last guess was the most reasonable, except that it was incomplete. The way that providers should allow you to do what you want to do and that gets you the sharing is this:

@ElementCollection(fetch=FetchType.EAGER)
@MapKeyColumn(name="language",insertable=false,updatable=false)
@CollectionTable(schema="jpa",name ="multilingual_string_map",
joinColumns=@JoinColumn(name="string_id"))
private Map map = new HashMap();

This way you are sharing the same column, but only one of the mappings is writable. (It is not as good as using @MapKey since you have to duplicate the mapping, but it should work, at least.)

Regards,
-Mike Keith
(Pro JPA 2)

Harald Wellmann said...

Hi Mike,

yep, nothing's ever perfect, not even Java software, or else we'd be unemployed...

Anyway, thanks a lot for pointing out the original intentions of @MapKey - I think it's safe to take this as authoritative from a JSR 317 member :-)

So the expert group did not consider embeddables in this context, but as you said yourself, there is no compelling reason not to admit them - in that case, I think I'll file an official enhancement request.

But since I had already considered the @MapKeyColumn option in my post, the corrected outcome of my experiment is not much different:

It would have been 0:4 with my incorrect but not-so-stupid interpretation, and now with the @MapKeyColumn variant including your additions "insertable = false, updatable = false" (which I've tried, and they don't make any difference regarding the exceptions), the score is 1:3.

OpenJPA is the only one to pass the test, Hibernate and Eclipselink choke on the duplicate column mapping and DataNucleus doesn't like the embeddable map value at all.

So even if my enhancement request should not get accepted, maybe the JPA spec should clarify the following:

- @MapKey is not to be used for embeddable values, even though it may seem tempting to do so.

- A "duplicate mapping" of a value property in the @MapKeyColumn is absolutely legitimate.

Thanks and best regards,
Harald

Mike Keith said...

Harald,

I am a little surprised that so many of the providers did not support the duplicate mapping on the @MapKeyColumn. Duplicate mapping is a reasonably common approach in many of the providers. I suspect that since it generally happens using @Column it was not accounted for in the @MapKeyColumn. You should file bugs with the providers that barf (if you have the time and the inclination :-).

Please do file an official enhancement request for the spec. Until now I did not think there was any reason to allow @MapKey for an embeddable value but your use case looks like a valid one to me.

I guess I should be at least partly glad the spec is not perfect or maybe I would be out of a job as well ;-)

-Mike

Guy said...

Harald,

I just tried your model using @MapKey(name = "language") on a nightly EclipseLink 2.2 build (20100723) and everything passed for me.

You can download nightly builds here:

http://www.eclipse.org/eclipselink/downloads/nightly.php

Having said that, I did run into issues when using @MapKeyColumn(name = "language", insertable=false, updatable=false)

I'm going to look into that a little further and will post my findings.

Cheers,
Guy Pelletier

Guy said...

Hi Harald,

I just tried your model again on the Eclipselink nightly 2.2 build and both conifgurations are working for me. That is,

@MapKey("language") and
@MapKeyColumn(name = "language", insertable=false, updatable=false)

My original issue with map key column was caused by an error in EclipseLink's DDL generation. Creating the tables myself, my basic CRUD tests pass.

Any way you can try your tests on a nightly EclipseLink 2.2 build?

Cheers,
Guy

Guy said...

Looks like this problem will be fixed in the Eclipselink 2.1.1 release. FYI: https://bugs.eclipse.org/bugs/show_bug.cgi?id=298322

Harald Wellmann said...

Hi Guy,

thanks for your feedback and your efforts! This sounds good - if there's a Maven repository with the Eclipselink nightly builds, I could rerun my tests changing just one line...

It seems that Eclipselink has a couple of issues with DDL generation which is one of a number of reasons why we use Hibernate in our project, as we rely on generating our table schema from our entity model.

Actually, the pros and cons of individual persistence providers would be a topic for a separate post...

For now, all I can say is I did a trial project with Eclipselink 2.0.0 back in March and I was rather disappointed with it.

Not that I'm too enthusiastic about Hibernate, but it just seems to be a better match for our requirements, despite its long list of problems and omissions.

Cheers,
Harald

Guy said...

Hi Harald,

The following links provides info on the EclipseLink maven repository.

http://wiki.eclipse.org/EclipseLink/Maven

Cheers,
Guy

Harald Wellmann said...

Just tried Eclipselink 2.1.1-SNAPSHOT -
the DDL problem is still there, so my test fails as it's based on table auto-generation.

What's worse, I still can't query the map, see http://hwellmann.blogspot.com/2010/07/jpa-20-querying-map.html.

And I stumbled into a new problem this morning: CriteriaQueries with maps. They're also broken in Hibernate but work fine in OpenJPA.

All in all, my impression is that both Hibernate and Eclipselink (and also the top-secret JPA TCK) did a rather sloppy job with persistent maps.

Currently only OpenJPA has a usable map implementation.

andy said...

DataNucleus (SVN) of 28/09/2010 works perfectly fine for schema generation on such a case (embedded persistable value in a join table, with the key being a field of the value)

andy said...

PS, your geotools link is broken - missing an "o"

Harald Wellmann said...

Thanks - I've fixed the link.

Henno Vermeulen said...

Interesting how much time it can take to implement something as a multi language String...

I myself am struggling with OpenJPA and the same use case. I managed to get it working correctly by creating a Map where LocalizedString is a full-blown Entity. (Actually I use Locale as a map key which automatically gets mapped to a String).

As an alternative to both approaches I tried a simple @ElementCollection Map.
This works but I run into an exception when putting something into the map after detaching the entity (https://issues.apache.org/jira/browse/OPENJPA-1919).

With all approaches that use a Map there is a very serious performance problem with OpenJPA: it performs N+1 selects when selecting multiple entities (https://issues.apache.org/jira/browse/OPENJPA-1920).

I guess we don't live in a perfect world yet :).

To fix this I think I'll just map it as a @ElementCollection List and perform entry lookup in this list myself.

I am now using it in multiple different entities like:
@Entity
public class Product {
@OneToOne(optional = false, cascade = ALL, fetch = EAGER, orphanRemoval = true)
// multiLanguageNaam
private MultilingualString name = new MultilingualString();
}

A MultilingualString should really be a value object without it's own identity. Therefore it would be nicer if it was an @Embeddable but I fear the great adventure that awaits when using an @ElementCollection in an @Embeddable and embedding this @Embeddable in multiple entities... (e.g. will there be one LocalizedString table or one per entity???).

Finally, from a database perspective the entire MultilingualString table doesn't really have to exist because it's sole column is an id which could be directly contained in a foreign key column where we use it, e.g. Product could have a column MultilingualString_ID that directly maps to a column in the LocalizedString.

Again funny how something that seems easy can be hard to implement :S.

CodeMonkey said...

I've been looking into this general problem lately (l10n/i18n of domain objects), and I was considering using a separate persistence unit for each localization. Initially, this might seem like a lot of extra overhead, but it could simplify your persistent entities a bit (i.e., no maps required -- the instance is specific to a locale already, you could stick with the more consistently-implemented portions of the JPA spec). And if you're working with a distributed app, you could dedicate nodes to serving specific locales.
You would do something like
getEntityManager(Locale l).find(...), etc.
I'm still working out the details, but I was wondering if anyone else has gone down this path?

@Harald, did you consider this as an option?

Harald Wellmann said...

@MattD: Interesting idea, but depending on your environment, it seems to raise a whole lot of new questions.

I think you may get away with getEntityManager(locale) in a Java SE environment, but if you try to do the same in a managed container (Java EE or Spring), things will start getting difficult.

The container only lets you define a static 1:1 mapping between persistence units and data sources, so how would you do the switch on the locale...?

Colm said...

For anyone actually looking for a solution to the problem posed in this article (internationalisation of strings), I have put a working approach up on SO. I struggled to get this article's approach working at all (even though we are now a couple of versions on from when the article was originally written). The approach I took was to have the ElementCollection inside an Entity, with a ManyToOne relationship in the source Entity to the Localised Entity. A similar approach to Henno I think, I am linking it here as I actually have working code on SO.

http://stackoverflow.com/questions/13426273/jpa-database-structure-for-internationalisation

Anonymous said...

Thanks very much! This example helped.