Print

Print


Hi Antoine,

If you are going to work with identifiers in a linked (open) data situation then you need to be cetrain that the uri's don't change.
Although human readable URI's sound like a good idea, because we can read them, in my experience will also be an invitation to debate and change... Johannes Sebastian Bach may become Johann Sebastiaan Bach etc. Words change, names change and people have a tendancy to change human readable stuff...

We (at the Rijksmuseum) created persistent URI's without any human readable reference for technical purposes (e.g. machines reading them): since they are just a set of numbers, no one feels the urge to debate them. And we have human readable URL's (server optimized. To a certain level... could be improved): we can change the website and and names of objects etc. but the technical references stay in place..

This is why I voted for option 1 in your poll.

Happy to hear others opinions and the outcome of your poll!

Best wishes,
Lizzy

-----Oorspronkelijk bericht-----
Van: Discussion list for Europeana Technical Developments [mailto:[log in to unmask]] Namens Antoine Isaac
Verzonden: dinsdag 24 maart 2015 15:45
Aan: [log in to unmask]
Onderwerp: Your advice on minting URIs for contextual entities

Dear all,

We're about to mint identifiers (URIs) for contextual entities to be used in Europeana. This will concern concepts, agents or places to be used for enrichment [1] and a couple of other things. The data will be adapted from external or providers' datasets, and will eventually have to be available as linked data on data.europeana.eu.

After internal discussions, we have to choose between two options:

1. A bare numerical identifier, as in
http://data.europeana.eu/agent/12345

2. A number combined with a human-readable label, as in http://data.europeana.eu/agent/12345_johannes_sebastian_bach

In any case URIs would lead to machine-readable data for software clients, while humans would be directed to pages like [2]. But human-readable labels in identifiers would help to identify and discuss the resources more easily. So option 2 is very tempting.
However, option 2 is slightly harder to implement. Also, we would have to choose one field in the data, and one language (as we do for other communication, including this mail). Both field and language could change from one source to the other, when we merge different datasets.


We're curious to hear whether you have a preference! We have created a small poll:
http://doodle.com/sdpftvqq6e3shw4v


Note that it is not a a majority vote. We may end up not have the resource to implement the more complex option. Also, one could have a killer argument for one option, that defeats all other considerations :-). You can leave comments at the bottom of the poll page.

Thanks a lot for the advice!

Antoine

[1] https://docs.google.com/document/d/1JvjrWMTpMIH7WnuieNqcT0zpJAXUPo6x4uMBj1pEx0Y/
[2] http://invis.io/RU2G1HUBG