[prev in list] [next in list] [prev in thread] [next in thread]
List: nepomuk
Subject: Re: [Nepomuk] Duplicates merging
From: Sebastian_Trüg <trueg () kde ! org>
Date: 2011-10-31 11:12:35
Message-ID: 4EAE82A3.1090902 () kde ! org
[Download RAW message or body]
Hi Christian,
let's meet up this week to discuss the problem and hopefully fix it. So
far I stayed clean of the storeResources code but with Vishesh not
having much time I will dive into it.
Cheers,
Sebastian
On 10/31/2011 12:42 PM, Christian Mollekopf wrote:
> Hey,
>
> This issue starts to get pressing, a solution is needed for 4.8.
> Currently the feeders are broken because of that issue.
>
> The code in storeResources is beyond me and my attempts to fix it failed so
> far. So if no one fixes it there I'll have to work around the issue in the
> feeder code.
>
> I don't mean to push anyone, I'd just like to know if somebody from the
> nepomuk team (yes vishesh I'm looking at you ;-) is going to fix this, or if
> I'm on my own. As said, I do understand if you currently lack the time to make
> this happen, just tell me.
>
> Thanks,
> Christian
>
> PS: I added the pastes before they are deleted from pastie
>
> On Saturday, October 08, 2011 03:12:51 PM Christian Mollekopf wrote:
> > Hi Vishesh,
> >
> > The duplicates merging code doesn't cut it for the feeders yet.
> > As far as I could track it down the problem is that I have hierarchies of
> > resources which need to be merged together.
> > I.e. I add a contact with it's email address several times to the graph. The
> > email addresses are now correctly merged, but because the contacts had
> > different email uris in the first hashing run (before they have been
> > merged), the contacts remain duplicated.
> >
> > Here is the test which currently fails:
> > http://paste.kde.org/131371/
>
> void DataManagementModelTest::testStoreResources_duplicates2()
> {
> SimpleResource contact1;
> contact1.addType( NCO::Contact() );
> contact1.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact1.addProperty( NAO::prefLabel(), QLatin1String("test") );
>
> SimpleResource email1;
> email1.addType(NCO::EmailAddress());
> email1.addProperty(NCO::emailAddress(), QLatin1String("email@foo.com"));
> contact1.addProperty(NCO::hasEmailAddress(), email1.uri());
>
> SimpleResource contact2;
> contact2.addType( NCO::Contact() );
> contact2.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact2.addProperty( NAO::prefLabel(), QLatin1String("test") );
>
> SimpleResource email2;
> email2.addType(NCO::EmailAddress());
> email2.addProperty(NCO::emailAddress(), QLatin1String("email@foo.com"));
> contact2.addProperty(NCO::hasEmailAddress(), email2.uri());
>
> SimpleResourceGraph graph;
> graph << email1 << contact1 << email2 << contact2;
>
> m_dmModel->storeResources( graph, "appA" );
> QVERIFY(!m_dmModel->lastError());
>
> int contactCount = m_model->listStatements( Node(), RDF::type(),
> NCO::Contact() ).allStatements().size();
> QCOMPARE( contactCount, 1 );
>
> int emailCount = m_model->listStatements( Node(), RDF::type(),
> NCO::EmailAddress() ).allStatements().size();
> QCOMPARE( emailCount, 1 );
>
> QCOMPARE( m_model->listStatements( Node(), NCO::fullname(), Node()
> ).allStatements().size(), 1 );
> QCOMPARE( m_model->listStatements( Node(), NAO::prefLabel(), Node()
> ).allStatements().size(), 1 );
>
> QVERIFY(!haveTrailingGraphs());
> }
>
> add to qtest_dms.cpp:
>
> model.addStatement( NCO::emailAddress(), RDF::type(), RDF::Property(),
> graph );
> model.addStatement( NCO::emailAddress(), RDFS::range(),
> XMLSchema::string(), graph );
> model.addStatement( NCO::emailAddress(), RDFS::domain(),
> NCO::EmailAddress(), graph );
>
> model.addStatement( NCO::hasEmailAddress(), RDF::type(), RDF::Property(),
> graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::range(),
> NCO::EmailAddress(), graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::domain(),
> NCO::Contact(), graph );
>
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Resource(),
> graph );
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Class(), graph
> );
> model.addStatement( NCO::EmailAddress(), RDFS::subClassOf(),
> NCO::ContactMedium(), graph );
>
> >
> > And here's an excerpt of the debugging output which shows the problem in the
> > actual feeders:
> > http://paste.kde.org/131377/
> >
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian \
> Trueg""
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian \
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:gqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#EmailAddress>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:gqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress>"""sebastian@trueg.de"^^<http://www.w3.org/2001/XMLSchema#string>"
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian \
> Trueg""
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian \
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>
> This is the error returned after the storeResourceCall:
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources: Setting error! "Invalid argument
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a
> max cardinality of 1. Provided 2 values - "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing - Affected
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old
> card: 0"
> "/opt/devel/KDE/bin/nepomukservicestub(21806)" Soprano: "Invalid argument
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a
> max cardinality of 1. Provided 2 values - "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing - Affected
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old
> card: 0"
>
> > As I understand your code you generate a hash of each resource to check if
> > two are exactly the same. That probably works for most use-cases, but I'm
> > not sure if it is the best solution.
> > Given the problem above you'd have to rerun the hashing for the resources
> > which were modified due to a merged resource, so that already complicates
> > matters.
> >
> > I thought maybe it would be possible to leave the merging up to the normal
> > resource merger. This would have the effect that not only exactly equal
> > resources would be merged, but all, just as the resource merger would
> > normally merge them.
> > If you think of the SimpleResourceGraph as a tree, a post-order traversal of
> > the tree would allow you to store each resource one by one, starting from
> > the leaves of the branch going to the root. The ResourceMerger would then
> > automatically merge all resources as necessary.
> >
> > Do you think that would be a viable option?
> >
> > Cheers,
> > Christian
> >
> > _______________________________________________
> > Nepomuk mailing list
> > Nepomuk@kde.org
> > https://mail.kde.org/mailman/listinfo/nepomuk
> _______________________________________________
> Nepomuk mailing list
> Nepomuk@kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>
_______________________________________________
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic