[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nepomuk
Subject:    Re: [Nepomuk] Duplicates merging
From:       Sebastian_Trüg <trueg () kde ! org>
Date:       2011-10-31 11:12:35
Message-ID: 4EAE82A3.1090902 () kde ! org
[Download RAW message or body]

Hi Christian,

let's meet up this week to discuss the problem and hopefully fix it. So
far I stayed clean of the storeResources code but with Vishesh not
having much time I will dive into it.

Cheers,
Sebastian

On 10/31/2011 12:42 PM, Christian Mollekopf wrote:
> Hey,
> 
> This issue starts to get pressing, a solution is needed for 4.8.
> Currently the feeders are broken because of that issue.
> 
> The code in storeResources is beyond me and my attempts to fix it failed so 
> far. So if no one fixes it there I'll have to work around the issue in the 
> feeder code.
> 
> I don't mean to push anyone, I'd just like to know if somebody from the 
> nepomuk team (yes vishesh I'm looking at you ;-) is going to fix this, or if 
> I'm on my own. As said, I do understand if you currently lack the time to make 
> this happen, just tell me.
> 
> Thanks,
> Christian
> 
> PS: I added the pastes before they are deleted from pastie
> 
> On Saturday, October 08, 2011 03:12:51 PM Christian Mollekopf wrote:
> > Hi Vishesh,
> > 
> > The duplicates merging code doesn't cut it for the feeders yet.
> > As far as I could track it down the problem is that I have hierarchies of
> > resources which need to be merged together.
> > I.e. I add a contact with it's email address several times to the graph. The
> > email addresses are now correctly merged, but because the contacts had
> > different email uris in the first hashing run (before they have been
> > merged), the contacts remain duplicated.
> > 
> > Here is the test which currently fails:
> > http://paste.kde.org/131371/
> 
> void DataManagementModelTest::testStoreResources_duplicates2()
> {
> SimpleResource contact1;
> contact1.addType( NCO::Contact() );
> contact1.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact1.addProperty( NAO::prefLabel(), QLatin1String("test") );
> 
> SimpleResource email1;
> email1.addType(NCO::EmailAddress());
> email1.addProperty(NCO::emailAddress(), QLatin1String("email@foo.com"));
> contact1.addProperty(NCO::hasEmailAddress(), email1.uri());
> 
> SimpleResource contact2;
> contact2.addType( NCO::Contact() );
> contact2.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact2.addProperty( NAO::prefLabel(), QLatin1String("test") );
> 
> SimpleResource email2;
> email2.addType(NCO::EmailAddress());
> email2.addProperty(NCO::emailAddress(), QLatin1String("email@foo.com"));
> contact2.addProperty(NCO::hasEmailAddress(), email2.uri());
> 
> SimpleResourceGraph graph;
> graph << email1 << contact1 << email2 << contact2;
> 
> m_dmModel->storeResources( graph, "appA" );
> QVERIFY(!m_dmModel->lastError());
> 
> int contactCount = m_model->listStatements( Node(), RDF::type(), 
> NCO::Contact() ).allStatements().size();
> QCOMPARE( contactCount, 1 );
> 
> int emailCount = m_model->listStatements( Node(), RDF::type(), 
> NCO::EmailAddress() ).allStatements().size();
> QCOMPARE( emailCount, 1 );
> 
> QCOMPARE( m_model->listStatements( Node(), NCO::fullname(), Node() 
> ).allStatements().size(), 1 );
> QCOMPARE( m_model->listStatements( Node(), NAO::prefLabel(), Node() 
> ).allStatements().size(), 1 );
> 
> QVERIFY(!haveTrailingGraphs());
> }
> 
> add to qtest_dms.cpp:
> 
> model.addStatement( NCO::emailAddress(), RDF::type(), RDF::Property(), 
> graph );
> model.addStatement( NCO::emailAddress(), RDFS::range(), 
> XMLSchema::string(), graph );
> model.addStatement( NCO::emailAddress(), RDFS::domain(), 
> NCO::EmailAddress(), graph );
> 
> model.addStatement( NCO::hasEmailAddress(), RDF::type(), RDF::Property(), 
> graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::range(), 
> NCO::EmailAddress(), graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::domain(), 
> NCO::Contact(), graph );
> 
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Resource(), 
> graph );
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Class(), graph 
> );
> model.addStatement( NCO::EmailAddress(), RDFS::subClassOf(), 
> NCO::ContactMedium(), graph );
> 
> > 
> > And here's an excerpt of the debugging output which shows the problem in the
> > actual feeders:
> > http://paste.kde.org/131377/
> > 
> 
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian \
>  Trueg""
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:zre""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian \
>  Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>  
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:gqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#EmailAddress>"
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:gqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress>"""sebastian@trueg.de"^^<http://www.w3.org/2001/XMLSchema#string>"
>  
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian \
>  Trueg""
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:fqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian \
>  Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: 
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>  
> This is the error returned after the storeResourceCall:
> nepomukstorage(21806)/nepomuk (storage service) 
> Nepomuk::DataManagementModel::storeResources: Setting error! "Invalid argument 
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a 
> max cardinality of 1. Provided 2 values - "Sebastian 
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian 
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing -  Affected 
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old 
> card: 0"
> "/opt/devel/KDE/bin/nepomukservicestub(21806)" Soprano: "Invalid argument 
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a 
> max cardinality of 1. Provided 2 values - "Sebastian 
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian 
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing -  Affected 
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old 
> card: 0"
> 
> > As I understand your code you generate a hash of each resource to check if
> > two are exactly the same. That probably works for most use-cases, but I'm
> > not sure if it is the best solution.
> > Given the problem above you'd have to rerun the hashing for the resources
> > which were modified due to a merged resource, so that already complicates
> > matters.
> > 
> > I thought maybe it would be possible to leave the merging up to the normal
> > resource merger. This would have the effect that not only exactly equal
> > resources would be merged, but all, just as the resource merger would
> > normally merge them.
> > If you think of the SimpleResourceGraph as a tree, a post-order traversal of
> > the tree would allow you to store each resource one by one, starting from
> > the leaves of the branch going to the root. The ResourceMerger would then
> > automatically merge all resources as necessary.
> > 
> > Do you think that would be a viable option?
> > 
> > Cheers,
> > Christian
> > 
> > _______________________________________________
> > Nepomuk mailing list
> > Nepomuk@kde.org
> > https://mail.kde.org/mailman/listinfo/nepomuk
> _______________________________________________
> Nepomuk mailing list
> Nepomuk@kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
> 
_______________________________________________
Nepomuk mailing list
Nepomuk@kde.org
https://mail.kde.org/mailman/listinfo/nepomuk


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic