From solr-dev Tue Dec 03 12:07:28 2013 From: Michal Hlavac Date: Tue, 03 Dec 2013 12:07:28 +0000 To: solr-dev Subject: more generic lucene-morfologik integration Message-Id: <2559310.0YUvc6vo1E () hlavki> X-MARC-Message: https://marc.info/?l=solr-dev&m=138607249530130 MIME-Version: 1 Content-Type: multipart/mixed; boundary="--nextPart4156608.FA1aGKTExa" --nextPart4156608.FA1aGKTExa Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Hi, I have little proposal for morfologik lucene module. Current module is tightly coupled with polish DICTIONARY enumeration. But other people (like me) can build own dictionaries to FSA and use it with lucene. You can find proposal in attachment and also example usage in analyzer (SlovakLemmaAnalyzer). It uses dictionary property as String resource from classpath, not enumeration. One change is, that dictionary variable must be set in MofologikFilterFactory (no default value). thanks, m. --nextPart4156608.FA1aGKTExa Content-Disposition: attachment; filename="morfologik.zip" Content-Transfer-Encoding: base64 Content-Type: application/zip; name="morfologik-proposal.zip" UEsDBAoAAAAAALmEZEMAAAAAAAAAAAAAAAALABwAbW9yZm9sb2dpay9VVAkAA06/d1LY7XdSdXgL AAEE6AMAAARkAAAAUEsDBBQAAAAIAB2cZEMt0/kE7wIAADQIAAAkABwAbW9yZm9sb2dpay9Nb3Jm b2xvZ2lrTGVtbWF0aXplci5qYXZhVVQJAANa6HdS1+13UnV4CwABBOgDAAAEZAAAAK1V30/bMBB+ r9T/4cRT2g2XvdJSFY0hIXVDWift2UmuraljZ7ZDx1D/952dNElDYDzM4qHY3/367rtLzpMd3yBo s2E858kWmSwSVMi44vLJCsskZhlnmTZrLfVG7KbDwXAgslwbBw/8kbPCCcmujeFPS2Hd9LVH2/dy 59Bwp03f26m3JgFmHaUk1IbdiMQJrbh5ei9uqfWuyN9G3638LzRvo35qk95wxwMdk/F4OIAxXENa hzqPucUUbOkM1tqA5GpTEN0MfmyFrZ8M/iqEQQtcwe3q+jzRWS4k2TbOwGmIEXJCoXIgVAiXSG5t zt2WXFhdmAQt8w/hcRbP7+MHTJwFvQbnAwY8cIOgtKMrg5wS5Gtks0k8ry0XFvFdPHr0ZDjIi1iK BNaCJFPF+FpbL718nPhDdRKbJCbK38KR448QFBBLnB0JncOzpxTolLT6M4YmNsgQHFKUuOEO2REz KX/kRjzSdZWPV9Gsm/i8NoYrULiHWr4vodFo2pdP6GCilXWmSEjBQP8KJZzgkopNYS+oLRziQkh3 LlS7lwRecyHtEWMK5USGgL8TzD0IhG8Y1rFathTF946GREhPW118DV7k3PCsZdOlp+xWX4eilTPU 55btyDcDqlMTztM08qx1uYqaC7ZBd6vNslJ81HI58oR6f4c+Xp8XQm3RCHejk8Nrxe1JK82dQVcY 1Slzcf+IxogUT6oOLW6UViop+rzlZkVDiCrB4PukbGpGVJduiadoBFdX8OkE5E+ZR0MTURBdjFgV JPidNhYHQGmx66OTIG2NzHqNFlJOT5F+oUTdDhxn47LO4kWWlW1wW6Jfy7BNQLA4Vj+Hi9FLmD8x rZRdx8Whl6XgsE3HG5oovxK0IDX1FLiU7Ykoa7E0gjR1NGiutVq7q6FfFMePUIt3UV1F/1UJjdd/ 6oCWs96H1fS93A5fjsshOvumq/z8qgiU+C+GLtTJ1yazl3DW36UP5b6zzOly4puqnA5PUTOlp82h v79QSwMEFAAAAAgAuYRkQxf0aGXEBAAAxgsAADAAHABtb3Jmb2xvZ2lrL01vcnBob3N5bnRhY3Rp Y1RhZ3NBdHRyaWJ1dGVJbXBsLmphdmFVVAkAA02/d1LX7XdSdXgLAAEE6AMAAARkAAAAjVZNc9pI EL3zK3o5CRcWsY9xkjVx7FpqE9iyyKZ8HERLTDzMaGdGENbFf9/u0QdyzOK4XC40TL/ufv1ey6MR nJ+dQ3q+EE6m5ybLHPq3cMmnvUKkjyJHMDaPBT2sMFZlihpjoYXaOeliheu1iNfGZkaZXD5e9Xqj sx6cwWdJFx0uwRvwK4RxiIfEZH4rLMKdKfVSeGk0ROPkbgD0iBaM5nRAgMgoqdHeykXp6UxViCBy i7hG7V0MkCAG+OlsPrm5hUwqhKV0VRBl30q/YiC/kg62xj5CRlBiuZScWiiQmg7WVSEWc2GXUueU t9hZma88mK1G61ayiBlmzp0kd00trsINWanPB1PWbXQ6rokYwt+Ew1ku4zcMFfGdfv1tf3AFO4pe ix1o46F02IHGHykWnkqlutaFkkKn2OmszUF8PNQgZuEF3RehEzBZ9xoIT4Ecyz8r74u3o9F2u21m TOMeNQ2OPhOt0+T2vKqag75qhc4RWf+U0hLFix2IgqpKxYJqVWLLAwxDCsOnKraW2Nb5EFw9fYbp TulAWlMitd69QLQJDf1xApOkDx/HySQZMsi3yfyP2dc5fBvf34+n88ltArN7uJlNP03mk9mUnu5g PH2APyfTT0NAoozy4I/CcgdUpmQ6cRlm20ipKYGFws+uwFRmMqXWdF6yH3KzQatZJwXatXQ8VkcF LhlGybX0QU/uZV+caNTrUVpjPXwXGxGXXqr47Ko9fGm2cGPsazImVHJwWRjGF2OLlXE77UXqqUah ST51dm7AlTYTaWhm7UKb1476/ClsLnLXJgglFuWCJgqpEkTUqdtcDjHqUS8dPD/tQeC38upJkCHc KHJ+ENAThdEv9QcTTboRSv7bGE00AUH95BTwBMP1AtQVv1ZrNICnfZuC/hIlYxqa8+ySwlAjnJOB YSMsec27VgppaS19T7J+RB2H4JDayo3wPGTn3yWUTOcfS6lo9h9CgTSuTrZ79KXVLgD+NUtCptqg nuQUwyQLu0CzgboGJgLSlaCZkvOQVsAwfFeBSqqSl2WgOiIp0uh4lz5dK6kf4VlRezeABaaCl4z0 jKlzYjhQWqEhyY/yb0NBbGKxMZJtqjEl5wi7gzXSjibnK2XSSm8HPq5n5A8rl3gYyzFqcvQ8Hp4I h9JGYWJqxgD2veNQoZRUobBtZFDBe9ClUqciF8ZQmAbiTygXzRbfMaWx81JokGQGUTigveU8L1qi /pSmmsC2/AAe8bTitsEhRNFpkKqIQ8TgKqDuu8RkVDS2/TWie9ZU05O6GELz8fInftUFvK+4gt8h UpfN0wDe0uNFXLNDcYNTZErywUq41Y1Z4osRcvfVUNpMbwiej+JDUAvfeiNBXxmD0GnFVj7kd02G liXf/DeRyw3q4No4vJMb6ddmcOElSu7g3Q6ndBnE5GrW/8+/rcwObf2iSsmfcxM9X4teWBpzg3ks Z7DusiPoSpch82/NsBrZHS6TX8fWit0RyBAbO9qkrbQgbLXo2T1Y1EM6wDcJYvqfKeIUzwKixQFu 31Hsa3KvOYgb5qscJ/X26ksoYLRK/LXrDW+vvzWuDgqop9qpuRV+dVa1Ed4xRxqpCCQlVx+Orb+O bfrv6rfch35joENklWjf+w9QSwMEFAAAAAgAuYRkQ3LYe1P8CAAARhoAACAAHABtb3Jmb2xvZ2lr L01vcmZvbG9naWtGaWx0ZXIuamF2YVVUCQADTb93Utftd1J1eAsAAQToAwAABGQAAAClWW1v2zgS /u5fMc0Be3Jiy2lwwAF1EtSbNlhjfckidq8oDveBlmibG1nUklRc327++82QlEzLstu9E4okpOb9 5eFQHQygf96HpD9nWiR9uVhobt7BFe12CpY8syUHqZYxw8WKx1mZ8JzHLGfZVgsdZ3y9ZvFaqoXM 5FI8DzudwXkHzmEikFDzFIwEs+IwsvwwlQuzYYrDvSzzlBkhc4hG0/su4JIrkDmpAxTISUoic6PE vDS4lzmJwJaK8zXPjY4Bppxb8Q+Ps/HdR1iIjEMqtGNC7RthViTIrISGjVTPsEBRLE0FqWYZiBw3 1s4QxZdMpSJfot5iq8RyZUBucq70ShQxiZmRJ9P7yhbt5Fqt6OcXWXo3Ao99IHrwT5RDWq7iSxIV Ec2Zf3vWHcIWuddsC7k0UGoeiOZfE14YNBXtWheZYHnCA89qHRiPL16InBuG9Mx6AnIRkgEzyEi8 9KyMKd4NBpvNpsoxpntQOTiYYFgfph/7zmpi+pRnXGsM1m+lUBji+RZYgVYlbI62ZmxDCbRJsslH KzYKo50ve6B99klMmKVd0CoT0fWQAMPGcjgbTWE8PYMfR9PxtEdCPo9nPz1+msHn0dPT6GE2/jiF xye4e3z4MJ6NHx9wdQ+jhy/w8/jhQw84hgz18K+FIg/QTEHh5KnNbVVKlQlUKLTWBU/EQiToWr4s qR+W8oWrnOqk4GotNKVVo4EpicnEWhhbT/rQL1I06KBWqQz8yl5YLGQ8fvxoE4wsw713pRFZfI4t 5Td3bRZrg42HFoSvT7TpTD7z/F5khqvh95JPjeJs/T3khsiZ8dnS8d2KqRlGZlRt/S9CfuZb7Nf0 /5Lxi9S2z8d5oixk/ClpNvzkC0swbp9wpU+w1ckanNs2+f19JvJnCCL/im1NRfOPOo1YLHPF1Db2 rbV74zgs/mEfY3FV8pCkWEm9xf3EiGTGlrp26rUHm5VIVlAo+SJShKf1PjXpYDkCjK9PKnGkTdGD FCyUI6ZWfRCYmcqkpPA5mCSulKNdma4Mf6+R6ZrBSvHFzZmHlKBc5/hLF9LECGCDs9tANKr/lScG Cmys6wG7tQ1SlHPEE0gyhm16EBT+1fA81WFo4fdOh8CsUOKFGexdQdh+UIdg3Apu6Aiot6MDwtiq 7g5bhJ5KABi3aso/xXNC1fHyhUJq2m3RdZzphKZms8HzbqOpoEl7QizFVT/xBehEMYOFeQM539Tb 0eVRrl3LQYJL33wh7dTQz6RUCn1sEzOeEkZibWj3e59mgqfL9Wf04wMz7NYVP+21SRopxbaWATER O/jHUmQI7Lc23bTv/TpGFzXcFLnZKRzjGfF16MrXQgc952B5qe8XrsCp6ZbihedYFZnQK+9c/GF8 Ryfd6OkL8Lxcx56/EvO+wFiu6RAWeVEasAiJESFob9CkAtvwpPAGw4sfaCYWAOslzSgIFHORCbO1 dmdyw1XCCPwq+wYuIK7Pmx0eubAHpxBa3/PJcKG11lZb1WTlLegiGoB/dIlHdCTyro+wjfIApo+T p/7fLi///g4hgRCdKZFtkXqujTDUAIS8CDQegzLJ7EkugWXoDelNKytojlRbnIe0LFXCdVwrmiEc shTWHOvD/R37enWrqi7ouSM9E6cmyZBhzeMlN3fOjOBtyGRQ785XepBLt3LtYjyhyjPiP5hf65xV E8oPFFglOIfGvod8obcJiygWbax1/yLzfmuT5nGuDc2zUZW8Fgl1r5AEmWXcBl3HmDlj+y2Myasr iuy7I5NkIXfH/dzr1xcpUkTc4gE5rcNRWGJYTt4oKg6LCfXsmFMJlbn4reSu43fFQaGY4gzNaZj3 ZUF9t7OFYMJKu4HL3W4qG465HqigzKlBljpoFOVoH24uLhphDvRXrMRG66hBusPo2no0MuTC463J JBZ430GqN1g/ZZZ1Gy4EJBZPY40VhUG+vrEBaCO3xVGR4/kUUV3uwe5BIQeZPXRojxXm5WJhq71W QUGkxUHo6HHkVF8Tni/Nqj7YWojwroTTS7QfrQb5a1DNONHhtaqRQLgO8uuCdaDvhx8AE8SyMFU6 OlUW3V3Oe2FJdJvoyY2tbsv9Vw1z5q5La7oDuZJl+G83h3eQyT/TLUFJLEsTY3vlJsuDzEV+Potd rMiMy141tCEM2Nh2u3ABZ9C/xR8Xe2YGwOhZMCMfCSSibhX3I/RuIiN6Gs6CQiznFmDIDKpEz/N6 cF7f0amnsBc20k4soKuQWzCwmcDzMIaR1jhLa7zm533qBWBqWbrPGftnowefuZQZx8tvSy73mlC/ 7e13pb4K+8bNHPlbrGn9tg7lsPn+it5ftb1fUA0i/5sbS9fFA8+UCm8DLNN8R0fORiRMWEjI3w6h 3xdwSxjW7GOSicZQuEYmEl2SjcrrdVvft2ttNE2nQW1UyY8lbiLlc1ngoOWmK12qBUt8OcsF7ruR iSq7LFLKiL+J/aVunlf79mDb9tRrpYclibQfl7Lt6URn1qCps+MezdjPszUnjEx4OoZndewERY5h eEjvkGTvcPHxamIL3FZEh/F74ti6/MV+C/MHnotYhHcULeZ4EC+UXDvAICsxqO6q2RxVnfJ6aVZK bnTbV5LufgDfP+LsoPC+G86UDtWrmIrqMmRnSnTICw+Ehr2CdfkmuAHFQvtLD3IirH4Ti5uFi7Oh kYrbK0uFQA3I313oCIQOLnEHJ8r+ODJsqGtUvc0c8Iw+rH3DN1tB9q4QN4PW1r6Hxeqhtwt//NFS ykZO/GWA15Stfe6jhOWZsALd8cFrOVdPRcI7fSi+AvwEy0MdcH13NJtpbsOm1yONg6Mo1i11S31B oqthv8gwWO0Q0QCCXSj3h8nV3sTkOsHjO6I6Iby7jsf1BrK0YH5FtsReidzrcBKwx5w7qf/170Aq 7esjJ8LlEH9de0sODwS4CO4IsZH2S0EU7lif78jn+l4RJzLlv0hBnzpwV/cAj46et4wWYS6aUOeN Po5u2I3afkLQBOHluswY/W8EIf6K0Wco+hhMSbRXTXejaiLbcSj7NpDZy4ciM74FW6eA/U9doerp 56A7rI+xt6aK2Wvnv1BLAwQUAAAACAC5hGRDNkUr7TUDAAA4BgAALAAcAG1vcmZvbG9naWsvTW9y cGhvc3ludGFjdGljVGFnc0F0dHJpYnV0ZS5qYXZhVVQJAANNv3dS1+13UnV4CwABBOgDAAAEZAAA AG1UTW/TTBC+51cMvdBUjf2KI0WIUBoRURJUB1CPm/XYWbre9bsfcSPEf2dm7aSGkkMUb3aer5lx nsPsYgZythVeyZmtKo/hNbzi00kr5IOoEayrM0EPO8x0lGgwE0bog1c+09g0Imusq6y2tXq4mkzy iwlcwK2iix5LCBbCDmGe6qGwVeiEQ1jYaEoRlDVwPi8WU6BHdGAN0wEBIqNIa4JT2xjoTPeIIGqH 2KAJPgMoEBP8ar1ZXt9ApTRCqXxfROydCjsGCjvlobPuASqCEmWpmFpoUIYOml6Iw1q4UpmaeNuD U/UugO0MOr9TbcYwG3ZSLI5afI+bWMnnvY2DjZHjIYhL+EY4zPIq+4+hzvnO2fDv2fQKDlTdiAMY GyB6HEHjo8Q2kFTS1bRaCSNx5OzEQXncDyB2GwTdF8kJ2Gp8DUSgQq7lzy6E9nWed1137DG1Oz8a zG8p1lVxM+tVc9FXo9F7Cuv/qBxFvD2AaEmVFFvSqkXHDUxNSs0nFZ2jtE19CX7oPsOMu/QU2lEi WR9foNiEgbN5AcviDN7Pi2VxySDfl5uP668b+D6/u5uvNsubAtZ3cL1efVhulusVPS1gvrqHT8vV h0tAiox48LF17IBkKo4Ty9Tb4ygdJfCg8LNvUapKSbJm6sj7UNs9OsNz0qJrlOe2ehJYMoxWjQpp nvxzX0yUTyZEa12AH2IvshiUzm7J69Xp/Pm+pUvzMOSRluwCmO3zafMoL5lm2imay9bZvSqRF6nd WX8wQdC/kkTSeA3qyCBD+OgqIZPfhlZqMdjGR6qAfjnYG5TopVNt2pR+oGiqE8AQHHH+QBleUuus jLyhw36PSffCHXhkTmoPL6Z9Km3c0hDRvATsBX3+U/tG1P4UAckLaEoPTyc/JwB9LJC6GZKoL+sC gqiztLolViLqQCJ0RDg3dpZ+TXncTNQ662v773etcKLhYmKhtvrAtgc8T4vlaIpaa9ILgwZdRjow AfpXYsLI6XuwtbeqpKQCuzjndr8pSLip30elaUjeJtApdXZs4g5DdMaPjRx3mUJqEkmS6tJF+Jv0 Xzz1oOGJ61qjcOzgj3iy5/IlX+TCX5PfUEsDBBQAAAAIAKacZENIf3c2LwUAAAsMAAAnABwAbW9y Zm9sb2dpay9Nb3Jmb2xvZ2lrRmlsdGVyRmFjdG9yeS5qYXZhVVQJAANY6XdS1+13UnV4CwABBOgD AAAEZAAAALVWUXPaOBB+51ds/ZBCS+3ePRbIxE3gyrSBmZi0k3vpCHsBXWTJJ8mhtMN/v5VswEDa 6cv5BbzWfrv7afeTCpY+siWC0suQFSxdYSjKFCWGTDKxMdyEAvOchbnSCyXUkj/2Wq3oVQtewSdO Cw1mYBXYFULs/SFRC7tmGmGkSpkxy5WEdpyMOkCvqEFJFw4IEB1KqqTVfF5asokKEdhSI+YorQkB EkQPP5nOxtdDWHCBkHFTOVH0NbcrB2RX3MBa6UdYEBTLMu5CMwFckiGvEtG4ZDrjcklxi43my5UF tZaozYoXoYOZuUqS0S4XU+H6qFTngyrrMhoV10R04TPhuCh/hm8dVNutCeqvQacHG/LO2QakslAa bEDjtxQLS6lSXnkhOJMpNirbxyA+HmoQNbeM1jNfCahFcxkwS47O1z0ra4t3UbRer3d7TNsd7QqM PhGtk2T4psraOd1LgcYQWf+WXBPF8w2wgrJK2ZxyFWztNtBvkt98ymKtiW257IKpd9/BNHfpQNou RSq9uYBoYxKCOIFxEsD7OBknXQfyZTz7ML2fwZf47i6ezMbDBKZ3cD2d3Ixn4+mE3kYQTx7g43hy 0wUkyigOfiu0q4DS5I5OzPze7lppl4JrFPduCkz5gqdUmlyWbh6W6gm1dH1SoM65cdtqKMHMwQie c+v7yZzX5QJFLYqqtIV/2BMLS8tFeMuK3s76i2GbqUeUidXI8t9Z7qHjObHIUhvX1hH9V3rTgyjy CWQqNb+L5eOPuLCodzBu3H1XVGZYVHZP3o8rweUj3O7FoVqz9Wz3aQ8gFcyYQUB/rd0UmksbXLqP F8L2FhxFNtsUCJLlOAgsfrNfCyW4WQU7R6OEDmf0YeQWB1Ao4+d6LFPtJeIvVgyCP96+DS6Wtudh fUXfUTvDLpR1dXEyHuF+WXGLhujA2e57XXUQ7eEWVdlNv9N6d05V06ded/RmECSfpp/jjwEcwKJm cpVlz4Iz9SNi6rIew37hqYqlH+9D0EMMTvqU0tjM0elJ1nWN6KZfOwXwXZkrR9I76Kcqw8sqoX7k X6B9gHxpXCRFgsidZB6K6HRr19vp3Wh4n/x95Iyl+f4mGcez/bLr6e378WR4s182V3ZVq9MRPS51 Nqcx6zrJm3OJWSesC78yNKh9BiuNi0FQy9fhBArn9GMKZUPyjILLBjNrnAM1CPYjdukHsSjnpFvV 7p31ab1vJBcWZWbgvPvhR6vlJNSPgHtewc2em7A2Rf6XmvuJWTr/SNJINw4U9s4gEpq/nNEW1ep3 AlSlbJzEpHQ8uA2pQW/G10704ruHr8n1h+Ft/DWeze7G7+9nQxhAcIgZPBNUkXbTaUtiUbraQi+H 9QT/REO2fsq5pJFjgn+vzlDBF5huUrFPexfhqmCa5dR9S/NcQT9hv03S2K/q69Z1XnqMDpEP9WNK 0uG2t/b2xjOmJyQjxINbFi7Rtn9BVwOGL6B9gvFiALIUAi4u4MXxp5CbYV7YTbvTTM891ncLnDzU dki6+zx+72y5uxGEh8VUzLHnsceWht+mK2iPhaCLjYj1snTjPvS3CbdXeJplFUSrNUhcw8/82oG7 BgXw+pcd95pW7HsYMoVGvrTAUgdi4ImJEt9BcBZ+97w+qa1zUtz+bQsoSNVO6P5fishLY52aGrRB I58ql+3ZTP244pLuG9zeqHT7/DRYszfQGVhqeTQaV1O6Z2ieYXNQGpcAoFOORKXdNNmjyahAPRGn A9a2ptvguAvVkX/reqa+qtY1blvb1n9QSwMEFAAAAAgAJ5xkQzhymfhMBAAAfgoAACIAHABtb3Jm b2xvZ2lrL01vcmZvbG9naWtBbmFseXplci5qYXZhVVQJAANp6HdS1+13UnV4CwABBOgDAAAEZAAA AKVVTXPiRhC98yu6OIELi9Qe1+sts/6oUNlACrHZ8nGQWtKsRzPKzAiWbPHf0y2NsAxOykl0AUk9 /fHe66fpFC4vLiG53Agnk0uTZQ79e3jHTweVSJ5EjmBsHgm6KTBSdYIaI6GF2jvpIoVlKaLS2Mwo k8unq8FgejGAC/gsKdBhCt6ALxBmzXmITeZ3wiI8mFqnwkujYTSLH8ZAt2jBaC4HlBA5S2K0t3JT e3qm2owgcotYovYuAogRm/SL5Xp+ew+ZVAipdO0hqr6TvuBEvpAOdsY+QUapRJpKLi0USE0PyrYR i7mwqdQ51a32VuaFB7PTaF0hq4jTrHmS+KHrxbV5m6o056Opwxi9iQMQE/id8nCVd9FPnGrEMcPw dji+gj2dLsUetPFQO+ylxu8JVp5apb7KSkmhE+xNdqxBeDyGJGbjBcWLZhIwWT8MhKeDfJavwvvq /XS62+06jonuaTfg9DPBuojvL9uu+dAXrdA5AuuPWlqCeLMHUVFXidhQr0rsmMCGpIZ86mJnCW2d T8AF9jlNn6Vn0LoWafR+AMEmNAxnMczjIXyaxfN4wkm+ztc/L7+s4etstZot1vP7GJYruF0u7ubr +XJBdw8wWzzCL/PF3QSQIKM6+L2yPAG1KRlOTBtuOyl1LbBQ+N5VmMhMJjSazmveh9xs0WrWSYW2 lI5pddRgymmULKVv9OTO5+JC0wFVNdbDN7EVkTTRCgWFXXWP/2HbZvznz7fFrs0TavnGYOepe9J+ FIc/D1L5/3j0LYVrL1UUFqKxjEZZP26U1E9vAeBAG8L4/3o0HsJ9Y4XdR0GlN47o/CCgsJhdD4PI n30q2tCPq4yPaKWmw4+9RJU13zDxUBHVH6biY0NZVW9I4ZAoQcJ5Du76IU151KmD44MfgwFvV2Xl VngSk2SziUnQ1HUqk8Z97P7qlaDOJ7ZHeDimgYivC/hUS5Wy3kB01RpTg1xuUcNvRklXxJ6cGW10 N7/lTZitHgF1XUYhSZfrphJWlE0taMFuPIb0u5FK+n3XxUk8TwAztmem34cd2fNwu0ImRW9E9oCk MIbtqVuxHtqpSWo289aBaY6uUorkYKrZ0tYNydgxE7Xy3QzTFryWmXNORudwTs55GBNTEC523CgM DNfNiauXL3tjXTfH2/eHM45uLRKjRNK/EHW7stQbivKWSKBvIX3kDgFRH9aKcirFcHSlPGmPXZYR Iu1uZUqGGaq21nL4G9YziSoFmWvDPt7eaVHiSZhtkpB515a+O/wt4VbcaU6LvraaVPH/J96Qwj1k 1pQk8a5ASHtmMwf+SJJZha/9aVzrZAe25+7Vs1bCy5eKulkS85ZgDMtpPNkBZX+1WUgapp8fjF5I rAG1k13LRsCzL7z29XEicDYhfWncnU876ot00uUKJsFXoIEPv9rw6BjZXVRtcvaQz5/iNOp3FB69 bIdSjSenqzJ5sVjjcbczh8FfUEsBAh4DCgAAAAAAuYRkQwAAAAAAAAAAAAAAAAsAGAAAAAAAAAAQ AO1BAAAAAG1vcmZvbG9naWsvVVQFAANOv3dSdXgLAAEE6AMAAARkAAAAUEsBAh4DFAAAAAgAHZxk Qy3T+QTvAgAANAgAACQAGAAAAAAAAQAAAKSBRQAAAG1vcmZvbG9naWsvTW9yZm9sb2dpa0xlbW1h dGl6ZXIuamF2YVVUBQADWuh3UnV4CwABBOgDAAAEZAAAAFBLAQIeAxQAAAAIALmEZEMX9GhlxAQA AMYLAAAwABgAAAAAAAEAAACkgZIDAABtb3Jmb2xvZ2lrL01vcnBob3N5bnRhY3RpY1RhZ3NBdHRy aWJ1dGVJbXBsLmphdmFVVAUAA02/d1J1eAsAAQToAwAABGQAAABQSwECHgMUAAAACAC5hGRDcth7 U/wIAABGGgAAIAAYAAAAAAABAAAApIHACAAAbW9yZm9sb2dpay9Nb3Jmb2xvZ2lrRmlsdGVyLmph dmFVVAUAA02/d1J1eAsAAQToAwAABGQAAABQSwECHgMUAAAACAC5hGRDNkUr7TUDAAA4BgAALAAY AAAAAAABAAAApIEWEgAAbW9yZm9sb2dpay9Nb3JwaG9zeW50YWN0aWNUYWdzQXR0cmlidXRlLmph dmFVVAUAA02/d1J1eAsAAQToAwAABGQAAABQSwECHgMUAAAACACmnGRDSH93Ni8FAAALDAAAJwAY AAAAAAABAAAApIGxFQAAbW9yZm9sb2dpay9Nb3Jmb2xvZ2lrRmlsdGVyRmFjdG9yeS5qYXZhVVQF AANY6XdSdXgLAAEE6AMAAARkAAAAUEsBAh4DFAAAAAgAJ5xkQzhymfhMBAAAfgoAACIAGAAAAAAA AQAAAKSBQRsAAG1vcmZvbG9naWsvTW9yZm9sb2dpa0FuYWx5emVyLmphdmFVVAUAA2nod1J1eAsA AQToAwAABGQAAABQSwUGAAAAAAcABwDeAgAA6R8AAAAA --nextPart4156608.FA1aGKTExa Content-Disposition: attachment; filename="SlovakLemmaAnalyzer.java" Content-Transfer-Encoding: 7Bit Content-Type: text/x-java; charset="utf-8"; name="SlovakLemmaAnalyzer.java" package org.apache.lucene.analysis.sk; /* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.core.LowerCaseFilter; import org.apache.lucene.analysis.core.StopFilter; import org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.cz.CzechStemFilter; import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter; import org.apache.lucene.analysis.miscellaneous.SetKeywordMarkerFilter; import org.apache.lucene.analysis.lemma.morfologik.MorfologikFilter; import org.apache.lucene.analysis.standard.StandardFilter; import org.apache.lucene.analysis.standard.StandardTokenizer; import org.apache.lucene.analysis.util.CharArraySet; import org.apache.lucene.analysis.util.StopwordAnalyzerBase; import org.apache.lucene.analysis.util.WordlistLoader; import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.Version; /** * {@link Analyzer} for Slovak language. *

* Supports an external list of stopwords (words that will not be indexed at all). A default set of stopwords * is used unless an alternative list is specified. *

* * *

* You must specify the required {@link Version} compatibility when creating CzechAnalyzer: *

*/ public final class SlovakLemmaAnalyzer extends StopwordAnalyzerBase { /** * File containing default Slovak stopwords. */ public final static String DEFAULT_STOPWORD_FILE = "stop-words.txt"; private final CharArraySet stemExclusionSet; private final Dictionary dictionary; public enum Dictionary { DEFAULT("sk"), MLTEAST("mlteast-sk"); private final String resource; private Dictionary(String resource) { this.resource = resource; } public String getResource() { return resource; } } /** * Returns an unmodifiable instance of the default stop words set. * * @return default stop words set. */ public static CharArraySet getDefaultStopSet() { return SlovakLemmaAnalyzer.DefaultSetHolder.DEFAULT_STOP_SET; } /** * Atomically loads the DEFAULT_STOP_SET in a lazy fashion once the outer class accesses the static final * set the first time.; */ private static class DefaultSetHolder { static final CharArraySet DEFAULT_STOP_SET = getStopSet(); private static CharArraySet getStopSet() { try { return WordlistLoader.getWordSet(IOUtils.getDecodingReader(SlovakLemmaAnalyzer.class, DEFAULT_STOPWORD_FILE, IOUtils.CHARSET_UTF_8), "#", Version.LUCENE_CURRENT); } catch (IOException ex) { // default set should always be present as it is part of the // distribution (JAR) throw new RuntimeException("Unable to load default stopword set"); } } } /** * Builds an analyzer with the default stop words: {@link #getDefaultStopSet}. * * @param matchVersion */ public SlovakLemmaAnalyzer(Version matchVersion) { this(matchVersion, Dictionary.DEFAULT, SlovakLemmaAnalyzer.DefaultSetHolder.DEFAULT_STOP_SET); } /** * Builds an analyzer with the default stop words: {@link #getDefaultStopSet}. * * @param matchVersion */ public SlovakLemmaAnalyzer(Version matchVersion, Dictionary dictionary) { this(matchVersion, dictionary, SlovakLemmaAnalyzer.DefaultSetHolder.DEFAULT_STOP_SET); } /** * Builds an analyzer with the given stop words. * * @param matchVersion lucene compatibility version * @param dictionary dictionary resource * @param stopwords a stopword set */ public SlovakLemmaAnalyzer(Version matchVersion, Dictionary dictionary, CharArraySet stopwords) { this(matchVersion, dictionary, stopwords, CharArraySet.EMPTY_SET); } /** * Builds an analyzer with the given stop words. If a non-empty stem exclusion set is provided this * analyzer will add a {@link KeywordMarkerFilter} before stemming. * * @param matchVersion lucene compatibility version * @param dictionary dictionary resource * @param stopwords a stopword set * @param stemExclusionSet a set of terms not to be stemmed */ public SlovakLemmaAnalyzer(Version matchVersion, Dictionary dictionary, CharArraySet stopwords, CharArraySet stemExclusionSet) { super(matchVersion, stopwords); this.dictionary = dictionary; this.stemExclusionSet = CharArraySet.unmodifiableSet(CharArraySet.copy( matchVersion, stemExclusionSet)); } /** * Creates {@link org.apache.lucene.analysis.Analyzer.TokenStreamComponents} used to tokenize all the text * in the provided {@link Reader}. * * @return {@link org.apache.lucene.analysis.Analyzer.TokenStreamComponents} built from a * {@link StandardTokenizer} filtered with * {@link StandardFilter}, {@link LowerCaseFilter}, {@link StopFilter} , and {@link CzechStemFilter} (only * if version is >= LUCENE_31). If a version is >= LUCENE_31 and a stem exclusion set is provided via * {@link #CzechAnalyzer(Version, CharArraySet, CharArraySet)} a {@link KeywordMarkerFilter} is added * before {@link CzechStemFilter}. */ @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { final Tokenizer source = new StandardTokenizer(matchVersion, reader); TokenStream result = new StandardFilter(matchVersion, source); result = new LowerCaseFilter(matchVersion, result); result = new StopFilter(matchVersion, result, stopwords); if (matchVersion.onOrAfter(Version.LUCENE_31)) { if (!this.stemExclusionSet.isEmpty()) { result = new SetKeywordMarkerFilter(result, stemExclusionSet); } result = new MorfologikFilter(result, dictionary.getResource(), matchVersion); } result = new ASCIIFoldingFilter(result); return new TokenStreamComponents(source, result); } } --nextPart4156608.FA1aGKTExa Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org --nextPart4156608.FA1aGKTExa--