[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikitech-l
Subject:    Re: [Wikitech-l] Machine-utilizable Crowdsourced Lexicons
From:       David Cuenca Tudela <dacuetu () gmail ! com>
Date:       2018-05-30 8:32:38
Message-ID: CAJBSGSp8yYKU6ZN=a8cMdueSXabAn=r_5jB15R4Jjc-7TDnW=Q () mail ! gmail ! com
[Download RAW message or body]

Hi Adam,

Thanks for your well-intentioned letter. Do you know about Wikidata and the
recent developments to support machine-readable Lexicographical data? I
would like to invite you to take a look at:
https://www.wikidata.org/wiki/Wikidata:Lexicographical_data

The system is still at its early stages, but you can take a look to
examples like:
https://www.wikidata.org/wiki/Lexeme:L11
https://www.wikidata.org/wiki/Lexeme:L403

If you have any questions about this, please do ask.

Regards,
Micru


On Wed, May 30, 2018 at 3:01 AM, Adam Sobieski <adamsobieski@hotmail.com>
wrote:

> INTRODUCTION
>
> Machine-utilizable lexicons can enhance a great number of speech and
> natural language technologies. Scientists, engineers and technologists –
> linguists, computational linguists and artificial intelligence researchers
> – eagerly await the advancement of machine lexicons which include rich,
> structured metadata and machine-utilizable definitions.
>
> Wiktionary, a collaborative project to produce a free-content multilingual
> dictionary, aims to describe all words of all languages using definitions
> and descriptions. The Wiktionary project, brought online in 2002, includes
> 139 spoken languages and American sign language [1].
>
> This letter hopes to inspire exploration into and discussion regarding
> machine wiktionaries, machine-utilizable crowdsourced lexicons, and
> services which could exist at https://machine.wiktionary.org/ .
>
> LEXICON EDITIONING
>
> The premise of editioning is that one version of the resource can be more
> or less frozen, e.g. a 2018 edition, while wiki editors collaboratively
> work on a next version, e.g. a 2019 edition. Editioning can provide
> stability for complex software engineering scenarios utilizing an online
> resource. Some software engineering teams, however, may choose to utilize
> fresh dumps or data exports of the freshest edition.
>
> SEMANTIC WEB
>
> A machine-utilizable lexicon could include a semantic model of its
> contents and a SPARQL endpoint.
>
> MACHINE-UTILIZABLE DEFINITIONS
>
> Machine-utilizable definitions, available in a number of knowledge
> representation formats, can be granular, detailed and nuanced.
>
> There exist a large number of use cases for machine-utilizable
> definitions. One use case is providing natural language processing
> components with the capabilities to semantically interpret natural
> language, to utilize automated reasoning to disambiguate lexemes, phrases
> and sentences in contexts. Some contend that the best output after a
> natural language processing component processes a portion of natural
> language is each possible interpretation, perhaps weighted via statistics.
> In this way, (1) natural language processing components could process
> ambiguous language, (2) other components, e.g. automated reasoning
> components, could narrow sets of hypotheses utilizing dialogue contexts,
> (3) other components, e.g. automated reasoning components, could narrow
> sets of hypotheses utilizing knowledgebase content, and (4)
> mixed-initiative dialogue systems could also ask users questions to narrow
> sets of hypotheses. Such disambiguation and interpretation would utilize
> machine-utilizable definitions of senses of lexemes.
>
> CONJUGATION, DECLENSION AND THE URL-BASED SPECIFICATION OF LEXEMES AND
> LEXICAL PHRASES
>
> A grammatical category [2] is a property of items within the grammar of a
> language; it has a number of possible values, sometimes called grammemes,
> which are normally mutually exclusive within a given category. Verb
> conjugation, for example, may be affected by the grammatical categories of:
> person, number, gender, tense, aspect, mood, voice, case, possession,
> definiteness, politeness, causativity, clusivity, interrogativity,
> transitivity, valency, polarity, telicity, volition, mirativity,
> evidentiality, animacy, associativity, pluractionality, reciprocity,
> agreement, polypersonal agreement, incorporation, noun class, noun
> classifiers, and verb classifiers in some languages [3].
>
> By combining the grammatical categories from each and every language
> together, we can precisely specify a conjugation or declension. For
> example, the URL:
>
> https://machine.wiktionary.org/wiki/lookup.php?edition=
> 2018&language=en-US&lemma=fly&category=verb&person=first-
> person&number=singular&tense=past&aspect=past_simple&mood=indicative&…
>
> includes an edition, a language of a lemma, a lemma, a lexical category,
> and conjugates (with ellipses) the verb in a language-independent manner.
>
> We can further specify, via URL query string, the semantic sense of a
> grammatical element:
>
> https://machine.wiktionary.org/wiki/lookup.php?edition=
> 2018&language=en-US&lemma=fly&category=verb&person=first-
> person&number=singular&tense=past&aspect=past_simple&mood=
> indicative&...&sense=4
>
> Specifying a grammatical item fully in a URL query string, as indicated in
> the previous examples, could result in a redirection to another URL.
>
> That is, the URL:
>
> https://machine.wiktionary.org/wiki/lookup.php?edition=
> 2018&language=en-US&lemma=fly&category=verb&person=first-
> person&number=singular&tense=past&aspect=past_simple&mood=indicative&…
>
> could redirect to:
>
> https://machine.wiktionary.org/wiki/index.php?edition=2018&id=12345678
>
> or to:
>
> https://machine.wiktionary.org/wiki/2018/12345678/
>
> and the URL with a specified semantic sense:
>
> https://machine.wiktionary.org/wiki/lookup.php?edition=
> 2018&language=en-US&lemma=fly&category=verb&person=first-
> person&number=singular&tense=past&aspect=past_simple&mood=
> indicative&...&sense=4
>
> could redirect to:
>
> https://machine.wiktionary.org/wiki/index.php?edition=
> 2018&id=12345678&sense=4
>
> or to:
>
> https://machine.wiktionary.org/wiki/2018/12345678/4/
>
> The URL https://machine.wiktionary.org/wiki/2018/12345678/ is intended to
> indicate a conjugation or declension with one or more meanings or senses.
> The URL https://machine.wiktionary.org/wiki/2018/12345678/4/ is intended
> to indicate a specific sense or definition of a conjugation or declension.
> A feature from having URL's for both conjugations or declensions and for
> specific meanings or senses is that HTTP request headers can specify
> languages and content types of the output desired for a particular URL.
>
> The provided examples intended to indicate that each complete,
> language-independent conjugation or declension can have an ID number as
> opposed to each headword or lemma. Instead of one ID number for all
> variations of "fly", there is one ID number for "flew", another for "have
> flown", another for "flying", and one for each conjugation or declension.
> Reasons for indexing the conjugations and declensions instead of
> traditional headwords or lemmas include that, at least for some knowledge
> representation formats, the formal semantics of the definitions vary per
> conjugation or declension.
>
> CONCLUSION
>
> This letter broached machine wiktionaries and some of the services which
> could exist at https://machine.wiktionary.org/ . It is my hope that this
> letter indicated a few of the many exciting topics with regard to
> machine-utilizable crowdsourced lexicons.
>
>
> REFERENCES
>
> [1] https://en.wiktionary.org/wiki/Index:All_languages#List_of_languages
> [2] https://en.wikipedia.org/wiki/Grammatical_category
> [3] https://en.wikipedia.org/wiki/Grammatical_conjugation
> [4] https://en.wikipedia.org/wiki/List_of_HTTP_header_fields#
> Request_fields
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Etiamsi omnes, ego non
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic