[prev in list] [next in list] [prev in thread] [next in thread]
List: cassandra-dev
Subject: Re: [VOTE] CEP-30 ANN Vector Search
From: "Andrew Cobley (Staff)" <a.e.cobley () dundee ! ac ! uk>
Date: 2023-06-16 18:44:11
Message-ID: DU0PR04MB9418BB72A929B915768FCC0EE758A () DU0PR04MB9418 ! eurprd04 ! prod ! outlook ! com
[Download RAW message or body]
[Attachment #2 (text/plain)]
Thanks Jonathan,
That's good to know.
Andy
From: Jonathan Ellis <jbellis@gmail.com>
Date: Friday, 16 June 2023 at 18:04
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: [VOTE] CEP-30 ANN Vector Search
CAUTION: This email originated from outside the University of Dundee. Do not click \
links or open attachments unless you recognise the sender's email address and know \
the content is safe. Correct. They will be ordered closest-first.
Unfortunately it's not possible for the near or medium future to do farthest-first. \
HNSW index gets to log(n) time by only keeping a subset of the closest neighbors for \
each vector. So you'd need a separate index with a inverse-cosine similarity metric, \
and it's not possible today to use a custom metric function.
(This has been GA for over a year in Elastic and Solr and so far nobody has needed \
farthest-first badly enough to add this as an option to the underlying Lucene \
library.)
You can get the distances back today, like this:
SELECT my_text, similarity_cosine(my_embedding, ?)
FROM my_table
ORDER BY my_embedding ANN OF ? LIMIT 2
Then just pass the query vector into both bind variables.
On Fri, Jun 16, 2023 at 7:09 AM Andrew Cobley (Staff) \
<a.e.cobley@dundee.ac.uk<mailto:a.e.cobley@dundee.ac.uk>> wrote: Hi,
I've got a question and a request about this CEP
In the example:
SELECT * FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1;
I presume that limit n will return the nth nearest neighbours?
If that's the case what order will they be in? Is it posssible to reverse the order ?
Secondly would it be possible to return the calculated distances? This might be \
particular important if there are n returned neighbours?
Andy
________________________________
From: Patrick McFadin <pmcfadin@gmail.com<mailto:pmcfadin@gmail.com>>
Sent: 15 June 2023 01:03
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org> \
<dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: [VOTE] CEP-30 ANN Vector Search
CAUTION: This email originated from outside the University of Dundee. Do not click \
links or open attachments unless you recognise the sender's email address and know \
the content is safe. Andy,
Good to see you on the ML again! CEP-30 is slated for release with 5.0 later in the \
year. Until then, you'll need to do a local build or try it out in a preview in \
Astra. A few of us have been talking about creating a preview docker image since \
there is some interest in having it run in k8ssandra. In any case, this is very alpha \
code and should be treated as such. Reporting errors or unusual results would be \
greatly appreciated!
Patrick
On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) \
<a.e.cobley@dundee.ac.uk<mailto:a.e.cobley@dundee.ac.uk>> wrote:
Hi All,
Great news this has gone through, I wondering if we have a timescale for this making \
it to Beta or release ? I'm asking because we have a project that would benefit from \
this approach.
Andy
From: Jonathan Ellis <jbellis@gmail.com<mailto:jbellis@gmail.com>>
Date: Tuesday, 30 May 2023 at 14:44
To: dev <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: [VOTE] CEP-30 ANN Vector Search
CAUTION: This email originated from outside the University of Dundee. Do not click \
links or open attachments unless you recognise the sender's email address and know \
the content is safe.
Thanks, all. Closing the vote as accepted with 8 binding +1 (including me) and 11 \
non-binding votes.
On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis \
<jbellis@gmail.com<mailto:jbellis@gmail.com>> wrote:
Let's make this official.
CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
POC that demonstrates all the big rocks, including distributed queries: \
https://github.com/datastax/cassandra/tree/cep-vsearch
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
The University of Dundee is a registered Scottish Charity, No: SC015096
The University of Dundee is a registered Scottish Charity, No: SC015096
--
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced
The University of Dundee is a registered Scottish Charity, No: SC015096
[Attachment #3 (text/html)]
<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"> <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
@font-face
{font-family:inherit;
panose-1:2 11 6 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
font-size:10.0pt;
font-family:"Courier New";}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.gmailsignatureprefix
{mso-style-name:gmail_signature_prefix;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">Thanks \
Jonathan,<o:p></o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p> <p \
class="MsoNormal"><span style="font-size:11.0pt;mso-fareast-language:EN-US">That's \
good to know.<o:p></o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt;mso-fareast-language:EN-US"><br> Andy<o:p></o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p> <p \
class="MsoNormal"><span \
style="font-size:11.0pt;mso-fareast-language:EN-US"><o:p> </o:p></span></p> <div \
id="mail-editor-reference-message-container"> <div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span \
style="font-size:12.0pt;color:black">From: </span></b><span \
style="font-size:12.0pt;color:black">Jonathan Ellis <jbellis@gmail.com><br> \
<b>Date: </b>Friday, 16 June 2023 at 18:04<br> <b>To: </b>dev@cassandra.apache.org \
<dev@cassandra.apache.org><br> <b>Subject: </b>Re: [VOTE] CEP-30 ANN Vector \
Search<o:p></o:p></span></p> </div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><span \
style="font-size:11.0pt"> <o:p></o:p></span></p> <div style="border:solid \
#9C6500 1.0pt;padding:0cm 0cm 0cm \
0cm;margin-left:2.0pt;margin-top:2.0pt;margin-right:2.0pt;margin-bottom:2.0pt"> <p \
class="MsoNormal" align="center" \
style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;text-align:center;line-height:12.0pt;background:#FFFFEE">
<span style="font-size:12.0pt;color:black">CAUTION: This email originated from \
outside the University of Dundee. Do not click links or open attachments unless you \
recognise the sender's email address and know the content is safe.</span><span \
style="font-size:11.0pt;color:black"> </span><span \
style="font-size:11.0pt"><o:p></o:p></span></p> </div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Correct. They will be \
ordered closest-first.<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Unfortunately it's not possible \
for the near or medium future to do farthest-first. HNSW index gets to log(n) \
time by only keeping a subset of the closest neighbors for each vector. So \
you'd need a separate index with a inverse-cosine similarity metric, and it's not \
possible today to use a custom metric function.<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">(This has been GA for over a year \
in Elastic and Solr and so far nobody has needed farthest-first badly enough to add \
this as an option to the underlying Lucene library.)<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">You can get the distances back \
today, like this:<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">SELECT my_text, \
similarity_cosine(my_embedding, ?) <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">FROM my_table \
<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">ORDER BY my_embedding ANN OF ? \
LIMIT 2<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Then just pass the query vector \
into both bind variables.<o:p></o:p></span></p> </div>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">On Fri, Jun 16, 2023 at 7:09 AM \
Andrew Cobley (Staff) <<a \
href="mailto:a.e.cobley@dundee.ac.uk">a.e.cobley@dundee.ac.uk</a>> \
wrote:<o:p></o:p></span></p> </div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm \
6.0pt;margin-left:4.8pt;margin-right:0cm"> <div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">Hi,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">I've got a question \
and a request about this CEP<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">In the \
example:<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<pre style="margin-top:7.5pt"><span style="font-size:11.5pt;color:#292929">SELECT * \
FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1;<o:p></o:p></span></pre> <p \
class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">I presume that limit n \
will return the nth nearest neighbours? <o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">If that's the case \
what order will they be in? Is it posssible to reverse the order \
?<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">Secondly would it be \
possible to return the calculated distances? This might be particular important \
if there are n returned neighbours?<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:"inherit",serif">Andy<o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center"><span \
style="font-size:11.0pt"> <hr size="0" width="94%" align="center">
</span></div>
<div id="m_-4950070540474202086divRplyFwdMsg">
<p class="MsoNormal"><b><span \
style="font-size:11.0pt;color:black">From:</span></b><span \
style="font-size:11.0pt;color:black"> Patrick McFadin <<a \
href="mailto:pmcfadin@gmail.com" target="_blank">pmcfadin@gmail.com</a>><br> \
<b>Sent:</b> 15 June 2023 01:03<br> <b>To:</b> <a \
href="mailto:dev@cassandra.apache.org" target="_blank">dev@cassandra.apache.org</a> \
<<a href="mailto:dev@cassandra.apache.org" \
target="_blank">dev@cassandra.apache.org</a>><br> <b>Subject:</b> Re: [VOTE] \
CEP-30 ANN Vector Search</span><span style="font-size:11.0pt"> <o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"> <o:p></o:p></span></p>
</div>
</div>
<div>
<p> <o:p></o:p></p>
<div style="border:solid #9C6500 1.0pt;padding:0cm 0cm 0cm \
0cm;margin-left:2.0pt;margin-top:2.0pt;margin-right:2.0pt;margin-bottom:2.0pt"> <p \
align="center" style="text-align:center;line-height:12.0pt;background:#FFFFEE"> <span \
style="font-size:12.0pt;color:black">CAUTION: This email originated from outside the \
University of Dundee. Do not click links or open attachments unless you recognise the \
sender's email address and know the content is safe.</span><span style="color:black"> \
</span><o:p></o:p></p> </div>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Andy,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Good to see you on the ML again! \
CEP-30 is slated for release with 5.0 later in the year. Until then, you'll need to \
do a local build or try it out in a preview in Astra. A few of us have been talking \
about creating a preview docker image since there is some interest in having it run \
in k8ssandra. In any case, this is very alpha code and should be treated as such. \
Reporting errors or unusual results would be greatly appreciated! \
<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Patrick<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">On Wed, Jun 14, 2023 at 7:10 AM \
Andrew Cobley (Staff) <<a href="mailto:a.e.cobley@dundee.ac.uk" \
target="_blank">a.e.cobley@dundee.ac.uk</a>> wrote:<o:p></o:p></span></p> </div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm \
6.0pt;margin-left:4.8pt;margin-right:0cm"> <div>
<div>
<div>
<p>Hi All,<o:p></o:p></p>
<p> <o:p></o:p></p>
<p>Great news this has gone through, I wondering if we have a timescale for this \
making it to Beta or release ? I'm asking because we have a project that would \
benefit from this approach.<o:p></o:p></p> <p> <o:p></o:p></p>
<p>Andy<o:p></o:p></p>
<p> <o:p></o:p></p>
<p> <o:p></o:p></p>
<div id="m_-4950070540474202086x_m_-4135098088777040675mail-editor-reference-message-container">
<div>
<div style="border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm \
0cm;border-color:currentcolor currentcolor"> <p style="margin-bottom:12.0pt"><b><span \
style="font-size:12.0pt;color:black">From: </span></b><span \
style="font-size:12.0pt;color:black">Jonathan Ellis <<a \
href="mailto:jbellis@gmail.com" target="_blank">jbellis@gmail.com</a>><br> \
<b>Date: </b>Tuesday, 30 May 2023 at 14:44<br> <b>To: </b>dev <<a \
href="mailto:dev@cassandra.apache.org" \
target="_blank">dev@cassandra.apache.org</a>><br> <b>Subject: </b>Re: [VOTE] \
CEP-30 ANN Vector Search</span><o:p></o:p></p> </div>
<p> <o:p></o:p></p>
<div style="border:solid #9C6500 1.0pt;padding:0cm 0cm 0cm \
0cm;margin-left:2.0pt;margin-top:2.0pt;margin-right:2.0pt;margin-bottom:2.0pt"> <p \
align="center" style="text-align:center;line-height:12.0pt;background:#FFFFEE"> <span \
style="font-size:12.0pt;color:black">CAUTION: This email originated from outside the \
University of Dundee. Do not click links or open attachments unless you recognise the \
sender's email address and know the content is safe.</span><span style="color:black"> \
</span><o:p></o:p></p> </div>
<div>
<div>
<p>Thanks, all. Closing the vote as accepted with 8 binding +1 (including me) \
and 11 non-binding votes.<o:p></o:p></p> </div>
<p> <o:p></o:p></p>
<div>
<div>
<p>On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis <<a \
href="mailto:jbellis@gmail.com" target="_blank">jbellis@gmail.com</a>> \
wrote:<o:p></o:p></p> </div>
<blockquote style="border:none;border-left:solid windowtext 1.0pt;padding:0cm 0cm 0cm \
6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt;border-color:currentcolor \
currentcolor currentcolor rgb(204,204,204)"> <div>
<div>
<p>Let's make this official.<o:p></o:p></p>
</div>
<div>
<p><br>
CEP: <a href="https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes" \
target="_blank"> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Appro \
ximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes</a><o:p></o:p></p>
</div>
<div>
<p> <o:p></o:p></p>
</div>
<div>
<p>POC that demonstrates all the big rocks, including distributed queries: <a \
href="https://github.com/datastax/cassandra/tree/cep-vsearch" target="_blank"> \
https://github.com/datastax/cassandra/tree/cep-vsearch</a><o:p></o:p></p> </div>
<p><br>
-- <o:p></o:p></p>
<div>
<div>
<div>
<p>Jonathan Ellis<br>
co-founder, <a href="http://www.datastax.com" \
target="_blank">http://www.datastax.com</a><br> @spyced<o:p></o:p></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<p><br clear="all">
<br>
-- <o:p></o:p></p>
<div>
<div>
<div>
<p>Jonathan Ellis<br>
co-founder, <a href="http://www.datastax.com" \
target="_blank">http://www.datastax.com</a><br> @spyced<o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
</span>The University of Dundee is a registered Scottish Charity, No: SC015096<span \
style="font-size:11.0pt"> <o:p></o:p></span></p>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><br>
</span>The University of Dundee is a registered Scottish Charity, No: SC015096<span \
style="font-size:11.0pt"> <o:p></o:p></span></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt"><br clear="all">
<br>
<span class="gmailsignatureprefix">-- </span><o:p></o:p></span></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt">Jonathan Ellis<br>
co-founder, <a href="http://www.datastax.com" \
target="_blank">http://www.datastax.com</a><br> @spyced<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<span style="font-size:10pt;">The University of Dundee is a registered Scottish \
Charity, No: SC015096</span> </body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic