[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-user
Subject:    Poor Performance of Cassandra UDF/UDA
From:       Xin Jin <xjin () telesign ! com>
Date:       2017-09-26 18:22:33
Message-ID: B081BD69-4B63-40C6-A881-7C949A4B830D () telesign ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Hi All,

I am new to the Cassandra community and  thank you in advance for your kindly \
comments on an issue we met recently.

We have found that running query with direct UDF execution is ten time more faster \
than the async UDF execution. The in-line comment: "Using async UDF execution is \
expensive (adds about 100us overhead per invocation on a Core-i7 MBPr)" \
https://insight.io/github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293 \
show that this is a known behavior.  My questions are as below:

1. What are the main pros and cons of these two methods? Can I find any documents \
that discuss this?

2. Are there any plans to improve the performance of using async UDF? A simple way \
come to my mind is to use some sort of batch method, e.g., replace current row by row \
method with some rows by some rows. Are there any concerns on this?

3. How people solve this performance issue in general? It seems this performance \
issue is not an urgent or an important issue to solve because it is known and it is \
still there. Therefore people must have some sort of good solution solving this \
issue.

4. Can anyone share your experience on a solution for this?

I really appreciate your comments in advance.

Best regards,

Xin


[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: \
after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-family: Calibri, \
sans-serif;"> <div>
<div>
<div>
<div>Hi All,</div>
<div><br>
</div>
<div>I am new to the Cassandra community and &nbsp;thank you in advance for your \
kindly comments on an issue we met recently.&nbsp;</div> <div><br>
</div>
<div>We have found that running query with direct UDF execution is ten time more \
faster than the async UDF execution. The in-line comment: &quot;Using async UDF \
execution is expensive (adds about 100us overhead per invocation on a Core-i7 \
MBPr)"&nbsp;<a href="https://insight.io/github.com/apache/cassandra/blob/trunk/src/jav \
a/org/apache/cassandra/cql3/functions/UDFunction.java?line=293">https://insight.io/git \
hub.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293</a>&nbsp;show
  that this is a known behavior. &nbsp;My questions are as below:</div>
<div><br>
</div>
<div>1. What are the main pros and cons of these two methods? Can I find any \
documents that discuss this? &nbsp;</div> <div><br>
</div>
<div>2. Are there any plans to improve the performance of using async UDF? A simple \
way come to my mind is to use some sort of batch method, e.g., replace current row by \
row method with some rows by some rows. Are there any concerns on this?</div> \
<div><br> </div>
<div>3. How people solve this performance issue in general? It seems this performance \
issue is not an urgent or an important issue to solve because it is known and it is \
still there. Therefore people must have some sort of good solution solving this \
issue.&nbsp;</div> <div><br>
</div>
<div>4. Can anyone share your experience on a solution for this?</div>
<div><br>
</div>
<div>I really appreciate your comments in advance.</div>
<div><br>
</div>
<div>Best regards,</div>
<div><br>
</div>
<div>Xin</div>
</div>
<div>
<div id="MAC_OUTLOOK_SIGNATURE"></div>
</div>
</div>
</div>
</body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic