![](/style/images/good.png)
![](/style/images/bad.png)
Performance difference in prepared statements compared to jTDS due to differing...
source link: https://github.com/microsoft/mssql-jdbc/issues/1196
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Performance difference in prepared statements compared to jTDS due to differing execution plan #1196
Comments
Driver version7.4.1.jre12 SQL Server versionMicrosoft SQL Server 2017 (RTM-GDR) (KB4505224) - 14.0.2027.2 (X64) Jun 15 2019 00:26:19 Copyright (C) 2017 Microsoft Corporation Enterprise Edition (64-bit) on Windows 10 Enterprise 10.0 (Build 18362: ) Client Operating SystemWindows 10 Pro (1909) JAVA/JVM versionOpenJDK 12.0.2+10 Table schemaA simple table with a few columns and one index Problem descriptionWe would like to switch from JTDS to MS SQL JDBC driver. The problem is that during long data crunching, the MS SQL driver is much slower. Our data crunching, which includes various other things, takes 77 minutes with JTDS and 104 minutes with MS SQL.
Reproduction codeI narrowed down the test case to a relatively small backup (14MB, 170MB unzipped) and a Java class. MSvsJTDS_bak.zip
We got these results: 2nd Try: 3rd Try: 4th Try: On a different client and host machine, we got: |
Contributor
ulvii commented on Nov 29, 2019
Hi @Chrriis, |
Contributor
ulvii commented on Nov 30, 2019
Hi @Chrriis, I also modified your test code a bit and seeing some strange behavior. When I hardcode the ID in the query
and update the JAVA code accordingly
jTDS driver becomes slower too, both drivers perform the same. So parametrizing IDs makes jTDS faster and I am currently not sure why. The only difference between drivers is jTDS calls |
Author
Chrriis commented on Dec 1, 2019
Thank you very much for the update! Maybe I should tell more about this schema. This schema contains a primary key (Col1), a conceptual key (Col2, Col3, Col4, Col5) with a versioning column (Col28). A version can contain many rows of different conceptual keys. I am not a specialist in statistics computing in databases, but it is possible that the layout after insertion helps running the update statement (only the most recent versions are of interest, so last inserted rows per conceptual keys / PK) as done by JTDS. You say that the only difference is in using sp_prepexec instead of sp_prepare. But in the test case, we clearly prepare the statement before executing it multiple times. Do you mean that sp_prepexec is invoked at the first batch execution (and sp_execute after that) or used at each batch execution (what would it mean to have a prepare step everytime)? Have you tried modifying the MS code to use sp_prepare like JTDS to see if you get the same result than JTDS (so we are sure there is nothing else at play here, like a subtle difference in the use of the TDS protocol, data types, etc.)? I will do a few experiments tomorrow, I will let you know if I find anything of interest. |
Author
Chrriis commented on Dec 2, 2019
I tried another variation of the statement which is less performant but more basic SQL (no window function). |
Author
Chrriis commented on Dec 2, 2019
I tried changing various settings in the connection string of SQL Server, but no improvement:
I also tried to change JTDS parameters to see how the driver behaves. Still fast when changing:
JTDS became as slow as the MS driver when changing: I was not surprised by that last change result. The JTDS documentation says:
|
Author
Chrriis commented on Dec 2, 2019
The test case uses a batch but I found this is irrelevant. If you replace |
Author
Chrriis commented on Dec 2, 2019
The funny thing is that if you hardcode the 1st and 3rd parameters in the query (only leaving the ID parameterized), then the query is still fast for JTDS and slow for MS. |
Contributor
ulvii commented on Dec 4, 2019
Hi @Chrriis ,
MS JDBC
|
@ulvii I looked at the code and figured that out :) That being said, I disagree when you say that it is the only thing that differs. I simplified the test case and added logs of the bytes that are sent for a given execution of the statement. I see less bytes sent with JTDS although we should be under the same conditions in both cases: statement was prepared long time ago, there were the same number of executions, etc. Of course it is possible that I made a mistake in my test case, so please do cross-check my findings! The zip
It seems that the JTDS prepared statement executions do not have the header (size 22). Maybe only the preparation packet needs it and adding it to executions creates problems? Thanks a lot for your help! |
Author
Chrriis commented on Dec 4, 2019
I think I succeeded modifying the code so that the sp_prepexec call is just a sp_prepare instead. That way we can compare the same scenario between MS and JTDS. Well, unless I made a mistake, total time has not changed: MS is still 3x slower... |
I've done some profiling that may or may not be interesting for this case (I'm attaching it.) Here's what I see. It looks like the number of requests are roughly the same between the two drivers, yet the MS driver requests appear to take a much longer time. The MS driver is spending all of it's time here
On the flip side, the Jtds path looks as follows for where most of it's time is spent
This was using JTDs 1.3.1 and MS 7.4.1 I ran sql server locally in a docker container. As far as I can tell, it looks like both are doing essentially the same thing. Some protocol or connection setting seems to be the likely culprit here, IMO. Interestingly enough, in both profiling sessions I did (One with jtds first, one with it second, to make sure that DB caching wasn't messing with things), it looks like the first 2 batches for the MS driver ran faster than JTDs (~80ms). It was the subsequent requests that ended up taking ~1 second. JTDS consistently spent around 160ms per request. Is there some "expected batch size" or "look ahead" parameter going on that the MS driver isn't sending? |
@Chrriis It looks to me (and I could be wrong) that the MS driver is setting the "SQLBatch" packet data section and JTDs is not. |
Author
Chrriis commented on Dec 6, 2019
Thanks @cogman. I analyzed the bytes that were sent vs TDS spec and suspected they were the MARS section stuff, so thank you for the confirmation. I think JTDS uses an older version of the protocol that does not have those. About profiling, I too found that time was spent reading, which is why I started analyzing the sent messages in the first place, but they seemed to be the same. So I think you are right saying there is a protocol or connection setting, but I fail to find which could be causing this huge difference. Any help is welcome there! :) |
Author
Chrriis commented on Dec 11, 2019
@ulvii I found the issue! The root cause is that the statement that is passed to the server is slightly different between MS and JTDS: the MS driver adds spaces at the end of the statement when replacing the Of course, the fix must be more subtle than this: the statement that is passed to the driver should be left untouched when replacing the parameters. What I mean is that
With that fix, the MS driver would do as is told (passing command as is) and for us it would become as fast as JTDS (at least for this case). Please let us know what you think! :) |
Author
Chrriis commented on Dec 11, 2019
The fix in
That way, the statement is untouched, spaces that were part of the original statement are preserved and no new spaces are added. |
cogman commented on Dec 11, 2019 •
TBH, this method is a fair bit more complex than needs be. (Looks like a C->Java translation) Make this into a string builder and use strings/Integer.toString and this will be both more idiomatic and readable. The performance hit will be negligible, potentially non-existent due to JVM optimizations. For example, "makeParameterName" could be simplified to Also, weird that spaces have that big of an impact on performance from the server. I'd never have guessed that. Same method, more idiomatic
|
Hi @ulvii, I am not sure what you did, because on our systems, the problem and the fix are consistent. Here are our results:
The only change between both is that:
is replaced with:
|
Contributor
ulvii commented on Dec 13, 2019
Author
Chrriis commented on Dec 13, 2019
Thanks @ulvii, I would greatly appreciate it if you could updated the driver with the changes I suggested. It does make a huge difference! Of course, what would be great for the future is if SQL Server itself were fixed to treat statements with and without trailing spaces in the same manner. If you know anyone at Microsoft who can do something in the server itself, you would make the world a better place :) About the changes @cogman proposed, I would have to try them when time allows. That being said, I think the immediate need is a fix that is the least intrusive, and I think improving the code as suggested deserves its own bug report / PR to better evaluate if it is equivalent and does not breaks anything (for example, I am not sure how |
cogman commented on Dec 14, 2019 •
@Chrriis I've ran the test a few times with a few permutations of the driver changes (mine, yours, part of yours, part of mine). I didn't see any driver differences even though the string output at the end was the same.
I even tried running against sql server 2017 vs 2019 just to see if it was something about the database version. I threw in print statements into the driver to make sure I wasn't somehow getting an older version of the driver. I ran this change with 8.1.1 built on my box. Edit: Just for fun, I also added a bit of logic to strip out repeated whitespace in the query. No difference.
|
@ulvii I looked at @cogman's code. I changed a few things, e.g. to not allocate an array of chars just to get the length and to take into account the OUT parameter length. I truly think that rewriting this method should be a different issue/PR focused on improving the code and which could have its own dedicated thread of comments (let's not highjack this one which is about a bug in performance). By-the-way, I did not have the bug today but I restored the database backup and could reproduce. I probably had background processes altering statistics. It is important to restore the backup before measuring again. So, I would be grateful if you could fix the issue by replacing:
with:
I hope this can make it soon in a release so we can continue our performance investigations between JTDS and MS drivers. |
@Chrriis Did you miss where I ran your code and did not see performance improvements? Specifically, the statement that got pushed out was I didn't just test my code change. Neither change on 8.1.1 resulted in a difference in performance. I can cook up a test docker compose to demonstrate if you like. |
@cogman This bug is related to SQL Server caches of query plans, parameter sniffing, statistics computing, etc. It is known that having trailing spaces, sometimes magical number of them, confuse SQL Server and make it not reuse an optimal plan. Do you mean that you are not able to reproduce the issue? If so, this kind of issue is like concurrency issues: it is not because you do not see it that it does not exist. In any case, the statement does have useless spaces and it is known to cause problems. |
@ulvii, @cogman, I have very interesting findings. I ported the whole thing to SQL Server Express 2012 (our earlier investigations were on SQL Server 2017). And these are the times we get (without fix): So, on that server, the performance penalty is 10 times! But that is not all. If I apply the fix I suggested, the performance penalty remains. This might explain what @cogman is experiencing. So, I applied another difference I found between JTDS and MS JDBC: spaces around parameters. Depending on the test case, it does or does not help. I was wondering why JTDS was adding these spaces, but maybe they (sometimes?) help parameter sniffing and query plan reuse in older versions of SQL Server, who knows. I would not urge to add the spaces around parameters like JTDS does though, this needs much more analysis. So: the trailing spaces at the end of the statement are definitely a problem and this fix should be applied. But, removing them does not fix all of the performance penalties of MS driver compared to JTDS. Here is the backup for SQL Express 2012: |
I found how to fix the remaining performance penaly found on SQL Server 2012: I changed the MS driver code (temporary hack) to do a prepare instead of a prepExec. Then it becomes as fast... So, we can infer that:
(of course, do check on your side if you come to the same conclusions). |
Contributor
ulvii commented on Dec 19, 2019
Hi @Chrriis , @cogman , |
Author
Chrriis commented on Dec 20, 2019
Thanks @ulvii, I will try the next official release that contains this first fix. Unfortunately, we need to support SQL Server 2012 so we will stick to JTDS in production for the time being. About the PrepExec vs Prepare + Execute, JTDS has the notion of modes using the prepareSQL connection property. Here is its documentation: prepareSQL (default - 3 for SQL Server, 1 for Sybase)
While I don't think the MS driver should have these modes, maybe you could add some modes at least to switch between PrepExec and Prepare + Execute. |
Hi @otbutz , Edit: we look into getting this done in the next production release before January 31st, will update you on progress. |
Is this going to be released in 9.2.0 ? |
Highly doubt it. The issue is open since Nov 2019 with a very clear diagnosis done by @Chrriis who posted unpatched vs patched benchmark results. And yet a slowdown of factor 3-4 compared to jtds doesn't seem to justify prioritizing this issue. |
Contributor
peterbae commented on Jan 19, 2021
The issue is still pending prioritization, we will let you know if we have any progress on this issue in the future. |
@lukaseder maybe a tweet to draw attention to this issue? |
Sure, that'll be 250 EUR per tweet |
That's actually cheaper than i thought |
Anyone want to take a stab at a pull request that adds a mode in the driver configuration (and elsewhere) so that it will switch to prepare+exec instead of prepexec like the JTDS driver does? Alternative for our use case is I guess to clone and build our own with Chrriis's hack |
Author
Chrriis commented on May 18, 2021
It is not straightforward to implement, which is why I had hopes that the original developers would take a look at it.
The hack is coupled with the test so that there is always a dummy call first. It cannot be turned into a proper feature. |
Dito. We can only hope that MS changes its prioritization strategy and finally pays more attention to the fundamental problems of its driver. But backlogged issues like this one and #1538 are red flags that we'll have to stick to jTDS for the foreseeable future. |
Amazing work @Chrriis. We are facing the exact same performance drop switching from jTDS to ms driver. Subscribed to this issue just in case Microsoft wakes up. |
Hi all, This issue, along with others were considered when we do planning for the semester but it didn't make the cut. We will review the list again when we do planning for the next semester but please note we have very limited resources so can only work on those features/issues that are highest priority. Since this project is open source, you are welcome to contribute and create a PR if you have a fix, that would help expedite the process. |
Here I thought that SQL Server was a commercial product offered by Microsoft, a company worth more than 2,000,000,000,000 $, my bad. The next time somebody brings up whether we should chose SQL Server or an alternative I'll bring up that the JDBC driver is just an open source project with very limited resources. |
Any updates? We are approaching the second birthday of this issue. |
changed the title
Big performance penalty in prepared statements compared to JTDS
Performance difference in prepared statements compared to jTDS due to differing execution plan
I just encountered another performance issue with the MS JDBC driver. It takes about 39 minutes to insert 85K records into a MS SQL Server table using the MS SQL Server driver. If I use the Open Source JTDS driver then it takes only 39 seconds. The issue is only occurring when I have fields defined as BigDecimal in the application and that generates setBigDecimal for the prepared statement. If I change the data type from BigDecimal to Int, Float or Double, I do not get that performance issue. Is setBigDecimal inherently slow performing and should be avoided with MS SQL Server driver? Why do I not experience the same issue with the Open source JTDS driver? |
Hi @dospencer, we are currently in progress of investigating performance related problems in regards to this issue. However, we are preparing for an upcoming release and will try to address this issue after. |
@dospencer We expect to have our release out sometime late January or early February |
Hi @dospencer, can you create a separate github issue for your problem? Thanks |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
No one assigned
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK