does use of large datatypes where much smaller ones will do, affect performance? – SQLServerCentral Forums (2024)

Post reply

  • polkadot

    SSChampion

    Points: 13664

    More actions

    November 7, 2018 at 1:03 am

    #378561

    when a tablehas onehundred columns, where40 of them have datatypes far exceeding the max(len()) of data actually in them,will this alone impactquery performance?

    Breakdown
    varchar(100) -19 havingmax length 20
    varchar(300) - 4havingmax length 51
    varchar(500), - 12having max length 94
    varchar(1000) - 2 having max length 86
    varchar(2000) - 2 having max length 115
    up to varchar(4000) -2having max length 3999 & 392

    --Quote me

  • John Mitchell-245523

    SSC Guru

    Points: 148809

    More actions

    November 7, 2018 at 1:52 am

    #2012654

    Probably not, although it may affect data integrity. If the maximum length of any value in the column is going to be 50, then you don't want varchar(300) for that column, since there's a danger you will accidentally admit invalid values.

    One way you may get a small performance improvement is if you have any fixed-length columns - credit card numbers for example. In those cases you should use char instead of varchar, both for data integrity and to avoid the two-byte overhead that varchar incurs over char. (Of course, if it's credit card numbers we're talking about, there are all sorts of security concerns on top of that, but that's beyond the scope of this topic.)

    John

  • Eddie Wuerch

    SSChampion

    Points: 12532

    More actions

    Memory allocated for the execution of a query is based on the assumed row width. The assumed row width is not based on reality - it's based on the declared size of the columns.

    I posted details in another thread (https://www.sqlservercentral.com/Forums/1997686/NVarchar4000-and-Performance#bm1997863). That shows the memory requests and grants associated with selecting the same ~40-byte value out of columns of different data types. The larger the column size, the more wasted memory granted to the query.

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • polkadot

    SSChampion

    Points: 13664

    More actions

    November 7, 2018 at 12:38 pm

    #2012742

    John says No, Eddie gives a fully explained Yes (on referenced link).

    Because of the well reasoned explanation I accept Eddie's answer.

    Large datatypes require more space to index as well, don't they?

    --Quote me

  • Eddie Wuerch

    SSChampion

    Points: 12532

    More actions

    November 7, 2018 at 3:30 pm

    #2012765

    polkadot - Wednesday, November 7, 2018 12:38 PM

    Large datatypes require more space to index as well, don't they?

    With character data, no.
    Physically, there's no difference between storing or indexing a 40-byte string as varchar(128) or varchar(512).
    An ASCII value stored as nvarchar(100) will take twice the space as varchar(100), unless row-level compression is in use.

    Eddie Wuerch
    MCM: SQL

  • polkadot

    SSChampion

    Points: 13664

    More actions

    November 7, 2018 at 11:05 pm

    #2012775

    oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    if I take a len() of a column value and it returns 38, is this 38bytes?

    --Quote me

  • John Mitchell-245523

    SSC Guru

    Points: 148809

    More actions

    November 8, 2018 at 1:50 am

    #2012783

    It won't affect indexing, no, but Eddie is right - it does affect memory grants, which in turn affects performance. I overlooked that in my first reply.

    LEN() returns the number of characters in a string. DATALENGTH() returns the number of bytes it uses.

    John

  • Eddie Wuerch

    SSChampion

    Points: 12532

    More actions

    November 9, 2018 at 3:01 am

    #2012936

    polkadot - Wednesday, November 7, 2018 11:05 PM

    oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.

    varchar(100) -19 having max length 20
    varchar(300) -4 having max length 51
    varchar(500), - 12 havingmax length 94
    varchar(1000) - 2 having max length 86
    varchar(2000) - 2 having max length 115
    up to varchar(4000) - 2 having max length 3999 & 392

    With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.

    The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.

    Eddie Wuerch
    MCM: SQL

  • Jeff Moden

    SSC Guru

    Points: 1003831

    More actions

    November 10, 2018 at 8:15 am

    #2013068

    Eddie Wuerch - Friday, November 9, 2018 3:01 AM

    polkadot - Wednesday, November 7, 2018 11:05 PM

    oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.

    Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.

    varchar(100) -19 having max length 20
    varchar(300) -4 having max length 51
    varchar(500), - 12 havingmax length 94
    varchar(1000) - 2 having max length 86
    varchar(2000) - 2 having max length 115
    up to varchar(4000) - 2 having max length 3999 & 392

    With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.

    The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.

    You should write an article on this, Eddie. Too many people think that NUMERIC(18,0) and NVARCHAR(256) (the defaults of lot of "table designer" software) is OK to use for just about everything.

    And while Knuth was absolutely correct, too many people confuse doing things the right way with "pre-optimization".

    --Jeff Moden

    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

does use of large datatypes where much smaller ones will do, affect performance? – SQLServerCentral Forums (2024)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 5712

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.