Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (2024)

Phil Factor

23 January 2019

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Phil Factor

23 January 2019

  • SQL Prompt
  • SQL Code Analysis

If you declare a variable-length string , or coerce a string, without specifying its length, you can fall foul of ‘silent’ string truncation. Some developers resort to using the (MAX) specification, which is a mistake too. Phil Factor explains the dangers and then offers a workaround for the problem, when you're importing text and simply don't know the correct length of each string.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

In SQL, if you declare a string in any of the four formats, CHAR, NCHAR, VARCHAR or NVARCHAR, without specifying its length, the string is given the length of one character. If you coerce a string, using CAST or CONVERT, and make the same mistake, it gets a length of thirty characters. Why one and thirty? Historical reasons only, but other RDBMS react in similar ways.

In either case, you can get yourself into trouble with ‘silent’ string truncation, so SQL Prompt has some built-in code analysis rules to warn you if you forget to specify string length both during variable-length string declaration (BP007) and coercion (BP008).

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (3)

Some developers assume that, if they leave the length out, SQL Server will work out what they need. This isn’t as far-fetched as it might seem, because it does so in other contexts, as I’ll go on to demonstrate. Often, the context for this mistake is when you’re importing from a data source where you don’t know the right string length for the columns, and you need to create the destination table. Alternatively, you might end up using MAX as the length, which brings troubles of its own.

Later in the article, I’ll show you a handy trick to get around this. It uses Prompt’s Script as Insert feature to script out the first 1000 rows of a table, and then we persuade SQL Server to assign the correct datatypes and string lengths, by importing the data via a table-value constructor (TVC) within a SELECT INTO statement.

Problems with forgetting the string length

Normally, but not always, your mistake will be quickly apparent.

1

2

3

4

5

6

7

8

CREATE TABLE dbo.Deleteme (MyString VARCHAR NOT null)

INSERT INTO dbo.Deleteme (MyString) VALUES ('first'),('Second'),('Third')

/*

Msg 8152, Level 16, State 14, Line 5

String or binary data would be truncated.

The statement has been terminated.

*/

drop TABLE dbo.Deleteme

This can get more unpleasant if you do the same with a variable, because the string gets truncated to a single character, but there is no error.

1

2

3

DECLARE @MyString VARCHAR ='this ends badly'

SELECT @MyString

--no error. Produces value 't'

The same is true of parameters in a function or procedure …

1

2

3

4

5

6

7

8

9

CREATE FUNCTION dbo.deleteme (@param1 VARCHAR, @param2 CHAR)

RETURNS VARCHAR(100)

AS

BEGIN

RETURN @param1 + @param2;

END;

Go

SELECT dbo.deleteme('this is likely to','go very badly')

--returns 'tg'

And in the returned value of a function:

1

2

3

4

5

6

7

8

9

CREATE FUNCTION dbo.deletemeAlso (@param1 VARCHAR(100), @param2 Varchar(100))

RETURNS VARCHAR

AS

BEGIN

RETURN @param1 + @param2;

END;

GO

SELECT dbo.deletemeAlso('this is likely to','go very badly')

--returns 't'

If you forget to declare the variable string length during a CONVERT or CAST operation, the behavior is different. You get a string of thirty characters.

1

2

3

4

5

6

SELECT Convert(nVARCHAR,'12345678901234567890123456789012345678901234567890' ,113)

/*

------------------------------

123456789012345678901234567890

(1 row affected)

*/

Conventional wisdom at the time said that most database strings were less than thirty characters in length. Even a string that is inevitably less than thirty characters gets the treatment.

1

2

3

4

5

6

SELECT Convert(CHAR,GetDate() ,113)

SELECT system_type_name FROM

sys.dm_exec_describe_first_result_set(

'SELECT Convert(VARCHAR,GetDate() ,113)',

NULL, 1)

--varchar(30)

I’m not sure why anyone would want to use a VARCHAR without specifying its length. It may be a habit caught from procedural coding. Things are different for any RDBMS; you either ensure that strings are stored economically or just airily wave your hand and consign the string to a (MAX) length specification, with all the indexing and performance compromises that entails (these datatypes cannot be specified as index key columns).

Using an alias data type

If you need a default string length as part of your design, you can create an alias data type. This is much safer:

1

2

3

4

5

6

7

--create an alias data type

CREATE TYPE dbo.StringFROM VARCHAR(30) NOT NULL

GO

DECLARE @MyString String ='this ends well'

SELECT @MyString

------------------------------

--this ends well

However, the downside is that, for some reason, you can’t use it in either CAST or CONVERT operations on the grounds that it isn’t a defined system type.

1

2

3

4

5

6

SELECT Convert(dbo.String,GetDate() ,113)

SELECT Cast(GetDate() AS String)

/*Msg 243, Level 16, State 2, Line 40

Type dbo.String is not a defined system type.

Msg 243, Level 16, State 2, Line 41

Type String is not a defined system type.*/

Using a Table-Value Constructor to assign reasonable string lengths during data import

SQL Server can do much better than shrug and give up like this. When it needs to be, it can be very clever. It can, for example, work out the length of the datatype and its nullability in a Table-Value Constructor (TVC) statement like this.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

SELECT name

FROM (VALUES --one to twenty

('Yan'), --in Lincolnshire dialect

('Tyan'),

('Tethera'),

('Methera'),

('Pimp'),

('Sethera'),

('Lethera'),

('Hovera'),

('Dovera'),

('Dik'),

('Yanadik'),

('Tyanadik'),

('Tetheradik'),

('Metheradik'),

('Bumfitt'),

('Yanabumfit'),

('Tyanabumfitt'),

('Tetherabumfitt'),

('Metherabumfitt'),

('Giggot'))f(name)

If you SELECT INTO a table using this TVC, you will be able to see that it has created a VARCHAR column with the length of the longest strings in the values within the column (in this cas, ‘Tetherabumfitt’ and ‘Metherabumfitt’; fourteen characters).

We can verify this easily by SELECT INTO with a temporary table…

1

2

3

4

5

6

SELECT name INTO #MyTemp

FROM (VALUES --one to twenty

('Yan'), --in Lincolnshire dialect

('Tyan'),

...etc ...

('Giggot'))f(name)

…and then checking on the width of the column created…

1

2

3

4

5

SELECT system_type_name FROM

sys.dm_exec_describe_first_result_set('SELECT * FROM #MyTemp',NULL,1)

/*system_type_name

varchar(14)

*/

This suggests that if you need to import from an external text-based source, which contains a tediously-wide table with a lot of strings, then the best way to ensure that you get reasonable-length string datatypes is to use a TVC.

The TVC has a limit of 1000 rows when used in an INSERT INTO…VALUES statement, and you’ll see Error 10738 if you exceed it. However, there is no limit that I can detect when using a TVC in a SELECT INTO statement that uses VALUES, like the one illustrated above.

Let’s try it. Here is a first import of a business directory, spoofed with SQL Data Generator of course. If you want to play along, you can download the build script and .sqlgen file from my Table does not have clustered index (BP021) article. To demonstrate this, we get the first thousand rows into the grid view of SSMS.

1

SELECT TOP 1000 * FROM bigdirectory

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (4)

Now, if you have SQL Prompt, you’ll be bobbing up and down in your chair whilst reading this, because you have a rather cute advantage here. Click the top left square of the grid to highlight the whole lot then right-click and select the Script as Insert option.

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (5)

SQL Prompt uses a TVC in an INSERT INTO <tablename> VALUES statement, which means its subject to the 1000-row limit. If there are more, then Prompt switches to using individual INSERT statements.

We’ll stick to 1000-rows in this case, and adapt this code slightly to use SELECT INTO with a temporary table, from a TVC, thereby escaping the 1000-row limit:

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (6)

…and so on until…

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (7)

Execute the above code, and let’s use the handy sys.dm_exec_describe_first_result_set DMV again, to get the metadata of the resultselt for a SELECT * FROM #temptable query:

1

2

3

4

SELECT name + ' ' + system_type_name

+ CASE WHEN is_nullable = 1 THEN ' NULL' ELSE ' NOT NULL' END

FROM <a id="post-468314-_Hlk535921789"></a>sys.dm_exec_describe_first_result_set('

SELECT * FROM #temptable', NULL, 1);

It’s specified the datatypes, lengths and nullability for each column, as follows (for this to work well, you’ll need some long strings and nulls in your sample rows):

1

2

3

4

5

6

7

8

9

10

11

12

13

14

id int NOT NULL

Name varchar(37) NOT NULL

Address1 varchar(34) NULL

Address2 varchar(23) NULL

Town varchar(22) NOT NULL

City varchar(19) NOT NULL

County varchar(18) NOT NULL

Postcode varchar(8) NOT NULL

Region varchar(22) NOT NULL

BusinessType varchar(55) NULL

Leads varchar(17) NULL

Phone varchar(12) NULL

Fax varchar(12) NULL

Website varchar(56) NULL

I’ve tested this up to 10,000 rows but others have done more. Once you have the right datatypes, length and nullability for all the columns, you can do a bit of a rounding up of the lengths to allow for outliers, create a good tidy table and then use it to import the entire data (four million rows in this experiment).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

CREATE TABLE BusinessDirectory

(

id INT NOT NULL,

Name VARCHAR(40) NOT NULL,

Address1 VARCHAR(40) NULL,

Address2 VARCHAR(50) NULL,

Town VARCHAR(30) NOT NULL,

City VARCHAR(20) NOT NULL,

County VARCHAR(20) NOT NULL,

Postcode VARCHAR(8) NOT NULL,

Region VARCHAR(30) NOT NULL,

BusinessType VARCHAR(60) NULL,

Leads VARCHAR(20) NULL,

Phone VARCHAR(15) NULL,

Fax VARCHAR(15) NULL,

Website VARCHAR(60) NULL

);

INSERT INTO BusinessDirectory

(id, Name, Address1, Address2, Town, City, County, Postcode, Region,

BusinessType, Leads, Phone, Fax, Website)

SELECT id, Name, Address1, Address2, Town, City, County,

Postcode, Region, BusinessType, Leads, Phone, Fax, Website

FROM BigDirectory;

If you dislike this method you can, of course, get the maximum actual lengths easily from the original table in this case.

1

2

3

4

5

SELECT Max(Len(Name)), Max(Len(Address1)), Max(Len(Address2)), Max(Len(Town)),

Max(Len(City)), Max(Len(County)), Max(Len(Postcode)), Max(Len(Region)),

Max(Len(BusinessType)), Max(Len(Leads)), Max(Len(Phone)), Max(Len(Fax)),

Max(Len(Website))

FROM BigDirectory;

However, it is trickier to detect whether the columns are nullable.

When you have imports from external text-based sources, the TVC technique could be handy. The main point, however, is to show you that SQL Server can be very good at detecting the right length and nullability of a string datatype, if it wants to be.

Conclusions

SQL Server requires you to specify the lengths of string datatypes. You might assume that, because it doesn’t object when you leave out the length, it obligingly detects the length for you. No, not at all. If you declare a column as a CHAR, NCHAR, VARCHAR or NVARCHAR without a length, SQL Server reads that as a length of 1. This is such a silly length for any variable-length string that it amounts to unhelpful nonsense, but we are stuck with it.

If you define a variable string without a length, it will exact an even more horrible vengeance on you. Not only will it assume that it has a length of 1 character, it will also silently and discreetly reduce every value assigned to it to one character. SQL Server must have been in a silly mood when it decided on reducing a varchar to one character if you didn’t specify the length, because if you make the same mistake when casting a datatype to an NVARCHAR or VARCHAR, it produces one with a more sensible length of 30.

Always specify a length to any text-based datatype such as NVARCHAR or VARCHAR. Don’t over-use the MAX specification either as the resulting column then can’t be indexed and comes with performance baggage.

Was this article helpful?

Tools in this post

SQL Data Generator

Generate realistic test data for SQL Server databases

Find out more

SQL Prompt

Write, format, and refactor SQL effortlessly

Find out more

SQL Prompt

Write, format, and refactor SQL effortlessly

Find out more

  • Article

    Building reusable table build scripts using SQL Prompt

    You need a fast, general-purpose way to save the results of a query or batch or procedure into any sort of worktable, such as a temporary table or a table variable or table valued parameter. A simple SELECT…INTO isn't versatile enough for these requirements, and the alternative ways to handcraft the list of columns are slow and error prone. Phil Factor shows how to create a 'table-build generator' that will do all this, and save you a lot of time, especially if you use a lot of working tables in your code.

  • Article

    SQL Prompt ALT S Magic

    Use the SQL Prompt Command Palette to find and execute 'hidden' SQL Prompt functionality, plus any of the common SSMS commands, as well as search for database objects.

  • Article

    How to implement a T-SQL code formatting standard using SQL Prompt

    To what extent should a team attempt to impose a uniform SQL coding format? It can sometimes seem like a futile task. All programmers have a preferred formatting style, to which they profess an unwavering attachment. Check the code of ten programmers, though, and you will likely discover at least 15 different formats, since few

  • Article

    SQL Code Analysis from a PowerShell Deployment Script

    This article shows how to define SQL code analysis rules using SQL Prompt and them run them automatically from a PowerShell script, displaying the detected code issues in a handy HTML report.

  • Forums

    SQL Prompt Forum

    Write, format, and refactor SQL effortlessly.

Using a Variable-length Datatype Without Explicit Length: The Whys and Wherefores (BP007/8) | Redgate (2024)
Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 5606

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.