Showing posts with label tsql. Show all posts
Showing posts with label tsql. Show all posts

Thursday, February 23, 2017

Remove duplicate data from query results

Leave a Comment

I have the following query:

 select      C.ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits'  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      C.ClientID,     C.FirstName + ' ' + C.LastName,     CAST(V.StartDate as date)  having     count(*) > 3  order by     C.ClientID,      CAST(V.StartDate as date) 

which gives the following results (names are fake in case anyone is wondering)

 ClientID   ClientName            VisitDate      Number of Visits  75         Kay Taylor            2016-06-07     4  372         Moses Mcgowan       2016-09-03      4  422         Raven Mckay         2016-03-11      4  422         Raven Mckay         2016-06-14      4  679         Ulysses Booker      2016-01-09      4  696         Timon Turner        2016-07-06      4  1063        Quyn Wall           2016-06-25      4  1142        Garth Moran         2016-11-20      4  1142        Garth Moran         2016-11-21      4  1563        Hedley Gutierrez    2016-01-07      4  1563        Hedley Gutierrez    2016-01-17      4  1563        Hedley Gutierrez    2016-01-21      4  1563        Hedley Gutierrez    2016-01-27      4  1563        Hedley Gutierrez    2016-01-28      4  1563        Hedley Gutierrez    2016-01-30      4  1563        Hedley Gutierrez    2016-02-27      4  1563        Hedley Gutierrez    2016-03-26      4  1563        Hedley Gutierrez    2016-04-06      4  1563        Hedley Gutierrez    2016-04-09      4  1563        Hedley Gutierrez    2016-04-22      4  1563        Hedley Gutierrez    2016-05-06      4  1563        Hedley Gutierrez    2016-05-26      4  1563        Hedley Gutierrez    2016-06-02      4  1563        Hedley Gutierrez    2016-07-14      4  1563        Hedley Gutierrez    2016-07-29      4  1563        Hedley Gutierrez    2016-08-09      7  1563        Hedley Gutierrez    2016-09-01      4  1563        Hedley Gutierrez    2016-09-23      4  1563        Hedley Gutierrez    2016-12-07      4  1636        Kiara Lowery        2016-01-12      4  2917        Cynthia Carr        2016-06-21      4  2917        Cynthia Carr        2016-10-21      4  3219        Alan Monroe         2016-01-02      4  3219        Alan Monroe         016-02-27       4  3219        Alan Monroe         2016-09-01      5  4288        Natalie Mitchell    2016-03-19      4 

How can I get the results to show only the ClientID and ClientName once so the results are like this?

 ClientID   ClientName            VisitDate      Number of Visits  75         Kay Taylor            2016-06-07     4  372         Moses Mcgowan       2016-09-03      4  422         Raven Mckay         2016-03-11      4                                  2016-06-14      4  679         Ulysses Booker      2016-01-09      4  696         Timon Turner        2016-07-06      4  1063        Quyn Wall           2016-06-25      4  1142        Garth Moran         2016-11-20      4                                  2016-11-21      4  1563        Hedley Gutierrez    2016-01-07      4                                  2016-01-17      4                                  2016-01-21      4                                  2016-01-27      4                                  2016-01-28      4                                  2016-01-30      4                                  2016-02-27      4                                  2016-03-26      4                                  2016-04-06      4                                  2016-04-09      4                                  2016-04-22      4                                  2016-05-06      4                                  2016-05-26      4                                  2016-06-02      4                                  2016-07-14      4                                  2016-07-29      4                                  2016-08-09      7                                  2016-09-01      4                                  2016-09-23      4                                  2016-12-07      4  1636        Kiara Lowery        2016-01-12      4  2917        Cynthia Carr        2016-06-21      4                                  2016-10-21      4  3219        Alan Monroe         2016-01-02      4  3219                            016-02-27       4                                  2016-09-01      5  4288        Natalie Mitchell    2016-03-19      4 

6 Answers

Answers 1

Actually, what you want is not to remove duplicates, but not display them.

In order to do this you can use a CASE statement with ROW_NUMBER() and show the value on the 1st row and display either NULL or '' on the ELSE branch (the other rows):

select     CASE        WHEN ROW_NUMBER() OVER (PARTITION BY C.ClientID ORDER BY CAST(V.StartDate as date) ASC) = 1             THEN C.ClientID        ELSE NULL    END as ClientID,    CASE         WHEN ROW_NUMBER() OVER (PARTITION BY C.ClientID ORDER BY CAST(V.StartDate as date) ASC) = 1             THEN C.FirstName + ' ' + C.LastName        ELSE NULL     END as ClientName,    CAST(V.StartDate as date) as VisitDate,    count(*) as 'Number of Visits' from    Visit V Inner Join Client C on    V.ClientID = C.ClientID group by     C.ClientID,    C.FirstName + ' ' + C.LastName,    CAST(V.StartDate as date) having    count(*) > 3 order by    C.ClientID,     CAST(V.StartDate as date) 

Answers 2

Try this:

DECLARE @Table TABLE (ClientId NVARCHAR(5), ClientName NVARCHAR(6), VisitDate DATE, NumOfVisits INT)  INSERT INTO @Table VALUES ('75' , 'A_NAME' , '2016-06-07' , '4' ),('372' , 'B_NAME' , '2016-09-03' , '4' ),   ('422' , 'C_NAME' , '2016-03-11' , '4' ),('500' , 'D_NAME' , '2016-03-15' , '4'),   ('500' , 'D_NAME' , '2016-03-19' , '4' ),('500' , 'D_NAME' , '2016-03-20' , '4'),   ('500' , 'D_NAME' , '2016-07-15' , '4' ),('500' , 'D_NAME' , '2016-09-13' , '4'),   ('600' , 'E_NAME' , '2016-03-19' , '4' ),('600' , 'E_NAME' , '2016-03-20' , '4'),   ('600' , 'E_NAME' , '2016-07-15' , '4' ),('600' , 'E_NAME' , '2016-09-13' , '4')  ;WITH A AS ( SELECT ROW_NUMBER() OVER(PARTITION BY ClientID ORDER BY ClientID) row_id,* FROM (                       ----------------------------------------- SELECT * FROM @Table --- replace this line with your query----                      -----------------------------------------   ) Main_Result ) SELECT ISNULL(BB.ClientID,'')ClientID,ISNULL(BB.ClientName,'')ClientName,AA.VisitDate,AA.NumOfVisits FROM A AA LEFT JOIN (SELECT * FROM A BB WHERE BB.row_id=1) BB ON AA.ClientID = BB.ClientID AND AA.row_id =BB.row_id              ORDER BY CONVERT(INT,AA.ClientID) 

Hope this helps. :)

you can execute this directly to get sample result from sample data. :)

Answers 3

have you tried using distinct in your query: https://www.w3schools.com/sql/sql_distinct.asp

Answers 4

You can solved this with GROUP BY, grouping by (ClientID, VisitDate).

See the response 1097 here: Using group by on multiple columns

Note: In your ORDER BY is not necessary to use CAST(V.StartDate as date) you can use VisitDate because it exists in your SELECT: ... CAST(V.StartDate as date) as VisitDate,

EDIT: Try this:

SELECT       C.ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits'  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      (C.ClientID, VisitDate)  having     count(*) > 3  order by     C.ClientID,      VisitDate 

Answers 5

I hope the below query would do that.... :)

WITH CTE AS  (  select top 100 percent     cast(C.ClientID as nvarchar(255)) as ClientID,     C.FirstName + ' ' + C.LastName as ClientName,     CAST(V.StartDate as date) as VisitDate,     count(*) as 'Number of Visits',     row_number() over (partition by C.ClientID,C.FirstName + ' ' + C.LastName  ORDER BY CAST(V.StartDate as date) ) as rw_num  from     Visit V  Inner Join Client C on     V.ClientID = C.ClientID  group by      C.ClientID,     C.FirstName + ' ' + C.LastName,     CAST(V.StartDate as date)  having     count(*) > 3  order by     min(C.ClientID),      min(CAST(V.StartDate as date))  )  select case         when rw_num<>1 then '' else  ClientID end as ClientID,        case        when rw_num<>1 then '' else  ClientName end as ClientName,        VisitDate, [Number of Visits] from CTE 

Result:

enter image description here

My test Data in my test tables:

enter image description here

Answers 6

I would use your initial query as a CTE or as subquery to replace #TMP_DATA. Here is what I would do:

SELECT CASE WHEN CAST( LAG(T.ClientID,1,'') OVER (PARTITION BY T.ClientID ORDER BY T.ClientID,T.VisitDate) AS VARCHAR ) = T.ClientID THEN '' ELSE CAST(T.ClientID AS VARCHAR) END AS ClientID, CASE WHEN LAG(T.ClientName,1,'') OVER (PARTITION BY T.ClientID ORDER BY T.ClientID,T.VisitDate) = T.ClientName THEN '' ELSE T.ClientName END ClientName, T.VisitDate, T.[Number of Visits] FROM #TMP_DATA AS T

The result set is: enter image description here

Read More

Monday, February 6, 2017

Import error using Openrowset

Leave a Comment

I am trying to import a column with numeric and alphanumeric values but when I run the openrowset procedure it imports the numeric rows properly but for alphanumeric values it defaults to null.

Table A

ID,A,B,C 1,12,hh,i 2,ab12,tt,b 3,2,aa,o 4,bc12,ee,l 

Code used

SELECT      *  FROM      OPENROWSET         (             'Microsoft.ACE.OLEDB.12.0','Text;Database=C:\;IMEX=1;','SELECT * FROM abc.csv'         ) t 

I used IMEX =1 and no change at all.

2 Answers

Answers 1

The problem cause is the Oledb provider

When importing csv file or excel files with mixed data types column it will replace non dominant types by null. (Using Oledb or Ace.Oledb)

Workarounds

You can do some workaround by adding a first row that contain string values then removing it after impirting is finish

ID,A,B,C 0,a,a,a 1,12,hh,i 2,ab12,tt,b 3,2,aa,o 4,bc12,ee,l 

And it will fix the issue if using IMEX=1

This will read columns as string and ID column as number. (0 is used)

Or add HDR=NO property to connection string so the header is the first line imported (all its values are strings)

Read more about mixed data types in this article

Other methods

Or try to achieve this without aceoledb provider just import csv file in other way like the following:

Using Microsoft Text Driver

SELECT * FROM OPENROWSET('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\;', 'SELECT * FROM abc.csv') 

Using BULK INSERT

CREATE TABLE dbo.MyTable  (ID INTEGER,  A VARCHAR(50),  B VARCHAR(50),  C VARCHAR(50) )  BULK INSERT dbo.MyTable FROM 'C:\abc.csv' WITH    (     FIELDTERMINATOR = ',',      ROWTERMINATOR = '\n'    ) 

Other Details in these articles:

Answers 2

The below example should work. Note that On my system I couldn't use the provider that you use (because I haven't installed this in a while and figuring out what to install is driving me crazy.) The key piece: You need a format file to tell SQL server the expected values in each column. Here's a good blog post using csv in your queries I can't paste the XMl as the version may vary depending on your database:

The easiest way to create the xml file is to:

1) Create a table in the database. You won't be inserting to this. It will just allow you to have the next command create the table:

CREATE TABLE dbo.TestInsert  (ID INTEGER,  A VARCHAR(10),  B VARCHAR(10),  C VARCHAR(10) ) 

2) Next, run the following command line bcp program to generate the appropriate file:

bcp YourDatabase.dbo.TestInsert format nul -c -x -fimport.xml -t, -T 

3) Finally, use import.xml as your format file.

SELECT    *  FROM     OPENROWSET     (         BULK 'c:\test.csv',          FORMATFILE='c:\import.xml',         FIRSTROW=2     ) t 
Read More

Tuesday, June 21, 2016

FileTable External Process Like Zip using T-SQL

Leave a Comment

Has anyone ever tried to use tsql to launch external processes against files in a FileTable? I have not been able to find anything so if may not be possible.

In particular I am looking into PGP and ZIP operations. My backup plan is to use C# in combination with queries against the filetable.

I am curious to know if it can be done with T-SQL instead. I have looked at xpcmdshell to launch a process but many people recommend against this.

Thoughts and ideas are much appreciated.

1 Answers

Answers 1

Can set folder to compressed (or on EFS)?

See http://blog.brucejackson.info/2013/04/sql-file-table-step-by-step.html

Read More

Monday, April 18, 2016

SQL Server - conditional aggregation with correlation

Leave a Comment

Background:

The original case was very simple. Calculate running total per user from highest revenue to lowest:

CREATE TABLE t(Customer INTEGER  NOT NULL PRIMARY KEY                ,"User"   VARCHAR(5) NOT NULL               ,Revenue  INTEGER  NOT NULL);  INSERT INTO t(Customer,"User",Revenue) VALUES (001,'James',500),(002,'James',750),(003,'James',450), (004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150), (007,'Sarah',600),(008,'James',150),(009,'James',100); 

Query:

SELECT *,     1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,     1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)          /SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage FROM t; 

LiveDemo

Output:

╔════╦═══════╦═════════╦════════════╦════════════════════╗ ║ ID ║ User  ║ Revenue ║ percentage ║ running_percentage ║ ╠════╬═══════╬═════════╬════════════╬════════════════════╣ ║  2 ║ James ║     750 ║ 0.38       ║ 0.38               ║ ║  1 ║ James ║     500 ║ 0.26       ║ 0.64               ║ ║  3 ║ James ║     450 ║ 0.23       ║ 0.87               ║ ║  8 ║ James ║     150 ║ 0.08       ║ 0.95               ║ ║  9 ║ James ║     100 ║ 0.05       ║ 1                  ║ ║  7 ║ Sarah ║     600 ║ 0.44       ║ 0.44               ║ ║  5 ║ Sarah ║     500 ║ 0.37       ║ 0.81               ║ ║  6 ║ Sarah ║     150 ║ 0.11       ║ 0.93               ║ ║  4 ║ Sarah ║     100 ║ 0.07       ║ 1                  ║ ╚════╩═══════╩═════════╩════════════╩════════════════════╝ 

It could be calculated differently using specific windowed functions.


Now let's assume that we cannot use windowed SUM and rewrite it:

SELECT c.Customer, c."User", c."Revenue"     ,1.0 * Revenue / NULLIF(c3.s,0) AS percentage     ,1.0 * c2.s    / NULLIF(c3.s,0) AS running_percentage FROM t c CROSS APPLY         (SELECT SUM(Revenue) AS s         FROM t c2         WHERE c."User" = c2."User"             AND c2.Revenue >= c.Revenue) AS c2 CROSS APPLY         (SELECT SUM(Revenue) AS s         FROM t c2         WHERE c."User" = c2."User") AS c3 ORDER BY "User", Revenue DESC; 

LiveDemo

I have used CROSS APPLY because I do not like correlated subqueries in SELECT colums list and c3 is used twice.

Everything work as it should. But when we look closer c2 and c3 are very similiar. So why not combine them and use simple conditional aggregation:

SELECT c.Customer, c."User", c."Revenue"     ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage     ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage FROM t c CROSS APPLY         (SELECT SUM(Revenue) AS sum_total,                 SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END)                  AS sum_running         FROM t c2         WHERE c."User" = c2."User") AS c2 ORDER BY "User", Revenue DESC; 

Unfortunately it is not possible.

Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

Of course I could circumvent it wrapping with another subquery, but it becomes a bit "ugly":

SELECT c.Customer, c."User", c."Revenue"     ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage     ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage FROM t c CROSS APPLY (   SELECT SUM(Revenue) AS sum_total,            SUM(running_revenue) AS sum_running      FROM (SELECT Revenue,                   CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END                    AS running_revenue            FROM t c2            WHERE c."User" = c2."User") AS sub ) AS c2 ORDER BY "User", Revenue DESC 

LiveDemo


Postgresql version. The only difference is LATERAL instead of CROSS APPLY.

SELECT c.Customer, c."User", c.Revenue     ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage      ,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage  FROM t c ,LATERAL (SELECT SUM(Revenue) AS sum_total,                  SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)                   AS running_sum         FROM t c2         WHERE c."User" = c2."User") c2 ORDER BY "User", Revenue DESC; 

SqlFiddleDemo

It works very nice.


SQLite/MySQL version (that is why I preferLATERAL/CROSS APPLY):

SELECT c.Customer, c."User", c.Revenue,     1.0 * Revenue / (SELECT SUM(Revenue)                      FROM t c2                      WHERE c."User" = c2."User") AS percentage,     1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)            FROM t c2           WHERE c."User" = c2."User")  /            (SELECT SUM(c2.Revenue)            FROM t c2            WHERE c."User" = c2."User") AS running_percentage FROM t c ORDER BY "User", Revenue DESC; 

SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL


I've read Aggregates with an Outer Reference:

The source for the restriction is in the SQL-92 standard, and SQL Server inherited it from the Sybase codebase. The problem is that SQL Server needs to figure out which query will compute the aggregate.

I do not search for answers that only show how to circumvent it.

The questions are:

  1. Which part of standard disallow or interfere with it?
  2. Why other RDBMSes do not have problem with this kind of outer dependency?
  3. Do they extend SQL Standard and SQL Server behaves as it should or SQL Server does not implement it fully(correctly?)?.

I would be very grateful for references to:

  • ISO standard (92 or newer)
  • SQL Server Standards Support
  • official documenation from any RDBMS that explains it (SQL Server/Postgresql/Oracle/...).

EDIT:

I know that SQL-92 does not have concept of LATERAL. But version with subqueries (like in SQLite/MySQL) does not work too.

LiveDemo

EDIT 2:

To simplify it a bit, let's check only correlated subquery only:

SELECT c.Customer, c."User", c.Revenue,        1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)               FROM t c2               WHERE c."User" = c2."User")         / (SELECT SUM(c2.Revenue)           FROM t c2           WHERE c."User" = c2."User") AS running_percentage FROM t c ORDER BY "User", Revenue DESC; 

The version above works fine in MySQL/SQLite/Postgresql.

In SQL Server we get error. After wraping it with subquery to "flatten" it to one level it works:

SELECT c.Customer, c."User", c.Revenue,       1.0 * (               SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)               FROM (SELECT c2.Revenue AS r1, c.Revenue r2                     FROM t c2                     WHERE c."User" = c2."User") AS S)  /               (SELECT SUM(c2.Revenue)               FROM t c2               WHERE c."User" = c2."User") AS running_percentage FROM t c ORDER BY "User", Revenue DESC; 

The point of this question is how does SQL standard regulate it.

LiveDemo

2 Answers

Answers 1

There is an easier solution:

SELECT c.Customer, c."User", c."Revenue",        1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,        1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage FROM t c CROSS APPLY      (SELECT SUM(c2.Revenue) AS sum_total,              SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END)                   as sum_running       FROM t c2 CROSS JOIN            (SELECT c.REVENUE) x       WHERE c."User" = c2."User"      ) c2 ORDER BY "User", Revenue DESC; 

I am not sure why or if this limitation is in the SQL '92 standard. I did have it pretty well memorized 20 or so years ago, but I don't recall that particular limitation.

I should note:

  • At the time of the SQL 92 standard, lateral joins were not really on the radar. Sybase definitely had no such concept.
  • Other databases do have problems with outer references. In particular, they often limit the scoping to one level deep.
  • The SQL Standard itself tends to highly political (that is, vendor-driven) rather than driven by actual database user requirements. Well, over time, it does move in the right direction.

Answers 2

There is no such limitation in the SQL standard for LATERAL. CROSS APPLY is a vendor-specific extension from Microsoft (Oracle adopted it later for compatibility) and its limitations are obviously not owed to the ANSI SQL standard, since the MS feature pre-dates the standard.

LATERAL according to ANSI SQL is basically just a modifier for joins to allow lateral references in the join tree. There is no limit to the number of columns that can be referenced.

I wouldn't see a reason for the odd restriction to begin with. Maybe it's because CROSS APPLY was originally intended to allow table-valued functions, which was later extended to allow sub-SELECTs.

The Postgres manual explains LATERAL like this:

The LATERAL key word can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)

The Postgres version of your query (without the more elegant window functions) can be simpler:

SELECT c.*      , round(revenue        / c2.sum_total, 2) END AS percentage       , round(c2.running_sum / c2.sum_total, 2) END AS running_percentage  FROM   t c, LATERAL (    SELECT NULLIF(SUM(revenue), 0)::numeric AS sum_total  -- NULLIF, cast once         , SUM(revenue) FILTER (WHERE revenue >= c.revenue) AS running_sum    FROM   t    WHERE  "User" = c."User"    ) c2 ORDER  BY c."User", c.revenue DESC; 
  • Postgres 9.4+ has the more elegant aggregate FILTER for conditional aggregates.

  • NULLIF was redundant. revenue is defined NOT NULL, the aggregates are guaranteed to find 1 or more rows, and the LATERAL sub-SELECT is joined in a CROSS JOIN, so sum_total cannot be NULL. That was backwards, I thought about COALESCE. NULLIF makes sense, I only suggest a minor simplification.

  • Cast sum_total to numeric once.

  • Round result to match your desired result.

Read More

Sunday, March 6, 2016

Sql geometry polygon direction and crossing the International Date Line (IDL)

Leave a Comment

As you can see in the following, the result of the geography polygon is different from the geometry polygon. What I want is the yellow square, but what I'm getting is the green square. Is there any way to achieve this?

If I create a geography polygon it returns the expected area but with curves, which I don't want. Why is the result of my geometry polygon different from my geography polygon? Is it possible to create a geometry polygon that crosses the dateline?

I know that geography polygons should be anticlockwise. I tried both clockwise and anticlockwise direction for my geometry polygon and the result was the same. How can I set the direction of the geometry polygon?

declare @maxLat  varchar(50);  declare @minLong varchar(50); declare @minLat  varchar(50); declare @maxLong varchar(50);  set @maxLat ='-17.041470974676777' set @minLong ='107.1076781691894' set @minLat ='-41.6281068235708' set @maxLong='-169.2204468308106'   declare @boundingRectGeography varchar(1000) SET @boundingRectGeography = 'POLYGON((' +   @minLong + ' '  + @minLat + ', ' +                                         @maxLong + ' ' + @minLat + ', ' +                                          @maxLong + ' ' + @maxLat + ', ' +                                          @minLong + ' ' + @maxLat + ', ' +                                          @minLong + ' ' + @minLat + '))'  DECLARE @BoundsGeography AS Geography =GEOGRAPHY::STPolyFromText(@boundingRectGeography,4326)  select @BoundsGeography     DECLARE @boundingRectGeometry varchar(1000) SET @boundingRectGeometry = 'POLYGON((' +   @minLong + ' '  + @minLat + ', ' +                                         @maxLong + ' ' + @minLat + ', ' +                                          @maxLong + ' ' + @maxLat + ', ' +                                          @minLong + ' ' + @maxLat + ', ' +                                          @minLong + ' ' + @minLat + '))'  DECLARE @BoundsGeometry AS geometry =geometry::STPolyFromText(@boundingRectGeometry,4326)  select @BoundsGeometry 

map

enter image description here


To give you some background, I have list of landmarks (latitude, longitude) that I want to load on Google Maps. Since there are too many landmarks, I cannot return all of them at once. I need to return the landmarks that are in the areas that are visible to the user, in their viewing boundary.

I'm getting north/west (maximum latitude, mininmum longitude) and south/east (minimum latitude, maximum longitude) of a Google Maps boundary and sending it to my stored procedure to return back the list of the landmarks within that boundary. However, as I explained above I have issues and I need your help.

@dotMorten kindly looked into my issue and suggested to add +360 to my maxLong. I added the following longitude before creating my polygon but it's not working in all scenarios like this following set of latitudes and longitudes:

set @maxLat ='69.00668202899128' set @minLong ='158.5892594884939' set @minLat ='-17.38989745726571' set @maxLong='133.2767594884939'  set @maxLong=convert(VARCHAR(20), convert(float,@maxLong) +360 ) 

enter image description here

enter image description here

1 Answers

Answers 1

To Begin

The difference between a geometry column and a geography column, is that a geometry column is applied to a plane (flat), while a geography is applied to a spheroid (curved).

To explain how these are different: Imagine walking forward 6,225 miles, turning left, walking forward 6,225 miles, turning left, and walking forward 6,225 miles.

On a plane, you have just made a U shape. |_| On the earth, you are back where you started (or relatively close).

If you start at the north pole, walk to the equator, turn left, walk along the equator, turn left, and now are walking north, you will end up back at the north pole.

This is why the determining the distance between 2 GPS coordinates includes a bunch of math, even though distance seems like a fairly simple thing.

Projections

There are a lot of different maps of the earth. There is the one you usually see, there are the one that that are curved around the edges, and there are the ones that have weird shapes. This is because we are trying to draw the surface of a ball on a piece of paper.

Imagine pealing off the surface of a globe and trying to lie it flat. If you want it to be a rectangle, you are going to have take the top and bottom parts, and stretch them, because there just isn't enough material. This is the concept behind the Mercator projection.

Relevant XKCD

Curved Results

When you map results in SQL Server, the default projection is equirectangular. The equirectangular projection draws the earth so that latitude and longitude vary evenly throughout the map, i.e. longitude = x and latitude = y.

When viewing the entire planet, this causes some distortion. Things that are further away from the equator are more affected than things closer, and things further away from (0,0) are affected. Australia fits both of these, and as a result, its shape is distorted.

When you draw a box in that area, SQL Server will, by default, distort the shape. Different projection systems will distort the shapes you see differently. On the right side of the spatial results tab in Management Studio, there will be three drop down boxes; the last drop down box allows you to change your projection. Try flipping between the different options to see how they change.

For Your Question

A rectangular box on a map seems like a simple request, but to translate that into a shape on the surface of a sphere is actually quite complicated for all the same reasons listed above, but in reverse. On a sphere, the shape you are describing is equally distorted. This means you have two options:

  1. Accept the distortion (curviness) of your rectangle.
  2. Describe your rectangle using many points. The more points you use, the closer to a rectangle your shape will look.

    Example: Simple Geography

vs.

Complex Geography

Direction of Points

Once again, your issue here is that you are on a sphere.
For a geography data type, a polygon is any part of the surface of the sphere. This means that it can be the entire sphere except a tiny area, or it can be that tiny area. Direction lets you decide which shape you are describing.

For a geometry data type this does not apply. While longitude=181 is the same as longitude=-179, x=181 is NOT the same as x=-179. This means that for the shape you declare to actually be a shape (rather than an infinite plane with a tiny hole), SQL Server will automatically assume you wanted the square described by those points instead, i.e. the big box. For a geometry data type to create the shape you see in geography, a multipolygon would be necessary.

However, this is simply due to the geography datum. In a geometry data type, the plane extends forever, but in a geography data type it wraps around. This means that if you used a different location for (0,0), the shape would appear whole, rather than in two pieces.

Hopefully this has cleared some stuff up.

Read More