Query StackOverflow with SQL
Query StackOverflow with SQL and StackExchange Data Explorer.
Thanks to my friend Glenn, for the tip that StackExchange has a UI where you can query all SE sites like StackOverflow, Ask Ubuntu, and more (about 100 sites).
- Create an account with StackExchange Data Explorer here using OpenID.
- Click on
Compose Query
to see the UI where to enter a query. It shows the database schema on the right. - Enter a name and description (optional) for the query
- Compose the SQL query and click
Run Query
Database Schema
It’s not very intuitive to understand the schema, even as a StackOverflow user. I assumed that all answers belonged to the Comments table.
This the structure of Posts:
- All questions and answers are in the Posts table
- These are identified using
PostTypeId
(see tablePostTypes
)
- These are identified using
- In StackOverflow, the comments that you see under questions or answers are in the Comments table
It’s easier to understand the schema by looking at the ERD and querying individual tables.
Find all PostTypes
SELECT * FROM PostTypes
Questions have a PostTypeId
of 1
and answers are 2
.
Find all Tags
There are a lot of tags
SELECT COUNT(Tags.Id) FROM Tags
Find top 20 Tags with most count
SELECT TOP 20 Tags.Name FROM Tags ORDER BY Tags.Count DESC
The UI notifies if you made a SQL or schema mistake. The above query shows Invalid column name 'Name'
. Correct to:
SELECT TOP 20 Tags.TagName, Tags.Count FROM Tags ORDER BY Tags.Count DESC
These are the first 5 rows:
javascript | 2470759
python | 2097773
java | 1885215
c# | 1579042
php | 1454705
What are badges?
SELECT DISTINCT Badges.Name, COUNT(Badges.Name) AS Total FROM Badges GROUP BY Badges.Name ORDER BY Total DESC
If the queries start to get longer, then use your preferred best practice for indentation.
SELECT
DISTINCT TOP 10 Badges.Name
, COUNT(Badges.Name) AS Total
FROM
Badges
GROUP BY
Badges.Name
ORDER BY
Total DESC
These are the first 3 results:
Popular Question | 6988567
Notable Question | 3460033
Editor | 3292748
Find the top 10 users by most reputation
SELECT
TOP 10 Users.Id AS [User Link]
, Users.Reputation
FROM
Users
ORDER BY
Users.Reputation DESC
As seen in the documentation help here, you can add special links called “magic columns” to access content on the site, for example, [Post Link]
, [User Link]
, [Comment Link]
. Use the Id
attribute of the corresponding table.
The top 3 results are:
User | Reputation
Jon Skeet | 1382399
Gordon Linoff | 1224614
VonC | 1185445
Find users with the most down votes…sorry :(
SELECT TOP 10 Users.Id as [User Link], Users.DownVotes FROM Users ORDER BY Users.DownVotes DESC
Find the top 10 users by most reputation and location
SELECT
TOP 10 Users.Id AS [User Link]
, Users.Reputation
FROM
Users
WHERE
Users.Location LIKE ##Location:string##
ORDER BY
Users.Reputation DESC
-- Location: Enter a city or country
This query uses SQL syntax specific to this UI as seen in the docs here. Enable user input with ##Variable##
, optionally add the data type int
, float
, or string
. Create a comment with the variable name with more information (field label) about the variable. This comment can be added at the end of the query.
For Location enter Miami
. These are top 3 results.
User | Reputation
Eduardo... | 13402
tik27 | 2698
codeSF | 1152
Join Tables
Find users with the most answers by location
SELECT
TOP 10 Users.Id AS [User Link]
, Users.AboutMe AS About
, COUNT(Posts.Id) AS Answer_Count
FROM
Users
JOIN Posts ON Posts.OwnerUserId = Users.Id
WHERE
Posts.PostTypeId = 2
AND Users.Location LIKE ##Location:string##
GROUP BY
Users.Id
ORDER BY
Answer_Count DESC
-- Location: Enter city, state, or country
Find users by location and tag
Run my query directly in Data Explorer: Find users by location and tag in StackOverflow
More queries
Go to my profile in StackExchange Data Explorer to see the list of queries I made.