1. Summary
As a data worker, SQL is the most commonly used data extraction & simple preprocessing language in daily work. Because of its widespread use and easy-to-learn degree, it has also been widely learned and used by other positions such as product managers and R & D. This article mainly combines classic interview questions to give SQL methods and practical practices for interviews through data development. The following questions are all from my experience and medium to high difficulty SQL questions shared online.
2. Problem solving ideas
Simple-we will examine some usages such as group by & limit, or functions that are not commonly used such as rand() class; we will involve some associations between tables
Moderate-the basic usage of some window functions will be examined; there will be associations between tables, but relatively tricky is that there will be some self-associations.
Difficulties-there will be median or more complex retrieval concepts, which may require generating columns according to certain requirements; generally, this type of problem construction will make the solution clearer
3. SQL real questions
the first question
order order table, the fields are: goods_id, amount ;
pv browsing table, the fields are: goods_id, uid;
Goods are sorted according to the total sales amount and are divided into top10,top10~top20, and the other three groups
Find the number of browsing users in each group of products (the same user in the same group can only be counted once)
create table if not exists test.nil_goods_category as
select goods_id
,case when nn<= 10 then 'top10'
when nn<= 20 then 'top10~top20'
else 'other' end as goods_group
from
(
select goods_id
,row_number() over(partition by goods_id order by sale_sum desc) as nn
from
(
select goods_id,sum(amount) as sale_sum
from order
group by 1
) aa
) bb;
select b.goods_group,count(distinct a.uid) as num
from pv a
left join test.nil_goods_category b
on a.goods_id = b.goods_id
group by 1;
the second question
Commodity activity table goods_event, g_id (may be repeated), t1 (start time),t2 (end time)
For a given time period (t3,t4), find the number of products that are active during the time period
1.
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
or (t2>=t3 and t2<=t4)
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
union all
the third question
Commodity activity schedule, table name is event, field: goods_id, time;
Find the most recent event time for the product that participated in the event
select a.goods_id,a.time
from event a
inner join
(
select goods_id,count(*)
from event
group by gooods_id
order by count(*) desc
limit 1
) b
on a.goods_id = b.goods_id
order by a.goods_id,a.time desc
the fourth question
The log data of the user's login defines the session, and the login of the same user within one hour is counted as a session;
Generating session column
drop table if exists koo.nil_temp0222_a2;
create table if not exists koo.nil_temp0222_a2 as
select *
,row_number() over(partition by userid order by inserttime) as nn1
from
(
select a.*
,b.inserttime as inserttime_aftr
,datediff(b.inserttime,a.inserttime) as session_diff
from
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) a
left join
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) b
on a.userid = b.userid and a.nn = b.nn-1
) aa
where session_diff >10 or nn = 1
order by userid,inserttime;
drop table if exists koo.nil_temp0222_a2_1;
create table if not exists koo.nil_temp0222_a2_1 as
select a.*
,case when b.nn is null then a.nn+3 else b.nn end as nn_end
from koo.nil_temp0222_a2 a
left join koo.nil_temp0222_a2 b
on a.userid = b.userid
and a.nn1 = b.nn1 - 1;
select a.*,b.nn1 as session_id
from
(
select userid,inserttime
,row_number() over(partition by userid order by inserttime asc) nn
from koo.nil_temp0222
where userid = 1900000169
) a
left join koo.nil_temp0222_a2_1 b
on a.userid = b.userid
and a.nn>=b.nn
and a.nn<b.nn_end
the fifth question
Order table, fields with order number and time;
Pick up the last three orders on the last day of the month
select *
from
(
select *
,rank() over(partition by mm order by dd desc) as nn1
,row_number() over(partition by mm,dd order by inserttime desc) as nn2
from
(select cast(right(to_date(inserttime),2) as int) as dd,month(inserttime) as mm,userid,inserttime
from koo.nil_temp0222) aa
) bb
where nn1 = 1 and nn2<=3;
the sixth problem
The database table Torres records the number of tourists visiting a certain attraction every day in July as follows:
id date visits 1 2017-07-01 100... Coincidentally, the id field exactly equals the number in the date.
Now please filter out dates that are greater than 100 days for three consecutive days.
The output of the above example is: date 2017-07-01...
select a.*,b.num as num2,c.num as num3
from table a
left join table b
on a.userid = b.userid
and a.dt = date_add(b.dt,-1)
left join table c
on a.userid = c.userid
and a.dt = date_add(c.dt,-2)
where b.num>100
and a.num>100
and c.num>100
Question 7
The existing Table A has 21 columns. The first column is id, and the rest are listed as characteristic fields. The column names range from d1 to d20, with a total of 10W pieces of data!
The other table B is called the pattern table, and has the same structure as table A, with a total of 5W pieces of data.
Please find the data whose characteristics in Table A match the pattern in Table B, and record the corresponding id
There are two situations that meet the requirements:
- When every feature column matches exactly
- At most one feature column does not match, and the other 19 feature columns all match exactly, but which column does not match is unknown
select aa.*
from
(
select *,concat(d1,d2,d3……d20) as mmd
from table
) aa
left join
(
select id,concat(d1,d2,d3……d20) as mmd
from table
) bb
on aa.id = bb.id
and aa.mmd = bb.mmd
select a.*,sum(d1_jp,d2_jp……,d20_jp) as same_judge
from
(
select a.*
,case when a.d1 = b.d1 then 1 else 0 end as d1_jp
,case when a.d2 = b.d2 then 1 else 0 end as d2_jp
,case when a.d3 = b.d3 then 1 else 0 end as d3_jp
,case when a.d4 = b.d4 then 1 else 0 end as d4_jp
,case when a.d5 = b.d5 then 1 else 0 end as d5_jp
,case when a.d6 = b.d6 then 1 else 0 end as d6_jp
,case when a.d7 = b.d7 then 1 else 0 end as d7_jp
,case when a.d8 = b.d8 then 1 else 0 end as d8_jp
,case when a.d9 = b.d9 then 1 else 0 end as d9_jp
,case when a.d10 = b.d10 then 1 else 0 end as d10_jp
,case when a.d20 = b.d20 then 1 else 0 end as d20_jp
,case when a.d11 = b.d11 then 1 else 0 end as d11_jp
,case when a.d12 = b.d12 then 1 else 0 end as d12_jp
,case when a.d13 = b.d13 then 1 else 0 end as d13_jp
,case when a.d14 = b.d14 then 1 else 0 end as d14_jp
,case when a.d15 = b.d15 then 1 else 0 end as d15_jp
,case when a.d16 = b.d16 then 1 else 0 end as d16_jp
,case when a.d17 = b.d17 then 1 else 0 end as d17_jp
,case when a.d18 = b.d18 then 1 else 0 end as d18_jp
,case when a.d19 = b.d19 then 1 else 0 end as d19_jp
from table a
left join table b
on a.id = b.id
) aa
where sum(d1_jp,d2_jp……,d20_jp) = 19
Question 8
We represent the user's rating of the product as a sparse vector and save it in the database table t:
- The fields of t are: uid, goods_id, star. uid is the user id
- goodsid is product id = star is the user's rating for the product, with values ranging from 1 to 5
Now we want to calculate the inner product between pairs of vectors. The semantics of the inner product here are:
For two different users, if they both score the same batch of products, then the scores for each person in the list are multiplied and the products are summed up.
For example, the database table contains the following data:
U0 g0 2
U0 g1 4
U1 g0 3
U1 g1 1
The calculated result is:
U0 U1 23+41=10 ……
select aa.uid1,aa.uid2
,sum(star_multi) as result
from
(
select a.uid as uid1
,b.uid as uid2
,a.goods_id
,a.star * b.star as star_multi
from t a
left join t b
on a.goods_id = b.goods_id
and a.udi<>b.uid
) aa
group by 1,2
select uid1,uid2,sum(multiply) as result
from
(select t.uid as uid1, t.uid as uid2, goods_id,a.star*star as multiply
from a left join b
on a.goods_id = goods_id
and a.uid<>uid) aa
group by goods
Question 9
Give a table of numbers and frequencies, and count the median of this pile
select a.*
,b.s_mid_n
,c.l_mid_n
,avg(b.s_mid_n,c.l_mid_n)
from
(
select
case when mod(count(*),2) = 0 then count(*)/2 else (count(*)+1)/2 end as s_mid
,case when mod(count(*),2) = 0 then count(*)/2+1 else (count(*)+1)/2 end as l_mid
from table
) a
left join
(
select id,num,row_number() over(partition by id order by num asc) nn
from table
) b
on a.s_mid = b.nn
left join
(
select id,num,row_number() over(partition by id order by num asc) nn
from table
) c
on a.l_mid = c.nn
Question 10
The order table has three fields, store ID, order time, and order amount
Check the stores that have sales every week for a month
select distinct credit_level
from
(
select credit_level,count(distinct nn) as number
from
(
select userid,credit_level,inserttime,month(inserttime) as mm
,weekofyear(inserttime) as week
,dense_rank() over(partition by credit_level,month(inserttime) order by weekofyear(inserttime) asc) as nn
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12'
order by credit_level ,inserttime
) aa
group by 1
) bb
where number = (select count(distinct weekofyear(inserttime))
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12')