sql經典面試10題講解

sql經典面試10題講解

以下題目均來與筆者經歷&網上分享的中高難度sql題。

最后更新 2022/5/2 下午6:23
俊欣 关于数据分析与可视化
预计阅读 10 分钟
分类
資料庫
标签
資料庫

一、提要

作為一名數據工作人員,sql 是日常工作中最常用的數據提取&簡單預處理語言。因為其使用的廣泛性和易學程度也被其他崗位比如產品經理、研發廣泛學習使用,本篇文章主要結合經典面試題,給出通過數據開發面試的 sql 方法與實戰。以下題目均來於筆者經歷&網上分享的中高難度 sql 題。

二、解題思路

  • 簡單--會考察一些 group by & limit 之類的用法,或者平時用的不多的函數比如 rand()類;會涉及到一些表之間的關聯

  • 中等--會考察一些窗口函數的基本用法;會有表之間的關聯,相對 tricky 的地方在於會有一些自關聯的使用

  • 困難--會有中位數或者更加複雜的取數概念,可能要求按照某特定要求生成列;一般這種題建中間表會解得清晰些

三、sql 真題

第一題

  • order 訂單表,欄位為:goods_id, amount ;

  • pv 瀏覽表,欄位為:goods_id,uid;

  • goods 按照總銷售金額排序,分成 top10,top10~top20,其他三組

求每組商品的瀏覽用戶數(同組內同一用戶只能算一次)

create table if not exists test.nil_goods_category as
select goods_id
,case when nn<= 10 then 'top10'
      when nn<= 20 then 'top10~top20'
      else 'other' end as goods_group
from
(
    select goods_id
    ,row_number() over(partition by goods_id order by sale_sum desc) as nn
    from
    (
        select goods_id,sum(amount) as sale_sum
        from order
        group by 1
    ) aa
) bb;
select b.goods_group,count(distinct a.uid) as num
from pv a
left join test.nil_goods_category b
on a.goods_id = b.goods_id
group by 1;

第二題

商品活動表 goods_event,g_id(有可能重複),t1(開始時間),t2(結束時間)

給定時間段(t3,t4),求在時間段內做活動的商品數

1.

select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
or (t2>=t3 and t2<=t4)
select count(distinct g_id) as event_goods_num
from goods_event
where (t1<=t4 and t1>=t3)
union all

第三題

商品活動流水錶,表名為 event,欄位:goods_id, time;

求參加活動次數最多的商品的最近一次參加活動的時間

select a.goods_id,a.time
from event a
inner join
(
    select goods_id,count(*)
    from event
    group by gooods_id
    order by count(*) desc
    limit 1
) b
on a.goods_id = b.goods_id
order by a.goods_id,a.time desc

第四題

用戶登錄的 log 數據,劃定 session,同一個用戶一個小時之內的登錄算一個 session;

生成 session 列

drop table if exists koo.nil_temp0222_a2;
create table if not exists koo.nil_temp0222_a2 as
select *
    ,row_number() over(partition by userid order by inserttime) as nn1
from
(
    select a.*
    ,b.inserttime as inserttime_aftr
    ,datediff(b.inserttime,a.inserttime) as session_diff
  from
  (
    select userid,inserttime
      ,row_number() over(partition by userid order by inserttime asc) nn
    from koo.nil_temp0222
    where userid = 1900000169
  ) a
  left join
  (
     select userid,inserttime
      ,row_number() over(partition by userid order by inserttime asc) nn
    from koo.nil_temp0222
    where userid = 1900000169
  ) b
  on a.userid =  b.userid and a.nn = b.nn-1
) aa
where session_diff >10 or nn = 1
order by userid,inserttime;

drop table if exists koo.nil_temp0222_a2_1;
create table if not exists koo.nil_temp0222_a2_1 as
select a.*
,case when b.nn is null then a.nn+3 else b.nn end as nn_end
from koo.nil_temp0222_a2 a
left join koo.nil_temp0222_a2 b
on a.userid = b.userid
and a.nn1 = b.nn1 - 1;

select a.*,b.nn1 as session_id
from
(
  select userid,inserttime
    ,row_number() over(partition by userid order by inserttime asc) nn
  from koo.nil_temp0222
  where userid = 1900000169
) a
left join koo.nil_temp0222_a2_1 b
on a.userid = b.userid
and a.nn>=b.nn
and a.nn<b.nn_end

第五題

訂單表,欄位有訂單編號和時間;

取每月最後一天的最後三筆訂單

select *
from
(
  select *
  ,rank() over(partition by mm order by dd desc) as nn1
  ,row_number() over(partition by mm,dd order by inserttime desc) as nn2
  from
  (select cast(right(to_date(inserttime),2) as int) as dd,month(inserttime) as mm,userid,inserttime
  from koo.nil_temp0222) aa
) bb
where nn1 = 1 and nn2<=3;

第六題

資料庫表 tourists,記錄了某個景點 7 月份每天來訪遊客的數量如下:

id date visits 1 2017-07-01 100 …… 非常巧,id 欄位剛好等於日期裡面的幾號。

現在請篩選出連續三天都有大於 100 天的日期。

上面例子的輸出為:date 2017-07-01 ……

select a.*,b.num as num2,c.num as num3
from table  a
left join table b
on a.userid = b.userid
and a.dt = date_add(b.dt,-1)
left join table c
on a.userid = c.userid
and a.dt = date_add(c.dt,-2)
where b.num>100
and a.num>100
and c.num>100

第七題

現有 a 表,有 21 個列,第一列 id,剩餘列為特徵欄位,列名從 d1-d20,共 10w 條數據!

另外一個表 b 稱為模式表,和 a 表結構一樣,共 5w 條數據

請找到 a 表中的特徵符合 b 表中模式的數據,並記錄下相對應的 id

有兩種情況滿足要求:

  • 每個特徵列都完全匹配的情況下
  • 最多有一個特徵列不匹配,其他 19 個特徵列都完全匹配,但哪個列不匹配未知
select aa.*
from
(
  select *,concat(d1,d2,d3……d20) as mmd
  from table
) aa
left join
(
  select id,concat(d1,d2,d3……d20) as mmd
  from table
) bb
on aa.id = bb.id
and aa.mmd = bb.mmd
select a.*,sum(d1_jp,d2_jp……,d20_jp) as same_judge
from
(
  select a.*
  ,case when a.d1 = b.d1 then 1 else 0 end as d1_jp
  ,case when a.d2 = b.d2 then 1 else 0 end as d2_jp
  ,case when a.d3 = b.d3 then 1 else 0 end as d3_jp
  ,case when a.d4 = b.d4 then 1 else 0 end as d4_jp
  ,case when a.d5 = b.d5 then 1 else 0 end as d5_jp
  ,case when a.d6 = b.d6 then 1 else 0 end as d6_jp
  ,case when a.d7 = b.d7 then 1 else 0 end as d7_jp
  ,case when a.d8 = b.d8 then 1 else 0 end as d8_jp
  ,case when a.d9 = b.d9 then 1 else 0 end as d9_jp
  ,case when a.d10 = b.d10 then 1 else 0 end as d10_jp
  ,case when a.d20 = b.d20 then 1 else 0 end as d20_jp
  ,case when a.d11 = b.d11 then 1 else 0 end as d11_jp
  ,case when a.d12 = b.d12 then 1 else 0 end as d12_jp
  ,case when a.d13 = b.d13 then 1 else 0 end as d13_jp
  ,case when a.d14 = b.d14 then 1 else 0 end as d14_jp
  ,case when a.d15 = b.d15 then 1 else 0 end as d15_jp
  ,case when a.d16 = b.d16 then 1 else 0 end as d16_jp
  ,case when a.d17 = b.d17 then 1 else 0 end as d17_jp
  ,case when a.d18 = b.d18 then 1 else 0 end as d18_jp
  ,case when a.d19 = b.d19 then 1 else 0 end as d19_jp
  from table a
  left join table b
  on a.id = b.id
) aa
where sum(d1_jp,d2_jp……,d20_jp) = 19

第八題

我們把用戶對商品的評分用稀疏向量表示,保存在資料庫表 t 裡面:

  • t 的欄位有:uid,goods_id,star。uid 是用戶 id
  • goodsid 是商品 id = star 是用戶對該商品的評分,值為 1-5

現在我們想要計算向量兩兩之間的內積,內積在這裡的語義為:

對於兩個不同的用戶,如果他們都對同樣的一批商品打了分,那麼對於這裡面的每個人的分數乘起來,並對這些乘積求和。

例子,資料庫表里有以下的數據:

U0 g0 2
U0 g1 4
U1 g0 3
U1 g1 1

計算後的結果為:

U0 U1 23+41=10 ……

select aa.uid1,aa.uid2
,sum(star_multi) as result
from
(
  select a.uid as uid1
  ,b.uid as uid2
  ,a.goods_id
  ,a.star * b.star as star_multi
  from t a
  left join t b
  on a.goods_id = b.goods_id
  and a.udi<>b.uid
) aa
group by 1,2
select uid1,uid2,sum(multiply) as result
from
(select t.uid as uid1, t.uid as uid2, goods_id,a.star*star as multiply
from a left join b
on a.goods_id = goods_id
and a.uid<>uid) aa
group by goods

第九題

給出一堆數和頻數的表格,統計這一堆數中位數

select a.*
,b.s_mid_n
,c.l_mid_n
,avg(b.s_mid_n,c.l_mid_n)
from
(
  select
  case when mod(count(*),2) = 0 then count(*)/2 else (count(*)+1)/2 end as s_mid
  ,case when mod(count(*),2) = 0 then count(*)/2+1 else (count(*)+1)/2 end as l_mid
  from table
) a
left join
(
  select id,num,row_number() over(partition by id order by num asc) nn
  from table
) b
on a.s_mid = b.nn
left join
(
  select id,num,row_number() over(partition by id order by num asc) nn
  from table
) c
on a.l_mid = c.nn

第十題

表 order 有三個欄位,店鋪 id,訂單時間,訂單金額

查詢一個月內每周都有銷量的店鋪

select distinct credit_level
from
(
  select credit_level,count(distinct nn) as number
  from
  (
    select userid,credit_level,inserttime,month(inserttime) as mm
    ,weekofyear(inserttime) as week
    ,dense_rank() over(partition by credit_level,month(inserttime) order by weekofyear(inserttime) asc) as nn
    from koo.nil_temp0222
    where substring(inserttime,1,7) = '2019-12'
    order by credit_level ,inserttime
  ) aa
  group by 1
) bb
where number = (select count(distinct weekofyear(inserttime))
from koo.nil_temp0222
where substring(inserttime,1,7) = '2019-12')
Keep Exploring

延伸阅读

更多文章