AlphaStar Revisited: Upgrading the State of the Art

Following the Meta

플레이 하다보면 메타를 잘 따라가는 것은 필수라는 생각이 든다.

agent viz scII fullscreen fullscreen_mobile
A visualisation of the AlphaStar agent during game two of the match against MaNa. This shows the game from the agent’s point of view: the raw observation input to the neural network, the neural network’s internal activations, some of the considered actions the agent can take such as where to click and what to build, and the predicted outcome. MaNa’s view of the game is also shown, although this is not accessible to the agent.
fullscreen fullscreen_mobile
알파스타가 진 경기. 나는 이 경기를 보면서 더 개선할 부분을 발견했다.

It is impressive, but is it how we play the game?

Seiok Kim, DRL Researcher
agent viz scII fullscreen fullscreen_mobile
PySC2에서 게이머의 action을 정의하는 방법.
fullscreen fullscreen_mobile
학습 이전 Move2Beacon environment에서 random한 action을 하고 있는 agent의 모습. Green annotated actions mean normal actions. Blue annotated actions mean Patrol command actions. Red annotated actions mean Attack command actions.
fullscreen fullscreen_mobile
학습 이후 수렴한 결과. 처음 agent를 선택하는데 약간의 delay가 있지만 Beacon이 있는 위치를 정확히 클릭하는 action을 보여준다.

play Stolen BATTLECRUISER? - Starcraft 2: Gumiho vs. Dark

Some of the highlights and how we learn new tactics

So how do we make new strategies? One of the most prominant behavior is the act of imitation.

Reaper Control 구현 및 Imitation Learning

다음은 리퍼 컨트롤 agent를 구현한 결과이다. 이를 통해서 더 강한 Agent를 학습시키는 데 사용할 수 있을까?

Bot 구현은 다음을 기본으로 한다.

Basic Build Order

  • First SCV pops - supply
  • Rax-Gas (both on 16)
  • Do not stop producing workers as you add on a 2nd and 3rd rax asap.
  • Right after your 3rd rax, get up a 2nd Gas.
  • When 1st rax finishes, get reaper, make supply depot,
  • when you get 3 reapers (including ones in production), go Orbital Command.
  • Keep on building Reapers until you have a healthy count (6-9) and when you have 400 minerals while doing so go take your natural.
fullscreen fullscreen_mobile
Reaper Control로 Insane Level AI 상대하기.
fullscreen fullscreen_mobile
So what is the tactis here? In chess, AlphaZero first outperformed Stockfish after just 4 hours; in shogi, AlphaZero first outperformed Elmo after 2 hours; and in Go, AlphaZero first outperformed the version of AlphaGo that beat the legendary player Lee Sedol in 2016 after 30 hours. Note: each training step represents 4,096 board positions.

AlphaStar Resources

SC2 replay files from the matches between AlphaStar and Team Liquid’s Grzegorz “MaNa” Komincz, and AlphaStar and Team Liquid’s Dario “TLO” Wunsch.

Camera interface

MaNa v AlphaStar (24 January 2019)

Exhibition game

Raw interface

TLO v AlphaStar (12 December 2018)

Game 1

Game 2

Game 3

Game 4

Game 5

MaNa v AlphaStar (19 December 2018)

Game 1

Game 2

Game 3

Game 4

Game 5

Please note that the raw interface agents weren’t using the camera directly. The 10 replays have therefore been post-processed to add heuristic camera movements, such that the target location of each agent action is visible on screen. This is to make the replays easier to follow from the agent’s perspective.

To load the replays:

  • Install StarCraft II. It is free to play and runs on Windows and Mac.

  • Create the StarCraft Maps directory:

  • Windows: C:\Program Files (x86)\StarCraft II\Maps

  • Mac: /Applications/StarCraft II/Maps

  • Download the map.

  • Move it into the Maps directory.

  • Download the replays above, and open them as usual.

More information on AlphaStar and the matches played between MaNa and TLO can be found on the DeepMind blog.