\documentclass[11pt]{article}
\usepackage{hyperref}
\usepackage[margin=0.75in]{geometry}
\begin{document}
\title{LING 575K HW2}
\date{\vspace{-0.2in}Due 11PM on Apr 15, 2021}
\maketitle
In this assignment, you will answer some written questions about and then implement word2vec; in particular, the method \emph{skip-gram with negative sampling (SGNS)}. By doing so you will:
\begin{itemize}
\item Count parameters
\item Take derivatives of a loss
\item Translate mathematics into implemented code
\item Train your own set of word vectors and briefly analyze them
\end{itemize}
We \textbf{strongly recommend} doing this assignment in order. Your answers in the written portion will make your implementation much easier, especially for the gradient computations.
\section{Understanding Word2Vec [30 pts]}
\noindent {\bf Q1: Parameters [3 pts]} How many parameters are there in the SGNS model? Write your answer in terms of $V$ (the vocabulary) and $d_e$, the embedding dimension. [Hint: one parameter is \emph{a single real number}.]
\vspace{2em}
\noindent {\bf Q2: Sigmoid [7 pts]} Sigmoid is the logistic curve $\sigma(x) = \frac{1}{1+e^{-x}}$.
\begin{itemize}
\item What is the range of $\sigma(x)$? [1 pt]
\item How is it used in the SGNS model? [2 pts]
\item Compute $\frac{d\sigma}{dx}$; show your work. [Hint: write your final answer in terms of $\sigma(x)$.] [5pts]
\end{itemize}
\vspace{2em}
\noindent {\bf Q3: Loss function's gradients [20 pts]} In the slides for lecture 3, we saw that the total loss for one positive example and $k$ negative examples is given by:
$$ L_{CE} = -\log P(1 | w, c_+) - \sum_{i=1}^k \log P(0 | w, c_{-i})$$
In what follows, where $x$ is a vector and $f$ a function of $x$ and possibly more variables, we will define $\nabla_x f := \langle \frac{\partial f}{\partial x_1} , \frac{\partial f}{\partial x_2}, \dots , \frac{\partial f}{\partial x_n} \rangle$.
\begin{itemize}
\item Rewrite this loss in terms of the parameter matrices $E$ and $C$ (i.e. replace the $P(\cdot)$s with the definition of the model). [2 pts]
Use $w$ as the integer index of the target word, $c_+$ as the integer index of the positive context word, and $c_{-i}$ as the integer index of the $i$th negative sampled context word.
\item Using the chain rule, compute $\frac{d}{dx} -\log\sigma(x)$. [Hint: first, show that $\sigma(x) = \frac{e^x}{e^x+1}$. Note: $\log$ here is the natural logarithm, i.e. logarithm with base $e$.] [4 pts]
\item Show that $\nabla_x x \cdot y = y$ (where $x \cdot y$ is the dot product of two vectors). [2 pts]
\item Compute $\nabla_{C_{c_+}} L_{CE}$. [4 pts]
\item Compute $\nabla_{C_{c_{-i}}} L_{CE}$. [4 pts]
\item Compute $\nabla_{E_w} L_{CE}$. [4 pts]
\end{itemize}
\section{Implementing Word2Vec [45 pts]}
Before getting started, a few notes on the implementation:
\begin{itemize}
\item Always start with small data! To test various components of the pipeline, you can use the toy files in \texttt{/dropbox/20-21/575k/data/}.
\item All files referenced here are in \texttt{/dropbox/20-21/575k/hw2} on patas.
\item The main training loop is at the bottom of \texttt{word2vec.py}. You do not have to touch this, but can read it to see how the various components you implement are being used.
\item This implementation uses a Vocabulary class, as implemented in HW1. We will make a reference implementation available for use on Monday morning (after the late submission deadline); until then, you can use your own, by placing vocabulary.py in the same directory as your copy of the files for this assignment.
\end{itemize}
\vspace{2em}
\noindent {\bf Q1: Data generation [10 pts]} In \texttt{data.py}
\begin{itemize}
\item Implement \texttt{get\_positive\_samples}, which generates positive examples from a list of tokens. [7 pts]
\item Implement \texttt{negative\_samples}, which samples negative context words. [Hint: \texttt{random.choices} is your friend.] [3 pts]
\end{itemize}
\vspace{2em}
\noindent {\bf Q2: Model computation [10 pts]} In \texttt{word2vec.py}
\begin{itemize}
\item Implement \texttt{SGNS.forward}. This represents one ``forward pass'' of the skip-gram with negative sampling model, i.e. this computes $P(1 | w, c)$. Note: use \texttt{self.embeddings} and \texttt{self.context\_embeddings}, which are defined in \texttt{\_\_init\_\_}.
\end{itemize}
\vspace{2em}
\noindent {\bf Q3: Gradient computation [15 pts]} In \texttt{word2vec.py}, implement the following methods
\begin{itemize}
\item \texttt{get\_positive\_context\_gradient}: this computes $\nabla_{C_{c_+}} L_{CE}$.
\item \texttt{get\_negative\_context\_gradients}: this computes the list of $\nabla_{C_{c_{-i}}} L_{CE}$ for each negative context word $c_{-i}$.
\item \texttt{get\_target\_word\_gradient}: this computes $\nabla_{E_w} L_{CE}$.
\end{itemize}
\vspace{2em}
\noindent {\bf Q4: Train word vectors [10 pts]} Run the main training loop by calling \texttt{word2vec.py} with the following command-line arguments (defined in \texttt{util.py}):
\begin{itemize}
\item 15 epochs
\item Save vectors to a file called vectors.tsv
\item Embedding dimension: 15
\item Learning rate: 0.2
\item Minimum frequency: 5
\item Number of negative samples: 15
\end{itemize}
After that, run \texttt{python analysis.py --save\_vectors vectors.tsv --save\_plot vectors.png}. This will take your saved vectors and produce a plot with the vectors (after using PCA to reduce dimensionality to 2) of a select choice of words. In your readme file, please include:
\begin{itemize}
\item The total run-time of your training loop. This will be printed by the main script.
\item The generated plot.
\item Describe in 2-3 sentences any trends that you see in these embeddings.
\end{itemize}
\vspace{2em}
\noindent {\bf Testing your code} In the dropbox folder for this assignment, we have included a file \texttt{test\_all.py} with a few very simple unit tests for the methods that you need to implement. You can verify that your code passes the tests by running \texttt{pytest} from your code's directory, with the course's conda environment activated.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Submission Instructions}
In your submission, include the following:
\begin{itemize}
\item readme.(txt$\mid$pdf) that includes your answers to \S1 as well as Q4 of \S2.
\item \texttt{hw2.tar.gz} containing:
\begin{itemize}
\item run\_hw2.sh
\item word2vec.py
\item data.py
\end{itemize}
\end{itemize}
\end{document}