\documentclass[11pt]{article}
\usepackage{hyperref}
\usepackage[margin=0.75in]{geometry}

\begin{document}

\title{LING 575K HW2}
\date{\vspace{-0.2in}Due 11PM on Apr 14, 2022}
\maketitle


In this assignment, you will answer some written questions about and then implement word2vec; in particular, the method \emph{skip-gram with negative sampling (SGNS)}.  By doing so you will:
\begin{itemize}
  \item Count parameters
  \item Take derivatives of a loss
  \item Translate mathematics into implemented code
  \item Train your own set of word vectors and briefly analyze them
\end{itemize}
We \textbf{strongly recommend} doing this assignment in order.  Your answers in the written portion will make your implementation much easier, especially for the gradient computations.


\section{Understanding Word2Vec [30 pts]}

\noindent {\bf Q1: Parameters [3 pts]}  How many parameters are there in the SGNS model?  Write your answer in terms of $V$ (the vocabulary) and $d_e$, the embedding dimension.  [Hint: one parameter is \emph{a single real number}.]


\vspace{2em}
\noindent {\bf Q2: Sigmoid [7 pts]}  Sigmoid is the logistic curve $\sigma(x) = \frac{1}{1+e^{-x}}$.
\begin{itemize}
  \item What is the range of $\sigma(x)$? [1 pt]
  \item How is it used in the SGNS model? [2 pts]
  \item Compute $\frac{d\sigma}{dx}$; show your work.  [Hint: write your final answer in terms of $\sigma(x)$.] [5pts]
\end{itemize}

\vspace{2em}
\noindent {\bf Q3: Loss function's gradients [20 pts]}  In the slides for lecture 3, we saw that the total loss for one positive example and $k$ negative examples is given by:
$$ L_{CE} = -\log P(1 | w, c_+) - \sum_{i=1}^k \log P(0 | w, c_{-i})$$
In what follows, where $x$ is a vector and $f$ a function of $x$ and possibly more variables, we will define $\nabla_x f := \langle \frac{\partial f}{\partial x_1} , \frac{\partial f}{\partial x_2}, \dots , \frac{\partial f}{\partial x_n} \rangle$.
\begin{itemize}
  \item Rewrite this loss in terms of the parameter matrices $E$ and $C$ (i.e. replace the $P(\cdot)$s with the definition of the model). [2 pts]

        Use $w$ as the integer index of the target word, $c_+$ as the integer index of the positive context word, and $c_{-i}$ as the integer index of the $i$th negative sampled context word.
  \item Using the chain rule, compute $\frac{d}{dx} -\log\sigma(x)$.  [Hint: first, show that $\sigma(x) = \frac{e^x}{e^x+1}$.  Note: $\log$ here is the natural logarithm, i.e. logarithm with base $e$.] [4 pts]
  \item Show that $\nabla_x x \cdot y = y$ (where $x \cdot y$ is the dot product of two vectors). [2 pts]
  \item Compute (and show your work) $\nabla_{C_{c_+}} L_{CE}$. [4 pts]
  \item Compute (and show your work) $\nabla_{C_{c_{-i}}} L_{CE}$. [4 pts]
  \item Compute (and show your work) $\nabla_{E_w} L_{CE}$. [4 pts]
\end{itemize}


\section{Implementing Word2Vec [45 pts]}

Before getting started, a few notes on the implementation:
\begin{itemize}
  \item Always start with small data!  To test various components of the pipeline, you can use the toy files in \texttt{/dropbox/21-22/575k/data/}.
  \item All files referenced here are in \texttt{/dropbox/21-22/575k/hw2} on patas.
  \item The main training loop is at the bottom of \texttt{word2vec.py}.  You do not have to touch this, but can read it to see how the various components you implement are being used.
  \item This implementation uses a Vocabulary class, as implemented in HW1.  We will make a reference implementation available for use on Monday morning (after the late submission deadline); until then, you can use your own, by placing vocabulary.py in the same directory as your copy of the files for this assignment.
\end{itemize}

\vspace{2em}
\noindent {\bf Q1: Data generation [10 pts]} In \texttt{data.py}
\begin{itemize}
  \item Implement \texttt{get\_positive\_samples}, which generates positive examples from a list of tokens. [7 pts]
  \item Implement \texttt{negative\_samples}, which samples negative context words.  [Hint: \texttt{random.choices} is your friend.] [3 pts]
\end{itemize}

\vspace{2em}
\noindent {\bf Q2: Model computation [10 pts]} In \texttt{word2vec.py}
\begin{itemize}
  \item Implement \texttt{SGNS.forward}.  This represents one ``forward pass'' of the skip-gram with negative sampling model, i.e. this computes $P(1 | w, c)$.  Note: use \texttt{self.embeddings} and \texttt{self.context\_embeddings}, which are defined in \texttt{\_\_init\_\_}.
\end{itemize}

\vspace{2em}
\noindent {\bf Q3: Gradient computation [15 pts]} In \texttt{word2vec.py}, implement the following methods
\begin{itemize}
  \item \texttt{get\_positive\_context\_gradient}: this computes $\nabla_{C_{c_+}} L_{CE}$.
  \item \texttt{get\_negative\_context\_gradients}: this computes the list of $\nabla_{C_{c_{-i}}} L_{CE}$ for each negative context word $c_{-i}$.
  \item \texttt{get\_target\_word\_gradient}: this computes $\nabla_{E_w} L_{CE}$.
\end{itemize}

\vspace{2em}
\noindent {\bf Q4: Train word vectors [10 pts]} Run the main training loop by calling \texttt{word2vec.py} with the following command-line arguments (defined in \texttt{util.py}):
\begin{itemize}
  \item 15 epochs
  \item Save vectors to a file called vectors.tsv
  \item Embedding dimension: 15
  \item Learning rate: 0.2
  \item Minimum frequency: 5
  \item Number of negative samples: 15
\end{itemize}
After that, run \texttt{python analysis.py --save\_vectors vectors.tsv --save\_plot vectors.png}.  This will take your saved vectors and produce a plot with the vectors (after using PCA to reduce dimensionality to 2) of a select choice of words.  In your readme file, please include: 
\begin{itemize}
  \item The total run-time of your training loop.  This will be printed by the main script.
  \item The generated plot.
  \item Describe in 2-3 sentences any trends that you see in these embeddings.
\end{itemize}


\vspace{2em}
\noindent {\bf Testing your code} In the dropbox folder for this assignment, we have included a file \texttt{test\_all.py} with a few very simple unit tests for the methods that you need to implement.  You can verify that your code passes the tests by running \texttt{pytest} from your code's directory, with the course's conda environment activated.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Submission Instructions}

In your submission, include the following:
\begin{itemize}
  \item readme.(txt$\mid$pdf) that includes your answers to \S1 as well as Q4 of \S2. 
  \item \texttt{hw2.tar.gz} containing:
  \begin{itemize}
    \item run\_hw2.sh
    \item word2vec.py
    \item data.py
  \end{itemize}
\end{itemize}





\end{document}