Web scraping for job Interviews of film industry professionals

We’ve successfully completed a project involving sourcing interviews with specific film industry professionals. Our task was to scrape the text from each interview, organizing them into separate text files according to a predefined naming convention. We ensured a minimum of 10 interviews per person from a selected list of sources, following a systematic approach:

  1. Identified access methods to targeted data sources such as websites, magazines, and newspapers, including Time Out, New York Times, Variety, Washington Post,, Entertainment Weekly, Los Angeles Times, Hollywood Reporter, Interview, Filmmaker, Moviemaker, and ShortList.
  2. Conducted queries on these sources for each person on the list, assessing if the interview content indicated the interviewee’s direct quotes. Interviews meeting this criterion were scraped and stored accordingly.
  3. Supplemented our findings with top Google search hits for each individual, ensuring inclusion of interviews from sources not covered in the initial list. We repeated the scraping process for these additional sources until reaching a satisfactory number of interviews per person.

The project encompassed three lists: directors (380), producers (605), and actors/actresses (713). The successful candidate possessed expertise in web crawlers and text scraping, adapting to various source materials, demonstrating a proactive approach, and exhibiting creative problem-solving skills. If you’re interested and possess these qualifications, we eagerly await your application!

