Mining-Away/jupyter-notes/Panda Bamboo.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas Crash Course\n",
    "As usual, documenting what is being used from pandas here ig\n",
    "\n",
    "Docs:\n",
    "- https://pandas.pydata.org/docs/getting_started/index.html#getting-started"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning Game/Score/Rating Dataset\n",
    "Error found: Game Names had Reviews attached to them"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Game Datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from tkinter.filedialog import askopenfilename\n",
    "filename = askopenfilename()\n",
    "df1= pd.read_csv(filename)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Cleaning:  Removing the word review and anything after it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Unclean showcase\n",
    "unclean = df1\n",
    "#limit this output 3 rows pls\n",
    "print(unclean[['GameName']].head(5))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CLeaning\n",
    "nuke=df1['GameName'].to_list()\n",
    "nuke2 = list()\n",
    "\n",
    "for orphan in nuke : \n",
    "    orphan = orphan.split('Review')[0]\n",
    "    nuke2.append(orphan)\n",
    "\n",
    "df1['GameName']=nuke\n",
    "\n",
    "\n",
    "\n",
    "nuke_frame = pd.DataFrame(nuke2)\n",
    "clean=df1.drop(columns=['GameName'])\n",
    "\n",
    "clean['Name'] = nuke2\n",
    "#limit this output 3 rows pls\n",
    "print(clean[['Name']].head(5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CSV output\n",
    "df1.to_csv('cleaned_games.csv')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Integrating Game Datasets together"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "filename = askopenfilename()\n",
    "df2 = pd.read_csv(filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# merged = pd.merge(df1,df2, how='inner', sort=True) DOES NOT WORK\n",
    "# print(merged.head(10))\n",
    "merged = df2.join(df1, lsuffix='merged') #Good\n",
    "print(merged.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "merged2.to_csv('merged_games.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(df1[['Name']].head(100))\n",
    "print(df2[['Name']].head(100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}