Quantcast
Channel: RapidMiner Studio Forum topics
Viewing all 2122 articles
Browse latest View live

How do I load the data stored by a store operator outside RapidMiner?

$
0
0

Hey, 

I have stored object that's around 10GB in size. When I try to process on this object, RapidMiner stops and responds back saying there isn't enough memory to continue this process. Is there a way I can load the data into a python/java object so that I can perform the operations I want on it. 

 

Regards,

Naveen


My Rapid Miner Studio 8.1 seems to be crashed - How can I solve it?

$
0
0

Hi,

since yesterday, my Rapid Miner studio 8.1 platform seems to have crashed: If I only run a ReadXLS operator connected to the output and run it, it appears nothing in the results when before, it would appear the retrieved data from the .xls file. If I run any of the VCFs that before yesterday would run properly and in a speedy way, almost instantaneously, now, it only appears a time lapse that never ends...

 

The VCFs that I am running are simple ones with only statisitcal analyisis such as anova and chi-quare applied to less than 300 examples, so before yesterday, it was running instantaneously, and now, I only see the time lapse in the left corner of the platform in seconds running and does nothing from there..even if I leave it for a long time...so something append.

 

My rapid miner started to not well function after I have installed the statisitcal extention and the tool extension to have the tukey and chi-square operator that after testing them, something happened which made the RM crash. I have tried to run a malware, and it is still the same. I have uninstalled and then, installed again the rapid miner 8.1, 3 times, and it is still the same. Uninstalled all extensions, and still the same. What I find strange it is that I see that the "Java(TM) Platform SE binary" rises to 2000MB to quite 4000MB in my management task window of my computer which is an Intel(R) Core(TM) i7-6700HQ CPU with16,0 GB of RAM, with an windows operating system of 64bits, so it should work well and fast!...I am lost....can anyone can give me a tip of the reasons of my problem and how to solve it? I would be very grateful,

Jud

 

Uploading own Association Rules (not creating them in rapid miner)

$
0
0

I have an example list with text information which I want to classify according to content (not sentiment).

Now I have a secound list with set rules - like a dictionary - e.g. if word A,B or C accure it's definitalycategory 1. So the list has the two columns category and words that are 100% related to this categrory.

My question is now how can I uplad this information to my rapid miner model? I can create rules from an example set and apply them, but how do I add an excel list as set of rules?

Import a Word document to Rapidminer

$
0
0

On a project for a recent client I needed to apply some common Natural Language Processing (NLP) techniques to surveys they had gathered, but one of the requirements for the project was that the source document had to remain in Word's .docx format and couldn't be exported to .txt. RapidMiner was the tool of choice for this engagement since it is graphical in nature and has a very usable library for text analysis, but what it doesn't have is an operator that specifically imports .docx files.

 

Microsoft Word files are basically zip files that contain an XML representation of the actual document. It stands to reason that if you can unzip the wrapper and get to the XML inside, you have a good chance of being able to read the document and do whatever you need in terms of analysis. RapidMiner has an operator for executing custom Python scripts (if you download the Python extension), so I chose to start there and see if it could handle those tasks.

Using Python in RapidMiner

First we'll need to download the Python extension, which you can do by going to Extensions-->Marketplace in the menu at the top of the page. It's one of the most popular downloads, so just go to "Top Downloads," select it from the list, and click "Install Packages" at the bottom of the window. You'll need to restart RapidMiner afterwards for the extension's operators to become available.

 

 

To use a custom Python script, search for the "Execute Python" operator and drag it onto the workflow. Double-click and you'll see the usual parameter editing box on the top right of the screen, which should contain a button labeled "Edit Text." This is where we'll enter the code.

 

The Code

I try not to reinvent the wheel when coding, so I Googled the problem to see if someone had tackled it before me and someone definitely had. The code I used is below:

 

 

If you want to download it straight from Etienne's blog, just follow this link:

http://etienned.github.io/posts/extract-text-from-word-docx-simply/

The initial workflow looked like this:

 

 

After using Etienne's code to unwrap the .docx file, it was easily readable by the "Read Document" operator. After that I transformed all words to lowercase, tokenized them, removed stop words, then converted the resulting word list to data and loaded it into a database for analysis. Simple.

N-grams do not sort in ascending/descending order

$
0
0

Hello,

 

I have got an exampleset that was converted to a wordslist with the process documents from data operator to count the term and document occurrences of my dataset. I also have a duiplicate of this process that generates n-grams. Now, the actual wordlist that has no n-grams can be sorted in ascending/descending order when clicking on the attribute column like "term occurrences"  or "document occurrences", however this does not work on the result that contains the n-grams. Does anyone know why the results with the n-grams cannot be sorted?

 

 

flood forecasting

$
0
0

Hello! I'm working on a multivariate time series, 7 independent variable as rainfall depth (mm/hr) and one dependent variable as stream-flow (cumecs), so i want to develop a flood forecasting model by using past values of streamflow time series, together with joint time series of the observed present and past, as well as anticipated future values of rainfall time series. I want to forecast rainfall the from  the forecasted rainfall I can be able to predict the stream flow.

Let y be the variable (scalar/vector-valued) to be forecasted let Yt be the joint time series of the

present and past values of y.

Yt = [yt, yt-1,…, yt-n],  

Let u be the variable (scalar or vector-valued) that is in causal relationship with y, and let Ut be

the joint time-series of the observed present and past, as well as any anticipated future values

(denoted by hat) of u, such that,

Ut = [Ūt+α, ut, ut-1,…,ut-n], and let

Zt = [Yt, Ut],

α > 0,

Lead time forecast of the y variable is p(yt+α|Zt),

α= 1

yt+1  = [yt, yt-1, …, yt-n; Ūt+1, ut, ut-1,…,ut-n]

 

Inwould like to use neural networks and svm to carry out this task

Concatenate all text from Twitter feed

$
0
0

I am collecting all of the text from a Twitter feed using the Twitter operator. I am storing this data into a MySQL database. I extract this information from the MySQL table to process through a sentiment analysis engine, as part of the process I need to include the all of the examples into an API format before submitting. I use the Generate Attribute operator to create the API secret and key required, however when I join and then concatinate the date into a single file (Select Attribute used to drop the attributes I do not need) I only have the first tweet that was pulled from the DB included in the API submission file, the rest of the tweets are missing.

 

What am I doing incorrectly? I have tried turnig the data into documents and combing, I have tried creating collections and flattening them. I have really tried everything but I am just unable to insert the roughly 1,5k tweets that I have pulled down into the file format. 

 


curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/hal+json' --header 'X-API-SECRET-KEY:XXXXXXXxXXXxXXXXXX' --header 'X-API-XXXXXXXXXXXXX' -d '{ "name": "account_name", "gender": 0, "content": { "content_handle": "account_handle", "content_source": 1,2 "content_date": "10/04/2018 22:00:46 PM SAST", "language_content": "need to insert the free text examples from twitter in here." }, "person_handle": "key_account_manager"

 

In the code above I need to insert the Twitter Text into the "language_content" field, but can only ever insert a single line of Twitter data. 

 

@RitualGym Hey guys, the gym here in Illovo was supposed to be opening in April, is it still happening? I’m keen to get stared. 💪🏼 @RitualGym Hey guys, the gym here in Illovo was supposed to be opening in April, is it still happening? I’m keen to get stared. 💪🏼
@OneDayOnlycoza House of Chards 🥦 https://t.co/ljZQD93t3i @OneDayOnlycoza House of Chards 🥦 https://t.co/ljZQD93t3i
@ThatDarnKitteh @Nick_Frost If this isn’t your handle by the end of the day I’m unfollowing 😂 @ThatDarnKitteh @Nick_Frost If this isn’t your handle by the end of the day I’m unfollowing 😂
@Nick_Frost Have you seen what happens when people try combine their names for their kids? 😣😖🤢🤮 @Nick_Frost Have you seen what happens when people try combine their names for their kids? 😣😖🤢🤮
This is the best thing on the internet right now. 😂 https://t.co/TchhxFUGqT This is the best thing on the internet right now. 😂 https://t.co/TchhxFUGqT
@OneDayOnlycoza UK size 7. 😉 https://t.co/DAIaK0V5oF @OneDayOnlycoza UK size 7. 😉 https://t.co/DAIaK0V5oF

Above is the text that I need to insert into the file. This text comes from a MySQL db, so has /r/n characters seperating the fileds (which I believe is causing the issue)

 

At my wits end, please help me see the wood for the trees. 

Finding most common words in text attribute

$
0
0

Hello all,

 

This is my first post on this forum, though I have been using RapidMiner for some time now, so hi to all of you! I hope you can help me out with a problem that I just can't seem to solve.

 

I want to get a list (like a top 10 or a top 20) of the most common word throughout a text attribute. I have already performed the basics (Nominal to text, Process Documents, tokenize, filter stopwords) and even developed some prediction models, but I am just not finding any operator that will show me the words that occur most commonly throughout the dataset (or better yet, the most common words per label). Can anyone help?

 

Thank you so much in advance. Regards, Rick


Twitter tweets extract

$
0
0
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000"><operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34"><parameter key="connection" value="Sentiment.analysis"/><parameter key="query" value="trump"/><parameter key="result_type" value="recent or popular"/><parameter key="limit" value="100"/><parameter key="language" value="english"/><parameter key="until" value="2018.04.11 21:57:56 +0530"/><parameter key="filter_by_geo_location" value="false"/><parameter key="radius_unit" value="miles"/></operator></process>

Hi, I am not able to extract tweets for any keywords. Can somebody help?

Returning website HTML code

$
0
0

I am a Rapidminer learner and need to be able to download the html code for any given website in order to determine if any of the accompanying pages include some form of login, form submission or other workflow. The thought is to download the html code and then search for identifiers unique to such finctionality. My question is:

 

a) Is this the best way to accomplish the task?

b) What is the best sequence of operators to do so?

 

Thank you in advance for your help, it is greatly appreciated. BK

weka operator does not appear when it has installed weka extension

$
0
0
please help me, i have installed the latest version of rapidminer, version 8. i also have installed weka extension. but i did not find weka operator. how to bring it up?

Why does Keras LSTM expects 3 dimensions instead of the offered samples, time steps, and features?

$
0
0

I want to classify a time sequence of OHLC data using LSTM.

This exampleset is converted to samples, time steps, and features to feed the LSTM using a reshape layer.

When I run this model I get an error message as shown in the following picture.

What am I doing wrong?

 

Screen Shot 2018-04-13 at 07.37.01.png

My model is listed below.

My input exampleset consist out of 2000 examples of OHLC data.

I reshape this data to 100 samples of 20 timesteps of 4 (OHLC) features.

My label is PhasesNum. It contains 3 classes indicated with the real values 10, 100, and 1000.

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001"><context><input/><output/><macros/></context><operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process"><parameter key="logverbosity" value="init"/><parameter key="random_seed" value="2001"/><parameter key="send_mail" value="never"/><parameter key="notification_email" value=""/><parameter key="process_duration_for_mail" value="30"/><parameter key="encoding" value="SYSTEM"/><process expanded="true"><operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Data" width="90" x="45" y="34"><parameter key="repository_entry" value="../Data/inputdata"/></operator><operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value=""/><parameter key="attributes" value="Close|High|Low|Open|PhasesNum"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="attribute_value"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="time"/><parameter key="block_type" value="attribute_block"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_matrix_row_start"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/></operator><operator activated="true" class="numerical_to_real" compatibility="8.1.001" expanded="true" height="82" name="Numerical to Real" width="90" x="313" y="34"><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value=""/><parameter key="attributes" value="PhasesNum"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="numeric"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="real"/><parameter key="block_type" value="value_series"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_series_end"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="true"/></operator><operator activated="true" class="set_role" compatibility="8.1.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="34"><parameter key="attribute_name" value="PhasesNum"/><parameter key="target_role" value="label"/><list key="set_additional_roles"><parameter key="Close" value="regular"/><parameter key="High" value="regular"/><parameter key="Low" value="regular"/><parameter key="Open" value="regular"/></list></operator><operator activated="true" class="filter_example_range" compatibility="8.1.001" expanded="true" height="82" name="Filter Example Range" width="90" x="45" y="136"><parameter key="first_example" value="1"/><parameter key="last_example" value="2000"/><parameter key="invert_filter" value="false"/></operator><operator activated="true" class="normalize" compatibility="8.1.001" expanded="true" height="103" name="Normalize" width="90" x="179" y="136"><parameter key="return_preprocessing_model" value="false"/><parameter key="create_view" value="false"/><parameter key="attribute_filter_type" value="subset"/><parameter key="attribute" value=""/><parameter key="attributes" value="Close|High|Low|Open"/><parameter key="use_except_expression" value="false"/><parameter key="value_type" value="numeric"/><parameter key="use_value_type_exception" value="false"/><parameter key="except_value_type" value="real"/><parameter key="block_type" value="value_series"/><parameter key="use_block_type_exception" value="false"/><parameter key="except_block_type" value="value_series_end"/><parameter key="invert_selection" value="false"/><parameter key="include_special_attributes" value="false"/><parameter key="method" value="Z-transformation"/><parameter key="min" value="0.0"/><parameter key="max" value="1.0"/><parameter key="allow_negative_values" value="false"/></operator><operator activated="true" class="multiply" compatibility="8.1.001" expanded="true" height="103" name="Multiply" width="90" x="313" y="136"/><operator activated="true" class="keras:sequential" compatibility="1.0.003" expanded="true" height="166" name="Keras Model" width="90" x="246" y="289"><parameter key="input shape" value="(2000,4)"/><parameter key="loss" value="categorical_crossentropy"/><parameter key="optimizer" value="Adam"/><parameter key="learning rate" value="0.001"/><parameter key="momentum" value="0.0"/><parameter key="rho" value="0.9"/><parameter key="beta 1" value="0.999"/><parameter key="beta 2" value="0.999"/><parameter key="epsilon" value="1.0E-8"/><parameter key="decay" value="0.0"/><parameter key="schedule decay" value="0.004"/><parameter key="Nesterov" value="false"/><parameter key="use metric" value="false"/><enumeration key="metric"/><parameter key="epochs" value="128"/><parameter key="batch size" value="100"/><enumeration key="callbacks"><parameter key="callbacks" value="TensorBoard(log_dir='./logs', histogram_freq=0, write_graph=True, write_images=False, embeddings_freq=0, embeddings_layer_names=None, embeddings_metadata=None)"/></enumeration><parameter key="verbose" value="1"/><parameter key="validation split" value="0.0"/><parameter key="shuffle" value="false"/><parameter key="fix seed" value="false"/><parameter key="random seed" value="0"/><process expanded="true"><operator activated="true" class="keras:core_layer" compatibility="1.0.003" expanded="true" height="82" name="Add Core Layer" width="90" x="112" y="34"><parameter key="layer_type" value="Reshape"/><parameter key="no_units" value="1"/><parameter key="activation_function" value="None"/><parameter key="use_bias" value="true"/><parameter key="kernel_initializer" value="glorot_uniform(seed=None)"/><parameter key="bias_initializer" value="Zeros()"/><parameter key="kernel_regularizer" value="None"/><parameter key="bias_regularizer" value="None"/><parameter key="activity_regularizer" value="None"/><parameter key="kernel_constraint" value="None"/><parameter key="bias_constraint" value="None"/><parameter key="rate" value="0.1"/><parameter key="noise_shape" value="None"/><parameter key="seed" value="None"/><parameter key="target_shape" value="(100,20,4)"/><parameter key="dims" value="1.1"/><parameter key="repetition_factor" value="1"/><parameter key="function" value="None"/><parameter key="l1" value="0.0"/><parameter key="l2" value="0.0"/><parameter key="mask_value" value="0.0"/></operator><operator activated="true" class="keras:recurrent_layer" compatibility="1.0.003" expanded="true" height="82" name="Add Recurrent Layer" width="90" x="313" y="34"><parameter key="layer_type" value="LSTM"/><parameter key="no_units" value="3"/><parameter key="activation" value="softmax"/><parameter key="recurrent_activation" value="sigmoid"/><parameter key="use_bias" value="true"/><parameter key="kernel_initializer" value="glorot_uniform(seed=None)"/><parameter key="recurrent_initializer" value="glorot_uniform(seed=None)"/><parameter key="bias_initializer" value="Zeros()"/><parameter key="unit_forget_bias" value="true"/><parameter key="kernel_regularizer" value="None"/><parameter key="recurrent_regularizer" value="None"/><parameter key="bias_regularizer" value="None"/><parameter key="activity_regularizer" value="None"/><parameter key="kernel_constraint" value="None"/><parameter key="recurrent_constraint" value="None"/><parameter key="bias_constraint" value="None"/><parameter key="dropout" value="0.0"/><parameter key="recurrent_dropout" value="0.0"/><parameter key="stateful" value="true"/><parameter key="unroll" value="false"/><parameter key="implementation" value="0"/></operator><connect from_op="Add Core Layer" from_port="layers 1" to_op="Add Recurrent Layer" to_port="layers"/><connect from_op="Add Recurrent Layer" from_port="layers 1" to_port="layers 1"/><portSpacing port="sink_layers 1" spacing="0"/><portSpacing port="sink_layers 2" spacing="0"/></process></operator><operator activated="true" class="keras:apply" compatibility="1.0.003" expanded="true" height="82" name="Apply Keras Model" width="90" x="447" y="289"><parameter key="batch_size" value="100"/><parameter key="verbose" value="0"/></operator><connect from_op="Retrieve Data" from_port="output" to_op="Select Attributes" to_port="example set input"/><connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/><connect from_op="Numerical to Real" from_port="example set output" to_op="Set Role" to_port="example set input"/><connect from_op="Set Role" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/><connect from_op="Filter Example Range" from_port="example set output" to_op="Normalize" to_port="example set input"/><connect from_op="Normalize" from_port="example set output" to_op="Multiply" to_port="input"/><connect from_op="Multiply" from_port="output 1" to_op="Apply Keras Model" to_port="unlabelled data"/><connect from_op="Multiply" from_port="output 2" to_op="Keras Model" to_port="training set"/><connect from_op="Keras Model" from_port="model" to_op="Apply Keras Model" to_port="model"/><connect from_op="Apply Keras Model" from_port="labelled data" to_port="result 1"/><portSpacing port="source_input 1" spacing="0"/><portSpacing port="sink_result 1" spacing="0"/><portSpacing port="sink_result 2" spacing="0"/></process></operator></process>

 

 

Fuzzy Match of Strings

$
0
0

I'm trying to work through a problem in Rapidminer. I'm trying to find approximate matches of strings of one dataset in another dataset.  Is there a way we can perform a fuzzy match on a string in rapidminer? Any help will be appreciated! 

How to convert predicted value back to its original value after normalisation?

$
0
0

Hi everyone, i'm having trouble to transform the predicted data to its original value after normalisation. Is there any method of doing so?thank you very much.

Anomaly Detection Extension Generate ROC Operator

$
0
0

Hi all;

I'm newbie at Rapidminer. I'm using anomaly detection extension. But I cannot use generate ROC operator. I don't know what is wrong with me? I couldn't find any document or video. Is there anyone who can help me about this operator?

Thanks.


Use a hardcoded exampleset on ApplyModel in java

$
0
0

I have trained the model via Rapidminer gui and now i am using .rpm file to use it in java. Now the problem is I want to simulate a data example in java via ApplyModel operator. How do i do that?

Decision tree

$
0
0

Hello everyone:

The problem that brings me here today should not be very complicated, but I can not think of any way to solve it.

The question is this: When we use a decision tree it returns a series of information that in my case it would be useful to know. First, it would be useful to know what is the way you are following to make a new classification; and second, I would like to know what is the distribution in that rule and the number of examples of the total that classify me in it. This is possible to see when we place the cursor on the final leaf of the tree on the diagram, but how could I know all this automatically if I want to classify a new example?

Decision Tree with R

$
0
0

Hello,

I am trying to build a decision tree using R and I already have the process. What I need to do is to import the process (rpm file) into RapidMiner and load it with the xlxs and csv docs. I have done that I when running it I get the following error message:  File :  memory buffered file

I tried to google it but I can´t get it sorted.  Does anyone know how to fix it?

Thanks 

Loading Financial Data into Rapidminer

$
0
0

Dear Rapidminer Community, 

 

I went through some posts in the forum looking for a convenient way to load financial data (mainly stocks and indices) but I think there are only workarounds which work in some cases but nothing in general? The operators Yahoo Stock Data, Yahoo Historical Stock Data don't work? Or have I configured them wrongly?  Also the workarounds from Scott and Thomas only work partly?

 

As most of these posts are almost a year old, I wanted to ask if there was already something new? An operator which allows to load financial data? Or any new (more convenient) workarounds to load data into Rapidminer? 

 

Best regards

Felix

 

https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Real-Time-Financial-Data-via-Alpha-Venture-API-alternative-to/ta-p/41119

 

https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Downloading-historical-financial-data-since-change-of-Yahoo-API/m-p/39145#M26819

Convert IOObjectCllection to Document

$
0
0

Can any one help me with how I should tackle the below error:

 

"Wrong input of type 'IOObjectCollection' at port 'Fuzzy Matching: Document' . I'm trying tp perform a fuzzy match and the port requires a document input.

Viewing all 2122 articles
Browse latest View live